freebsd-skq

Author	SHA1	Message	Date
kib	0d46b47153	Adjust the r205536, by allowing a non-zero offset for anonymous mappings for a.out binaries. Apparently, a.out ld.so from FreeBSD 1.1.5.1 can issue such requests. Reported and tested by: Dan Plassche <dplassche@gmail.com> MFC after: 1 week	2012-08-14 11:47:07 +00:00
kib	a3d0fb0175	Do not leave invalid pages in the object after the short read for a network file systems (not only NFS proper). Short reads cause pages other then the requested one, which were not filled by read response, to stay invalid. Change the vm_page_readahead_finish() interface to not take the error code, but instead to make a decision to free or to (de)activate the page only by its validity. As result, not requested invalid pages are freed even if the read RPC indicated success. Noted and reviewed by: alc MFC after: 1 week	2012-08-14 11:45:47 +00:00
alc	cd8266338a	Never sleep on busy pages in vm_pageout_launder(), always skip them. Long ago, sleeping on busy pages in vm_pageout_launder() made sense. The call to vm_pageout_flush() specified asynchronous I/O and sleeping on busy pages blocked vm_pageout_launder() until the flush had completed. However, in CVS revision 1.35 of vm/vm_contig.c, the call to vm_pageout_flush() was changed to request synchronous I/O, but the sleep on busy pages was not removed.	2012-08-07 04:48:14 +00:00
kib	cac2fe116f	After the PHYS_TO_VM_PAGE() function was de-inlined, the main reason to pull vm_param.h was removed. Other big dependency of vm_page.h on vm_param.h are PA_LOCK* definitions, which are only needed for in-kernel code, because modules use KBI-safe functions to lock the pages. Stop including vm_param.h into vm_page.h. Include vm_param.h explicitely for the kernel code which needs it. Suggested and reviewed by: alc MFC after: 2 weeks	2012-08-05 14:11:42 +00:00
kib	4259905d31	Reduce code duplication and exposure of direct access to struct vm_page oflags by providing helper function vm_page_readahead_finish(), which handles completed reads for pages with indexes other then the requested one, for VOP_GETPAGES(). Reviewed by: alc MFC after: 1 week	2012-08-04 18:16:43 +00:00
attilio	c52a057b19	MFC	2012-08-03 15:58:05 +00:00
alc	5b4712b5a1	Inline vm_page_aflags_clear() and vm_page_aflags_set(). Add comments stating that neither these functions nor the flags that they are used to manipulate are part of the KBI.	2012-08-03 01:48:15 +00:00
alc	ceefb8bf17	Eliminate an unneeded declaration. (I should have removed this as part of r227568.)	2012-07-30 20:38:37 +00:00
kib	4f8212948b	Do not requeue held page or page for which locking failed, just leave them alone. Process the act_count updates for the held pages in the vm_pageout loop over the inactive queue, instead of refusing to do anything with such page. Clarify the intent of the addl_page_shortage counter and change its use for pages which are not processed in the loop according to the description. Reviewed by: alc MFC after: 2 weeks	2012-07-26 09:06:48 +00:00
alc	26fd7fb588	Addendum to r238604. If the inactive queue scan isn't restarted, then the variable "addl_page_shortage_init" isn't needed. X-MFC after: r238604	2012-07-24 02:35:30 +00:00
kib	80c3756a6f	Do not restart scan of the inactive queue when non-inactive page is found. Rather, we shall not find such pages on inactive queue at all. Requested and reviewed by: alc MFC after: 2 weeks	2012-07-18 21:47:50 +00:00
alc	e5949174d4	Move what remains of vm/vm_contig.c into vm/vm_pageout.c, where similar code resides. Rename vm_contig_grow_cache() to vm_pageout_grow_cache(). Reviewed by: kib	2012-07-18 05:21:34 +00:00
alc	ad2692aed9	Correct vm_page_alloc_contig()'s implementation of VM_ALLOC_NODUMP.	2012-07-17 02:36:59 +00:00
alc	8af6bec3e3	Various improvements to vm_contig_grow_cache(). Most notably, even when it can't sleep, it can still move clean pages from the inactive queue to the cache. Also, when a page is cached, there is no need to restart the scan. The "next" page pointer held by vm_contig_launder() is still valid. Finally, add a comment summarizing what vm_contig_grow_cache() does based upon the value of "tries". MFC after: 3 weeks	2012-07-16 18:13:43 +00:00
alc	8f708ce433	Correct an off-by-one error in vm_reserv_alloc_contig() that resulted in the last reservation of a multi-reservation allocation not being initialized.	2012-07-15 21:46:19 +00:00
mdf	a42ef9b109	Fix a bug with memguard(9) on 32-bit architectures without a VM_KMEM_MAX_SIZE. The code was not taking into account the size of the kernel_map, which the kmem_map is allocated from, so it could produce a sub-map size too large to fit. The simplest solution is to ignore VM_KMEM_MAX entirely and base the memguard map's size off the kernel_map's size, since this is always relevant and always smaller. Found by: Justin Hibbits	2012-07-15 20:29:48 +00:00
alc	2044619bd4	If vm_contig_grow_cache() is allowed to sleep, then invoke the vm_lowmem handlers.	2012-07-14 20:14:03 +00:00
alc	c0341f5875	Move kmem_alloc_{attr,contig}() to vm/vm_kern.c, where similarly named functions reside. Correct the comment describing kmem_alloc_contig().	2012-07-14 18:10:44 +00:00
attilio	26c71bd174	Style.	2012-07-12 11:02:57 +00:00
attilio	34064c9bef	Remove unused iterating functions.	2012-07-12 11:02:04 +00:00
attilio	cc7c02f8bf	Merge from vmcontention	2012-07-12 02:15:06 +00:00
attilio	675a214708	MFC	2012-07-12 02:07:21 +00:00
attilio	74cb07ed81	Document the object type movements, related to swp_pager_copy(), in vm_object_collapse() and vm_object_split(). In collabouration with: alc MFC after: 3 days	2012-07-11 01:04:59 +00:00
attilio	a23ed68137	- Move VM_RADIX_STACK in vm_object.c because it is the only consumer - Import the check for the return value of vm_radix_lookup() directly in the while removing the need to use a spourious check.	2012-07-08 23:50:57 +00:00
attilio	593da627a7	MFC	2012-07-08 23:20:15 +00:00
attilio	65bce42709	MFC	2012-07-08 23:17:04 +00:00
kib	aa091fdb2a	Avoid vm page queues lock leak after r238212. Reported and tested by: Michael Butler <imb protected-networks net> Reviewed by: alc Pointy hat to: kib MFC after: 20 days	2012-07-08 18:04:26 +00:00
attilio	e237fbcafa	Merge from vmcontention	2012-07-08 16:12:59 +00:00
attilio	9cc8fb25a6	MFC	2012-07-08 14:06:26 +00:00
attilio	ffa3f082ff	- Split the cached and resident pages tree into 2 distinct ones. This makes the RED/BLACK support go away and simplifies a lot vmradix functions used here. This happens because with patricia trie support the trie will be little enough that keeping 2 diffetnt will be efficient too. - Reduce differences with head, in places like backing scan where the optimizazions used shuffled the code a little bit around. Tested by: flo, Andrea Barberio	2012-07-08 14:01:25 +00:00
kib	80dc0e94a4	Drop page queues mutex on each iteration of vm_pageout_scan over the inactive queue, unless busy page is found. Dropping the mutex often should allow the other lock acquires to proceed without waiting for whole inactive scan to finish. On machines with lot of physical memory scan often need to iterate a lot before it finishes or finds a page which requires laundring, causing high latency for other lock waiters. Suggested and reviewed by: alc MFC after: 3 weeks	2012-07-07 19:39:08 +00:00
attilio	d659f0f6bf	MFC	2012-07-07 17:55:27 +00:00
eadler	16452223a2	Add missing sleep stat increase PR: kern/168211 Submitted by: linimon Reviewed by: alc Approved by: cperciva MFC after: 3 days	2012-07-07 17:46:11 +00:00
kib	dbb42d9c5d	Style. Reviewed by: alc (previous version) MFC after: 1 week	2012-07-06 20:13:16 +00:00
jhb	ab100847da	Honor db_pager_quit in 'show uma' and 'show malloc'. MFC after: 1 month	2012-07-02 16:14:52 +00:00
alc	c5e6daff9d	Add new pmap layer locks to the predefined lock order. Change the names of a few existing VM locks to follow a consistent naming scheme.	2012-06-27 03:45:25 +00:00
attilio	5e431fc401	Fix an inverted logic or. Reported by: pho	2012-06-24 01:28:36 +00:00
attilio	69fc91bff9	Fix a bug in the logic, otherwise exhausted is set everytime that outidx == cnt. Reported by: pho	2012-06-23 14:09:52 +00:00
attilio	5d9dd820d8	MFC	2012-06-23 02:08:15 +00:00
attilio	a8dc1f5270	Give vm_radix_lookupn() a way to specify that the whole range has been exhausted while searching and when a "maximum" value is passed as end (or end == 0). This allow for avoiding starting address overflow while searching through and avoids livelock with "start" wrapping up to "end". Reported by: pho (supposedly)	2012-06-23 01:30:51 +00:00
attilio	1053e2db44	Restart the scan from the busy page rather than 0.	2012-06-23 01:08:46 +00:00
attilio	295c086a4e	Fix a bug where the start address is not correctly pointing to the "next" index, scanning 2 times in a row the same object. This was hidden because when cache and resident tries are merged together there is a check to skip different objects in all the vm_radix_lookupn() usages, in order to fix a race with RED nodes.	2012-06-22 22:45:34 +00:00
attilio	a7497af6fd	- Add a comment explaining the locking of the cached pages pool held by vm_objects. - Add flags for the per-object lock and free pages queue mutex lock. Use the newly added flags to mark the cache root within the vm_object structure. Please note that other vm_object members should be marked with correct locking but they are left for other commits. In collabouration with: alc MFC after: 3 days3 days3 days	2012-06-22 18:34:11 +00:00
alc	4d96d753fe	Selectively inline vm_page_dirty().	2012-06-20 23:25:47 +00:00
jhb	12f0fa63b4	Move the per-thread deferred user map entries list into a private list in vm_map_process_deferred() which is then iterated to release map entries. This avoids having a nested vm map unlock operation called from the loop body attempt to recuse into vm_map_process_deferred(). This can happen if the vm_map_remove() triggers the OOM killer. Reviewed by: alc, kib MFC after: 1 week	2012-06-20 18:00:26 +00:00
attilio	5d0dc848b7	Do a more targeted check on the page cache and avoid to check the cache pointer directly in vnode_pager_setsize() by using newly introduced vm_page_is_cached() function. Reviewed by: alc MFC after: 2 weeks X-MFC: r234039,234064	2012-06-16 21:39:00 +00:00
alc	6eeaee04e4	The page flag PGA_WRITEABLE is set and cleared exclusively by the pmap layer, but it is read directly by the MI VM layer. This change introduces pmap_page_is_write_mapped() in order to completely encapsulate all direct access to PGA_WRITEABLE in the pmap layer. Aesthetics aside, I am making this change because amd64 will likely begin using an alternative method to track write mappings, and having pmap_page_is_write_mapped() in place allows me to make such a change without further modification to the MI VM layer. As an added bonus, tidy up some nearby comments concerning page flags. Reviewed by: kib MFC after: 6 weeks	2012-06-16 18:56:19 +00:00
attilio	d692a159e6	MFC	2012-06-16 17:05:09 +00:00
kib	a280ada6e7	Use the previous stack entry protection and max protection to correctly propagate the stack execution permissions when stack is grown down. First, curproc->p_sysent->sv_stackprot specifies maximum allowed stack protection for current ABI, so the new stack entry was typically marked executable always. Second, for non-main stack MAP_STACK mapping, the PROT_ flags should be used which were specified at the mmap(2) call time, and not sv_stackprot. MFC after: 1 week	2012-06-10 11:31:50 +00:00
attilio	3fc8e26872	Introduce a new tree for dealing with cached pages separately and remove the RED/BLACK concept. This is based on the assumption that path-compressed tries will be small and fast enough that a separate trie for cached pages will make sense and will leave the trie code simple enough (along with removing a lot of differences in the userend code).	2012-06-09 12:25:30 +00:00
attilio	807db03f96	Revert r231027 and fix the prototype for vm_radix_remove(). The target of this is getting at the point where the recovery path is completely removed as we could count on pre-allocation once the path compressed trie is implemented.	2012-06-08 18:44:54 +00:00
attilio	56a6aa288e	Revert r234033, 234007. The target of this is getting at the point where the recovery path is completely removed as we could count on pre-allocation once the path compressed trie is implemented.	2012-06-08 18:11:21 +00:00
attilio	7b7f4887b9	Revert r236367. The target of this is getting at the point where the recovery path is completely removed as we could count on pre-allocation once the path compressed trie is implemented.	2012-06-08 18:08:31 +00:00
attilio	85fb4c5799	MFC	2012-06-07 22:47:53 +00:00
eadler	f5ab1922a6	Revert r236380 PR: kern/166780 Requested by: many Approved by: cperciva (implicit)	2012-06-01 18:58:50 +00:00
attilio	e761e0c4bc	MFC	2012-06-01 14:57:55 +00:00
eadler	01b578179f	Add sysctl to query amount of swap space free PR: kern/166780 Submitted by: Radim Kolar <hsn@sendmail.cz> Approved by: cperciva MFC after: 1 week	2012-06-01 04:42:52 +00:00
attilio	ab9d63eba7	Simplify insert path by using the same logic of vm_radix_remove() for the recovery path. The bulk of vm_radix_remove() is put into a generic function vm_radix_sweep() which allows 2 different modes (hard and soft): the soft one will deal with half-constructed paths by cleaning them up. Ideally all these complications should go once that a way to pre-allocate is implemented, possibly by implementing path compression. Requested and discussed with: jeff Tested by: pho	2012-05-31 22:54:08 +00:00
emax	0984d7ec39	Tweak condition for disabling allocation from per-CPU buckets in low memory situation. I've observed a situation where per-CPU allocations were disabled while there were enough free cached pages. Basically, cnt.v_free_count was sitting stable at a value lower than cnt.v_free_min and that caused massive performance drop. Reviewed by: alc MFC after: 1 week	2012-05-23 18:56:29 +00:00
kib	187a8c5cd6	Calculate the count of per-process cow faults. Export the count to userspace using the obscure spare int field in struct kinfo_proc. Submitted by: Andrey Zonov <andrey zonov org> MFC after: 1 week	2012-05-23 18:10:54 +00:00
avg	ea96526248	vm_pager_object_lookup: small performance optimization do not needlessly lock an object if its handle doesn't match Reviewed by: kib, alc MFC after: 1 week	2012-05-23 12:51:49 +00:00
andrew	39fa59acb9	Fix booting on ARM. In PHYS_TO_VM_PAGE() when VM_PHYSSEG_DENSE is set the check if we are past the end of vm_page_array was incorrect causing it to return NULL. This value is then used in vm_phys_add_page causing a data abort. Reviewed by: alc, kib, imp Tested by: stas	2012-05-22 07:04:23 +00:00
nwhitehorn	e83623fb1f	Replace the list of PVOs owned by each PMAP with an RB tree. This simplifies range operations like pmap_remove() and pmap_protect() as well as allowing simple operations like pmap_extract() not to involve any global state. This substantially reduces lock coverages for the global table lock and improves concurrency.	2012-05-20 14:33:28 +00:00
kib	ac67bd3aa8	Do not double-reference the found vm object in cdev_pager_lookup(). vm_pager_object_lookup() already referenced the object. Note that there is no in-tree consumers of cdev_pager_lookup(). The only known user of the function is i915 gem driver, which is not yet imported. This should make the KPI change minor. Submitted by: avg MFC after: 1 week	2012-05-18 10:23:47 +00:00
kib	81841e52c6	Add new pager type, OBJT_MGTDEVICE. It provides the device pager which carries fictitous managed pages. In particular, the consumers of the new object type can remove all mappings of the device page with pmap_remove_all(). The range of physical addresses used for fake page allocation shall be registered with vm_phys_fictitious_reg_range() interface to allow the PHYS_TO_VM_PAGE() to work in pmap. Most likely, only i386 and amd64 pmaps can handle fictitious managed pages right now. Sponsored by: The FreeBSD Foundation Reviewed by: alc MFC after: 1 month	2012-05-12 20:49:58 +00:00
kib	9ff1ec42a4	Add a facility to register a range of physical addresses to be used for allocation of fictitious pages, for which PHYS_TO_VM_PAGE() returns proper fictitious vm_page_t. The range should be de-registered after consumer stopped using it. De-inline the PHYS_TO_VM_PAGE() since it now carries code to iterate over registered ranges. A hash container might be developed instead of range registration interface, and fake pages could be put automatically into the hash, were PHYS_TO_VM_PAGE() could look them up later. This should be considered before the MFC of the commit is done. Sponsored by: The FreeBSD Foundation Reviewed by: alc MFC after: 1 month	2012-05-12 20:42:56 +00:00
kib	1dfd5258de	Split the code from vm_page_getfake() to initialize the fake page struct vm_page into new interface vm_page_initfake(). Handle the case of fake page re-initialization with changed memattr. Sponsored by: The FreeBSD Foundation Reviewed by: alc MFC after: 1 month	2012-05-12 20:34:22 +00:00
kib	e5df51c93a	Assert that the page passed to vm_page_putfake() is unmanaged. Sponsored by: The FreeBSD Foundation Reviewed by: alc MFC after: 1 month	2012-05-12 20:27:51 +00:00
kib	af33014c48	Assert that fictitious or unmanaged pages do not appear on active/inactive lists. Sponsored by: The FreeBSD Foundation Reviewed by: alc MFC after: 1 month	2012-05-12 20:24:46 +00:00
kib	d67fed001c	Commit the change forgotten in r235356. Sponsored by: The FreeBSD Foundation Reviewed by: alc MFC after: 1 month	2012-05-12 20:10:18 +00:00
kib	84b22557f6	Make the vm_page_array_size long. Remove redundand zero initialization for vm_page_array_size and nearby variablees. Sponsored by: The FreeBSD Foundation Reviewed by: alc MFC after: 1 month	2012-05-12 20:03:06 +00:00
attilio	c72fe43a63	Add braces.	2012-05-12 19:54:57 +00:00
attilio	e5220032ec	On 32-bits architecture KTR has a bug as it cannot correctly grok 64-bits numbers. ktr_tracepoint() infacts casts all the passed value to u_long values as that is what the ktr entries can handle. However, we have to work a lot with vm_pindex_t which are always 64-bit also on 32-bits architectures (most notable case being i386). Use macros to split the 64 bits printing into 32-bits chunks which KTR can correctly handle. Reported and tested by: flo	2012-05-12 19:52:59 +00:00
attilio	1b73802874	MFC	2012-05-12 19:26:15 +00:00
attilio	3bd53aaf3c	- Fix a bug where lookupn can wrap up looking for the pages to scan, returning a non correct very low address again. - Stub out vm_lookup_foreach as it is not used	2012-05-12 19:22:57 +00:00
alc	323d529ebe	Give vm_fault()'s sequential access optimization a makeover. There are two aspects to the sequential access optimization: (1) read ahead of pages that are expected to be accessed in the near future and (2) unmap and cache behind of pages that are not expected to be accessed again. This revision changes both aspects. The read ahead optimization is now more effective. It starts with the same initial read window as before, but arithmetically grows the window on sequential page faults. This can yield increased read bandwidth. For example, on one of my machines, a program using mmap() to read a file that is several times larger than the machine's physical memory takes about 17% less time to complete. The unmap and cache behind optimization is now more selectively applied. The read ahead window must grow to its maximum size before unmap and cache behind is performed. This significantly reduces the number of times that pages are unmapped and cached only to be reactivated a short time later. The unmap and cache behind optimization now clears each page's referenced flag. Previously, in the case of dirty pages, if the containing file was still mapped at the time that the page daemon examined the dirty pages, they would be reactivated. From a stylistic standpoint, this revision also cleanly separates the implementation of the read ahead and unmap/cache behind optimizations. Glanced at: kib MFC after: 2 weeks	2012-05-10 15:16:42 +00:00
attilio	c7b668e647	MFC	2012-05-05 21:40:32 +00:00
nwhitehorn	d394621163	Avoid a lock order reversal in pmap_extract_and_hold() from relocking the page. This PMAP requires an additional lock besides the PMAP lock in pmap_extract_and_hold(), which vm_page_pa_tryrelock() did not release. Suggested by: kib MFC after: 4 days	2012-04-22 17:58:30 +00:00
kib	bc03240549	When MAP_STACK mapping is created, the map entry is created only to cover the initial stack size. For MCL_WIREFUTURE maps, the subsequent call to vm_map_wire() to wire the whole stack region fails due to VM_MAP_WIRE_NOHOLES flag. Use the VM_MAP_WIRE_HOLESOK to only wire mapped part of the stack. Reported and tested by: Sushanth Rai <sushanth_rai yahoo com> Reviewed by: alc MFC after: 1 week	2012-04-21 18:36:53 +00:00
alc	ad5c747d1d	As documented in vm_page.h, updates to the vm_page's flags no longer require the page queues lock. MFC after: 1 week	2012-04-21 18:26:16 +00:00
attilio	b721606b1f	Fix a brain-o. vm_page_lookup_cache() is not exported and cannot be used arbitrarely. Reported by: pho	2012-04-21 00:32:56 +00:00
attilio	af795c2c10	MFC	2012-04-11 14:54:06 +00:00
attilio	628004ddfb	- Introduce a cache-miss optimization for consistency with other accesses of the cache member of vm_object objects. - Use novel vm_page_is_cached() for checks outside of the vm subsystem. Reviewed by: alc MFC after: 2 weeks X-MFC: r234039	2012-04-09 17:05:18 +00:00
alc	e30063988b	Fix mincore(2) so that it reports PG_CACHED pages as resident. MFC after: 2 weeks	2012-04-08 18:25:12 +00:00
alc	a317fe0618	If a page belonging a reservation is cached, then mark the reservation so that it will be freed to the cache pool rather than the default pool. Otherwise, the cached pages within the reservation may be recycled sooner than necessary. Reported by: Andrey Zonov	2012-04-08 17:00:46 +00:00
attilio	c85d28efad	Fix a typo. Reported by: pho	2012-04-08 11:04:08 +00:00
attilio	b07157326c	MFC	2012-04-08 00:14:15 +00:00
attilio	a539d5c569	If we already had a page with pindex == 0 in the device pager object (if not already fictious) the code can panic when trying to first insert a fictious page because of the overridden pindex. Fix this by applying the same spinning pattern of vm_page_rename(). Reported by: pho	2012-04-07 23:58:49 +00:00
attilio	4bd8f1d188	Staticize vm_page_cache_remove(). Reviewed by: alc	2012-04-06 20:34:00 +00:00
attilio	a729154742	MFC	2012-04-06 19:59:37 +00:00
attilio	ed06f155e1	Free a cached page rather than only removing it. vm_page_cache_remove() should only be used in very little and specific cases (and marked as static likely) where the callers is going to take care also of the page flags appropriately, otherwise one can end up with a corrupted page. Reported by: pho	2012-04-06 19:49:45 +00:00
nwhitehorn	9c79ae8eea	Reduce the frequency that the PowerPC/AIM pmaps invalidate instruction caches, by invalidating kernel icaches only when needed and not flushing user caches for shared pages. Suggested by: kib MFC after: 2 weeks	2012-04-06 16:03:38 +00:00
jhb	5829de48d9	Add new ktrace records for the start and end of VM faults. This gives a pair of records similar to syscall entry and return that a user can use to determine how long page faults take. The new ktrace records are enabled via the 'p' trace type, and are enabled in the default set of trace points. Reviewed by: kib MFC after: 2 weeks	2012-04-05 17:13:14 +00:00
attilio	3a71812376	The cached_pages_counter is not reset even when shared pages are present if the obj is not a vnode. Fix this. Reported and tested by: flo	2012-04-03 20:06:07 +00:00
attilio	081936d842	MFC	2012-03-30 16:54:21 +00:00
mckusick	9a7982e5a0	Keep track of the mount point associated with a special device to enable the collection of counts of synchronous and asynchronous reads and writes for its associated filesystem. The counts are displayed using `mount -v'. Ensure that buffers used for paging indicate the vnode from which they are operating so that counts of paging I/O operations from the filesystem are collected. This checkin only adds the setting of the mount point for the UFS/FFS filesystem, but it would be trivial to add the setting and clearing of the mount point at filesystem mount/unmount time for other filesystems too. Reviewed by: kib	2012-03-28 20:49:11 +00:00
alc	e02fd6b842	Handle spurious page faults that may occur in no-fault sections of the kernel. When access restrictions are added to a page table entry, we flush the corresponding virtual address mapping from the TLB. In contrast, when access restrictions are removed from a page table entry, we do not flush the virtual address mapping from the TLB. This is exactly as recommended in AMD's documentation. In effect, when access restrictions are removed from a page table entry, AMD's MMUs will transparently refresh a stale TLB entry. In short, this saves us from having to perform potentially costly TLB flushes. In contrast, Intel's MMUs are allowed to generate a spurious page fault based upon the stale TLB entry. Usually, such spurious page faults are handled by vm_fault() without incident. However, when we are executing no-fault sections of the kernel, we are not allowed to execute vm_fault(). This change introduces special-case handling for spurious page faults that occur in no-fault sections of the kernel. In collaboration with: kib Tested by: gibbs (an earlier version) I would also like to acknowledge Hiroki Sato's assistance in diagnosing this problem. MFC after: 1 week	2012-03-22 04:52:51 +00:00
jhb	102d1a0777	Bah, just revert my earlier change entirely. (Missed alc's request to do this earlier.) Requested by: alc	2012-03-19 19:06:40 +00:00
attilio	7ee59361d5	MFC	2012-03-19 18:54:01 +00:00
jhb	9628d3dbf8	Fix madvise(MADV_WILLNEED) to properly handle individual mappings larger than 4GB. Specifically, the inlined version of 'ptoa' of the the 'int' count of pages overflowed on 64-bit platforms. While here, change vm_object_madvise() to accept two vm_pindex_t parameters (start and end) rather than a (start, count) tuple to match other VM APIs as suggested by alc@.	2012-03-19 18:47:34 +00:00
jhb	2e0db42a5f	Alter the previous commit to use vm_size_t instead of vm_pindex_t. vm_pindex_t is not a count of pages per se, it is more like vm_ooffset_t, but a page index instead of a byte offset.	2012-03-19 18:43:44 +00:00
kib	2963c3c979	In vm_object_page_clean(), do not clean OBJ_MIGHTBEDIRTY object flag if the filesystem performed short write and we are skipping the page due to this. Propogate write error from the pager back to the callers of vm_pageout_flush(). Report the failure to write a page from the requested range as the FALSE return value from vm_object_page_clean(), and propagate it back to msync(2) to return EIO to usermode. While there, convert the clearobjflags variable in the vm_object_page_clean() and arguments of the helper functions to boolean. PR: kern/165927 Reviewed by: alc MFC after: 2 weeks	2012-03-17 23:00:32 +00:00
attilio	5725f63f57	MFC	2012-03-16 15:46:44 +00:00
attilio	f9319cf885	Fix the nodes allocator in architectures without direct-mapping: - Fix bugs in the free path where the pages were not unwired and relevant locking wasn't acquired. - Introduce the rnode_map, submap of kernel_map, where to allocate from. The reason is that, in architectures without direct-mapping, kmem_alloc*() will try to insert the newly created mapping while holding the vm_object lock introducing a LOR or lock recursion. rnode_map is however a leafly-used submap, thus there cannot be any deadlock. Notes: Size the submap in order to be, by default, around 64 MB and decrase the size of the nodes as the allocation will be much smaller (and when the compacting code in the vm_radix will be implemented this will aim for much less space to be used). However note that the size of the submap can be changed at boot time via the hw.rnode_map_scale scaling factor. - Use uma_zone_set_max() covering the size of the submap. Tested by: flo	2012-03-16 15:41:07 +00:00
jhb	1e515523f3	Pedantic nit: use vm_pindex_t instead of long for a count of pages.	2012-03-14 20:57:48 +00:00
flo	83ea608b00	IFC at r232948 Approved by: attilio	2012-03-14 00:41:37 +00:00
jhb	19feaba08b	Add KTR_VFS traces to track modifications to a vnode's writecount.	2012-03-08 20:27:20 +00:00
attilio	86fae10111	MFC	2012-03-07 11:18:38 +00:00
attilio	9e63566650	Fix a compile time bug by adding a check just after the struct definition	2012-03-06 23:37:53 +00:00
alc	9181f45b9f	Eliminate stale incorrect ARGSUSED comments. Submitted by: bde	2012-03-02 17:33:51 +00:00
attilio	df89a6a2db	- Exclude vm_radix_shrink() from the interface but retain the code still as it can be useful. - Make most of the interface private as it is unnecessary public right now. This will help in making nodes changing with arch and still avoid namespace pollution.	2012-03-01 00:54:08 +00:00
attilio	3c5fbc2c09	MFC	2012-03-01 00:27:51 +00:00
alc	54c1d2e89a	Simplify kmem_alloc() by eliminating code that existed on account of external pagers in Mach. FreeBSD doesn't implement external pagers. Moreover, it don't pageout the kernel object. So, the reasons for having code don't hold. Reviewed by: kib MFC after: 6 weeks	2012-02-29 05:41:29 +00:00
alc	867c58a8cb	Simplify vm_mmap()'s control flow. Add a comment describing what vm_mmap_to_errno() does. Reviewed by: kib MFC after: 3 weeks X-MFC after: r232071	2012-02-25 21:06:39 +00:00
attilio	d4c43cbb8b	MFC	2012-02-25 18:24:45 +00:00
alc	7d737c65b5	Simplify vmspace_fork()'s control flow by copying immutable data before the vm map locks are acquired. Also, eliminate redundant initialization of the new vm map's timestamp. Reviewed by: kib MFC after: 3 weeks	2012-02-25 17:49:59 +00:00
kib	8c39852ba5	Place the if() at the right location, to activate the v_writecount accounting for shared writeable mappings for all filesystems, not only for the bypass layers. Submitted by: alc Pointy hat to: kib MFC after: 20 days	2012-02-24 10:41:58 +00:00
kib	f315a59476	Account the writeable shared mappings backed by file in the vnode v_writecount. Keep the amount of the virtual address space used by the mappings in the new vm_object un_pager.vnp.writemappings counter. The vnode v_writecount is incremented when writemappings gets non-zero value, and decremented when writemappings is returned to zero. Writeable shared vnode-backed mappings are accounted for in vm_mmap(), and vm_map_insert() is instructed to set MAP_ENTRY_VN_WRITECNT flag on the created map entry. During deferred map entry deallocation, vm_map_process_deferred() checks for MAP_ENTRY_VN_WRITECOUNT and decrements writemappings for the vm object. Now, the writeable mount cannot be demoted to read-only while writeable shared mappings of the vnodes from the mount point exist. Also, execve(2) fails for such files with ETXTBUSY, as it should be. Noted by: tegge Reviewed by: tegge (long time ago, early version), alc Tested by: pho MFC after: 3 weeks	2012-02-23 21:07:16 +00:00
kib	ac9e4627bb	Remove wrong comment. Discussed with: alc MFC after: 3 days	2012-02-22 20:01:38 +00:00
alc	506ef17e7f	When vm_mmap() is used to map a vm object into a kernel vm_map, it makes no sense to check the size of the kernel vm_map against the user-level resource limits for the calling process. Reviewed by: kib	2012-02-16 06:45:51 +00:00
attilio	d58aaf5046	MFC	2012-02-14 19:58:00 +00:00
kib	dacbfe950a	Close a race due to dropping of the map lock between creating map entry for a shared mapping and marking the entry for inheritance. Other thread might execute vmspace_fork() in between (e.g. by fork(2)), resulting in the mapping becoming private. Noted and reviewed by: alc MFC after: 1 week	2012-02-11 17:29:07 +00:00
attilio	8fe59ec45c	MFC	2012-02-10 18:31:35 +00:00
ed	28b4a002d6	Remove direct access to si_name. Code should just use the devtoname() function to obtain the name of a character device. Also add const keywords to pieces of code that need it to build properly. MFC after: 2 weeks	2012-02-10 12:35:57 +00:00
flo	1e497814c3	fix KTR consistency I'm committing this on behalf of Attilio as he cannot access svn right now.	2012-02-05 18:55:20 +00:00
attilio	6587a6afdd	Remove the panic from vm_radix_insert() and propagate the error to the callers of vm_page_insert(). The default action for every caller is to unwind-back the operation besides vm_page_rename() where this has proven to be impossible to do. For that case, it just spins until the page is not available to be allocated. However, due to vm_page_rename() to be mostly rare (and having never hit this panic in the past) it is tought to be a very seldom thing and not a possible performance factor. The patch has been tested with an atomic counter returning NULL from the zone allocator every 1/100000 allocations. Per-printf, I've verified that a typical buildkernel could trigger this 30 times. The patch survived to 2 hours of repeated buildkernel/world. Several technical notes: - The vm_page_insert() is moved, in several callers, closer to failure points. This could be committed separately before vmcontention hits the tree just to verify -CURRENT is happy with it. - vm_page_rename() does not need to have the page lock in the callers as it hide that as an implementation detail. Do the locking internally. - now vm_page_insert() returns an int, with 0 meaning everything was ok, thus KPI is broken by this patch.	2012-02-05 17:37:26 +00:00
attilio	3454102b5b	MFC	2012-02-04 17:18:16 +00:00
mav	3cc9e27b24	Fix NULL dereference panic on attempt to turn off (on system shutdown) disconnected swap device. This is quick and imperfect solution, as swap device will still be opened and GEOM will not be able to destroy it. Proper solution would be to automatically turn off and close disconnected swap device, but with existing code it will cause panic if there is at least one page on device, even if it is unimportant page of the user-level process. It needs some work. Reviewed by: kib@ MFC after: 1 week	2012-02-01 20:12:44 +00:00
attilio	1b454e6b83	Fix a bug in vm_radix_leaf() where the shifting start address can wrap-up at some point. This bug is triggered very easilly by indirect blocks in UFS which grow negative resulting in very high counts. In collabouration with: flo	2012-01-29 16:44:21 +00:00
attilio	8bc5caadc8	Fix format string for the pindex members as they should be treated as uintmax_t for compatibility among 32/64 bits.	2012-01-29 16:29:06 +00:00
attilio	810afc9780	Make an assertion stronger and improve the printout for easier bug catching when it is not possible to dump	2012-01-29 16:11:25 +00:00
kmacy	84d434965a	exclude kmem_alloc'ed ARC data buffers from kernel minidumps on amd64 excluding other allocations including UMA now entails the addition of a single flag to kmem_alloc or uma zone create Reviewed by: alc, avg MFC after: 2 weeks	2012-01-27 20:18:31 +00:00
nwhitehorn	bf2ee27f25	Revert r212360 now that PowerPC can handle large sparse arguments to pmap_remove() (changed in r228412). MFC after: 2 weeks	2012-01-17 00:31:09 +00:00
kib	247e21eaf0	Change the type of the paging_in_progress refcounter from u_short to u_int. With the auto-sized buffer cache on the modern machines, UFS metadata can generate more the 65535 pages belonging to the buffers undergoing i/o, overflowing the counter. Reported and tested by: jimharris Reviewed by: alc MFC after: 1 week	2012-01-10 18:05:44 +00:00
attilio	abdebe75b2	MFC	2012-01-05 23:12:19 +00:00
kib	aec33a2b90	Do not restart the scan in vm_object_page_clean() on the object generation change if requested mode is async. The object generation is only changed when the object is marked as OBJ_MIGHTBEDIRTY. For async mode it is enough to write each dirty page, not to make a guarantee that all pages are cleared after the vm_object_page_clean() returned. Diagnosed by: truckman Tested by: flo Reviewed by: alc, truckman MFC after: 2 weeks	2012-01-04 16:04:20 +00:00
attilio	7ba6fbeca7	Fix a spot missed during the last merge.	2012-01-01 21:46:16 +00:00
attilio	a4ffaeb982	MFC	2012-01-01 20:18:40 +00:00
alc	7f817ed8c5	Optimize vm_object_split()'s handling of reservations.	2011-12-28 20:27:18 +00:00
kib	c2a7f10253	Optimize the common case of msyncing the whole file mapping with MS_SYNC flag. The system must guarantee that all writes are finished before syscalls returned. Schedule the writes in async mode, which is much faster and allows the clustering to occur. Wait for writes using VOP_FSYNC(), since we are syncing the whole file mapping. Potentially, the restriction to only apply the optimization can be relaxed by not requiring that the mapping cover whole file, as it is done by other OSes. Reported and tested by: az Reviewed by: alc MFC after: 2 weeks	2011-12-23 09:09:42 +00:00
kib	cfa70889cb	Move kstack_cache_entry into the private header, and make the stack cache list header accessible outside vm_glue.c. MFC after: 1 week	2011-12-16 10:56:16 +00:00
eadler	d29af4d1bc	- The previous commit (r228449) accidentally moved the vm.stats.vm.* sysctls to vm.stats.sys. Move them back. Noticed by: pho Reviewed by: bde (earlier version) Approved by: bz MFC after: 1 week Pointy hat to: me	2011-12-14 13:25:00 +00:00
eadler	3072a90209	Document a large number of currently undocumented sysctls. While here fix some style(9) issues and reduce redundancy. PR: kern/155491 PR: kern/155490 PR: kern/155489 Submitted by: Galimov Albert <wtfcrap@mail.ru> Approved by: bde Reviewed by: jhb MFC after: 1 week	2011-12-13 00:38:50 +00:00
kib	2893f40afd	Fix printf. Submitted by: az MFC after: 1 week	2011-12-12 10:04:04 +00:00
attilio	1f27e97ae5	Use atomics for rn_count on leaf node because RED operations happen without the VM_OBJECT_LOCK held, thus can be concurrent with BLACK ones. However, also use a write memory barrier in order to not reorder the operation of decrementing rn_count in respect fetching the pointer. Discussed with: jeff	2011-12-06 22:57:48 +00:00
attilio	2436e63a9c	- Make rn_count 32-bits as it will naturally pad for 32-bit arches - Avoid to use atomic to manipulate it at level0 because it seems unneeded and introduces a bug on big-endian architectures where only the top half (2 bits) of the double-words are written (as sparc64, for example, doesn't support atomics at 16-bits) heading to a wrong handling of rn_count. Reported by: flo, andreast Found by: marius No answer by: jeff	2011-12-06 19:04:45 +00:00
alc	a8855af4c0	Introduce vm_reserv_alloc_contig() and teach vm_page_alloc_contig() how to use superpage reservations. So, for the first time, kernel virtual memory that is allocated by contigmalloc(), kmem_alloc_attr(), and kmem_alloc_contig() can be promoted to superpages. In fact, even a series of small contigmalloc() allocations may collectively result in a promoted superpage. Eliminate some duplication of code in vm_reserv_alloc_page(). Change the type of vm_reserv_reclaim_contig()'s first parameter in order that it be consistent with other vm_*_contig() functions. Tested by: marius (sparc64)	2011-12-05 18:29:25 +00:00
andreast	8c385e6008	Fix compilation issue on 32-bit targets. Reviewed by: attilio	2011-12-05 16:06:12 +00:00
attilio	8fbad61ab4	Revert a change that sneaked in during the last MFC.	2011-12-02 23:21:59 +00:00
attilio	b2701fb716	MFC	2011-12-02 21:45:46 +00:00
kib	d326d5565d	Rename vm_page_set_valid() to vm_page_set_valid_range(). The vm_page_set_valid() is the most reasonable name for the m->valid accessor. Reviewed by: attilio, alc	2011-11-30 17:39:00 +00:00
kib	a441abaf37	Hide the internals of vm_page_lock(9) from the loadable modules. Since the address of vm_page lock mutex depends on the kernel options, it is easy for module to get out of sync with the kernel. No vm_page_lockptr() accessor is provided for modules. It can be added later if needed, unless proper KPI is developed to serve the needs. Reviewed by: attilio, alc MFC after: 3 weeks	2011-11-29 13:07:32 +00:00
attilio	984f9ddc2b	- Remove unnecessary checks on rnode in KTR prints - Track rn_count in KTR prints - Improve KTR in a way it best fits rn_count tracking	2011-11-29 02:07:07 +00:00
attilio	27cea0290f	Fix compile. Submitted by: flo	2011-11-28 19:14:38 +00:00
attilio	77442309d8	Improve the diagnostic in the remove case.	2011-11-28 17:26:19 +00:00
attilio	4391f6c279	Fix a bug when the 'rnode' pointer can be NULL and we try to track the children. This helps in debugging case. Reported by: flo	2011-11-26 14:26:37 +00:00
attilio	aba6408184	MFC	2011-11-26 13:54:55 +00:00
attilio	b95134ea01	Introduce the same mutex-wise fix in r227758 for sx locks. The functions that offer file and line specifications are: - sx_assert_ - sx_downgrade_ - sx_slock_ - sx_slock_sig_ - sx_sunlock_ - sx_try_slock_ - sx_try_xlock_ - sx_try_upgrade_ - sx_unlock_ - sx_xlock_ - sx_xlock_sig_ - sx_xunlock_ Now vm_map locking is fully converted and can avoid to know specifics about locking procedures. Reviewed by: kib MFC after: 1 month	2011-11-21 12:59:52 +00:00
attilio	6a69e947d3	Introduce macro stubs in the mutex implementation that will be always defined and will allow consumers, willing to provide options, file and line to locking requests, to not worry about options redefining the interfaces. This is typically useful when there is the need to build another locking interface on top of the mutex one. The introduced functions that consumers can use are: - mtx_lock_flags_ - mtx_unlock_flags_ - mtx_lock_spin_flags_ - mtx_unlock_spin_flags_ - mtx_assert_ - thread_lock_flags_ Spare notes: - Likely we can get rid of all the 'INVARIANTS' specification in the ppbus code by using the same macro as done in this patch (but this is left to the ppbus maintainer) - all the other locking interfaces may require a similar cleanup, where the most notable case is sx which will allow a further cleanup of vm_map locking facilities - The patch should be fully compatible with older branches, thus a MFC is previewed (infact it uses all the underlying mechanisms already present). Comments review by: eadler, Ben Kaduk Discussed with: kib, jhb MFC after: 1 month	2011-11-20 16:33:09 +00:00
attilio	062b8ecde4	Add more KTR points for failure in vm_radix_insert().	2011-11-20 14:51:27 +00:00
attilio	c68bde9dbe	MFC	2011-11-18 09:54:14 +00:00
alc	e6f2f7a89d	Eliminate end-of-line white space.	2011-11-17 06:54:49 +00:00
alc	3692d01659	Refactor the code that performs physically contiguous memory allocation, yielding a new public interface, vm_page_alloc_contig(). This new function addresses some of the limitations of the current interfaces, contigmalloc() and kmem_alloc_contig(). For example, the physically contiguous memory that is allocated with those interfaces can only be allocated to the kernel vm object and must be mapped into the kernel virtual address space. It also provides functionality that vm_phys_alloc_contig() doesn't, such as wiring the returned pages. Moreover, unlike that function, it respects the low water marks on the paging queues and wakes up the page daemon when necessary. That said, at present, this new function can't be applied to all types of vm objects. However, that restriction will be eliminated in the coming weeks. From a design standpoint, this change also addresses an inconsistency between vm_phys_alloc_contig() and the other vm_phys_alloc*() functions. Specifically, vm_phys_alloc_contig() manipulated vm_page fields that other functions in vm/vm_phys.c didn't. Moreover, vm_phys_alloc_contig() knew about vnodes and reservations. Now, vm_page_alloc_contig() is responsible for these things. Reviewed by: kib Discussed with: jhb	2011-11-16 16:46:09 +00:00
attilio	2cf76e3d27	Fix compilation for userland: - Use CTASSERT() only in the kernel. - the root pointer is required by struct vm_object which is accessible (maybe incorrectly?) by userland.	2011-11-15 23:37:15 +00:00
attilio	b13f89ff5b	MFC	2011-11-15 23:32:30 +00:00
kib	592f122323	Update the device pager interface, while keeping the compatibility layer for old KPI and KBI. New interface should be used together with d_mmap_single cdevsw method. Device pager can be allocated with the cdev_pager_allocate(9) function, which takes struct cdev_pager_ops, containing constructor/destructor and page fault handler methods supplied by driver. Constructor and destructor, called at the pager allocation and deallocation time, allow the driver to handle per-object private data. The pager handler is called to handle page fault on the vm map entry backed by the driver pager. Driver shall return either the vm_page_t which should be mapped, or error code (which does not cause kernel panic anymore). The page handler interface has a placeholder to specify the access mode causing the fault, but currently PROT_READ is always passed there. Sponsored by: The FreeBSD Foundation Reviewed by: alc MFC after: 1 month	2011-11-15 14:40:00 +00:00
kib	93c04deafa	Remove the condition that is always true. Submitted by: alc MFC after: 1 week	2011-11-15 14:09:53 +00:00
attilio	6e9e854884	MFC	2011-11-08 11:08:40 +00:00
ed	0c56cf839d	Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs. The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.	2011-11-07 15:43:11 +00:00
alc	eb6224d517	Wake up the page daemon in vm_page_alloc_freelist() if it couldn't allocate the requested page because too few pages are cached or free. Document the VM_ALLOC_COUNT() option to vm_page_alloc() and vm_page_alloc_freelist(). Make style changes to vm_page_alloc() and vm_page_alloc_freelist(), such as using a variable name that more closely corresponds to the comments.	2011-11-06 02:03:27 +00:00
kib	a527ed8008	Remove redundand definitions. The chunk was missed from r227102. MFC after: 2 weeks	2011-11-05 09:03:18 +00:00
kib	91c9020e20	Provide typedefs for the type of bit mask for the page bits. Use the defined types instead of int when manipulating masks. Supposedly, it could fix support for 32KB page size in the machine-independend VM layer. Reviewed by: alc MFC after: 2 weeks	2011-11-05 08:20:32 +00:00
attilio	65129c5c3e	MFC	2011-11-04 06:56:59 +00:00
alc	89a5842600	Simplify the implementation of the failure case in kmem_alloc_attr().	2011-11-04 04:41:58 +00:00
jhb	78c075174e	Add the posix_fadvise(2) system call. It is somewhat similar to madvise(2) except that it operates on a file descriptor instead of a memory region. It is currently only supported on regular files. Just as with madvise(2), the advice given to posix_fadvise(2) can be divided into two types. The first type provide hints about data access patterns and are used in the file read and write routines to modify the I/O flags passed down to VOP_READ() and VOP_WRITE(). These modes are thus filesystem independent. Note that to ease implementation (and since this API is only advisory anyway), only a single non-normal range is allowed per file descriptor. The second type of hints are used to hint to the OS that data will or will not be used. These hints are implemented via a new VOP_ADVISE(). A default implementation is provided which does nothing for the WILLNEED request and attempts to move any clean pages to the cache page queue for the DONTNEED request. This latter case required two other changes. First, a new V_CLEANONLY flag was added to vinvalbuf(). This requests vinvalbuf() to only flush clean buffers for the vnode from the buffer cache and to not remove any backing pages from the vnode. This is used to ensure clean pages are not wired into the buffer cache before attempting to move them to the cache page queue. The second change adds a new vm_object_page_cache() method. This method is somewhat similar to vm_object_page_remove() except that instead of freeing each page in the specified range, it attempts to move clean pages to the cache queue if possible. To preserve the ABI of struct file, the f_cdevpriv pointer is now reused in a union to point to the currently active advice region if one is present for regular files. Reviewed by: jilles, kib, arch@ Approved by: re (kib) MFC after: 1 month	2011-11-04 04:02:50 +00:00
attilio	11fa9a13f6	MFC	2011-11-03 21:57:02 +00:00
alc	6dbf0412bb	Add support for VM_ALLOC_WIRED and VM_ALLOC_ZERO to vm_page_alloc_freelist() and use these new options in the mips pmap. Wake up the page daemon in vm_page_alloc_freelist() if the number of free and cached pages becomes too low. Tidy up vm_page_alloc_init(). In particular, add a comment about an important restriction on its use. Tested by: jchandra@	2011-11-02 05:42:51 +00:00
jeff	31902d5a7c	- Add some convenience inlines. - Update the copyright.	2011-11-01 04:21:57 +00:00
attilio	f8c5162413	Add kernel protection to the header file for vmradix. Likely this file needs some more restructuration (and we should make a lot of macros private to radix implementation) but leave them as they are so far because we may enrich the KPI much further.	2011-11-01 03:53:10 +00:00
attilio	7272c08497	vm_object_terminate() doesn't actually free the pages in the splay tree. Reclaim all the nodes related to the radix tree for a specified vm_object when calling vm_object_terminate() via the newly added interface vm_radix_reclaim_nodes(). The function is recursive, but we have a well-defined maximum depth, thus the amount of necessary stack can be easilly calculated. Reported by: alc Discussed and reviewed by: jeff	2011-11-01 03:40:38 +00:00
jeff	146842f2d2	- Extract part of vm_radix_lookupn() into a function that just finds the first leaf page in a specified range. This permits us to make many search & operate functions without much code duplication. - Make a generic iterator for radix items.	2011-10-30 22:57:42 +00:00
attilio	44bf71a91c	MFC	2011-10-30 11:43:12 +00:00
jeff	9e2e6a2980	- Support two types of nodes, red and black, within the same radix tree. Black nodes support standard active pages and red nodes support cached pages. Red nodes may be removed without the object lock but will not collapse unused tree nodes. Red nodes may not be directly inserted, instead a new function is supplied to convert between black and red. - Handle cached pages and active pages in the same loop in vm_object_split, vm_object_backing_scan, and vm_object_terminate. - Retire the splay page handling as the ifdefs are too difficult to maintain. - Slightly optimize the vm_radix_lookupn() function.	2011-10-30 11:11:04 +00:00
alc	57e8705396	Eliminate vm_phys_bootstrap_alloc(). It was a failed attempt at eliminating duplicated code in the various pmap implementations. Micro-optimize vm_phys_free_pages(). Introduce vm_phys_free_contig(). It is fast routine for freeing an arbitrary number of physically contiguous pages. In particular, it doesn't require the number of pages to be a power of two. Use "u_long" instead of "unsigned long". Bruce Evans (bde@) has convinced me that the "boundary" parameters to kmem_alloc_contig(), vm_phys_alloc_contig(), and vm_reserv_reclaim_contig() should be of type "vm_paddr_t" and not "u_long". Make this change.	2011-10-30 05:06:14 +00:00
alc	15734d833a	Use "u_long" instead of "unsigned long".	2011-10-28 22:36:15 +00:00
jeff	1f2c60154d	- Use a single uintptr_t for the root of the radix node that encodes the height and a pointer so that the update to the root is atomic. This permits safe lookups in parallel with tree expansion. Shrinking the space requirements is a small bonus.	2011-10-28 03:42:41 +00:00
attilio	22f6379eef	Fix compilation.	2011-10-28 03:07:23 +00:00
attilio	963320ca59	MFC	2011-10-28 02:54:07 +00:00
attilio	588d89046c	Use an UMA zone for the radix node. This avoids the problem to check for the kernel_map/kmem_map recursion because it uses direct mapping provided by amd64 to avoid object and map search and recursion. Probabilly all the others architectures using UMA_MD_SMALL_ALLOC are also fixed by this, but other remains, where the most notable case is i386. For it a solution has still to be determined. A way to do this would be to have a reserved map just for radix node and mark all accesses to its lock to be witness safe, but that would still be unoptimal due to the large amount of virtual address space needed to cater the whole tree.	2011-10-28 01:56:36 +00:00
alc	f553bda045	Tidy up the comment at the head of vm_page_alloc, and mention that the returned page has the flag VPO_BUSY set.	2011-10-27 17:29:19 +00:00
alc	841afea2d9	Eliminate vestiges of page coloring in VM_ALLOC_NOOBJ calls to vm_page_alloc(). While I'm here, for the sake of consistency, always specify the allocation class, such as VM_ALLOC_NORMAL, as the first of the flags.	2011-10-27 16:39:17 +00:00
alc	0a7d6450d6	contigmalloc(9) and contigfree(9) are now implemented in terms of other more general VM system interfaces. So, their implementation can now reside in kern_malloc.c alongside the other functions that are declared in malloc.h.	2011-10-27 02:52:24 +00:00
alc	955d2b5af8	Speed up vm_page_cache() and vm_page_remove() by checking for a few common cases that can be handled in constant time. The insight being that a page's parent in the vm object's tree is very often its predecessor or successor in the vm object's ordered memq. Tested by: jhb MFC after: 10 days	2011-10-25 16:35:08 +00:00
attilio	5ce32374b4	Remove the stub for faking promotion failure now that the reservations are fully supported.	2011-10-23 21:37:01 +00:00
jeff	e6b90196c0	- Implement vm_radix_lookup_le(). - Fix vm_radix_lookupn() when max == -1 by making the end parameter inclusive.	2011-10-23 01:19:01 +00:00
attilio	c2583b6922	Check in an intial implementation of radix tree implementation to replace the vm object pages splay. TODO: - Handle differently the negative keys for having smaller depth index nodes (negative keys caming from indirect blocks) - Fix the get_node() by having support for a low reserved objects directly from UMA - Implement the lookup_le and re-enable VM_NRESERVELEVEL = 1 - Try to rework the superpage splay of idle pages and the cache splay for every vm object in order to regain space on vm_page structure - Verify performance and improve them (likely by having consumers to deal with several ranges of pages manually?) Obtained from: jeff, Mayur Shardul (GSoC 2009)	2011-10-22 23:34:37 +00:00
attilio	846436b8c8	VN_NRESERVLEVEL is used in this file but opt_vm is not included thus the stub switch won't be correctly handled. Include opt_vm.h. Submitted by: jeff MFC after: 3 days	2011-10-22 22:00:35 +00:00
kib	8e118d38cf	Control the execution permission of the readable segments for i386 binaries on the amd64 and ia64 with the sysctl, instead of unconditionally enabling it. Reviewed by: marcel	2011-10-15 12:35:18 +00:00
jhb	0d19c767ae	Fix a typo in a comment.	2011-10-14 11:48:32 +00:00
marcel	d5c0a67c82	In sys_obreak() and when compiling for amd64 or ia64, when the process is ILP32 (i.e. i386) grant execute permissions by default. The JDK 1.4.x depends on being able to execute from the heap on i386.	2011-10-13 16:20:10 +00:00
glebius	2522c42334	Make memguard(9) capable to guard uma(9) allocations.	2011-10-12 18:08:28 +00:00
kib	eca9de9f4b	Style nit. Submitted by: jhb MFC after: 2 weeks	2011-09-29 00:44:34 +00:00
kib	2e7081472b	Fix grammar. Submitted by: bf MFC after: 2 weeks	2011-09-28 16:12:15 +00:00
kib	e84b0ecd81	Use the trick of performing the atomic operation on the contained aligned word to handle the dirty mask updates in vm_page_clear_dirty_mask(). Remove the vm page queue lock around vm_page_dirty() call in vm_fault_hold() the sole purpose of which was to protect dirty on architectures which does not provide short or byte-wide atomics. Reviewed by: alc, attilio Tested by: flo (sparc64) MFC after: 2 weeks	2011-09-28 14:57:50 +00:00
kib	3383aff65a	Use the explicitly-sized types for the dirty and valid masks. Requested by: attilio Reviewed by: alc MFC after: 2 weeks	2011-09-28 14:51:28 +00:00
kmacy	99851f359e	In order to maximize the re-usability of kernel code in user space this patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)	2011-09-16 13:58:51 +00:00
kib	a9d505a22a	Split the vm_page flags PG_WRITEABLE and PG_REFERENCED into atomic flags field. Updates to the atomic flags are performed using the atomic ops on the containing word, do not require any vm lock to be held, and are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9) functions are provided to modify afalgs. Document the changes to flags field to only require the page lock. Introduce vm_page_reference(9) function to provide a stable KPI and KBI for filesystems like tmpfs and zfs which need to mark a page as referenced. Reviewed by: alc, attilio Tested by: marius, flo (sparc64); andreast (powerpc, powerpc64) Approved by: re (bz)	2011-09-06 10:30:11 +00:00
kib	97e23a4cb5	Update some comments in swap_pager.c. Reviewed and most wording by: alc MFC after: 1 week Approved by: re (bz)	2011-08-22 20:44:18 +00:00
kib	1d94389f8f	Apply the limit to avoid the overflows in the radix tree subr_blist.c after the conversion of the swap device size to the page size units, not before. That lifts the limit on the usable swap partition size from 32GB to 256GB, that is less depressing for the modern systems. Submitted by: Alexander V. Chernikov <melifaro ipfw ru> Reviewed by: alc Approved by: re (bz) MFC after: 2 weeks	2011-08-22 11:18:47 +00:00
rwatson	4af919b491	Second-to-last commit implementing Capsicum capabilities in the FreeBSD kernel for FreeBSD 9.0: Add a new capability mask argument to fget(9) and friends, allowing system call code to declare what capabilities are required when an integer file descriptor is converted into an in-kernel struct file *. With options CAPABILITIES compiled into the kernel, this enforces capability protection; without, this change is effectively a no-op. Some cases require special handling, such as mmap(2), which must preserve information about the maximum rights at the time of mapping in the memory map so that they can later be enforced in mprotect(2) -- this is done by narrowing the rights in the existing max_protection field used for similar purposes with file permissions. In namei(9), we assert that the code is not reached from within capability mode, as we're not yet ready to enforce namespace capabilities there. This will follow in a later commit. Update two capability names: CAP_EVENT and CAP_KEVENT become CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they represent. Approved by: re (bz) Submitted by: jonathan Sponsored by: Google Inc	2011-08-11 12:30:23 +00:00
kib	f408aa11a3	- Move the PG_UNMANAGED flag from m->flags to m->oflags, renaming the flag to VPO_UNMANAGED (and also making the flag protected by the vm object lock, instead of vm page queue lock). - Mark the fake pages with both PG_FICTITIOUS (as it is now) and VPO_UNMANAGED. As a consequence, pmap code now can use use just VPO_UNMANAGED to decide whether the page is unmanaged. Reviewed by: alc Tested by: pho (x86, previous version), marius (sparc64), marcel (arm, ia64, powerpc), ray (mips) Sponsored by: The FreeBSD Foundation Approved by: re (bz)	2011-08-09 21:01:36 +00:00
alc	24eb641020	Fix an error in kmem_alloc_attr(). Unless "tries" is updated, kmem_alloc_attr() could get stuck in a loop. Approved by: re (kib) MFC after: 3 days	2011-08-07 00:11:39 +00:00
kib	96a4fe50dc	Implement the linprocfs swaps file, providing information about the configured swap devices in the Linux-compatible format. Based on the submission by: Robert Millan <rmh debian org> PR: kern/159281 Reviewed by: bde Approved by: re (kensmith) MFC after: 2 weeks	2011-08-01 19:12:15 +00:00
kib	645f499b5d	Fix a race in the device pager allocation. If another thread won and allocated the device pager for the given handle, then the object fictitious pages list and the object membership in the global object list still need to be initialized. Otherwise, dev_pager_dealloc() will traverse uninitialized pointers. Reported and tested by: pho Reviewed by: jhb Approved by: re (kensmith) MFC after: 1 week	2011-07-30 14:13:57 +00:00
kib	e45048555a	Extract the code to translate VM error into errno, into an exported function vm_mmap_to_errno(). It is useful for the drivers that implement mmap(2)-like functionality, to be able to return error codes consistent with mmap(2). Sponsored by: The FreeBSD Foundation No objections from: alc MFC after: 1 week	2011-07-10 20:49:13 +00:00
kib	315e379ec2	Style. MFC after: 3 days	2011-07-10 20:45:13 +00:00
kib	61e3fec296	Add a facility to disable processing page faults. When activated, uiomove generates EFAULT if any accessed address is not mapped, as opposed to handling the fault. Sponsored by: The FreeBSD Foundation Reviewed by: alc (previous version)	2011-07-09 15:21:10 +00:00
trasz	4a17b24427	All the racct_*() calls need to happen with the proc locked. Fixing this won't happen before 9.0. This commit adds "#ifdef RACCT" around all the "PROC_LOCK(p); racct_whatever(p, ...); PROC_UNLOCK(p)" instances, in order to avoid useless locking/unlocking in kernels built without "options RACCT".	2011-07-06 20:06:44 +00:00
attilio	9be9b5e188	Handle a race between device_pager and devsw in a more graceful manner: return an error code rather than panic the kernel. Sponsored by: Sandvine Incorporated Reviewed by: kib Tested by: pho MFC after: 2 weeks	2011-07-06 15:09:52 +00:00
alc	dd0c3b188c	Initialize marker pages as held rather than fictitious/wired. Marking the page as held is more useful as a safety precaution in case someone forgets to check for PG_MARKER. Reviewed by: kib	2011-07-02 23:34:47 +00:00
alc	21902be08c	Add a new option, OBJPR_NOTMAPPED, to vm_object_page_remove(). Passing this option to vm_object_page_remove() asserts that the specified range of pages is not mapped, or more precisely that none of these pages have any managed mappings. Thus, vm_object_page_remove() need not call pmap_remove_all() on the pages. This change not only saves time by eliminating pointless calls to pmap_remove_all(), but it also eliminates an inconsistency in the use of pmap_remove_all() versus related functions, like pmap_remove_write(). It eliminates harmless but pointless calls to pmap_remove_all() that were being performed on PG_UNMANAGED pages. Update all of the existing assertions on pmap_remove_all() to reflect this change. Reviewed by: kib	2011-06-29 16:40:41 +00:00
alc	e7ea911039	Revert to using the page queues lock in vm_page_clear_dirty_mask() on MIPS. (At present, although atomic_clear_char() is defined by atomic.h on MIPS, it is not actually implemented by support.S.)	2011-06-23 05:23:59 +00:00
alc	95eeb54f18	Precisely document the synchronization rules for the page's dirty field. (Saying that the lock on the object that the page belongs to must be held only represents one aspect of the rules.) Eliminate the use of the page queues lock for atomically performing read- modify-write operations on the dirty field when the underlying architecture supports atomic operations on char and short types. Document the fact that 32KB pages aren't really supported. Reviewed by: attilio, kib	2011-06-19 19:13:24 +00:00
kib	6b9465356d	Assert that page is VPO_BUSY or page owner object is locked in vm_page_undirty(). The assert is not precise due to VPO_BUSY owner to tracked, so assertion does not catch the case when VPO_BUSY is owned by other thread. Reviewed by: alc	2011-06-11 20:15:19 +00:00
kib	27bd440e10	Fix a bug in r222586. Lock the page owner object around the modification of the m->dirty. Reported and tested by: nwhitehorn Reviewed by: alc	2011-06-11 20:13:28 +00:00
kib	ad5bd06523	In the VOP_PUTPAGES() implementations, change the default error from VM_PAGER_AGAIN to VM_PAGER_ERROR for the uwritten pages. Return VM_PAGER_AGAIN for the partially written page. Always forward at least one page in the loop of vm_object_page_clean(). VM_PAGER_ERROR causes the page reactivation and does not clear the page dirty state, so the write is not lost. The change fixes an infinite loop in vm_object_page_clean() when the filesystem returns permanent errors for some page writes. Reported and tested by: gavin Reviewed by: alc, rmacklem MFC after: 1 week	2011-06-01 21:00:28 +00:00
alc	2c928f6173	Correct an error in r222163. Unless UMA_MD_SMALL_ALLOC is defined, startup_alloc() must be used until uma_startup2() is called. Reported by: jh	2011-05-22 17:46:16 +00:00
alc	a899800e2a	1. Prior to r214782, UMA did not support multipage allocations before uma_startup2() was called. Thus, setting the variable "booted" to true in uma_startup() was ok on machines with UMA_MD_SMALL_ALLOC defined, because any allocations made after uma_startup() but before uma_startup2() could be satisfied by uma_small_alloc(). Now, however, some multipage allocations are necessary before uma_startup2() just to allocate zone structures on machines with a large number of processors. Thus, a Boolean can no longer effectively describe the state of the UMA allocator. Instead, make "booted" have three values to describe how far initialization has progressed. This allows multipage allocations to continue using startup_alloc() until uma_startup2(), but single-page allocations may begin using uma_small_alloc() after uma_startup(). 2. With the aforementioned change, only a modest increase in boot pages is necessary to boot UMA on a large number of processors. 3. Retire UMA_MD_SMALL_ALLOC_NEEDS_VM. It has only been used between r182028 and r204128. Reviewed by: attilio [1], nwhitehorn [3] Tested by: sbruno	2011-05-21 17:43:43 +00:00
alc	3c08fd7d05	Fix spelling errors.	2011-05-20 17:28:00 +00:00
alc	4037bb644a	Eliminate a redundant #include. ("vm/vm_param.h" already includes "machine/vmparam.h".)	2011-05-20 15:26:31 +00:00
mdf	3d3b036f95	Move the ZERO_REGION_SIZE to a machine-dependent file, as on many architectures (i386, for example) the virtual memory space may be constrained enough that 2MB is a large chunk. Use 64K for arches other than amd64 and ia64, with special handling for sparc64 due to differing hardware. Also commit the comment changes to kmem_init_zero_region() that I missed due to not saving the file. (Darn the unfamiliar development environment). Arch maintainers, please feel free to adjust ZERO_REGION_SIZE as you see fit. Requested by: alc MFC after: 1 week MFC with: r221853	2011-05-13 19:35:01 +00:00
mdf	9465c34001	Usa a globally visible region of zeros for both /dev/zero and the md device. There are likely other kernel uses of "blob of zeros" than can be converted. Reviewed by: alc MFC after: 1 week	2011-05-13 18:48:00 +00:00
mlaier	39f7e10a26	Another long standing vm bug found at Isilon: Fix a race between vm_object_collapse and vm_fault. Reviewed by: alc@ MFC after: 3 days	2011-05-09 20:27:49 +00:00
obrien	1b56a148b0	Reap old SPL comments. Reviewed by: alc	2011-04-26 22:18:53 +00:00
kib	d98bdded17	Fix two bugs in r218670. Hold the vnode around the region where object lock is dropped, until vnode lock is acquired. Do not drop the vnode reference for a case when the object was deallocated during unlock. Note that in this case, VV_TEXT is cleared by vnode_pager_dealloc(). Reported and tested by: pho Reviewed by: alc MFC after: 3 days	2011-04-23 21:38:21 +00:00
jhb	96b1d8b6d7	Fix several places to ignore processes that are not yet fully constructed. MFC after: 1 week	2011-04-06 17:47:22 +00:00
trasz	440cd5face	In vm_daemon(), do not skip processes stopped with SIGSTOP.	2011-04-06 16:27:04 +00:00
trasz	71afa1f865	Add RACCT_RSS. Sponsored by: The FreeBSD Foundation Reviewed by: kib (earlier version)	2011-04-06 16:24:24 +00:00
trasz	92bec9b84c	Add accounting for most of the memory-related resources. Sponsored by: The FreeBSD Foundation Reviewed by: kib (earlier version)	2011-04-05 20:23:59 +00:00
kib	eeb1ebf124	Handle the corner case in vm_fault_quick_hold_pages(). If supplied length is zero, and user address is invalid, function might return -1, due to the truncation and rounding of the address. The callers interpret the situation as EFAULT. Instead of handling the zero length in caller, filter it in vm_fault_quick_hold_pages(). Sponsored by: The FreeBSD Foundation Reviewed by: alc	2011-03-25 16:38:10 +00:00
jhb	c7ac62aecd	Fix some locking nits with the p_state field of struct proc: - Hold the proc lock while changing the state from PRS_NEW to PRS_NORMAL in fork to honor the locking requirements. While here, expand the scope of the PROC_LOCK() on the new process (p2) to avoid some LORs. Previously the code was locking the new child process (p2) after it had locked the parent process (p1). However, when locking two processes, the safe order is to lock the child first, then the parent. - Fix various places that were checking p_state against PRS_NEW without having the process locked to use PROC_LOCK(). Every place was already locking the process, just after the PRS_NEW check. - Remove or reduce the use of PROC_SLOCK() for places that were checking p_state against PRS_NEW. The PROC_LOCK() alone is sufficient for reading the current state. - Reorder fill_kinfo_proc() slightly so it only acquires PROC_SLOCK() once. MFC after: 1 week	2011-03-24 18:40:11 +00:00
jeff	2d7d8c05e7	- Merge changes to the base system to support OFED. These include a wider arg2 for sysctl, updates to vlan code, IFT_INFINIBAND, and other miscellaneous small features.	2011-03-21 09:40:01 +00:00
trasz	1eb6b91508	In vm_daemon(), when iterating over all processes in the system, skip those which are not yet fully initialized (i.e. ones with p_state == PRS_NEW). Without it, we could panic in _thread_lock_flags(). Note that there may be other instances of FOREACH_PROC_IN_SYSTEM() that require similar fix. Reported by: pho, keramida Discussed with: kib	2011-03-18 06:47:23 +00:00
alc	9e6c312311	Eliminate duplication of the fake page code and zone by the device and sg pagers. Reviewed by: jhb	2011-03-11 07:07:48 +00:00
brucec	3bd182f4eb	Change the return type of vmspace_swap_count to a long to match the other vmspace_*_count functions. MFC after: 3 days	2011-03-01 11:04:30 +00:00
pluknet	3061aea0d2	Remove sysctl vm.max_proc_mmap used to protect from KVA space exhaustion. As it was pointed out by Alan Cox, that no longer serves its purpose with the modern UMA allocator compared to the old one used in 4.x days. The removal of sysctl eliminates max_proc_mmap type overflow leading to the broken mmap(2) seen with large amount of physical memory on arches with factually unbound KVA space (such as amd64). It was found that slightly less than 256GB of physmem was enough to trigger the overflow. Reviewed by: alc, kib Approved by: avg (mentor) MFC after: 2 months	2011-02-24 09:22:56 +00:00
brucec	2d8d5824cb	Calculate and return the count in vmspace_swap_count as a vm_offset_t instead of an int to avoid overflow. While here, clean up some style(9) issues. PR: kern/152200 Reviewed by: kib MFC after: 2 weeks	2011-02-23 10:28:37 +00:00
alc	2f4da8e71e	Remove pmap fields that are either unused or not fully implemented. Discussed with: kib	2011-02-17 15:36:29 +00:00
kib	d20e0514a9	Since r218070 reenabled the call to vm_map_simplify_entry() from vm_map_insert(), the kmem_back() assumption about newly inserted entry might be broken due to interference of two factors. In the low memory condition, when vm_page_alloc() returns NULL, supplied map is unlocked. If another thread performs kmem_malloc() meantime, and its map entry is placed right next to our thread map entry in the map, both entries wire count is still 0 and entries are coalesced due to vm_map_simplify_entry(). Mark new entry with MAP_ENTRY_IN_TRANSITION to prevent coalesce. Fix some style issues, tighten the assertions to account for MAP_ENTRY_IN_TRANSITION state. Reported and tested by: pho Reviewed by: alc	2011-02-15 09:03:58 +00:00
kib	210cf47742	Lock the vnode around clearing of VV_TEXT flag. Remove mp_fixme() note mentioning that vnode lock is needed. Reviewed by: alc Tested by: pho MFC after: 1 week	2011-02-13 21:52:26 +00:00

... 3 4 5 6 7 ...

3238 Commits