freebsd-dev

Author	SHA1	Message	Date
Dag-Erling Smørgrav	db0390e833	No memory barrier is required. This was pointed out by kib@ a while ago, but I got distracted by other matters.	2012-09-04 19:04:02 +00:00
Andrey Zonov	cfe52ecf0e	- After r240026 sgrowsiz should be used in a safer maner. Approved by: kib (mentor) MCF after: 1 week	2012-09-03 09:34:46 +00:00
Andrey Zonov	e145130e71	- Remove accounting of locked memory from vsunlock(9) that I missed in r239818. Approved by: kib (mentor)	2012-08-30 08:03:33 +00:00
Andrey Zonov	126a63ce6c	- Don't take an account of locked memory for current process in vslock(9). There are two consumers of vslock(9): sysctl code and drm driver. These consumers are using locked memory as transient memory, it doesn't belong to a process's memory. Suggested by: avg Reviewed by: alc Approved by: kib (mentor) MFC after: 2 weeks	2012-08-29 11:23:59 +00:00
Sergey Kandaurov	9462305cbe	Typo in previous change: print half the theoretical maximum as maximum recommended amount. Reported by: <site freebsd at orientalsensation com> Reviewed by: des	2012-08-27 10:59:49 +00:00
Gleb Smirnoff	42321809c4	Fix function name in keg_cachespread_init() assert.	2012-08-26 09:54:11 +00:00
Dag-Erling Smørgrav	3ff863f1aa	- When running out of swzone, instead of spewing an error message every tick until the situation is resolved (if ever), just print a single message when running out and another when space becomes available. - When adding more swap, warn if the total amount exceeds half the theoretical maximum we can handle.	2012-08-16 08:29:49 +00:00
Konstantin Belousov	ee4116b8f7	For old mmap syscall, when executing on amd64 or ia64, enforce the PROT_EXEC if prot is non-zero, process is 32bit and kern.elf32.i386_read_exec syscal is enabled. This workaround is needed for old i386 a.out binaries, where dynamic linker did not specified PROT_EXEC for mapping of the text. The kern.elf32.i386_read_exec MIB name looks weird for a.out binaries, but I reused the existing knob which already has the needed semantic. MFC after: 1 week	2012-08-14 12:11:48 +00:00
Konstantin Belousov	7707ccabfb	Adjust the r205536, by allowing a non-zero offset for anonymous mappings for a.out binaries. Apparently, a.out ld.so from FreeBSD 1.1.5.1 can issue such requests. Reported and tested by: Dan Plassche <dplassche@gmail.com> MFC after: 1 week	2012-08-14 11:47:07 +00:00
Konstantin Belousov	b6c00483e9	Do not leave invalid pages in the object after the short read for a network file systems (not only NFS proper). Short reads cause pages other then the requested one, which were not filled by read response, to stay invalid. Change the vm_page_readahead_finish() interface to not take the error code, but instead to make a decision to free or to (de)activate the page only by its validity. As result, not requested invalid pages are freed even if the read RPC indicated success. Noted and reviewed by: alc MFC after: 1 week	2012-08-14 11:45:47 +00:00
Alan Cox	18f55b0171	Never sleep on busy pages in vm_pageout_launder(), always skip them. Long ago, sleeping on busy pages in vm_pageout_launder() made sense. The call to vm_pageout_flush() specified asynchronous I/O and sleeping on busy pages blocked vm_pageout_launder() until the flush had completed. However, in CVS revision 1.35 of vm/vm_contig.c, the call to vm_pageout_flush() was changed to request synchronous I/O, but the sleep on busy pages was not removed.	2012-08-07 04:48:14 +00:00
Konstantin Belousov	1c771f9222	After the PHYS_TO_VM_PAGE() function was de-inlined, the main reason to pull vm_param.h was removed. Other big dependency of vm_page.h on vm_param.h are PA_LOCK* definitions, which are only needed for in-kernel code, because modules use KBI-safe functions to lock the pages. Stop including vm_param.h into vm_page.h. Include vm_param.h explicitely for the kernel code which needs it. Suggested and reviewed by: alc MFC after: 2 weeks	2012-08-05 14:11:42 +00:00
Konstantin Belousov	0055cbd3c5	Reduce code duplication and exposure of direct access to struct vm_page oflags by providing helper function vm_page_readahead_finish(), which handles completed reads for pages with indexes other then the requested one, for VOP_GETPAGES(). Reviewed by: alc MFC after: 1 week	2012-08-04 18:16:43 +00:00
Alan Cox	369763e31a	Inline vm_page_aflags_clear() and vm_page_aflags_set(). Add comments stating that neither these functions nor the flags that they are used to manipulate are part of the KBI.	2012-08-03 01:48:15 +00:00
Alan Cox	d26a90a8b2	Eliminate an unneeded declaration. (I should have removed this as part of r227568.)	2012-07-30 20:38:37 +00:00
Konstantin Belousov	311e34e260	Do not requeue held page or page for which locking failed, just leave them alone. Process the act_count updates for the held pages in the vm_pageout loop over the inactive queue, instead of refusing to do anything with such page. Clarify the intent of the addl_page_shortage counter and change its use for pages which are not processed in the loop according to the description. Reviewed by: alc MFC after: 2 weeks	2012-07-26 09:06:48 +00:00
Alan Cox	2cba1ccd0e	Addendum to r238604. If the inactive queue scan isn't restarted, then the variable "addl_page_shortage_init" isn't needed. X-MFC after: r238604	2012-07-24 02:35:30 +00:00
Konstantin Belousov	d4961bcb3a	Do not restart scan of the inactive queue when non-inactive page is found. Rather, we shall not find such pages on inactive queue at all. Requested and reviewed by: alc MFC after: 2 weeks	2012-07-18 21:47:50 +00:00
Alan Cox	85eeca35b9	Move what remains of vm/vm_contig.c into vm/vm_pageout.c, where similar code resides. Rename vm_contig_grow_cache() to vm_pageout_grow_cache(). Reviewed by: kib	2012-07-18 05:21:34 +00:00
Alan Cox	da1ab8a4a0	Correct vm_page_alloc_contig()'s implementation of VM_ALLOC_NODUMP.	2012-07-17 02:36:59 +00:00
Alan Cox	907e4524dc	Various improvements to vm_contig_grow_cache(). Most notably, even when it can't sleep, it can still move clean pages from the inactive queue to the cache. Also, when a page is cached, there is no need to restart the scan. The "next" page pointer held by vm_contig_launder() is still valid. Finally, add a comment summarizing what vm_contig_grow_cache() does based upon the value of "tries". MFC after: 3 weeks	2012-07-16 18:13:43 +00:00
Alan Cox	476f7f2423	Correct an off-by-one error in vm_reserv_alloc_contig() that resulted in the last reservation of a multi-reservation allocation not being initialized.	2012-07-15 21:46:19 +00:00
Matthew D Fleming	f806cdcf99	Fix a bug with memguard(9) on 32-bit architectures without a VM_KMEM_MAX_SIZE. The code was not taking into account the size of the kernel_map, which the kmem_map is allocated from, so it could produce a sub-map size too large to fit. The simplest solution is to ignore VM_KMEM_MAX entirely and base the memguard map's size off the kernel_map's size, since this is always relevant and always smaller. Found by: Justin Hibbits	2012-07-15 20:29:48 +00:00
Alan Cox	9757857c4f	If vm_contig_grow_cache() is allowed to sleep, then invoke the vm_lowmem handlers.	2012-07-14 20:14:03 +00:00
Alan Cox	0ff0fc84c2	Move kmem_alloc_{attr,contig}() to vm/vm_kern.c, where similarly named functions reside. Correct the comment describing kmem_alloc_contig().	2012-07-14 18:10:44 +00:00
Attilio Rao	571a1e92aa	Document the object type movements, related to swp_pager_copy(), in vm_object_collapse() and vm_object_split(). In collabouration with: alc MFC after: 3 days	2012-07-11 01:04:59 +00:00
Konstantin Belousov	a4156419ca	Avoid vm page queues lock leak after r238212. Reported and tested by: Michael Butler <imb protected-networks net> Reviewed by: alc Pointy hat to: kib MFC after: 20 days	2012-07-08 18:04:26 +00:00
Konstantin Belousov	48cc2fc774	Drop page queues mutex on each iteration of vm_pageout_scan over the inactive queue, unless busy page is found. Dropping the mutex often should allow the other lock acquires to proceed without waiting for whole inactive scan to finish. On machines with lot of physical memory scan often need to iterate a lot before it finishes or finds a page which requires laundring, causing high latency for other lock waiters. Suggested and reviewed by: alc MFC after: 3 weeks	2012-07-07 19:39:08 +00:00
Eitan Adler	c288b54837	Add missing sleep stat increase PR: kern/168211 Submitted by: linimon Reviewed by: alc Approved by: cperciva MFC after: 3 days	2012-07-07 17:46:11 +00:00
Konstantin Belousov	5d10ef2096	Style. Reviewed by: alc (previous version) MFC after: 1 week	2012-07-06 20:13:16 +00:00
John Baldwin	687c94aac9	Honor db_pager_quit in 'show uma' and 'show malloc'. MFC after: 1 month	2012-07-02 16:14:52 +00:00
Alan Cox	e30df26e7b	Add new pmap layer locks to the predefined lock order. Change the names of a few existing VM locks to follow a consistent naming scheme.	2012-06-27 03:45:25 +00:00
Attilio Rao	ffdd0c7db3	- Add a comment explaining the locking of the cached pages pool held by vm_objects. - Add flags for the per-object lock and free pages queue mutex lock. Use the newly added flags to mark the cache root within the vm_object structure. Please note that other vm_object members should be marked with correct locking but they are left for other commits. In collabouration with: alc MFC after: 3 days3 days3 days	2012-06-22 18:34:11 +00:00
Alan Cox	eddc92918e	Selectively inline vm_page_dirty().	2012-06-20 23:25:47 +00:00
John Baldwin	6fbe60fa8b	Move the per-thread deferred user map entries list into a private list in vm_map_process_deferred() which is then iterated to release map entries. This avoids having a nested vm map unlock operation called from the loop body attempt to recuse into vm_map_process_deferred(). This can happen if the vm_map_remove() triggers the OOM killer. Reviewed by: alc, kib MFC after: 1 week	2012-06-20 18:00:26 +00:00
Attilio Rao	db9ba57895	Do a more targeted check on the page cache and avoid to check the cache pointer directly in vnode_pager_setsize() by using newly introduced vm_page_is_cached() function. Reviewed by: alc MFC after: 2 weeks X-MFC: r234039,234064	2012-06-16 21:39:00 +00:00
Alan Cox	6031c68de4	The page flag PGA_WRITEABLE is set and cleared exclusively by the pmap layer, but it is read directly by the MI VM layer. This change introduces pmap_page_is_write_mapped() in order to completely encapsulate all direct access to PGA_WRITEABLE in the pmap layer. Aesthetics aside, I am making this change because amd64 will likely begin using an alternative method to track write mappings, and having pmap_page_is_write_mapped() in place allows me to make such a change without further modification to the MI VM layer. As an added bonus, tidy up some nearby comments concerning page flags. Reviewed by: kib MFC after: 6 weeks	2012-06-16 18:56:19 +00:00
Konstantin Belousov	83ce08538a	Use the previous stack entry protection and max protection to correctly propagate the stack execution permissions when stack is grown down. First, curproc->p_sysent->sv_stackprot specifies maximum allowed stack protection for current ABI, so the new stack entry was typically marked executable always. Second, for non-main stack MAP_STACK mapping, the PROT_ flags should be used which were specified at the mmap(2) call time, and not sv_stackprot. MFC after: 1 week	2012-06-10 11:31:50 +00:00
Eitan Adler	0a4a2b8e62	Revert r236380 PR: kern/166780 Requested by: many Approved by: cperciva (implicit)	2012-06-01 18:58:50 +00:00
Eitan Adler	71ee98c97c	Add sysctl to query amount of swap space free PR: kern/166780 Submitted by: Radim Kolar <hsn@sendmail.cz> Approved by: cperciva MFC after: 1 week	2012-06-01 04:42:52 +00:00
Maksim Yevmenkin	251386b4b2	Tweak condition for disabling allocation from per-CPU buckets in low memory situation. I've observed a situation where per-CPU allocations were disabled while there were enough free cached pages. Basically, cnt.v_free_count was sitting stable at a value lower than cnt.v_free_min and that caused massive performance drop. Reviewed by: alc MFC after: 1 week	2012-05-23 18:56:29 +00:00
Konstantin Belousov	4d34e019c4	Calculate the count of per-process cow faults. Export the count to userspace using the obscure spare int field in struct kinfo_proc. Submitted by: Andrey Zonov <andrey zonov org> MFC after: 1 week	2012-05-23 18:10:54 +00:00
Andriy Gapon	b6062382be	vm_pager_object_lookup: small performance optimization do not needlessly lock an object if its handle doesn't match Reviewed by: kib, alc MFC after: 1 week	2012-05-23 12:51:49 +00:00
Andrew Turner	c415e17250	Fix booting on ARM. In PHYS_TO_VM_PAGE() when VM_PHYSSEG_DENSE is set the check if we are past the end of vm_page_array was incorrect causing it to return NULL. This value is then used in vm_phys_add_page causing a data abort. Reviewed by: alc, kib, imp Tested by: stas	2012-05-22 07:04:23 +00:00
Nathan Whitehorn	ccc4a5c761	Replace the list of PVOs owned by each PMAP with an RB tree. This simplifies range operations like pmap_remove() and pmap_protect() as well as allowing simple operations like pmap_extract() not to involve any global state. This substantially reduces lock coverages for the global table lock and improves concurrency.	2012-05-20 14:33:28 +00:00
Konstantin Belousov	df2f557df6	Do not double-reference the found vm object in cdev_pager_lookup(). vm_pager_object_lookup() already referenced the object. Note that there is no in-tree consumers of cdev_pager_lookup(). The only known user of the function is i915 gem driver, which is not yet imported. This should make the KPI change minor. Submitted by: avg MFC after: 1 week	2012-05-18 10:23:47 +00:00
Konstantin Belousov	b7ac5a8571	Add new pager type, OBJT_MGTDEVICE. It provides the device pager which carries fictitous managed pages. In particular, the consumers of the new object type can remove all mappings of the device page with pmap_remove_all(). The range of physical addresses used for fake page allocation shall be registered with vm_phys_fictitious_reg_range() interface to allow the PHYS_TO_VM_PAGE() to work in pmap. Most likely, only i386 and amd64 pmaps can handle fictitious managed pages right now. Sponsored by: The FreeBSD Foundation Reviewed by: alc MFC after: 1 month	2012-05-12 20:49:58 +00:00
Konstantin Belousov	b6de32bd9b	Add a facility to register a range of physical addresses to be used for allocation of fictitious pages, for which PHYS_TO_VM_PAGE() returns proper fictitious vm_page_t. The range should be de-registered after consumer stopped using it. De-inline the PHYS_TO_VM_PAGE() since it now carries code to iterate over registered ranges. A hash container might be developed instead of range registration interface, and fake pages could be put automatically into the hash, were PHYS_TO_VM_PAGE() could look them up later. This should be considered before the MFC of the commit is done. Sponsored by: The FreeBSD Foundation Reviewed by: alc MFC after: 1 month	2012-05-12 20:42:56 +00:00
Konstantin Belousov	e461aae747	Split the code from vm_page_getfake() to initialize the fake page struct vm_page into new interface vm_page_initfake(). Handle the case of fake page re-initialization with changed memattr. Sponsored by: The FreeBSD Foundation Reviewed by: alc MFC after: 1 month	2012-05-12 20:34:22 +00:00
Konstantin Belousov	116c213502	Assert that the page passed to vm_page_putfake() is unmanaged. Sponsored by: The FreeBSD Foundation Reviewed by: alc MFC after: 1 month	2012-05-12 20:27:51 +00:00
Konstantin Belousov	7900f95d88	Assert that fictitious or unmanaged pages do not appear on active/inactive lists. Sponsored by: The FreeBSD Foundation Reviewed by: alc MFC after: 1 month	2012-05-12 20:24:46 +00:00
Konstantin Belousov	13a0b7bcc4	Commit the change forgotten in r235356. Sponsored by: The FreeBSD Foundation Reviewed by: alc MFC after: 1 month	2012-05-12 20:10:18 +00:00
Konstantin Belousov	0c26bb71f6	Make the vm_page_array_size long. Remove redundand zero initialization for vm_page_array_size and nearby variablees. Sponsored by: The FreeBSD Foundation Reviewed by: alc MFC after: 1 month	2012-05-12 20:03:06 +00:00
Alan Cox	13458803f4	Give vm_fault()'s sequential access optimization a makeover. There are two aspects to the sequential access optimization: (1) read ahead of pages that are expected to be accessed in the near future and (2) unmap and cache behind of pages that are not expected to be accessed again. This revision changes both aspects. The read ahead optimization is now more effective. It starts with the same initial read window as before, but arithmetically grows the window on sequential page faults. This can yield increased read bandwidth. For example, on one of my machines, a program using mmap() to read a file that is several times larger than the machine's physical memory takes about 17% less time to complete. The unmap and cache behind optimization is now more selectively applied. The read ahead window must grow to its maximum size before unmap and cache behind is performed. This significantly reduces the number of times that pages are unmapped and cached only to be reactivated a short time later. The unmap and cache behind optimization now clears each page's referenced flag. Previously, in the case of dirty pages, if the containing file was still mapped at the time that the page daemon examined the dirty pages, they would be reactivated. From a stylistic standpoint, this revision also cleanly separates the implementation of the read ahead and unmap/cache behind optimizations. Glanced at: kib MFC after: 2 weeks	2012-05-10 15:16:42 +00:00
Nathan Whitehorn	0b852c03eb	Avoid a lock order reversal in pmap_extract_and_hold() from relocking the page. This PMAP requires an additional lock besides the PMAP lock in pmap_extract_and_hold(), which vm_page_pa_tryrelock() did not release. Suggested by: kib MFC after: 4 days	2012-04-22 17:58:30 +00:00
Konstantin Belousov	1472f4f4b9	When MAP_STACK mapping is created, the map entry is created only to cover the initial stack size. For MCL_WIREFUTURE maps, the subsequent call to vm_map_wire() to wire the whole stack region fails due to VM_MAP_WIRE_NOHOLES flag. Use the VM_MAP_WIRE_HOLESOK to only wire mapped part of the stack. Reported and tested by: Sushanth Rai <sushanth_rai yahoo com> Reviewed by: alc MFC after: 1 week	2012-04-21 18:36:53 +00:00
Alan Cox	2aa163dc57	As documented in vm_page.h, updates to the vm_page's flags no longer require the page queues lock. MFC after: 1 week	2012-04-21 18:26:16 +00:00
Attilio Rao	a0f2c37b6f	- Introduce a cache-miss optimization for consistency with other accesses of the cache member of vm_object objects. - Use novel vm_page_is_cached() for checks outside of the vm subsystem. Reviewed by: alc MFC after: 2 weeks X-MFC: r234039	2012-04-09 17:05:18 +00:00
Alan Cox	1c8279e4e7	Fix mincore(2) so that it reports PG_CACHED pages as resident. MFC after: 2 weeks	2012-04-08 18:25:12 +00:00
Alan Cox	908e3da10e	If a page belonging a reservation is cached, then mark the reservation so that it will be freed to the cache pool rather than the default pool. Otherwise, the cached pages within the reservation may be recycled sooner than necessary. Reported by: Andrey Zonov	2012-04-08 17:00:46 +00:00
Attilio Rao	d1aa86e151	Staticize vm_page_cache_remove(). Reviewed by: alc	2012-04-06 20:34:00 +00:00
Nathan Whitehorn	57bd5cce62	Reduce the frequency that the PowerPC/AIM pmaps invalidate instruction caches, by invalidating kernel icaches only when needed and not flushing user caches for shared pages. Suggested by: kib MFC after: 2 weeks	2012-04-06 16:03:38 +00:00
John Baldwin	35818d2e94	Add new ktrace records for the start and end of VM faults. This gives a pair of records similar to syscall entry and return that a user can use to determine how long page faults take. The new ktrace records are enabled via the 'p' trace type, and are enabled in the default set of trace points. Reviewed by: kib MFC after: 2 weeks	2012-04-05 17:13:14 +00:00
Kirk McKusick	1faacf5d09	Keep track of the mount point associated with a special device to enable the collection of counts of synchronous and asynchronous reads and writes for its associated filesystem. The counts are displayed using `mount -v'. Ensure that buffers used for paging indicate the vnode from which they are operating so that counts of paging I/O operations from the filesystem are collected. This checkin only adds the setting of the mount point for the UFS/FFS filesystem, but it would be trivial to add the setting and clearing of the mount point at filesystem mount/unmount time for other filesystems too. Reviewed by: kib	2012-03-28 20:49:11 +00:00
Alan Cox	5730afc9b6	Handle spurious page faults that may occur in no-fault sections of the kernel. When access restrictions are added to a page table entry, we flush the corresponding virtual address mapping from the TLB. In contrast, when access restrictions are removed from a page table entry, we do not flush the virtual address mapping from the TLB. This is exactly as recommended in AMD's documentation. In effect, when access restrictions are removed from a page table entry, AMD's MMUs will transparently refresh a stale TLB entry. In short, this saves us from having to perform potentially costly TLB flushes. In contrast, Intel's MMUs are allowed to generate a spurious page fault based upon the stale TLB entry. Usually, such spurious page faults are handled by vm_fault() without incident. However, when we are executing no-fault sections of the kernel, we are not allowed to execute vm_fault(). This change introduces special-case handling for spurious page faults that occur in no-fault sections of the kernel. In collaboration with: kib Tested by: gibbs (an earlier version) I would also like to acknowledge Hiroki Sato's assistance in diagnosing this problem. MFC after: 1 week	2012-03-22 04:52:51 +00:00
John Baldwin	d6e9b97b3f	Bah, just revert my earlier change entirely. (Missed alc's request to do this earlier.) Requested by: alc	2012-03-19 19:06:40 +00:00
John Baldwin	92a5994685	Fix madvise(MADV_WILLNEED) to properly handle individual mappings larger than 4GB. Specifically, the inlined version of 'ptoa' of the the 'int' count of pages overflowed on 64-bit platforms. While here, change vm_object_madvise() to accept two vm_pindex_t parameters (start and end) rather than a (start, count) tuple to match other VM APIs as suggested by alc@.	2012-03-19 18:47:34 +00:00
John Baldwin	8407f69657	Alter the previous commit to use vm_size_t instead of vm_pindex_t. vm_pindex_t is not a count of pages per se, it is more like vm_ooffset_t, but a page index instead of a byte offset.	2012-03-19 18:43:44 +00:00
Konstantin Belousov	126d60823a	In vm_object_page_clean(), do not clean OBJ_MIGHTBEDIRTY object flag if the filesystem performed short write and we are skipping the page due to this. Propogate write error from the pager back to the callers of vm_pageout_flush(). Report the failure to write a page from the requested range as the FALSE return value from vm_object_page_clean(), and propagate it back to msync(2) to return EIO to usermode. While there, convert the clearobjflags variable in the vm_object_page_clean() and arguments of the helper functions to boolean. PR: kern/165927 Reviewed by: alc MFC after: 2 weeks	2012-03-17 23:00:32 +00:00
John Baldwin	df96bc9713	Pedantic nit: use vm_pindex_t instead of long for a count of pages.	2012-03-14 20:57:48 +00:00
John Baldwin	b47f624183	Add KTR_VFS traces to track modifications to a vnode's writecount.	2012-03-08 20:27:20 +00:00
Alan Cox	83cbe16ff4	Eliminate stale incorrect ARGSUSED comments. Submitted by: bde	2012-03-02 17:33:51 +00:00
Alan Cox	9ed54e79b5	Simplify kmem_alloc() by eliminating code that existed on account of external pagers in Mach. FreeBSD doesn't implement external pagers. Moreover, it don't pageout the kernel object. So, the reasons for having code don't hold. Reviewed by: kib MFC after: 6 weeks	2012-02-29 05:41:29 +00:00
Alan Cox	f9230ad6b8	Simplify vm_mmap()'s control flow. Add a comment describing what vm_mmap_to_errno() does. Reviewed by: kib MFC after: 3 weeks X-MFC after: r232071	2012-02-25 21:06:39 +00:00
Alan Cox	79e538388f	Simplify vmspace_fork()'s control flow by copying immutable data before the vm map locks are acquired. Also, eliminate redundant initialization of the new vm map's timestamp. Reviewed by: kib MFC after: 3 weeks	2012-02-25 17:49:59 +00:00
Konstantin Belousov	9d22083da8	Place the if() at the right location, to activate the v_writecount accounting for shared writeable mappings for all filesystems, not only for the bypass layers. Submitted by: alc Pointy hat to: kib MFC after: 20 days	2012-02-24 10:41:58 +00:00
Konstantin Belousov	84110e7e0b	Account the writeable shared mappings backed by file in the vnode v_writecount. Keep the amount of the virtual address space used by the mappings in the new vm_object un_pager.vnp.writemappings counter. The vnode v_writecount is incremented when writemappings gets non-zero value, and decremented when writemappings is returned to zero. Writeable shared vnode-backed mappings are accounted for in vm_mmap(), and vm_map_insert() is instructed to set MAP_ENTRY_VN_WRITECNT flag on the created map entry. During deferred map entry deallocation, vm_map_process_deferred() checks for MAP_ENTRY_VN_WRITECOUNT and decrements writemappings for the vm object. Now, the writeable mount cannot be demoted to read-only while writeable shared mappings of the vnodes from the mount point exist. Also, execve(2) fails for such files with ETXTBUSY, as it should be. Noted by: tegge Reviewed by: tegge (long time ago, early version), alc Tested by: pho MFC after: 3 weeks	2012-02-23 21:07:16 +00:00
Konstantin Belousov	501f538675	Remove wrong comment. Discussed with: alc MFC after: 3 days	2012-02-22 20:01:38 +00:00
Alan Cox	a649296959	When vm_mmap() is used to map a vm object into a kernel vm_map, it makes no sense to check the size of the kernel vm_map against the user-level resource limits for the calling process. Reviewed by: kib	2012-02-16 06:45:51 +00:00
Konstantin Belousov	8211bd45bc	Close a race due to dropping of the map lock between creating map entry for a shared mapping and marking the entry for inheritance. Other thread might execute vmspace_fork() in between (e.g. by fork(2)), resulting in the mapping becoming private. Noted and reviewed by: alc MFC after: 1 week	2012-02-11 17:29:07 +00:00
Ed Schouten	7870adb640	Remove direct access to si_name. Code should just use the devtoname() function to obtain the name of a character device. Also add const keywords to pieces of code that need it to build properly. MFC after: 2 weeks	2012-02-10 12:35:57 +00:00
Alexander Motin	8f12d83ad9	Fix NULL dereference panic on attempt to turn off (on system shutdown) disconnected swap device. This is quick and imperfect solution, as swap device will still be opened and GEOM will not be able to destroy it. Proper solution would be to automatically turn off and close disconnected swap device, but with existing code it will cause panic if there is at least one page on device, even if it is unimportant page of the user-level process. It needs some work. Reviewed by: kib@ MFC after: 1 week	2012-02-01 20:12:44 +00:00
Kip Macy	263811f724	exclude kmem_alloc'ed ARC data buffers from kernel minidumps on amd64 excluding other allocations including UMA now entails the addition of a single flag to kmem_alloc or uma zone create Reviewed by: alc, avg MFC after: 2 weeks	2012-01-27 20:18:31 +00:00
Nathan Whitehorn	8d01a3b281	Revert r212360 now that PowerPC can handle large sparse arguments to pmap_remove() (changed in r228412). MFC after: 2 weeks	2012-01-17 00:31:09 +00:00
Konstantin Belousov	5dda2db9c8	Change the type of the paging_in_progress refcounter from u_short to u_int. With the auto-sized buffer cache on the modern machines, UFS metadata can generate more the 65535 pages belonging to the buffers undergoing i/o, overflowing the counter. Reported and tested by: jimharris Reviewed by: alc MFC after: 1 week	2012-01-10 18:05:44 +00:00
Konstantin Belousov	e65919f9fc	Do not restart the scan in vm_object_page_clean() on the object generation change if requested mode is async. The object generation is only changed when the object is marked as OBJ_MIGHTBEDIRTY. For async mode it is enough to write each dirty page, not to make a guarantee that all pages are cleared after the vm_object_page_clean() returned. Diagnosed by: truckman Tested by: flo Reviewed by: alc, truckman MFC after: 2 weeks	2012-01-04 16:04:20 +00:00
Alan Cox	b5f359b7c3	Optimize vm_object_split()'s handling of reservations.	2011-12-28 20:27:18 +00:00
Konstantin Belousov	75ff604a78	Optimize the common case of msyncing the whole file mapping with MS_SYNC flag. The system must guarantee that all writes are finished before syscalls returned. Schedule the writes in async mode, which is much faster and allows the clustering to occur. Wait for writes using VOP_FSYNC(), since we are syncing the whole file mapping. Potentially, the restriction to only apply the optimization can be relaxed by not requiring that the mapping cover whole file, as it is done by other OSes. Reported and tested by: az Reviewed by: alc MFC after: 2 weeks	2011-12-23 09:09:42 +00:00
Konstantin Belousov	e878d99718	Move kstack_cache_entry into the private header, and make the stack cache list header accessible outside vm_glue.c. MFC after: 1 week	2011-12-16 10:56:16 +00:00
Eitan Adler	33fd7c5628	- The previous commit (r228449) accidentally moved the vm.stats.vm.* sysctls to vm.stats.sys. Move them back. Noticed by: pho Reviewed by: bde (earlier version) Approved by: bz MFC after: 1 week Pointy hat to: me	2011-12-14 13:25:00 +00:00
Eitan Adler	3eb9ab5255	Document a large number of currently undocumented sysctls. While here fix some style(9) issues and reduce redundancy. PR: kern/155491 PR: kern/155490 PR: kern/155489 Submitted by: Galimov Albert <wtfcrap@mail.ru> Approved by: bde Reviewed by: jhb MFC after: 1 week	2011-12-13 00:38:50 +00:00
Konstantin Belousov	134465d732	Fix printf. Submitted by: az MFC after: 1 week	2011-12-12 10:04:04 +00:00
Alan Cox	c68c35372e	Introduce vm_reserv_alloc_contig() and teach vm_page_alloc_contig() how to use superpage reservations. So, for the first time, kernel virtual memory that is allocated by contigmalloc(), kmem_alloc_attr(), and kmem_alloc_contig() can be promoted to superpages. In fact, even a series of small contigmalloc() allocations may collectively result in a promoted superpage. Eliminate some duplication of code in vm_reserv_alloc_page(). Change the type of vm_reserv_reclaim_contig()'s first parameter in order that it be consistent with other vm_*_contig() functions. Tested by: marius (sparc64)	2011-12-05 18:29:25 +00:00
Konstantin Belousov	dc874f9881	Rename vm_page_set_valid() to vm_page_set_valid_range(). The vm_page_set_valid() is the most reasonable name for the m->valid accessor. Reviewed by: attilio, alc	2011-11-30 17:39:00 +00:00
Konstantin Belousov	cf1911a9ad	Hide the internals of vm_page_lock(9) from the loadable modules. Since the address of vm_page lock mutex depends on the kernel options, it is easy for module to get out of sync with the kernel. No vm_page_lockptr() accessor is provided for modules. It can be added later if needed, unless proper KPI is developed to serve the needs. Reviewed by: attilio, alc MFC after: 3 weeks	2011-11-29 13:07:32 +00:00
Attilio Rao	9fde98bba3	Introduce the same mutex-wise fix in r227758 for sx locks. The functions that offer file and line specifications are: - sx_assert_ - sx_downgrade_ - sx_slock_ - sx_slock_sig_ - sx_sunlock_ - sx_try_slock_ - sx_try_xlock_ - sx_try_upgrade_ - sx_unlock_ - sx_xlock_ - sx_xlock_sig_ - sx_xunlock_ Now vm_map locking is fully converted and can avoid to know specifics about locking procedures. Reviewed by: kib MFC after: 1 month	2011-11-21 12:59:52 +00:00
Attilio Rao	ccdf233323	Introduce macro stubs in the mutex implementation that will be always defined and will allow consumers, willing to provide options, file and line to locking requests, to not worry about options redefining the interfaces. This is typically useful when there is the need to build another locking interface on top of the mutex one. The introduced functions that consumers can use are: - mtx_lock_flags_ - mtx_unlock_flags_ - mtx_lock_spin_flags_ - mtx_unlock_spin_flags_ - mtx_assert_ - thread_lock_flags_ Spare notes: - Likely we can get rid of all the 'INVARIANTS' specification in the ppbus code by using the same macro as done in this patch (but this is left to the ppbus maintainer) - all the other locking interfaces may require a similar cleanup, where the most notable case is sx which will allow a further cleanup of vm_map locking facilities - The patch should be fully compatible with older branches, thus a MFC is previewed (infact it uses all the underlying mechanisms already present). Comments review by: eadler, Ben Kaduk Discussed with: kib, jhb MFC after: 1 month	2011-11-20 16:33:09 +00:00
Alan Cox	5ff276b7f4	Eliminate end-of-line white space.	2011-11-17 06:54:49 +00:00
Alan Cox	fbd80bd047	Refactor the code that performs physically contiguous memory allocation, yielding a new public interface, vm_page_alloc_contig(). This new function addresses some of the limitations of the current interfaces, contigmalloc() and kmem_alloc_contig(). For example, the physically contiguous memory that is allocated with those interfaces can only be allocated to the kernel vm object and must be mapped into the kernel virtual address space. It also provides functionality that vm_phys_alloc_contig() doesn't, such as wiring the returned pages. Moreover, unlike that function, it respects the low water marks on the paging queues and wakes up the page daemon when necessary. That said, at present, this new function can't be applied to all types of vm objects. However, that restriction will be eliminated in the coming weeks. From a design standpoint, this change also addresses an inconsistency between vm_phys_alloc_contig() and the other vm_phys_alloc*() functions. Specifically, vm_phys_alloc_contig() manipulated vm_page fields that other functions in vm/vm_phys.c didn't. Moreover, vm_phys_alloc_contig() knew about vnodes and reservations. Now, vm_page_alloc_contig() is responsible for these things. Reviewed by: kib Discussed with: jhb	2011-11-16 16:46:09 +00:00
Konstantin Belousov	286790a7dd	Update the device pager interface, while keeping the compatibility layer for old KPI and KBI. New interface should be used together with d_mmap_single cdevsw method. Device pager can be allocated with the cdev_pager_allocate(9) function, which takes struct cdev_pager_ops, containing constructor/destructor and page fault handler methods supplied by driver. Constructor and destructor, called at the pager allocation and deallocation time, allow the driver to handle per-object private data. The pager handler is called to handle page fault on the vm map entry backed by the driver pager. Driver shall return either the vm_page_t which should be mapped, or error code (which does not cause kernel panic anymore). The page handler interface has a placeholder to specify the access mode causing the fault, but currently PROT_READ is always passed there. Sponsored by: The FreeBSD Foundation Reviewed by: alc MFC after: 1 month	2011-11-15 14:40:00 +00:00

1 2 3 4 5 ...

3006 Commits