Commit Graph

2950 Commits

Author SHA1 Message Date
Dag-Erling Smørgrav
3ff863f1aa - When running out of swzone, instead of spewing an error message every
tick until the situation is resolved (if ever), just print a single
  message when running out and another when space becomes available.

- When adding more swap, warn if the total amount exceeds half the
  theoretical maximum we can handle.
2012-08-16 08:29:49 +00:00
Konstantin Belousov
ee4116b8f7 For old mmap syscall, when executing on amd64 or ia64, enforce the
PROT_EXEC if prot is non-zero, process is 32bit and
kern.elf32.i386_read_exec syscal is enabled. This workaround is needed
for old i386 a.out binaries, where dynamic linker did not specified
PROT_EXEC for mapping of the text.

The kern.elf32.i386_read_exec MIB name looks weird for a.out binaries,
but I reused the existing knob which already has the needed semantic.

MFC after:	1 week
2012-08-14 12:11:48 +00:00
Konstantin Belousov
7707ccabfb Adjust the r205536, by allowing a non-zero offset for anonymous
mappings for a.out binaries. Apparently, a.out ld.so from FreeBSD
1.1.5.1 can issue such requests.

Reported and tested by:	Dan Plassche <dplassche@gmail.com>
MFC after:	1 week
2012-08-14 11:47:07 +00:00
Konstantin Belousov
b6c00483e9 Do not leave invalid pages in the object after the short read for a
network file systems (not only NFS proper). Short reads cause pages
other then the requested one, which were not filled by read response,
to stay invalid.

Change the vm_page_readahead_finish() interface to not take the error
code, but instead to make a decision to free or to (de)activate the
page only by its validity. As result, not requested invalid pages are
freed even if the read RPC indicated success.

Noted and reviewed by:	alc
MFC after:	1 week
2012-08-14 11:45:47 +00:00
Alan Cox
18f55b0171 Never sleep on busy pages in vm_pageout_launder(), always skip them. Long
ago, sleeping on busy pages in vm_pageout_launder() made sense.  The call
to vm_pageout_flush() specified asynchronous I/O and sleeping on busy pages
blocked vm_pageout_launder() until the flush had completed.  However, in
CVS revision 1.35 of vm/vm_contig.c, the call to vm_pageout_flush() was
changed to request synchronous I/O, but the sleep on busy pages was not
removed.
2012-08-07 04:48:14 +00:00
Konstantin Belousov
1c771f9222 After the PHYS_TO_VM_PAGE() function was de-inlined, the main reason
to pull vm_param.h was removed.  Other big dependency of vm_page.h on
vm_param.h are PA_LOCK* definitions, which are only needed for
in-kernel code, because modules use KBI-safe functions to lock the
pages.

Stop including vm_param.h into vm_page.h. Include vm_param.h
explicitely for the kernel code which needs it.

Suggested and reviewed by:	alc
MFC after:    2 weeks
2012-08-05 14:11:42 +00:00
Konstantin Belousov
0055cbd3c5 Reduce code duplication and exposure of direct access to struct
vm_page oflags by providing helper function
vm_page_readahead_finish(), which handles completed reads for pages
with indexes other then the requested one, for VOP_GETPAGES().

Reviewed by:	alc
MFC after:	1 week
2012-08-04 18:16:43 +00:00
Alan Cox
369763e31a Inline vm_page_aflags_clear() and vm_page_aflags_set().
Add comments stating that neither these functions nor the flags that they
are used to manipulate are part of the KBI.
2012-08-03 01:48:15 +00:00
Alan Cox
d26a90a8b2 Eliminate an unneeded declaration. (I should have removed this as part
of r227568.)
2012-07-30 20:38:37 +00:00
Konstantin Belousov
311e34e260 Do not requeue held page or page for which locking failed, just leave
them alone.

Process the act_count updates for the held pages in the vm_pageout
loop over the inactive queue, instead of refusing to do anything with
such page.

Clarify the intent of the addl_page_shortage counter and change its
use for pages which are not processed in the loop according to the
description.

Reviewed by:	alc
MFC after:	2 weeks
2012-07-26 09:06:48 +00:00
Alan Cox
2cba1ccd0e Addendum to r238604. If the inactive queue scan isn't restarted, then
the variable "addl_page_shortage_init" isn't needed.

X-MFC after:	r238604
2012-07-24 02:35:30 +00:00
Konstantin Belousov
d4961bcb3a Do not restart scan of the inactive queue when non-inactive page is
found. Rather, we shall not find such pages on inactive queue at all.

Requested and reviewed by:	alc
MFC after:    2 weeks
2012-07-18 21:47:50 +00:00
Alan Cox
85eeca35b9 Move what remains of vm/vm_contig.c into vm/vm_pageout.c, where similar
code resides.  Rename vm_contig_grow_cache() to vm_pageout_grow_cache().

Reviewed by:	kib
2012-07-18 05:21:34 +00:00
Alan Cox
da1ab8a4a0 Correct vm_page_alloc_contig()'s implementation of VM_ALLOC_NODUMP. 2012-07-17 02:36:59 +00:00
Alan Cox
907e4524dc Various improvements to vm_contig_grow_cache(). Most notably, even when
it can't sleep, it can still move clean pages from the inactive queue to
the cache.  Also, when a page is cached, there is no need to restart the
scan.  The "next" page pointer held by vm_contig_launder() is still
valid.  Finally, add a comment summarizing what vm_contig_grow_cache()
does based upon the value of "tries".

MFC after:	3 weeks
2012-07-16 18:13:43 +00:00
Alan Cox
476f7f2423 Correct an off-by-one error in vm_reserv_alloc_contig() that resulted in
the last reservation of a multi-reservation allocation not being
initialized.
2012-07-15 21:46:19 +00:00
Matthew D Fleming
f806cdcf99 Fix a bug with memguard(9) on 32-bit architectures without a
VM_KMEM_MAX_SIZE.

The code was not taking into account the size of the kernel_map, which
the kmem_map is allocated from, so it could produce a sub-map size too
large to fit.  The simplest solution is to ignore VM_KMEM_MAX entirely
and base the memguard map's size off the kernel_map's size, since this
is always relevant and always smaller.

Found by:	Justin Hibbits
2012-07-15 20:29:48 +00:00
Alan Cox
9757857c4f If vm_contig_grow_cache() is allowed to sleep, then invoke the vm_lowmem
handlers.
2012-07-14 20:14:03 +00:00
Alan Cox
0ff0fc84c2 Move kmem_alloc_{attr,contig}() to vm/vm_kern.c, where similarly named
functions reside.  Correct the comment describing kmem_alloc_contig().
2012-07-14 18:10:44 +00:00
Attilio Rao
571a1e92aa Document the object type movements, related to swp_pager_copy(),
in vm_object_collapse() and vm_object_split().

In collabouration with:	alc
MFC after:	3 days
2012-07-11 01:04:59 +00:00
Konstantin Belousov
a4156419ca Avoid vm page queues lock leak after r238212.
Reported and tested by:	Michael Butler <imb protected-networks net>
Reviewed by:	alc
Pointy hat to:	kib
MFC after:	20 days
2012-07-08 18:04:26 +00:00
Konstantin Belousov
48cc2fc774 Drop page queues mutex on each iteration of vm_pageout_scan over the
inactive queue, unless busy page is found.

Dropping the mutex often should allow the other lock acquires to
proceed without waiting for whole inactive scan to finish. On machines
with lot of physical memory scan often need to iterate a lot before it
finishes or finds a page which requires laundring, causing high
latency for other lock waiters.

Suggested and reviewed by:	alc
MFC after:	3 weeks
2012-07-07 19:39:08 +00:00
Eitan Adler
c288b54837 Add missing sleep stat increase
PR:		kern/168211
Submitted by:	linimon
Reviewed by:	alc
Approved by:	cperciva
MFC after:	3 days
2012-07-07 17:46:11 +00:00
Konstantin Belousov
5d10ef2096 Style.
Reviewed by:	alc (previous version)
MFC after:	1 week
2012-07-06 20:13:16 +00:00
John Baldwin
687c94aac9 Honor db_pager_quit in 'show uma' and 'show malloc'.
MFC after:	1 month
2012-07-02 16:14:52 +00:00
Alan Cox
e30df26e7b Add new pmap layer locks to the predefined lock order. Change the names
of a few existing VM locks to follow a consistent naming scheme.
2012-06-27 03:45:25 +00:00
Attilio Rao
ffdd0c7db3 - Add a comment explaining the locking of the cached pages pool held
by vm_objects.
- Add flags for the per-object lock and free pages queue mutex lock.
  Use the newly added flags to mark the cache root within the vm_object
  structure.

Please note that other vm_object members should be marked with correct
locking but they are left for other commits.

In collabouration with:	alc

MFC after:	3 days3 days3 days
2012-06-22 18:34:11 +00:00
Alan Cox
eddc92918e Selectively inline vm_page_dirty(). 2012-06-20 23:25:47 +00:00
John Baldwin
6fbe60fa8b Move the per-thread deferred user map entries list into a private list
in vm_map_process_deferred() which is then iterated to release map entries.
This avoids having a nested vm map unlock operation called from the loop
body attempt to recuse into vm_map_process_deferred().  This can happen if
the vm_map_remove() triggers the OOM killer.

Reviewed by:	alc, kib
MFC after:	1 week
2012-06-20 18:00:26 +00:00
Attilio Rao
db9ba57895 Do a more targeted check on the page cache and avoid to check the cache
pointer directly in vnode_pager_setsize() by using newly introduced
vm_page_is_cached() function.

Reviewed by:	alc
MFC after:	2 weeks
X-MFC:		r234039,234064
2012-06-16 21:39:00 +00:00
Alan Cox
6031c68de4 The page flag PGA_WRITEABLE is set and cleared exclusively by the pmap
layer, but it is read directly by the MI VM layer.  This change introduces
pmap_page_is_write_mapped() in order to completely encapsulate all direct
access to PGA_WRITEABLE in the pmap layer.

Aesthetics aside, I am making this change because amd64 will likely begin
using an alternative method to track write mappings, and having
pmap_page_is_write_mapped() in place allows me to make such a change
without further modification to the MI VM layer.

As an added bonus, tidy up some nearby comments concerning page flags.

Reviewed by:	kib
MFC after:	6 weeks
2012-06-16 18:56:19 +00:00
Konstantin Belousov
83ce08538a Use the previous stack entry protection and max protection to correctly
propagate the stack execution permissions when stack is grown down.

First, curproc->p_sysent->sv_stackprot specifies maximum allowed stack
protection for current ABI, so the new stack entry was typically marked
executable always. Second, for non-main stack MAP_STACK mapping,
the PROT_ flags should be used which were specified at the mmap(2) call
time, and not sv_stackprot.

MFC after:	1 week
2012-06-10 11:31:50 +00:00
Eitan Adler
0a4a2b8e62 Revert r236380
PR:		kern/166780
Requested by:	many
Approved by:	cperciva (implicit)
2012-06-01 18:58:50 +00:00
Eitan Adler
71ee98c97c Add sysctl to query amount of swap space free
PR:		kern/166780
Submitted by:	Radim Kolar <hsn@sendmail.cz>
Approved by:	cperciva
MFC after:	1 week
2012-06-01 04:42:52 +00:00
Maksim Yevmenkin
251386b4b2 Tweak condition for disabling allocation from per-CPU buckets in
low memory situation. I've observed a situation where per-CPU
allocations were disabled while there were enough free cached pages.
Basically, cnt.v_free_count was sitting stable at a value lower
than cnt.v_free_min and that caused massive performance drop.

Reviewed by:	alc
MFC after:	1 week
2012-05-23 18:56:29 +00:00
Konstantin Belousov
4d34e019c4 Calculate the count of per-process cow faults. Export the count to
userspace using the obscure spare int field in struct kinfo_proc.

Submitted by:	Andrey Zonov <andrey zonov org>
MFC after:	1 week
2012-05-23 18:10:54 +00:00
Andriy Gapon
b6062382be vm_pager_object_lookup: small performance optimization
do not needlessly lock an object if its handle doesn't match

Reviewed by:	kib, alc
MFC after:	1 week
2012-05-23 12:51:49 +00:00
Andrew Turner
c415e17250 Fix booting on ARM.
In PHYS_TO_VM_PAGE() when VM_PHYSSEG_DENSE is set the check if we are past
the end of vm_page_array was incorrect causing it to return NULL. This
value is then used in vm_phys_add_page causing a data abort.

Reviewed by:	alc, kib, imp
Tested by:	stas
2012-05-22 07:04:23 +00:00
Nathan Whitehorn
ccc4a5c761 Replace the list of PVOs owned by each PMAP with an RB tree. This simplifies
range operations like pmap_remove() and pmap_protect() as well as allowing
simple operations like pmap_extract() not to involve any global state.
This substantially reduces lock coverages for the global table lock and
improves concurrency.
2012-05-20 14:33:28 +00:00
Konstantin Belousov
df2f557df6 Do not double-reference the found vm object in cdev_pager_lookup().
vm_pager_object_lookup() already referenced the object.

Note that there is no in-tree consumers of cdev_pager_lookup(). The
only known user of the function is i915 gem driver, which is not yet
imported. This should make the KPI change minor.

Submitted by:	avg
MFC after:	1 week
2012-05-18 10:23:47 +00:00
Konstantin Belousov
b7ac5a8571 Add new pager type, OBJT_MGTDEVICE. It provides the device pager
which carries fictitous managed pages. In particular, the consumers of
the new object type can remove all mappings of the device page with
pmap_remove_all().

The range of physical addresses used for fake page allocation shall be
registered with vm_phys_fictitious_reg_range() interface to allow the
PHYS_TO_VM_PAGE() to work in pmap.

Most likely, only i386 and amd64 pmaps can handle fictitious managed
pages right now.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	alc
MFC after:	1 month
2012-05-12 20:49:58 +00:00
Konstantin Belousov
b6de32bd9b Add a facility to register a range of physical addresses to be used
for allocation of fictitious pages, for which PHYS_TO_VM_PAGE()
returns proper fictitious vm_page_t. The range should be de-registered
after consumer stopped using it.

De-inline the PHYS_TO_VM_PAGE() since it now carries code to iterate
over registered ranges.

A hash container might be developed instead of range registration
interface, and fake pages could be put automatically into the hash,
were PHYS_TO_VM_PAGE() could look them up later. This should be
considered before the MFC of the commit is done.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	alc
MFC after:	1 month
2012-05-12 20:42:56 +00:00
Konstantin Belousov
e461aae747 Split the code from vm_page_getfake() to initialize the fake page struct
vm_page into new interface vm_page_initfake(). Handle the case of fake
page re-initialization with changed memattr.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	alc
MFC after:	1 month
2012-05-12 20:34:22 +00:00
Konstantin Belousov
116c213502 Assert that the page passed to vm_page_putfake() is unmanaged.
Sponsored by:	The FreeBSD Foundation
Reviewed by:	alc
MFC after:	1 month
2012-05-12 20:27:51 +00:00
Konstantin Belousov
7900f95d88 Assert that fictitious or unmanaged pages do not appear on
active/inactive lists.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	alc
MFC after:	1 month
2012-05-12 20:24:46 +00:00
Konstantin Belousov
13a0b7bcc4 Commit the change forgotten in r235356.
Sponsored by:	The FreeBSD Foundation
Reviewed by:	alc
MFC after:	1 month
2012-05-12 20:10:18 +00:00
Konstantin Belousov
0c26bb71f6 Make the vm_page_array_size long. Remove redundand zero initialization
for vm_page_array_size and nearby variablees.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	alc
MFC after:	1 month
2012-05-12 20:03:06 +00:00
Alan Cox
13458803f4 Give vm_fault()'s sequential access optimization a makeover.
There are two aspects to the sequential access optimization: (1) read ahead
of pages that are expected to be accessed in the near future and (2) unmap
and cache behind of pages that are not expected to be accessed again.  This
revision changes both aspects.

The read ahead optimization is now more effective.  It starts with the same
initial read window as before, but arithmetically grows the window on
sequential page faults.  This can yield increased read bandwidth.  For
example, on one of my machines, a program using mmap() to read a file that
is several times larger than the machine's physical memory takes about 17%
less time to complete.

The unmap and cache behind optimization is now more selectively applied.
The read ahead window must grow to its maximum size before unmap and cache
behind is performed.  This significantly reduces the number of times that
pages are unmapped and cached only to be reactivated a short time later.

The unmap and cache behind optimization now clears each page's referenced
flag.  Previously, in the case of dirty pages, if the containing file was
still mapped at the time that the page daemon examined the dirty pages,
they would be reactivated.

From a stylistic standpoint, this revision also cleanly separates the
implementation of the read ahead and unmap/cache behind optimizations.

Glanced at:	kib
MFC after:	2 weeks
2012-05-10 15:16:42 +00:00
Nathan Whitehorn
0b852c03eb Avoid a lock order reversal in pmap_extract_and_hold() from relocking
the page. This PMAP requires an additional lock besides the PMAP lock
in pmap_extract_and_hold(), which vm_page_pa_tryrelock() did not release.

Suggested by:	kib
MFC after:	4 days
2012-04-22 17:58:30 +00:00
Konstantin Belousov
1472f4f4b9 When MAP_STACK mapping is created, the map entry is created only to
cover the initial stack size. For MCL_WIREFUTURE maps, the subsequent
call to vm_map_wire() to wire the whole stack region fails due to
VM_MAP_WIRE_NOHOLES flag.

Use the VM_MAP_WIRE_HOLESOK to only wire mapped part of the stack.

Reported and tested by:	Sushanth Rai <sushanth_rai yahoo com>
Reviewed by:	alc
MFC after:	1 week
2012-04-21 18:36:53 +00:00