Commit Graph

457 Commits

Author SHA1 Message Date
Alan Cox
c68c35372e Introduce vm_reserv_alloc_contig() and teach vm_page_alloc_contig() how to
use superpage reservations.  So, for the first time, kernel virtual memory
that is allocated by contigmalloc(), kmem_alloc_attr(), and
kmem_alloc_contig() can be promoted to superpages.  In fact, even a series
of small contigmalloc() allocations may collectively result in a promoted
superpage.

Eliminate some duplication of code in vm_reserv_alloc_page().

Change the type of vm_reserv_reclaim_contig()'s first parameter in order
that it be consistent with other vm_*_contig() functions.

Tested by:	marius (sparc64)
2011-12-05 18:29:25 +00:00
Konstantin Belousov
dc874f9881 Rename vm_page_set_valid() to vm_page_set_valid_range().
The vm_page_set_valid() is the most reasonable name for the m->valid
accessor.

Reviewed by:	attilio, alc
2011-11-30 17:39:00 +00:00
Konstantin Belousov
cf1911a9ad Hide the internals of vm_page_lock(9) from the loadable modules.
Since the address of vm_page lock mutex depends on the kernel options,
it is easy for module to get out of sync with the kernel.

No vm_page_lockptr() accessor is provided for modules. It can be added
later if needed, unless proper KPI is developed to serve the needs.

Reviewed by:	attilio, alc
MFC after:	3 weeks
2011-11-29 13:07:32 +00:00
Alan Cox
5ff276b7f4 Eliminate end-of-line white space. 2011-11-17 06:54:49 +00:00
Alan Cox
fbd80bd047 Refactor the code that performs physically contiguous memory allocation,
yielding a new public interface, vm_page_alloc_contig().  This new function
addresses some of the limitations of the current interfaces, contigmalloc()
and kmem_alloc_contig().  For example, the physically contiguous memory that
is allocated with those interfaces can only be allocated to the kernel vm
object and must be mapped into the kernel virtual address space.  It also
provides functionality that vm_phys_alloc_contig() doesn't, such as wiring
the returned pages.  Moreover, unlike that function, it respects the low
water marks on the paging queues and wakes up the page daemon when
necessary.  That said, at present, this new function can't be applied to all
types of vm objects.  However, that restriction will be eliminated in the
coming weeks.

From a design standpoint, this change also addresses an inconsistency
between vm_phys_alloc_contig() and the other vm_phys_alloc*() functions.
Specifically, vm_phys_alloc_contig() manipulated vm_page fields that other
functions in vm/vm_phys.c didn't.  Moreover, vm_phys_alloc_contig() knew
about vnodes and reservations.  Now, vm_page_alloc_contig() is responsible
for these things.

Reviewed by:	kib
Discussed with:	jhb
2011-11-16 16:46:09 +00:00
Alan Cox
c835bd16a8 Wake up the page daemon in vm_page_alloc_freelist() if it couldn't
allocate the requested page because too few pages are cached or free.

Document the VM_ALLOC_COUNT() option to vm_page_alloc() and
vm_page_alloc_freelist().

Make style changes to vm_page_alloc() and vm_page_alloc_freelist(),
such as using a variable name that more closely corresponds to the
comments.
2011-11-06 02:03:27 +00:00
Konstantin Belousov
561cc9fcb5 Provide typedefs for the type of bit mask for the page bits.
Use the defined types instead of int when manipulating masks.
Supposedly, it could fix support for 32KB page size in the
machine-independend VM layer.

Reviewed by:	alc
MFC after:	2 weeks
2011-11-05 08:20:32 +00:00
Alan Cox
8393768074 Add support for VM_ALLOC_WIRED and VM_ALLOC_ZERO to vm_page_alloc_freelist()
and use these new options in the mips pmap.

Wake up the page daemon in vm_page_alloc_freelist() if the number of free
and cached pages becomes too low.

Tidy up vm_page_alloc_init().  In particular, add a comment about an
important restriction on its use.

Tested by:	jchandra@
2011-11-02 05:42:51 +00:00
Alan Cox
125b695b6e Tidy up the comment at the head of vm_page_alloc, and mention that the
returned page has the flag VPO_BUSY set.
2011-10-27 17:29:19 +00:00
Alan Cox
9c60ca3238 Speed up vm_page_cache() and vm_page_remove() by checking for a few
common cases that can be handled in constant time.  The insight being
that a page's parent in the vm object's tree is very often its
predecessor or successor in the vm object's ordered memq.

Tested by:	jhb
MFC after:	10 days
2011-10-25 16:35:08 +00:00
Konstantin Belousov
17514c1bd9 Style nit.
Submitted by:	jhb
MFC after:	2 weeks
2011-09-29 00:44:34 +00:00
Konstantin Belousov
2042bb377a Fix grammar.
Submitted by:	bf
MFC after:	2 weeks
2011-09-28 16:12:15 +00:00
Konstantin Belousov
abb9b935ca Use the trick of performing the atomic operation on the contained aligned
word to handle the dirty mask updates in vm_page_clear_dirty_mask().
Remove the vm page queue lock around vm_page_dirty() call in vm_fault_hold()
the sole purpose of which was to protect dirty on architectures which
does not provide short or byte-wide atomics.

Reviewed by:	alc, attilio
Tested by:	flo (sparc64)
MFC after:	2 weeks
2011-09-28 14:57:50 +00:00
Konstantin Belousov
3407fefef6 Split the vm_page flags PG_WRITEABLE and PG_REFERENCED into atomic
flags field. Updates to the atomic flags are performed using the atomic
ops on the containing word, do not require any vm lock to be held, and
are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9)
functions are provided to modify afalgs.

Document the changes to flags field to only require the page lock.

Introduce vm_page_reference(9) function to provide a stable KPI and
KBI for filesystems like tmpfs and zfs which need to mark a page as
referenced.

Reviewed by:    alc, attilio
Tested by:      marius, flo (sparc64); andreast (powerpc, powerpc64)
Approved by:	re (bz)
2011-09-06 10:30:11 +00:00
Konstantin Belousov
d98d0ce27a - Move the PG_UNMANAGED flag from m->flags to m->oflags, renaming the flag
to VPO_UNMANAGED (and also making the flag protected by the vm object
  lock, instead of vm page queue lock).
- Mark the fake pages with both PG_FICTITIOUS (as it is now) and
  VPO_UNMANAGED. As a consequence, pmap code now can use use just
  VPO_UNMANAGED to decide whether the page is unmanaged.

Reviewed by:	alc
Tested by:	pho (x86, previous version), marius (sparc64),
    marcel (arm, ia64, powerpc), ray (mips)
Sponsored by:	The FreeBSD Foundation
Approved by:	re (bz)
2011-08-09 21:01:36 +00:00
Alan Cox
1bfec3dfb6 Revert to using the page queues lock in vm_page_clear_dirty_mask() on
MIPS.  (At present, although atomic_clear_char() is defined by atomic.h
on MIPS, it is not actually implemented by support.S.)
2011-06-23 05:23:59 +00:00
Alan Cox
3c76db4c64 Precisely document the synchronization rules for the page's dirty field.
(Saying that the lock on the object that the page belongs to must be held
only represents one aspect of the rules.)

Eliminate the use of the page queues lock for atomically performing read-
modify-write operations on the dirty field when the underlying architecture
supports atomic operations on char and short types.

Document the fact that 32KB pages aren't really supported.

Reviewed by:	attilio, kib
2011-06-19 19:13:24 +00:00
Konstantin Belousov
3b1025d200 Assert that page is VPO_BUSY or page owner object is locked in
vm_page_undirty(). The assert is not precise due to VPO_BUSY owner
to tracked, so assertion does not catch the case when VPO_BUSY is
owned by other thread.

Reviewed by:	alc
2011-06-11 20:15:19 +00:00
Alan Cox
10cf256074 Eliminate duplication of the fake page code and zone by the device and sg
pagers.

Reviewed by:	jhb
2011-03-11 07:07:48 +00:00
Alan Cox
e6ffa21488 Remove pmap fields that are either unused or not fully implemented.
Discussed with:	kib
2011-02-17 15:36:29 +00:00
Alan Cox
d7b20e4b45 Retire VFS_BIO_DEBUG. Convert those checks that were still valid into
KASSERT()s and eliminate the rest.

Replace excessive printf()s and a panic() in bufdone_finish() with a
KASSERT() in vm_page_io_finish().

Reviewed by:	kib
2011-02-12 01:00:00 +00:00
Alan Cox
3d05198e23 Release the free page queues lock earlier in vm_page_alloc().
Discussed with:	kib@
2011-01-30 23:55:48 +00:00
Sergey Kandaurov
4053b05b91 Make MSGBUF_SIZE kernel option a loader tunable kern.msgbufsize.
Submitted by:	perryh pluto.rain.com (previous version)
Reviewed by:	jhb
Approved by:	kib (mentor)
Tested by:	universe
2011-01-21 10:26:26 +00:00
Alan Cox
4c6a2e7a1f Shift responsibility for synchronizing access to the page's act_count
field to the object's lock.

Reviewed by:	kib@
2011-01-16 18:01:39 +00:00
Alan Cox
9648f3447d Clean up the start of vm_page_alloc(). In particular, eliminate an
assertion that is no longer required.  Long ago, calls to vm_page_alloc()
from an interrupt handler had to specify VM_ALLOC_INTERRUPT so that
vm_page_alloc() would not attempt to reclaim a PQ_CACHE page from another vm
object.  Today, with the synchronization on a vm object's collection of
PQ_CACHE pages, this is no longer an issue.  In fact, VM_ALLOC_INTERRUPT now
reclaims PQ_CACHE pages just like VM_ALLOC_{NORMAL,SYSTEM}.

MFC after:	3 weeks
2011-01-16 17:33:34 +00:00
Alan Cox
27772ddf45 Eliminate a redundant alignment directive on the page locks array. 2011-01-09 04:34:02 +00:00
Alan Cox
ce8a13bdb9 Eliminate the counting of vm_page_pa_tryrelock calls. We really don't
need it anymore.  Moreover, its implementation had a type mismatch, a
long is not necessarily an uint64_t.  (This mismatch was hidden by
casting.)  Move the remaining two counters up a level in the sysctl
hierarchy.  There is no reason for them to be under the vm.pmap node.

Reviewed by:	kib
2011-01-08 22:45:22 +00:00
Alan Cox
17f6a17bf7 Release the page lock early in vm_pageout_clean(). There is no reason to
hold this lock until the end of the function.

With the aforementioned change to vm_pageout_clean(), page locks don't need
to support recursive (MTX_RECURSE) or duplicate (MTX_DUPOK) acquisitions.

Reviewed by:	kib
2011-01-03 00:41:56 +00:00
Konstantin Belousov
3280870dca Move the increment of vm object generation count into
vm_object_set_writeable_dirty().

Fix an issue where restart of the scan in vm_object_page_clean() did
not removed write permissions for newly added pages or, if the mapping
for some already scanned page changed to writeable due to fault.
Merge the two loops in vm_object_page_clean(), doing the remove of
write permission and cleaning in the same loop. The restart of the
loop then correctly downgrade writeable mappings.

Fix an issue where a second caller to msync() might actually return
before the first caller had actually completed flushing the
pages. Clear the OBJ_MIGHTBEDIRTY flag after the cleaning loop, not
before.

Calls to pmap_is_modified() are not needed after pmap_remove_write()
there.

Proposed, reviewed and tested by:	alc
MFC after:	1 week
2010-12-29 12:53:53 +00:00
Alan Cox
8c22654d7e Implement and use a single optimized function for unholding a set of pages.
Reviewed by:	kib@
2010-12-17 22:41:22 +00:00
Jayachandran C.
48772ca4aa Revert the vm/vm_page.c change in r216317.
This adds back changes in r216141, which was reverted by the above
check in.
2010-12-09 07:39:06 +00:00
Jayachandran C.
aa93efedd8 swi_vm() for mips. 2010-12-09 06:54:06 +00:00
Warner Losh
6f1a8765be To make minidumps work properly on mips for memory that's direct
mapped and entered via vm_page_setup, keep track of it like we do
for amd64.

# A separate commit will be made to move this to a capability-based ifdef
# rather than arch-based ifdef.

Submitted by:	alc@
MFC after:	1 week
2010-12-03 04:39:48 +00:00
Alan Cox
05cb58f669 Correct an error in the allocation of the vm_page_dump array in
vm_page_startup().  Specifically, the dump_avail array should be used
instead of the phys_avail array to calculate the size of vm_page_dump.  For
example, the pages for the message buffer are allocated prior to
vm_page_startup() by subtracting them from the last entry in the phys_avail
array, but the first thing that vm_page_startup() does after creating the
vm_page_dump array is to set the bits corresponding to the message buffer
pages in that array.  However, these bits might not actually exist in the
array, because the size of the array is determined by the current value in
the last entry of the phys_avail array.  In general, the only reason why
this doesn't always result in an out-of-bounds array access is that the size
of the vm_page_dump array is rounded up to the next page boundary.  This
change eliminates that dependence on rounding (and luck).

MFC after:	6 weeks
2010-12-01 03:35:19 +00:00
Jayachandran C.
aa54636620 Fix issue noted by alc while reviewing r215938:
The current implementation of vm_page_alloc_freelist() does not handle
order > 0 correctly. Remove order parameter to the function and use it
only for order 0 pages.

Submitted by:	alc
2010-11-28 05:51:31 +00:00
Alan Cox
00f8bffc22 Reduce the amount of detail printed by vm_page_free_toq() when it panics.
Reviewed by:	kib
2010-11-19 17:49:08 +00:00
Konstantin Belousov
4166faaee0 Only increment object generation count when inserting the page into
object page list.  The only use of object generation count now is a
restart of the scan in vm_object_page_clean(), which makes sense to do
on the page addition. Page removals do not affect the dirtiness of the
object, as well as manipulations with the shadow chain.

Suggested and reviewed by:	alc
MFC after:    1 week
2010-11-18 20:46:28 +00:00
Oleksandr Tymoshenko
903ba3da86 - Add minidump support for FreeBSD/mips 2010-11-07 03:09:02 +00:00
Andriy Gapon
a9b89cf1c1 vm_page.c: include opt_msgbuf.h for MSGBUF_SIZE use in vm_page_startup
vm_page_startup uses MSGBUF_SIZE value for adding msgbuf pages to minidump.
If opt_msgbuf.h is not included and MSGBUF_SIZE is overriden in kernel
config, then not all msgbuf pages will be dumped.  And most importantly,
struct msgbuf itself will not be included.  Thus the dump would look
corrupted/incomplete to tools like kgdb, dmesg, etc that try to access
struct msgbuf as one of the first things they do when working on a crash
dump.

MFC after:	5 days
2010-09-03 10:40:53 +00:00
Jayachandran C.
49ca10d40c Redo the page table page allocation on MIPS, as suggested by
alc@.

The UMA zone based allocation is replaced by a scheme that creates
a new free page list for the KSEG0 region, and a new function
in sys/vm that allocates pages from a specific free page list.

This also fixes a race condition introduced by the UMA based page table
page allocation code. Dropping the page queue and pmap locks before
the call to uma_zfree, and re-acquiring them afterwards  will introduce
a race condtion(noted by alc@).

The changes are :
- Revert the earlier changes in MIPS pmap.c that added UMA zone for
page table pages.
- Add a new freelist VM_FREELIST_HIGHMEM to MIPS vmparam.h for memory that
is not directly mapped (in 32bit kernel). Normal page allocations will first
try the HIGHMEM freelist and then the default(direct mapped) freelist.
- Add a new function 'vm_page_t vm_page_alloc_freelist(int flind, int
order, int req)' to vm/vm_page.c to allocate a page from a specified
freelist. The MIPS page table pages will be allocated using this function
from the freelist containing direct mapped pages.
- Move the page initialization code from vm_phys_alloc_contig() to a
new function vm_page_alloc_init(), and use this function to initialize
pages in vm_page_alloc_freelist() too.
- Split the  function vm_phys_alloc_pages(int pool, int order) to create
vm_phys_alloc_freelist_pages(int flind, int pool, int order), and use
this function from both vm_page_alloc_freelist() and vm_phys_alloc_pages().

Reviewed by:	alc
2010-07-21 09:27:00 +00:00
Alan Cox
b99348e5ea Add support for the VM_ALLOC_COUNT() hint to vm_page_alloc(). Consequently,
the maintenance of vm_pageout_deficit can be localized to just two places:
vm_page_alloc() and vm_pageout_scan().

This change also corrects an off-by-one error in the maintenance of
vm_pageout_deficit.  Historically, the buffer cache functions, allocbuf()
and vm_hold_load_pages(), have not taken into account that vm_page_alloc()
already increments vm_pageout_deficit by one.

Reviewed by:	kib
2010-07-09 19:38:30 +00:00
Konstantin Belousov
1d9e77f6bf Make VM_ALLOC_RETRY flag mandatory for vm_page_grab(). Assert that the
flag is always provided, and unconditionally retry after sleep for the
busy page or failed allocation.

The intent is to remove VM_ALLOC_RETRY eventually.

Proposed and reviewed by:	alc
2010-07-08 08:37:51 +00:00
Konstantin Belousov
5f195aa32e Add the ability for the allocflag argument of the vm_page_grab() to
specify the increment of vm_pageout_deficit when sleeping due to page
shortage. Then, in allocbuf(), the code to allocate pages when extending
vmio buffer can be replaced by a call to vm_page_grab().

Suggested and reviewed by:	alc
MFC after:	2 weeks
2010-07-05 21:13:32 +00:00
Konstantin Belousov
b382c10a57 Introduce a helper function vm_page_find_least(). Use it in several places,
which inline the function.

Reviewed by:	alc
Tested by:	pho
MFC after:	1 week
2010-07-04 11:13:33 +00:00
Alan Cox
b64400a03f Improve the comment and man page for vm_page_alloc(). Specifically,
document one of the optional flags; clarify which of the flags are
optional (and which are not), and remove mention of a restriction on
the reclamation of cached pages that no longer holds since version 7.

MFC after:	1 week
2010-07-03 18:25:37 +00:00
Alan Cox
9cf5198832 With the demise of page coloring, the page queue macros no longer serve any
useful purpose.  Eliminate them.

Reviewed by:	kib
2010-07-02 15:02:51 +00:00
Alan Cox
91b4f42767 Introduce vm_page_next() and vm_page_prev(), and use them in
vm_pageout_clean().  When iterating over a range of pages, these functions
can be cheaper than vm_page_lookup() because their implementation takes
advantage of the vm_object's memq being ordered.

Reviewed by:	kib@
MFC after:	3 weeks
2010-06-21 23:27:24 +00:00
Alan Cox
9ee2165f5d Eliminate checks for a page having a NULL object in vm_pageout_scan()
and vm_pageout_page_stats().  These checks were recently introduced by
the first page locking commit, r207410, but they are not needed.  At
the same time, eliminate some redundant accesses to the page's object
field.  (These accesses should have neen eliminated by r207410.)

Make the assertion in vm_page_flag_set() stricter.  Specifically, only
managed pages should have PG_WRITEABLE set.

Add a comment documenting an assertion to vm_page_flag_clear().

It has long been the case that fictitious pages have their wire count
permanently set to one.  Add comments to vm_page_wire() and
vm_page_unwire() documenting this.  Add assertions to these functions
as well.

Update the comment describing vm_page_unwire().  Much of the old
comment had little to do with vm_page_unwire(), but a lot to do with
_vm_page_deactivate().  Move relevant parts of the old comment to
_vm_page_deactivate().

Only pages that belong to an object can be paged out.  Therefore, it
is pointless for vm_page_unwire() to acquire the page queues lock and
enqueue such pages in one of the paging queues.  Generally speaking,
such pages are immediately freed after the call to vm_page_unwire().
Previously, it was the call to vm_page_free() that reacquired the page
queues lock and removed these pages from the paging queues.  Now, we
will never acquire the page queues lock for this case.  (It is also
worth noting that since both vm_page_unwire() and vm_page_free()
occurred with the page locked, the page daemon never saw the page with
its object field set to NULL.)

Change the panic with vm_page_unwire() to provide a more precise message.

Reviewed by:	kib@
2010-06-14 19:54:19 +00:00
Alan Cox
ce18658792 Reduce the scope of the page queues lock and the number of
PG_REFERENCED changes in vm_pageout_object_deactivate_pages().
Simplify this function's inner loop using TAILQ_FOREACH(), and shorten
some of its overly long lines.  Update a stale comment.

Assert that PG_REFERENCED may be cleared only if the object containing
the page is locked.  Add a comment documenting this.

Assert that a caller to vm_page_requeue() holds the page queues lock,
and assert that the page is on a page queue.

Push down the page queues lock into pmap_ts_referenced() and
pmap_page_exists_quick().  (As of now, there are no longer any pmap
functions that expect to be called with the page queues lock held.)

Neither pmap_ts_referenced() nor pmap_page_exists_quick() should ever
be passed an unmanaged page.  Assert this rather than returning "0"
and "FALSE" respectively.

ARM:

Simplify pmap_page_exists_quick() by switching to TAILQ_FOREACH().

Push down the page queues lock inside of pmap_clearbit(), simplifying
pmap_clear_modify(), pmap_clear_reference(), and pmap_remove_write().
Additionally, this allows for avoiding the acquisition of the page
queues lock in some cases.

PowerPC/AIM:

moea*_page_exits_quick() and moea*_page_wired_mappings() will never be
called before pmap initialization is complete.  Therefore, the check
for moea_initialized can be eliminated.

Push down the page queues lock inside of moea*_clear_bit(),
simplifying moea*_clear_modify() and moea*_clear_reference().

The last parameter to moea*_clear_bit() is never used.  Eliminate it.

PowerPC/BookE:

Simplify mmu_booke_page_exists_quick()'s control flow.

Reviewed by:	kib@
2010-06-10 16:56:35 +00:00
Konstantin Belousov
2bbfbc3fe2 Add assertion and comment in vm_page_flag_set() describing the expectations
when the PG_WRITEABLE flag is set.

Reviewed by:	alc
2010-06-03 10:11:45 +00:00