Commit Graph

365 Commits

Author SHA1 Message Date
alc
3c35e9380c Defer setting either PG_CACHED or PG_FREE until after the free page
queues lock is acquired.  Otherwise, the state of a reservation's
pages' flags and its population count can be inconsistent.  That could
result in a page being freed twice.

Reported by:	kris
2008-01-02 04:43:47 +00:00
alc
4565fa1697 Add the superpage reservation system. This is "part 2 of 2" of the
machine-independent support for superpages.  (The earlier part was
the rewrite of the physical memory allocator.)  The remainder of the
code required for superpages support is machine-dependent and will
be added to the various pmap implementations at a later date.

Initially, I am only supporting one large page size per architecture.
Moreover, I am only enabling the reservation system on amd64.  (In
an emergency, it can be disabled by setting VM_NRESERVLEVELS to 0
in amd64/include/vmparam.h or your kernel configuration file.)
2007-12-29 19:53:04 +00:00
alc
4518d14d23 Modify vm_phys_unfree_page() so that it no longer requires the given
page to be in the free lists.  Instead, it now returns TRUE if it
removed the page from the free lists and FALSE if the page was not
in the free lists.

This change is required to support superpage reservations.  Specifically,
once reservations are introduced, a cached page can either be in the
free lists or a reservation.
2007-12-20 22:45:54 +00:00
alc
3c2abd13fb Eliminate redundant code from vm_page_startup(). 2007-12-19 05:47:50 +00:00
alc
5929e7ecb6 Simplify vm_page_free_toq(). 2007-12-11 21:20:34 +00:00
alc
cf47268b02 Correct a comment. 2007-12-02 07:43:42 +00:00
alc
018efe29f9 When reactivating a cached page, reset the page's pool to the default
pool.  (Not doing this before was a performance pessimization but not
a cause for panic.)
2007-11-21 23:22:10 +00:00
kib
8cd6397d8a The intent of the freeing the (zeroed) page in vm_page_cache() for
default object rather than cache it was to have
vm_pager_has_page(object, pindex, ...) == FALSE to imply that there is
no cached page in object at pindex. This allows to avoid explicit
checks for cached pages in vm_object_backing_scan().

For now, we need the same bandaid for the swap object, otherwise both
the vm_page_lookup() and the pager can report that there is no page at
offset, while page is stored in the cache. Also, this fixes another
instance of the KASSERT("object type is incompatible") failure in the
vm_page_cache_transfer().

Reported and tested by:	Peter Holm
Reviewed by:	alc
MFC after:	3 days
2007-11-05 10:25:12 +00:00
alc
acb713befa Change vm_page_cache_transfer() such that it does not transfer pages
that would have an offset beyond the end of the target object.  Such
pages should remain in the source object.

MFC after:	3 days
Diagnosed and reviewed by:	Kostik Belousov
Reported and tested by:		Peter Holm
2007-10-27 00:09:30 +00:00
alc
d53c0afe54 In the rare case that vm_page_cache() actually frees the given page,
it must first ensure that the page is no longer mapped.  This is
trivially accomplished by calling pmap_remove_all() a little earlier
in vm_page_cache().  While I'm in the neighborbood, make a related
panic message a little more useful.

Approved by:	re (kensmith)
Reported by:	Peter Holm and Konstantin Belousov
Reviewed by:	Konstantin Belousov
2007-10-08 18:01:38 +00:00
alc
19c4fce2e3 Correct a lock assertion failure in sparc64's pmap_page_is_mapped() that is
a consequence of sparc64/sparc64/vm_machdep.c revision 1.76.  It occurs
when uma_small_free() frees a page.  The solution has two parts: (1) Mark
pages allocated with VM_ALLOC_NOOBJ as PG_UNMANAGED.  (2) Defer the lock
assertion in pmap_page_is_mapped() until after PG_UNMANAGED is tested.
This is safe because both PG_UNMANAGED and PG_FICTITIOUS are immutable
flags, i.e., they do not change state between the time that a page is
allocated and freed.

Approved by:	re (kensmith)
PR:		116794
2007-10-07 18:03:03 +00:00
alc
9d3ffe57ce Correct an error of omission in the reimplementation of the page
cache: vm_object_page_remove() should convert any cached pages that
fall with the specified range to free pages.  Otherwise, there could
be a problem if a file is first truncated and then regrown.
Specifically, some old data from prior to the truncation might reappear.

Generalize vm_page_cache_free() to support the conversion of either a
subset or the entirety of an object's cached pages.

Reported by: tegge
Reviewed by: tegge
Approved by: re (kensmith)
2007-09-27 04:21:59 +00:00
alc
d1bce06c64 Change the management of cached pages (PQ_CACHE) in two fundamental
ways:

(1) Cached pages are no longer kept in the object's resident page
splay tree and memq.  Instead, they are kept in a separate per-object
splay tree of cached pages.  However, access to this new per-object
splay tree is synchronized by the _free_ page queues lock, not to be
confused with the heavily contended page queues lock.  Consequently, a
cached page can be reclaimed by vm_page_alloc(9) without acquiring the
object's lock or the page queues lock.

This solves a problem independently reported by tegge@ and Isilon.
Specifically, they observed the page daemon consuming a great deal of
CPU time because of pages bouncing back and forth between the cache
queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE).  The source of
this problem turned out to be a deadlock avoidance strategy employed
when selecting a cached page to reclaim in vm_page_select_cache().
However, the root cause was really that reclaiming a cached page
required the acquisition of an object lock while the page queues lock
was already held.  Thus, this change addresses the problem at its
root, by eliminating the need to acquire the object's lock.

Moreover, keeping cached pages in the object's primary splay tree and
memq was, in effect, optimizing for the uncommon case.  Cached pages
are reclaimed far, far more often than they are reactivated.  Instead,
this change makes reclamation cheaper, especially in terms of
synchronization overhead, and reactivation more expensive, because
reactivated pages will have to be reentered into the object's primary
splay tree and memq.

(2) Cached pages are now stored alongside free pages in the physical
memory allocator's buddy queues, increasing the likelihood that large
allocations of contiguous physical memory (i.e., superpages) will
succeed.

Finally, as a result of this change long-standing restrictions on when
and where a cached page can be reclaimed and returned by
vm_page_alloc(9) are eliminated.  Specifically, calls to
vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and
return a formerly cached page.  Consequently, a call to malloc(9)
specifying M_NOWAIT is less likely to fail.

Discussed with: many over the course of the summer, including jeff@,
   Justin Husted @ Isilon, peter@, tegge@
Tested by: an earlier version by kris@
Approved by: re (kensmith)
2007-09-25 06:25:06 +00:00
alc
215153274b Add a counter for the total number of pages cached and support for
reporting the value of this counter in the program "vmstat".

Approved by:	re (rwatson)
2007-07-27 20:01:22 +00:00
alc
dc39b85c98 Eliminate two unused functions: vm_phys_alloc_pages() and
vm_phys_free_pages().  Rename vm_phys_alloc_pages_locked() to
vm_phys_alloc_pages() and vm_phys_free_pages_locked() to
vm_phys_free_pages().  Add comments regarding the need for the free page
queues lock to be held by callers to these functions.  No functional
changes.

Approved by:	re (hrs)
2007-07-14 21:21:17 +00:00
alc
3b36c8e7eb Correct a problem in the ZERO_COPY_SOCKETS option, specifically, in
vm_page_cowfault().  Initially, if vm_page_cowfault() sleeps, the given
page is wired, preventing it from being recycled.  However, when
transmission of the page completes, the page is unwired and returned to
the page queues.  At that point, the page is not in any special state
that prevents it from being recycled.  Consequently, vm_page_cowfault()
should verify that the page is still held by the same vm object before
retrying the replacement of the page.  Note: The containing object is,
however, safe from being recycled by virtue of having a non-zero
paging-in-progress count.

While I'm here, add some assertions and comments.

Approved by: re (rwatson)
MFC After: 3 weeks
2007-07-10 18:41:34 +00:00
mjacob
a7dcde4629 Don't declare inline a function which isn't. 2007-06-17 04:19:05 +00:00
alc
011d4e557f If attempting to cache a "busy", panic instead of printing a diagnostic
message and returning.
2007-06-16 21:07:51 +00:00
alc
a8415c5a0d Enable the new physical memory allocator.
This allocator uses a binary buddy system with a twist.  First and
foremost, this allocator is required to support the implementation of
superpages.  As a side effect, it enables a more robust implementation
of contigmalloc(9).  Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).

The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages.  Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space.  The performance benefits vary.  In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.

This allocator does not implement page coloring.  The reason is that
superpages have much the same effect.  The contiguous physical memory
allocation necessary for a superpage is inherently colored.

Finally, the one caveat is that this allocator does not effectively
support prezeroed pages.  I hope this is temporary.  On i386, this is
a slight pessimization.  However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects.  I speculate
that this is true in general of machines with a direct map.

Approved by:	re
2007-06-16 04:57:06 +00:00
attilio
e9fc4edc44 Optimize vmmeter locking.
In particular:
- Add an explicative table for locking of struct vmmeter members
- Apply new rules for some of those members
- Remove some unuseful comments

Heavily reviewed by: alc, bde, jeff
Approved by: jeff (mentor)
2007-06-10 21:59:14 +00:00
attilio
9bd4fdf7ce Do proper "locking" for missing vmmeters part.
Now, we assume no more sched_lock protection for some of them and use the
distribuited loads method for vmmeter (distribuited through CPUs).

Reviewed by: alc, bde
Approved by: jeff (mentor)
2007-06-04 21:45:18 +00:00
attilio
7dd8ed88a9 Revert VMCNT_* operations introduction.
Probabilly, a general approach is not the better solution here, so we should
solve the sched_lock protection problems separately.

Requested by: alc
Approved by: jeff (mentor)
2007-05-31 22:52:15 +00:00
jeff
953418f0d5 - rename VMCNT_DEC to VMCNT_SUB to reflect the count argument.
Suggested by:	julian@
Contributed by:	attilio@
2007-05-20 22:33:42 +00:00
jeff
e1996cb960 - define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating
vmcnts.  This can be used to abstract away pcpu details but also changes
   to use atomics for all counters now.  This means sched lock is no longer
   responsible for protecting counts in the switch routines.

Contributed by:		Attilio Rao <attilio@FreeBSD.org>
2007-05-18 07:10:50 +00:00
alc
b34f6f7ab1 Define every architecture as either VM_PHYSSEG_DENSE or
VM_PHYSSEG_SPARSE depending on whether the physical address space is
densely or sparsely populated with memory.  The effect of this
definition is to determine which of two implementations of
vm_page_array and PHYS_TO_VM_PAGE() is used.  The legacy
implementation is obtained by defining VM_PHYSSEG_DENSE, and a new
implementation that trades off time for space is obtained by defining
VM_PHYSSEG_SPARSE.  For now, all architectures except for ia64 and
sparc64 define VM_PHYSSEG_DENSE.  Defining VM_PHYSSEG_SPARSE on ia64
allows the entirety of my Itanium 2's memory to be used.  Previously,
only the first 1 GB could be used.  Defining VM_PHYSSEG_SPARSE on
sparc64 allows USIIIi-based systems to boot without crashing.

This change is a combination of Nathan Whitehorn's patch and my own
work in perforce.

Discussed with: kmacy, marius, Nathan Whitehorn
PR:		112194
2007-05-05 19:50:28 +00:00
alc
573a964db6 Change the way that unmanaged pages are created. Specifically,
immediately flag any page that is allocated to a OBJT_PHYS object as
unmanaged in vm_page_alloc() rather than waiting for a later call to
vm_page_unmanage().  This allows for the elimination of some uses of
the page queues lock.

Change the type of the kernel and kmem objects from OBJT_DEFAULT to
OBJT_PHYS.  This allows us to take advantage of the above change to
simplify the allocation of unmanaged pages in kmem_alloc() and
kmem_malloc().

Remove vm_page_unmanage().  It is no longer used.
2007-02-25 06:14:58 +00:00
alc
c0ed1b65cf Enable vm_page_free() and vm_page_free_zero() to be called on some pages
without the page queues lock being held, specifically, pages that are not
contained in a vm object and not a member of a page queue.
2007-02-18 05:54:42 +00:00
alc
7913035c29 Remove a stale comment. Add punctuation to a nearby comment. 2007-02-17 19:37:00 +00:00
alc
3e7d1b7ebd Relax the page queue lock assertions in vm_page_remove() and
vm_page_free_toq() to account for recent changes that allow
vm_page_free_toq() to be called on some pages without the page queues lock
being held, specifically, pages that are not contained in a vm object and
not a member of a page queue.  (Examples of such pages include page table
pages, pv entry pages, and uma small alloc pages.)
2007-02-15 05:43:38 +00:00
alc
0250171bf6 Avoid the unnecessary acquisition of the free page queues lock when a page
is actually being added to the hold queue, not the free queue.  At the same
time, avoid unnecessary tests to wake up threads waiting for free memory
and the idle thread that zeroes free pages.  (These tests will be performed
later when the page finally moves from the hold queue to the free queue.)
2007-02-14 07:05:55 +00:00
alc
909177e227 Use the free page queue mutex instead of the page queue mutex to
synchronize sleeping and waking of the zero idle thread.
2007-02-11 05:18:40 +00:00
alc
2eb15b506b Change the pagedaemon, vm_wait(), and vm_waitpfault() to sleep on the
vm page queue free mutex instead of the vm page queue mutex.
2007-02-07 06:37:30 +00:00
alc
4881bd38e2 Change the free page queue lock from a spin mutex to a default (blocking)
mutex.  With the demise of Alpha support, there is no longer a reason for
it to be a spin mutex.
2007-02-05 06:02:55 +00:00
kmacy
d0a44b7b7c Remove the requirement that phys_avail be sorted in ascending order
by explicitly finding the lowest and highest addresses when calculating
the size of the vm_pages array

Reviewed by :alc
2006-12-08 08:44:47 +00:00
alc
375373276d I misplaced the assertion that was added to vm_page_startup() in the
previous change.  Correct its placement.
2006-11-08 19:11:54 +00:00
alc
763826feff Simplify the construction of the free queues in vm_page_startup(). Add
an assertion to test a hypothesis concerning other redundant computation
in vm_page_startup().
2006-11-08 18:43:47 +00:00
alc
5d9c66a3f8 The page queues lock is no longer required by vm_page_busy() or
vm_page_wakeup().  Reduce or eliminate its use accordingly.
2006-10-22 21:18:48 +00:00
alc
cbcb760109 Replace PG_BUSY with VPO_BUSY. In other words, changes to the page's
busy flag, i.e., VPO_BUSY, are now synchronized by the per-vm object
lock instead of the global page queues lock.
2006-10-22 04:28:14 +00:00
kensmith
0c209e1877 Fix two minor style(9) nits in v1.313 which were noticed during an
MFC review.  alc@ will be MFCing V1.313 plus style fix to RELENG_6.
2006-09-29 00:20:56 +00:00
alc
f2ccfe9525 Refactor vm_page_sleep_if_busy() so that the test for a busy page is
inlined and a procedure call is made in the rare case, i.e., when it is
necessary to sleep.  In this case, inlining the test actually makes the
kernel smaller.
2006-08-27 19:50:13 +00:00
alc
ce3ad47700 Page flags are reset on (re)allocation. There is no need to clear any
flags except for PG_ZERO in vm_page_free_toq().
2006-08-21 00:34:31 +00:00
alc
cc1f2c465b Reimplement the page's NOSYNC flag as an object-synchronized instead of a
page queues-synchronized flag.  Reduce the scope of the page queues lock in
vm_fault() accordingly.

Move vm_fault()'s call to vm_object_set_writeable_dirty() outside of the
scope of the page queues lock.  Reviewed by: tegge
Additionally, eliminate an unnecessary dereference in computing the
argument that is passed to vm_object_set_writeable_dirty().
2006-08-13 00:11:09 +00:00
alc
b787cad1e0 Ensure that the page's new field for object-synchronized flags is always
initialized to zero.

Call vm_page_sleep_if_busy() instead of duplicating its implementation in
vm_page_grab().
2006-08-11 17:18:58 +00:00
alc
bc546843d7 Change vm_page_cowfault() so that it doesn't allocate a pre-busied page. 2006-08-10 04:48:29 +00:00
alc
b98eae58a6 Introduce a field to struct vm_page for storing flags that are
synchronized by the lock on the object containing the page.

Transition PG_WANTED and PG_SWAPINPROG to use the new field,
eliminating the need for holding the page queues lock when setting
or clearing these flags.  Rename PG_WANTED and PG_SWAPINPROG to
VPO_WANTED and VPO_SWAPINPROG, respectively.

Eliminate the assertion that the page queues lock is held in
vm_page_io_finish().

Eliminate the acquisition and release of the page queues lock
around calls to vm_page_io_finish() in kern_sendfile() and
vfs_unbusy_pages().
2006-08-09 17:43:27 +00:00
alc
84c8fb9bd2 Change vm_page_sleep_if_busy() so that it no longer requires the caller to
hold the page queues lock.
2006-08-06 00:15:40 +00:00
alc
cbc0dafbb2 When sleeping on a busy page, use the lock from the containing object
rather than the global page queues lock.
2006-08-03 23:56:11 +00:00
alc
a152234cf9 Complete the transition from pmap_page_protect() to pmap_remove_write().
Originally, I had adopted sparc64's name, pmap_clear_write(), for the
function that is now pmap_remove_write().  However, this function is more
like pmap_remove_all() than like pmap_clear_modify() or
pmap_clear_reference(), hence, the name change.

The higher-level rationale behind this change is described in
src/sys/amd64/amd64/pmap.c revision 1.567.  The short version is that I'm
trying to clean up and fix our support for execute access.

Reviewed by: marcel@ (ia64)
2006-08-01 19:06:06 +00:00
alc
d0e4b9565d Eliminate OBJ_WRITEABLE. It hasn't been used in a long time. 2006-07-21 06:40:29 +00:00
jhb
d4be78a6fa Move the code to handle the vm.blacklist tunable up a layer into
vm_page_startup().  As a result, we now only lookup the tunable once
instead of looking it up once for every physical page of memory in the
system.  This cuts out about a 1 second or so delay in boot on x86
systems.  The delay is much larger and more noticable on sun4v apparently.

Reported by:	kmacy
MFC after:	1 week
2006-06-23 16:44:24 +00:00