a device pager (OBJT_DEVICE) object in that it uses fictitious pages to
provide aliases to other memory addresses. The primary difference is that
it uses an sglist(9) to determine the physical addresses for a given offset
into the object instead of invoking the d_mmap() method in a device driver.
Reviewed by: alc
Approved by: re (kensmith)
MFC after: 2 weeks
configuring machine-dependent memory attributes...":
Don't set the memory attribute for a "real" page that is allocated to
a device object in vm_page_alloc(). It is a pointless act, because
the device pager replaces this "real" page with a "fake" page and sets
the memory attribute on that "fake" page.
Eliminate pointless code from pmap_cache_bits() on amd64.
Employ the "Self Snoop" feature supported by some x86 processors to
avoid cache flushes in the pmap.
Approved by: re (kib)
dependent memory attributes:
Rename vm_cache_mode_t to vm_memattr_t. The new name reflects the
fact that there are machine-dependent memory attributes that have
nothing to do with controlling the cache's behavior.
Introduce vm_object_set_memattr() for setting the default memory
attributes that will be given to an object's pages.
Introduce and use pmap_page_{get,set}_memattr() for getting and
setting a page's machine-dependent memory attributes. Add full
support for these functions on amd64 and i386 and stubs for them on
the other architectures. The function pmap_page_set_memattr() is also
responsible for any other machine-dependent aspects of changing a
page's memory attributes, such as flushing the cache or updating the
direct map. The uses include kmem_alloc_contig(), vm_page_alloc(),
and the device pager:
kmem_alloc_contig() can now be used to allocate kernel memory with
non-default memory attributes on amd64 and i386.
vm_page_alloc() and the device pager will set the memory attributes
for the real or fictitious page according to the object's default
memory attributes.
Update the various pmap functions on amd64 and i386 that map pages to
incorporate each page's memory attributes in the mapping.
Notes: (1) Inherent to this design are safety features that prevent
the specification of inconsistent memory attributes by different
mappings on amd64 and i386. In addition, the device pager provides a
warning when a device driver creates a fictitious page with memory
attributes that are inconsistent with the real page that the
fictitious page is an alias for. (2) Storing the machine-dependent
memory attributes for amd64 and i386 as a dedicated "int" in "struct
md_page" represents a compromise between space efficiency and the ease
of MFCing these changes to RELENG_7.
In collaboration with: jhb
Approved by: re (kib)
following changes:
Rename vfs_page_set_valid() to vfs_page_set_validclean() to reflect
what this function actually does. Suggested by: tegge
Introduce a new version of vfs_page_set_valid() that does no more than
what the function's name implies. Specifically, it does not update
the page's dirty mask, and thus it does not require the page queues
lock to be held.
Update two of the three callers to the old vfs_page_set_valid() to
call vfs_page_set_validclean() instead because they actually require
the page's dirty mask to be cleared.
Introduce vm_page_set_valid().
Reviewed by: tegge
of the counter, that may happen when too many sendfile(2) calls are
being executed with this vnode [1].
To keep the size of the struct vm_page and offsets of the fields
accessed by out-of-tree modules, swap the types and locations
of the wire_count and cow fields. Add safety checks to detect cow
overflow and force fallback to the normal copy code for zero-copy
sockets. [2]
Reported by: Anton Yuzhaninov <citrin citrin ru> [1]
Suggested by: alc [2]
Reviewed by: alc
MFC after: 2 weeks
work. (Moreover, I don't believe that they have ever worked as intended.)
The explanation is fairly simple. Both MADV_DONTNEED and MADV_FREE perform
vm_page_dontneed() on each page within the range given to madvise(). This
function moves the page to the inactive queue. Specifically, if the page is
clean, it is moved to the head of the inactive queue where it is first in
line for processing by the page daemon. On the other hand, if it is dirty,
it is placed at the tail. Let's further examine the case in which the page
is clean. Recall that the page is at the head of the line for processing by
the page daemon. The expectation of vm_page_dontneed()'s author was that
the page would be transferred from the inactive queue to the cache queue by
the page daemon. (Once the page is in the cache queue, it is, in effect,
free, that is, it can be reallocated to a new vm object by vm_page_alloc()
if it isn't reactivated quickly enough by a user of the old vm object.) The
trouble is that nowhere in the execution of either MADV_DONTNEED or
MADV_FREE is either the machine-independent reference flag (PG_REFERENCED)
or the reference bit in any page table entry (PTE) mapping the page cleared.
Consequently, the immediate reaction of the page daemon is to reactivate the
page because it is referenced. In effect, the madvise() was for naught.
The case in which the page was dirty is not too different. Instead of being
laundered, the page is reactivated.
Note: The essential difference between MADV_DONTNEED and MADV_FREE is
that MADV_FREE clears a page's dirty field. So, MADV_FREE is always
executing the clean case above.
This revision changes vm_page_dontneed() to clear both the machine-
independent reference flag (PG_REFERENCED) and the reference bit in all PTEs
mapping the page.
MFC after: 6 weeks
contigmalloc(9) as a last resort to steal pages from an inactive,
partially-used superpage reservation.
Rename vm_reserv_reclaim() to vm_reserv_reclaim_inactive() and
refactor it so that a separate subroutine is responsible for breaking
the selected reservation. This subroutine is also used by
vm_reserv_reclaim_contig().
vm/vm_contig.c, vm/vm_page.c, and vm/vm_pageq.c. Today, vm/vm_pageq.c
has withered to the point that it contains only four short functions,
two of which are only used by vm/vm_page.c. Since I can't foresee any
reason for vm/vm_pageq.c to grow, it is time to fold the remaining
contents of vm/vm_pageq.c back into vm/vm_page.c.
Add some comments. Rename one of the functions, vm_pageq_enqueue(),
that is now static within vm/vm_page.c to vm_page_enqueue().
Eliminate PQ_MAXCOUNT as it no longer serves any purpose.
queues lock is acquired. Otherwise, the state of a reservation's
pages' flags and its population count can be inconsistent. That could
result in a page being freed twice.
Reported by: kris
machine-independent support for superpages. (The earlier part was
the rewrite of the physical memory allocator.) The remainder of the
code required for superpages support is machine-dependent and will
be added to the various pmap implementations at a later date.
Initially, I am only supporting one large page size per architecture.
Moreover, I am only enabling the reservation system on amd64. (In
an emergency, it can be disabled by setting VM_NRESERVLEVELS to 0
in amd64/include/vmparam.h or your kernel configuration file.)
page to be in the free lists. Instead, it now returns TRUE if it
removed the page from the free lists and FALSE if the page was not
in the free lists.
This change is required to support superpage reservations. Specifically,
once reservations are introduced, a cached page can either be in the
free lists or a reservation.
default object rather than cache it was to have
vm_pager_has_page(object, pindex, ...) == FALSE to imply that there is
no cached page in object at pindex. This allows to avoid explicit
checks for cached pages in vm_object_backing_scan().
For now, we need the same bandaid for the swap object, otherwise both
the vm_page_lookup() and the pager can report that there is no page at
offset, while page is stored in the cache. Also, this fixes another
instance of the KASSERT("object type is incompatible") failure in the
vm_page_cache_transfer().
Reported and tested by: Peter Holm
Reviewed by: alc
MFC after: 3 days
that would have an offset beyond the end of the target object. Such
pages should remain in the source object.
MFC after: 3 days
Diagnosed and reviewed by: Kostik Belousov
Reported and tested by: Peter Holm
it must first ensure that the page is no longer mapped. This is
trivially accomplished by calling pmap_remove_all() a little earlier
in vm_page_cache(). While I'm in the neighborbood, make a related
panic message a little more useful.
Approved by: re (kensmith)
Reported by: Peter Holm and Konstantin Belousov
Reviewed by: Konstantin Belousov
a consequence of sparc64/sparc64/vm_machdep.c revision 1.76. It occurs
when uma_small_free() frees a page. The solution has two parts: (1) Mark
pages allocated with VM_ALLOC_NOOBJ as PG_UNMANAGED. (2) Defer the lock
assertion in pmap_page_is_mapped() until after PG_UNMANAGED is tested.
This is safe because both PG_UNMANAGED and PG_FICTITIOUS are immutable
flags, i.e., they do not change state between the time that a page is
allocated and freed.
Approved by: re (kensmith)
PR: 116794
cache: vm_object_page_remove() should convert any cached pages that
fall with the specified range to free pages. Otherwise, there could
be a problem if a file is first truncated and then regrown.
Specifically, some old data from prior to the truncation might reappear.
Generalize vm_page_cache_free() to support the conversion of either a
subset or the entirety of an object's cached pages.
Reported by: tegge
Reviewed by: tegge
Approved by: re (kensmith)
ways:
(1) Cached pages are no longer kept in the object's resident page
splay tree and memq. Instead, they are kept in a separate per-object
splay tree of cached pages. However, access to this new per-object
splay tree is synchronized by the _free_ page queues lock, not to be
confused with the heavily contended page queues lock. Consequently, a
cached page can be reclaimed by vm_page_alloc(9) without acquiring the
object's lock or the page queues lock.
This solves a problem independently reported by tegge@ and Isilon.
Specifically, they observed the page daemon consuming a great deal of
CPU time because of pages bouncing back and forth between the cache
queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE). The source of
this problem turned out to be a deadlock avoidance strategy employed
when selecting a cached page to reclaim in vm_page_select_cache().
However, the root cause was really that reclaiming a cached page
required the acquisition of an object lock while the page queues lock
was already held. Thus, this change addresses the problem at its
root, by eliminating the need to acquire the object's lock.
Moreover, keeping cached pages in the object's primary splay tree and
memq was, in effect, optimizing for the uncommon case. Cached pages
are reclaimed far, far more often than they are reactivated. Instead,
this change makes reclamation cheaper, especially in terms of
synchronization overhead, and reactivation more expensive, because
reactivated pages will have to be reentered into the object's primary
splay tree and memq.
(2) Cached pages are now stored alongside free pages in the physical
memory allocator's buddy queues, increasing the likelihood that large
allocations of contiguous physical memory (i.e., superpages) will
succeed.
Finally, as a result of this change long-standing restrictions on when
and where a cached page can be reclaimed and returned by
vm_page_alloc(9) are eliminated. Specifically, calls to
vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and
return a formerly cached page. Consequently, a call to malloc(9)
specifying M_NOWAIT is less likely to fail.
Discussed with: many over the course of the summer, including jeff@,
Justin Husted @ Isilon, peter@, tegge@
Tested by: an earlier version by kris@
Approved by: re (kensmith)
vm_phys_free_pages(). Rename vm_phys_alloc_pages_locked() to
vm_phys_alloc_pages() and vm_phys_free_pages_locked() to
vm_phys_free_pages(). Add comments regarding the need for the free page
queues lock to be held by callers to these functions. No functional
changes.
Approved by: re (hrs)
vm_page_cowfault(). Initially, if vm_page_cowfault() sleeps, the given
page is wired, preventing it from being recycled. However, when
transmission of the page completes, the page is unwired and returned to
the page queues. At that point, the page is not in any special state
that prevents it from being recycled. Consequently, vm_page_cowfault()
should verify that the page is still held by the same vm object before
retrying the replacement of the page. Note: The containing object is,
however, safe from being recycled by virtue of having a non-zero
paging-in-progress count.
While I'm here, add some assertions and comments.
Approved by: re (rwatson)
MFC After: 3 weeks
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
In particular:
- Add an explicative table for locking of struct vmmeter members
- Apply new rules for some of those members
- Remove some unuseful comments
Heavily reviewed by: alc, bde, jeff
Approved by: jeff (mentor)
Now, we assume no more sched_lock protection for some of them and use the
distribuited loads method for vmmeter (distribuited through CPUs).
Reviewed by: alc, bde
Approved by: jeff (mentor)
Probabilly, a general approach is not the better solution here, so we should
solve the sched_lock protection problems separately.
Requested by: alc
Approved by: jeff (mentor)
vmcnts. This can be used to abstract away pcpu details but also changes
to use atomics for all counters now. This means sched lock is no longer
responsible for protecting counts in the switch routines.
Contributed by: Attilio Rao <attilio@FreeBSD.org>
VM_PHYSSEG_SPARSE depending on whether the physical address space is
densely or sparsely populated with memory. The effect of this
definition is to determine which of two implementations of
vm_page_array and PHYS_TO_VM_PAGE() is used. The legacy
implementation is obtained by defining VM_PHYSSEG_DENSE, and a new
implementation that trades off time for space is obtained by defining
VM_PHYSSEG_SPARSE. For now, all architectures except for ia64 and
sparc64 define VM_PHYSSEG_DENSE. Defining VM_PHYSSEG_SPARSE on ia64
allows the entirety of my Itanium 2's memory to be used. Previously,
only the first 1 GB could be used. Defining VM_PHYSSEG_SPARSE on
sparc64 allows USIIIi-based systems to boot without crashing.
This change is a combination of Nathan Whitehorn's patch and my own
work in perforce.
Discussed with: kmacy, marius, Nathan Whitehorn
PR: 112194
immediately flag any page that is allocated to a OBJT_PHYS object as
unmanaged in vm_page_alloc() rather than waiting for a later call to
vm_page_unmanage(). This allows for the elimination of some uses of
the page queues lock.
Change the type of the kernel and kmem objects from OBJT_DEFAULT to
OBJT_PHYS. This allows us to take advantage of the above change to
simplify the allocation of unmanaged pages in kmem_alloc() and
kmem_malloc().
Remove vm_page_unmanage(). It is no longer used.
vm_page_free_toq() to account for recent changes that allow
vm_page_free_toq() to be called on some pages without the page queues lock
being held, specifically, pages that are not contained in a vm object and
not a member of a page queue. (Examples of such pages include page table
pages, pv entry pages, and uma small alloc pages.)
is actually being added to the hold queue, not the free queue. At the same
time, avoid unnecessary tests to wake up threads waiting for free memory
and the idle thread that zeroes free pages. (These tests will be performed
later when the page finally moves from the hold queue to the free queue.)
inlined and a procedure call is made in the rare case, i.e., when it is
necessary to sleep. In this case, inlining the test actually makes the
kernel smaller.
page queues-synchronized flag. Reduce the scope of the page queues lock in
vm_fault() accordingly.
Move vm_fault()'s call to vm_object_set_writeable_dirty() outside of the
scope of the page queues lock. Reviewed by: tegge
Additionally, eliminate an unnecessary dereference in computing the
argument that is passed to vm_object_set_writeable_dirty().
synchronized by the lock on the object containing the page.
Transition PG_WANTED and PG_SWAPINPROG to use the new field,
eliminating the need for holding the page queues lock when setting
or clearing these flags. Rename PG_WANTED and PG_SWAPINPROG to
VPO_WANTED and VPO_SWAPINPROG, respectively.
Eliminate the assertion that the page queues lock is held in
vm_page_io_finish().
Eliminate the acquisition and release of the page queues lock
around calls to vm_page_io_finish() in kern_sendfile() and
vfs_unbusy_pages().
Originally, I had adopted sparc64's name, pmap_clear_write(), for the
function that is now pmap_remove_write(). However, this function is more
like pmap_remove_all() than like pmap_clear_modify() or
pmap_clear_reference(), hence, the name change.
The higher-level rationale behind this change is described in
src/sys/amd64/amd64/pmap.c revision 1.567. The short version is that I'm
trying to clean up and fix our support for execute access.
Reviewed by: marcel@ (ia64)
vm_page_startup(). As a result, we now only lookup the tunable once
instead of looking it up once for every physical page of memory in the
system. This cuts out about a 1 second or so delay in boot on x86
systems. The delay is much larger and more noticable on sun4v apparently.
Reported by: kmacy
MFC after: 1 week
These pages are allocated from the direct map, and were not previous
tracked. This included the vm_page_array and the early UMA bootstrap
pages.
Reviewed by: peter
via the debug.minidump sysctl and tunable.
Traditional dumps store all physical memory. This was once a good thing
when machines had a maximum of 64M of ram and 1GB of kvm. These days,
machines often have many gigabytes of ram and a smaller amount of kvm.
libkvm+kgdb don't have a way to access physical ram that is not mapped
into kvm at the time of the crash dump, so the extra ram being dumped
is mostly wasted.
Minidumps invert the process. Instead of dumping physical memory in
in order to guarantee that all of kvm's backing is dumped, minidumps
instead dump only memory that is actively mapped into kvm.
amd64 has a direct map region that things like UMA use. Obviously we
cannot dump all of the direct map region because that is effectively
an old style all-physical-memory dump. Instead, introduce a bitmap
and two helper routines (dump_add_page(pa) and dump_drop_page(pa)) that
allow certain critical direct map pages to be included in the dump.
uma_machdep.c's allocator is the intended consumer.
Dumps are a custom format. At the very beginning of the file is a header,
then a copy of the message buffer, then the bitmap of pages present in
the dump, then the final level of the kvm page table trees (2MB mappings
are expanded into a 4K page mappings), then the sparse physical pages
according to the bitmap. libkvm can now conveniently access the kvm
page table entries.
Booting my test 8GB machine, forcing it into ddb and forcing a dump
leads to a 48MB minidump. While this is a best case, I expect minidumps
to be in the 100MB-500MB range. Obviously, never larger than physical
memory of course.
minidumps are on by default. It would want be necessary to turn them off
if it was necessary to debug corrupt kernel page table management as that
would mess up minidumps as well.
Both minidumps and regular dumps are supported on the same machine.
and it has not plenty of free pages it tries to free pages in the cache queue.
Unfortunately freeing a cached page requires the locking of the object that
owns the page. However in the context of allocating pages we may not be able
to lock the object and thus can only TRY to lock the object. If the locking try
fails the cache page can not be freed and is activated to move it out of the way
so that we may try to free other cache pages.
If all pages in the cache belong to objects that are currently locked the
cache queue can be emptied without freeing a single page. This scenario caused
two problems:
1) vm_page_alloc always failed allocation when it tried freeing pages from
the cache queue and failed to do so. However if there are more than
cnt.v_interrupt_free_min pages on the free list it should return pages
when requested with priority VM_ALLOC_SYSTEM. Failure to do so can cause
resource exhaustion deadlocks.
2) Threads than need to allocate pages spend a lot of time cleaning up the
page queue without really getting anything done while the pagedaemon
needs to work overtime to refill the cache.
This change fixes the first problem. (1)
Reviewed by: tegge@
- provide an interface (macros) to the page coloring part of the VM system,
this allows to try different coloring algorithms without the need to
touch every file [1]
- make the page queue tuning values readable: sysctl vm.stats.pagequeue
- autotuning of the page coloring values based upon the cache size instead
of options in the kernel config (disabling of the page coloring as a
kernel option is still possible)
MD changes:
- detection of the cache size: only IA32 and AMD64 (untested) contains
cache size detection code, every other arch just comes with a dummy
function (this results in the use of default values like it was the
case without the autotuning of the page coloring)
- print some more info on Intel CPU's (like we do on AMD and Transmeta
CPU's)
Note to AMD owners (IA32 and AMD64): please run "sysctl vm.stats.pagequeue"
and report if the cache* values are zero (= bug in the cache detection code)
or not.
Based upon work by: Chad David <davidc@acns.ab.ca> [1]
Reviewed by: alc, arch (in 2004)
Discussed with: alc, Chad David, arch (in 2004)
by the zero-copy sockets method, and written to before the transmission
completes, we need to destroy all of the existing mappings to the page,
not just the one that we fault on. Otherwise, the mappings will no longer
be to the same page and changes made through one of the mappings will not
be visible through the others.
Observed by: tegge
If a copy-on-write fault occurs on the page, the new copy should inherit
a part of the original page's wire count.
Submitted by: tegge
MFC after: 1 week
is inserted.
- In vm_page_remove() drop the backing vnode when the last page
is removed.
- Don't check the vnode to see if it must be reclaimed on every
call to vm_page_free_toq() as we only check it now when it is
actually required. This saves us two lock operations per call.
Sponsored by: Isilon Systems, Inc.
queue and (possibly) unlocking the containing object from
vm_page_alloc() to vm_page_select_cache(). Recent optimizations to
vm_map_pmap_enter() (see vm_map.c revisions 1.362 and 1.363) and
pmap_enter_quick() have resulted in panic()s because vm_page_alloc()
mistakenly unlocked objects that had not been locked by
vm_page_select_cache().
Reported by: Peter Holm and Kris Kennaway
need for most calls to vm_page_busy(). Specifically, most calls to
vm_page_busy() occur immediately prior to a call to vm_page_remove().
In such cases, the containing vm object is locked across both calls.
Consequently, the setting of the vm page's PG_BUSY flag is not even
visible to other threads that are following the synchronization
protocol.
This change (1) eliminates the calls to vm_page_busy() that
immediately precede a call to vm_page_remove() or functions, such as
vm_page_free() and vm_page_rename(), that call it and (2) relaxes the
requirement in vm_page_remove() that the vm page's PG_BUSY flag is
set. Now, the vm page's PG_BUSY flag is set only when the vm object
lock is released while the vm page is still in transition. Typically,
this is when it is undergoing I/O.
that indicates that the caller does not want a page with its busy flag set.
In many places, the global page queues lock is acquired and released just
to clear the busy flag on a just allocated page. Both the allocation of
the page and the clearing of the busy flag occur while the containing vm
object is locked. So, the busy flag might as well never be set.
errors are in rarely executed paths.
1. Each time the retry_alloc path is taken, the PG_BUSY must be set again.
Otherwise vm_page_remove() panics.
2. There is no need to set PG_BUSY on the newly allocated page before
freeing it. The page already has PG_BUSY set by vm_page_alloc().
Setting it again could cause an assertion failure.
MFC after: 2 weeks
vm_page_io_finish(). The motivation being to transition synchronization of
the vm_page's busy field from the global page queues lock to the per-object
lock.
and which takes a M_WAITOK/M_NOWAIT flag argument.
Add compatibility isa_dmainit() macro which whines loudly if
isa_dma_init() fails.
Problem uncovered by: tegge
- Enable recursion on the page queues lock. This allows calls to
vm_page_alloc(VM_ALLOC_NORMAL) and UMA's obj_alloc() with the page
queues lock held. Such calls are made to allocate page table pages
and pv entries.
- The previous change enables a partial reversion of vm/vm_page.c
revision 1.216, i.e., the call to vm_page_alloc() by vm_page_cowfault()
now specifies VM_ALLOC_NORMAL rather than VM_ALLOC_INTERRUPT.
- Add partial locking to pmap_copy(). (As a side-effect, pmap_copy()
should now be faster on i386 SMP because it no longer generates IPIs
for TLB shootdown on the other processors.)
- Complete the locking of pmap_enter() and pmap_enter_quick(). (As of now,
all changes to a user-level pmap on alpha, amd64, and i386 are performed
with appropriate locking.)
being incomplete, it currently has to know how to drop and pick back
up the vm_object's mutex if it has to sleep and drop the page queue
mutex. The problem with this is that if the page is busy, while we
are sleeping, the page can be freed and object disappear. When trying
to lock m->object, we'd get a stale or NULL pointer and crash.
The object is now cached, but this makes the assumption that
the object is referenced in some manner and will not itself
disappear while it is unlocked. Since this only happens if
the object is locked, I had to remove an assumption earlier in
contigmalloc() that reversed the order of locking the object and
doing vm_page_sleep_if_busy(), not the normal order.
allocated as "no object" pages. Similar changes were made to the amd64
and i386 pmap last year. The primary reason being that maintaining
a pte object leads to lock order violations. A secondary reason being
that the pte object is redundant, i.e., the page table itself can be
used to lookup page table pages. (Historical note: The pte object
predates our ability to allocate "no object" pages. Thus, the pte
object was a necessary evil.)
- Unconditionally check the vm object lock's status in vm_page_remove().
Previously, this assertion could not be made on Alpha due to its use
of a pte object.
being that PHYS_TO_VM_PAGE() returns the wrong vm_page for fictitious
pages but unwiring uses PHYS_TO_VM_PAGE(). The resulting panic
reported an unexpected wired count. Rather than attempting to fix
PHYS_TO_VM_PAGE(), this fix takes advantage of the properties of
fictitious pages. Specifically, fictitious pages will never be
completely unwired. Therefore, we can keep a fictitious page's wired
count forever set to one and thereby avoid the use of
PHYS_TO_VM_PAGE() when we know that we're working with a fictitious
page, just not which one.
In collaboration with: green@, tegge@
PR: kern/29915
Some of the conditions that caused vm_page_select_cache() to deactivate a
page were wrong. For example, deactivating an unmanaged or wired page is a
nop. Thus, if vm_page_select_cache() had ever encountered an unmanaged or
wired page, it would have looped forever. Now, we assert that the page is
neither unmanaged nor wired.
caller to vm_page_grab(). Although this gives VM_ALLOC_ZERO a
different meaning for vm_page_grab() than for vm_page_alloc(), I feel
such change is necessary to accomplish other goals. Specifically, I
want to make the PG_ZERO flag immutable between the time it is
allocated by vm_page_alloc() and freed by vm_page_free() or
vm_page_free_zero() to avoid locking overheads. Once we gave up on
the ability to automatically recognize a zeroed page upon entry to
vm_page_free(), the ability to mutate the PG_ZERO flag became useless.
Instead, I would like to say that "Once a page becomes valid, its
PG_ZERO flag must be ignored."
vm_page_free() is called. The problem with holding this lock is that it is
a spin lock and vm_page_free() may attempt the acquisition of a different
default-type lock.
could result in a panic "vm_page_cache: caching a dirty page, ...":
Access to the page must be restricted or removed before calling
vm_page_cache(). This race condition is identical in nature to that
which was addressed by vm_pageout.c's revision 1.251.
- Simplify the code surrounding the fix to this same race condition
in vm_pageout.c's revision 1.251. There should be no behavioral
change. Reviewed by: tegge
MFC after: 7 days
free pages queue. This is presently needed by contigmalloc1().
- Move a sanity check against attempted double allocation of two pages
to the same vm object offset from vm_page_alloc() to vm_page_insert().
This provides better protection because double allocation could occur
through a direct call to vm_page_insert(), such as that by
vm_page_rename().
- Modify contigmalloc1() to hold the mutex synchronizing access to the
free pages queue while it scans vm_page_array in search of free pages.
- Correct a potential leak of pages by contigmalloc1() that I introduced
in revision 1.20: We must convert all cache queue pages to free pages
before we begin removing free pages from the free queue. Otherwise,
if we have to restart the scan because we are unable to acquire the
vm object lock that is necessary to convert a cache queue page to a
free page, we leak those free pages already removed from the free queue.
vm object hasn't changed, the desired page will be at or near the root
of the vm object's splay tree, making vm_page_lookup() cheap. (The only
lock required for vm_page_lookup() is already held.) If, however, the
vm object has changed and retry was requested, eliminating the generation
check also eliminates a pointless acquisition and release of the page
queues lock.
This guard page would have trapped the problems with the MFC of the PAE
support to RELENG_4 at an earlier point in the sequence of events.
Submitted by: tegge
vm_pageout_scan(). Rationale: I don't like leaving a busy page in the
cache queue with neither the vm object nor the vm page queues lock held.
- Assert that the page is active in vm_pageout_page_stats().
pmap_copy_page() et al. to accept a vm_page_t rather than a physical
address. Also, this change will facilitate locking access to the vm page's
valid field.
color in vm_page_alloc(). (This also has small performance benefits.)
- Eliminate vm_page_select_free(); vm_page_alloc() might as well
call vm_pageq_find() directly.
releasing the lock only if we are about to sleep (e.g., vm_pager_get_pages()
or vm_pager_has_pages()). If we sleep, we have marked the vm object with
the paging-in-progress flag.
- Remove the Giant required from vm_page_free_toq(). (Any locking
errors will be caught by vm_page_remove().)
This remedies a panic that occurred when kmem_malloc(NOWAIT) performed
without Giant failed to allocate the necessary pages.
Reported by: phk
called without Giant; and obj_alloc() in turn calls vm_page_alloc()
without Giant. This causes an assertion failure in vm_page_alloc().
Fortunately, obj_alloc() is now MPSAFE. So, we need only clean up
some assertions.
- Weaken the assertion in vm_page_lookup() to require Giant only
if the vm_object isn't locked.
- Remove an assertion from vm_page_alloc() that duplicates a check
performed in vm_page_lookup().
In collaboration with: gallatin, jake, jeff
where physical addresses larger than virtual addresses, such as i386s
with PAE.
- Use this to represent physical addresses in the MI vm system and in the
i386 pmap code. This also changes the paddr parameter to d_mmap_t.
- Fix printf formats to handle physical addresses >4G in the i386 memory
detection code, and due to kvtop returning vm_paddr_t instead of u_long.
Note that this is a name change only; vm_paddr_t is still the same as
vm_offset_t on all currently supported platforms.
Sponsored by: DARPA, Network Associates Laboratories
Discussed with: re, phk (cdevsw change)
The objective being to eliminate some cases of page queues locking.
(See, for example, vm/vm_fault.c revision 1.160.)
Reviewed by: tegge
(Also, pointed out by tegge that I changed vm_fault.c before changing
vm_page.c. Oops.)
requests when the number of free pages is below the reserved threshold.
Previously, VM_ALLOC_ZERO was only honored when the number of free pages
was above the reserved threshold. Honoring it in all cases generally
makes sense, does no harm, and simplifies the code.