Commit Graph

447 Commits

Author SHA1 Message Date
Alan Cox
2446e4f02c Enable the new physical memory allocator.
This allocator uses a binary buddy system with a twist.  First and
foremost, this allocator is required to support the implementation of
superpages.  As a side effect, it enables a more robust implementation
of contigmalloc(9).  Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).

The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages.  Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space.  The performance benefits vary.  In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.

This allocator does not implement page coloring.  The reason is that
superpages have much the same effect.  The contiguous physical memory
allocation necessary for a superpage is inherently colored.

Finally, the one caveat is that this allocator does not effectively
support prezeroed pages.  I hope this is temporary.  On i386, this is
a slight pessimization.  However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects.  I speculate
that this is true in general of machines with a direct map.

Approved by:	re
2007-06-16 04:57:06 +00:00
Attilio Rao
393a081d42 Optimize vmmeter locking.
In particular:
- Add an explicative table for locking of struct vmmeter members
- Apply new rules for some of those members
- Remove some unuseful comments

Heavily reviewed by: alc, bde, jeff
Approved by: jeff (mentor)
2007-06-10 21:59:14 +00:00
Attilio Rao
b4b7081961 Do proper "locking" for missing vmmeters part.
Now, we assume no more sched_lock protection for some of them and use the
distribuited loads method for vmmeter (distribuited through CPUs).

Reviewed by: alc, bde
Approved by: jeff (mentor)
2007-06-04 21:45:18 +00:00
Attilio Rao
2feb50bf7d Revert VMCNT_* operations introduction.
Probabilly, a general approach is not the better solution here, so we should
solve the sched_lock protection problems separately.

Requested by: alc
Approved by: jeff (mentor)
2007-05-31 22:52:15 +00:00
Jeff Roberson
80b200da28 - rename VMCNT_DEC to VMCNT_SUB to reflect the count argument.
Suggested by:	julian@
Contributed by:	attilio@
2007-05-20 22:33:42 +00:00
Jeff Roberson
222d01951f - define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating
vmcnts.  This can be used to abstract away pcpu details but also changes
   to use atomics for all counters now.  This means sched lock is no longer
   responsible for protecting counts in the switch routines.

Contributed by:		Attilio Rao <attilio@FreeBSD.org>
2007-05-18 07:10:50 +00:00
Alan Cox
04a18977c8 Define every architecture as either VM_PHYSSEG_DENSE or
VM_PHYSSEG_SPARSE depending on whether the physical address space is
densely or sparsely populated with memory.  The effect of this
definition is to determine which of two implementations of
vm_page_array and PHYS_TO_VM_PAGE() is used.  The legacy
implementation is obtained by defining VM_PHYSSEG_DENSE, and a new
implementation that trades off time for space is obtained by defining
VM_PHYSSEG_SPARSE.  For now, all architectures except for ia64 and
sparc64 define VM_PHYSSEG_DENSE.  Defining VM_PHYSSEG_SPARSE on ia64
allows the entirety of my Itanium 2's memory to be used.  Previously,
only the first 1 GB could be used.  Defining VM_PHYSSEG_SPARSE on
sparc64 allows USIIIi-based systems to boot without crashing.

This change is a combination of Nathan Whitehorn's patch and my own
work in perforce.

Discussed with: kmacy, marius, Nathan Whitehorn
PR:		112194
2007-05-05 19:50:28 +00:00
Alan Cox
9f5c801b94 Change the way that unmanaged pages are created. Specifically,
immediately flag any page that is allocated to a OBJT_PHYS object as
unmanaged in vm_page_alloc() rather than waiting for a later call to
vm_page_unmanage().  This allows for the elimination of some uses of
the page queues lock.

Change the type of the kernel and kmem objects from OBJT_DEFAULT to
OBJT_PHYS.  This allows us to take advantage of the above change to
simplify the allocation of unmanaged pages in kmem_alloc() and
kmem_malloc().

Remove vm_page_unmanage().  It is no longer used.
2007-02-25 06:14:58 +00:00
Alan Cox
711585d087 Enable vm_page_free() and vm_page_free_zero() to be called on some pages
without the page queues lock being held, specifically, pages that are not
contained in a vm object and not a member of a page queue.
2007-02-18 05:54:42 +00:00
Alan Cox
ba000fb2c1 Remove a stale comment. Add punctuation to a nearby comment. 2007-02-17 19:37:00 +00:00
Alan Cox
d3d029bd62 Relax the page queue lock assertions in vm_page_remove() and
vm_page_free_toq() to account for recent changes that allow
vm_page_free_toq() to be called on some pages without the page queues lock
being held, specifically, pages that are not contained in a vm object and
not a member of a page queue.  (Examples of such pages include page table
pages, pv entry pages, and uma small alloc pages.)
2007-02-15 05:43:38 +00:00
Alan Cox
7d60988bad Avoid the unnecessary acquisition of the free page queues lock when a page
is actually being added to the hold queue, not the free queue.  At the same
time, avoid unnecessary tests to wake up threads waiting for free memory
and the idle thread that zeroes free pages.  (These tests will be performed
later when the page finally moves from the hold queue to the free queue.)
2007-02-14 07:05:55 +00:00
Alan Cox
5351a2488a Use the free page queue mutex instead of the page queue mutex to
synchronize sleeping and waking of the zero idle thread.
2007-02-11 05:18:40 +00:00
Alan Cox
e9f995d824 Change the pagedaemon, vm_wait(), and vm_waitpfault() to sleep on the
vm page queue free mutex instead of the vm page queue mutex.
2007-02-07 06:37:30 +00:00
Alan Cox
3ae3919d0b Change the free page queue lock from a spin mutex to a default (blocking)
mutex.  With the demise of Alpha support, there is no longer a reason for
it to be a spin mutex.
2007-02-05 06:02:55 +00:00
Kip Macy
35d10226b7 Remove the requirement that phys_avail be sorted in ascending order
by explicitly finding the lowest and highest addresses when calculating
the size of the vm_pages array

Reviewed by :alc
2006-12-08 08:44:47 +00:00
Alan Cox
49c3b92531 I misplaced the assertion that was added to vm_page_startup() in the
previous change.  Correct its placement.
2006-11-08 19:11:54 +00:00
Alan Cox
9ad3296a25 Simplify the construction of the free queues in vm_page_startup(). Add
an assertion to test a hypothesis concerning other redundant computation
in vm_page_startup().
2006-11-08 18:43:47 +00:00
Alan Cox
2a53696fb8 The page queues lock is no longer required by vm_page_busy() or
vm_page_wakeup().  Reduce or eliminate its use accordingly.
2006-10-22 21:18:48 +00:00
Alan Cox
9af80719db Replace PG_BUSY with VPO_BUSY. In other words, changes to the page's
busy flag, i.e., VPO_BUSY, are now synchronized by the per-vm object
lock instead of the global page queues lock.
2006-10-22 04:28:14 +00:00
Ken Smith
a9a5d47c85 Fix two minor style(9) nits in v1.313 which were noticed during an
MFC review.  alc@ will be MFCing V1.313 plus style fix to RELENG_6.
2006-09-29 00:20:56 +00:00
Alan Cox
eb4bbba83a Refactor vm_page_sleep_if_busy() so that the test for a busy page is
inlined and a procedure call is made in the rare case, i.e., when it is
necessary to sleep.  In this case, inlining the test actually makes the
kernel smaller.
2006-08-27 19:50:13 +00:00
Alan Cox
4f9d17d8ab Page flags are reset on (re)allocation. There is no need to clear any
flags except for PG_ZERO in vm_page_free_toq().
2006-08-21 00:34:31 +00:00
Alan Cox
b146f9e5d2 Reimplement the page's NOSYNC flag as an object-synchronized instead of a
page queues-synchronized flag.  Reduce the scope of the page queues lock in
vm_fault() accordingly.

Move vm_fault()'s call to vm_object_set_writeable_dirty() outside of the
scope of the page queues lock.  Reviewed by: tegge
Additionally, eliminate an unnecessary dereference in computing the
argument that is passed to vm_object_set_writeable_dirty().
2006-08-13 00:11:09 +00:00
Alan Cox
25017df472 Ensure that the page's new field for object-synchronized flags is always
initialized to zero.

Call vm_page_sleep_if_busy() instead of duplicating its implementation in
vm_page_grab().
2006-08-11 17:18:58 +00:00
Alan Cox
75db2abb2e Change vm_page_cowfault() so that it doesn't allocate a pre-busied page. 2006-08-10 04:48:29 +00:00
Alan Cox
5786be7cc7 Introduce a field to struct vm_page for storing flags that are
synchronized by the lock on the object containing the page.

Transition PG_WANTED and PG_SWAPINPROG to use the new field,
eliminating the need for holding the page queues lock when setting
or clearing these flags.  Rename PG_WANTED and PG_SWAPINPROG to
VPO_WANTED and VPO_SWAPINPROG, respectively.

Eliminate the assertion that the page queues lock is held in
vm_page_io_finish().

Eliminate the acquisition and release of the page queues lock
around calls to vm_page_io_finish() in kern_sendfile() and
vfs_unbusy_pages().
2006-08-09 17:43:27 +00:00
Alan Cox
e74814b66a Change vm_page_sleep_if_busy() so that it no longer requires the caller to
hold the page queues lock.
2006-08-06 00:15:40 +00:00
Alan Cox
91449ce98c When sleeping on a busy page, use the lock from the containing object
rather than the global page queues lock.
2006-08-03 23:56:11 +00:00
Alan Cox
78985e424a Complete the transition from pmap_page_protect() to pmap_remove_write().
Originally, I had adopted sparc64's name, pmap_clear_write(), for the
function that is now pmap_remove_write().  However, this function is more
like pmap_remove_all() than like pmap_clear_modify() or
pmap_clear_reference(), hence, the name change.

The higher-level rationale behind this change is described in
src/sys/amd64/amd64/pmap.c revision 1.567.  The short version is that I'm
trying to clean up and fix our support for execute access.

Reviewed by: marcel@ (ia64)
2006-08-01 19:06:06 +00:00
Alan Cox
af51d7bf57 Eliminate OBJ_WRITEABLE. It hasn't been used in a long time. 2006-07-21 06:40:29 +00:00
John Baldwin
9bdaa43379 Move the code to handle the vm.blacklist tunable up a layer into
vm_page_startup().  As a result, we now only lookup the tunable once
instead of looking it up once for every physical page of memory in the
system.  This cuts out about a 1 second or so delay in boot on x86
systems.  The delay is much larger and more noticable on sun4v apparently.

Reported by:	kmacy
MFC after:	1 week
2006-06-23 16:44:24 +00:00
Paul Saab
4cbb1c1aaa Fix minidumps to include pages allocated via pmap_map on amd64.
These pages are allocated from the direct map, and were not previous
tracked.  This included the vm_page_array and the early UMA bootstrap
pages.

Reviewed by:	peter
2006-05-31 22:55:23 +00:00
Peter Wemm
c0345a84aa Introduce minidumps. Full physical memory crash dumps are still available
via the debug.minidump sysctl and tunable.

Traditional dumps store all physical memory.  This was once a good thing
when machines had a maximum of 64M of ram and 1GB of kvm.  These days,
machines often have many gigabytes of ram and a smaller amount of kvm.
libkvm+kgdb don't have a way to access physical ram that is not mapped
into kvm at the time of the crash dump, so the extra ram being dumped
is mostly wasted.

Minidumps invert the process.  Instead of dumping physical memory in
in order to guarantee that all of kvm's backing is dumped, minidumps
instead dump only memory that is actively mapped into kvm.

amd64 has a direct map region that things like UMA use.  Obviously we
cannot dump all of the direct map region because that is effectively
an old style all-physical-memory dump.  Instead, introduce a bitmap
and two helper routines (dump_add_page(pa) and dump_drop_page(pa)) that
allow certain critical direct map pages to be included in the dump.
uma_machdep.c's allocator is the intended consumer.

Dumps are a custom format.  At the very beginning of the file is a header,
then a copy of the message buffer, then the bitmap of pages present in
the dump, then the final level of the kvm page table trees (2MB mappings
are expanded into a 4K page mappings), then the sparse physical pages
according to the bitmap.  libkvm can now conveniently access the kvm
page table entries.

Booting my test 8GB machine, forcing it into ddb and forcing a dump
leads to a 48MB minidump.  While this is a best case, I expect minidumps
to be in the 100MB-500MB range.  Obviously, never larger than physical
memory of course.

minidumps are on by default.  It would want be necessary to turn them off
if it was necessary to debug corrupt kernel page table management as that
would mess up minidumps as well.

Both minidumps and regular dumps are supported on the same machine.
2006-04-21 04:24:50 +00:00
Warner Losh
62a59e8f0d Remove leading __ from __(inline|const|signed|volatile). They are
obsolete.  This should reduce diffs to NetBSD as well.
2006-03-08 06:31:46 +00:00
Stephan Uphoff
224409590d When the VM needs to allocated physical memory pages (for non interrupt use)
and it has not plenty of free pages it tries to free pages in the cache queue.
Unfortunately freeing a cached page requires the locking of the object that
owns the page. However in the context of allocating pages we may not be able
to lock the object and thus can only TRY to lock the object. If the locking try
fails the cache page can not be freed and is activated to move it out of the way
so that we may try to free other cache pages.

If all pages in the cache belong to objects that are currently locked the
cache queue can be emptied without freeing a single page. This scenario caused
two problems:

    1)  vm_page_alloc always failed allocation when it tried freeing pages from
        the cache queue and failed to do so. However if there are more than
        cnt.v_interrupt_free_min pages on the free list it should return pages
        when requested with priority VM_ALLOC_SYSTEM. Failure to do so can cause
        resource exhaustion deadlocks.

    2)  Threads than need to allocate pages spend a lot of time cleaning up the
        page queue without really getting anything done while the pagedaemon
         needs to work overtime to refill the cache.

This change fixes the first problem. (1)

Reviewed by:	tegge@
2006-02-15 22:29:53 +00:00
Alan Cox
6c237adcea Change #if defined(DIAGNOSTIC) to KASSERT. 2006-01-31 19:06:51 +00:00
Alan Cox
fc3c1bc471 In vm_page_set_invalid() invalidate all of the page's mappings as soon as
any part of the page's contents is invalidated.

Submitted by: tegge
2006-01-24 07:21:38 +00:00
Alexander Leidinger
ef39c05baa MI changes:
- provide an interface (macros) to the page coloring part of the VM system,
   this allows to try different coloring algorithms without the need to
   touch every file [1]
 - make the page queue tuning values readable: sysctl vm.stats.pagequeue
 - autotuning of the page coloring values based upon the cache size instead
   of options in the kernel config (disabling of the page coloring as a
   kernel option is still possible)

MD changes:
 - detection of the cache size: only IA32 and AMD64 (untested) contains
   cache size detection code, every other arch just comes with a dummy
   function (this results in the use of default values like it was the
   case without the autotuning of the page coloring)
 - print some more info on Intel CPU's (like we do on AMD and Transmeta
   CPU's)

Note to AMD owners (IA32 and AMD64): please run "sysctl vm.stats.pagequeue"
and report if the cache* values are zero (= bug in the cache detection code)
or not.

Based upon work by:	Chad David <davidc@acns.ab.ca> [1]
Reviewed by:		alc, arch (in 2004)
Discussed with:		alc, Chad David, arch (in 2004)
2005-12-31 14:39:20 +00:00
Alan Cox
984922d761 Assert that the page that is given to vm_page_free_toq() does not have any
managed mappings.
2005-12-13 19:59:09 +00:00
Alan Cox
7e9d944218 If a physical page is mapped by two or more virtual addresses, transmitted
by the zero-copy sockets method, and written to before the transmission
completes, we need to destroy all of the existing mappings to the page,
not just the one that we fault on.  Otherwise, the mappings will no longer
be to the same page and changes made through one of the mappings will not
be visible through the others.

Observed by: tegge
2005-11-08 06:33:21 +00:00
Alan Cox
674b706ea0 Consider the zero-copy transmission of a page that was wired by mlock(2).
If a copy-on-write fault occurs on the page, the new copy should inherit
a part of the original page's wire count.

Submitted by: tegge
MFC after: 1 week
2005-11-01 04:30:21 +00:00
Dag-Erling Smørgrav
3803b26bae As alc pointed out to me, vm_page.c 1.305 was incomplete: uma_startup()
still uses the constant UMA_BOOT_PAGES.  Change it to accept boot_pages
as an additional argument.

MFC after:	2 weeks
2005-10-08 21:03:54 +00:00
Dag-Erling Smørgrav
cfa22bcc4c Introduce the vm.boot_pages tunable and sysctl, which controls the number
of pages reserved to bootstrap the kernel memory allocator.

MFC after:	2 weeks
2005-08-12 12:24:19 +00:00
Jeff Roberson
761dbeb66f - In vm_page_insert() hold the backing vnode when the first page
is inserted.
 - In vm_page_remove() drop the backing vnode when the last page
   is removed.
 - Don't check the vnode to see if it must be reclaimed on every
   call to vm_page_free_toq() as we only check it now when it is
   actually required.  This saves us two lock operations per call.

Sponsored by:	Isilon Systems, Inc.
2005-03-15 14:14:09 +00:00
Alan Cox
46fbc58202 Transfer responsibility for freeing the page taken from the cache
queue and (possibly) unlocking the containing object from
vm_page_alloc() to vm_page_select_cache().  Recent optimizations to
vm_map_pmap_enter() (see vm_map.c revisions 1.362 and 1.363) and
pmap_enter_quick() have resulted in panic()s because vm_page_alloc()
mistakenly unlocked objects that had not been locked by
vm_page_select_cache().

Reported by: Peter Holm and Kris Kennaway
2005-01-07 05:02:19 +00:00
Warner Losh
60727d8b86 /* -> /*- for license, minor formatting changes 2005-01-07 02:29:27 +00:00
Alan Cox
0869d38ba6 Assert that page allocations during an interrupt specify
VM_ALLOC_INTERRUPT.

Assert that pages removed from the cache queue are not busy.
2004-12-31 19:50:45 +00:00
Alan Cox
7aa2190c8e Access to the page's busy field is (now) synchronized by the containing
object's lock.  Therefore, the assertion that the page queues lock is held
can be removed from vm_page_io_start().
2004-12-29 04:18:22 +00:00
Alan Cox
40198b3c04 Assert that the vm object is locked on entry to vm_page_sleep_if_busy();
remove some unneeded code.
2004-12-26 21:46:44 +00:00
Alan Cox
d19ef81437 The synchronization provided by vm object locking has eliminated the
need for most calls to vm_page_busy().  Specifically, most calls to
vm_page_busy() occur immediately prior to a call to vm_page_remove().
In such cases, the containing vm object is locked across both calls.
Consequently, the setting of the vm page's PG_BUSY flag is not even
visible to other threads that are following the synchronization
protocol.

This change (1) eliminates the calls to vm_page_busy() that
immediately precede a call to vm_page_remove() or functions, such as
vm_page_free() and vm_page_rename(), that call it and (2) relaxes the
requirement in vm_page_remove() that the vm page's PG_BUSY flag is
set.  Now, the vm page's PG_BUSY flag is set only when the vm object
lock is released while the vm page is still in transition.  Typically,
this is when it is undergoing I/O.
2004-11-03 20:17:31 +00:00
Alan Cox
f4d49654ae Assert that the containing vm object is locked in vm_page_cache() and
vm_page_try_to_cache().
2004-10-28 05:26:21 +00:00
Alan Cox
63bb7041cc Assert that the containing vm object is locked in vm_page_flash(). 2004-10-25 19:52:44 +00:00
Alan Cox
75d0533847 Assert that the containing vm object is locked in vm_page_busy() and
vm_page_wakeup().
2004-10-24 23:53:47 +00:00
Alan Cox
0f9f9bcb53 Introduce VM_ALLOC_NOBUSY, an option to vm_page_alloc() and vm_page_grab()
that indicates that the caller does not want a page with its busy flag set.
In many places, the global page queues lock is acquired and released just
to clear the busy flag on a just allocated page.  Both the allocation of
the page and the clearing of the busy flag occur while the containing vm
object is locked.  So, the busy flag might as well never be set.
2004-10-24 06:15:36 +00:00
Alan Cox
1e96d2a217 Correct two errors in PG_BUSY management by vm_page_cowfault(). Both
errors are in rarely executed paths.
1. Each time the retry_alloc path is taken, the PG_BUSY must be set again.
   Otherwise vm_page_remove() panics.
2. There is no need to set PG_BUSY on the newly allocated page before
   freeing it.  The page already has PG_BUSY set by vm_page_alloc().
   Setting it again could cause an assertion failure.

MFC after: 2 weeks
2004-10-18 08:11:59 +00:00
Alan Cox
36aeb90e34 Assert that the containing object is locked in vm_page_io_start() and
vm_page_io_finish().  The motivation being to transition synchronization of
the vm_page's busy field from the global page queues lock to the per-object
lock.
2004-10-17 22:33:40 +00:00
Poul-Henning Kamp
7ce1979be6 Add new a function isa_dma_init() which returns an errno when it fails
and which takes a M_WAITOK/M_NOWAIT flag argument.

Add compatibility isa_dmainit() macro which whines loudly if
isa_dma_init() fails.

Problem uncovered by:	tegge
2004-09-15 12:09:50 +00:00
Alan Cox
a087914310 Advance the state of pmap locking on alpha, amd64, and i386.
- Enable recursion on the page queues lock.  This allows calls to
   vm_page_alloc(VM_ALLOC_NORMAL) and UMA's obj_alloc() with the page
   queues lock held.  Such calls are made to allocate page table pages
   and pv entries.
 - The previous change enables a partial reversion of vm/vm_page.c
   revision 1.216, i.e., the call to vm_page_alloc() by vm_page_cowfault()
   now specifies VM_ALLOC_NORMAL rather than VM_ALLOC_INTERRUPT.
 - Add partial locking to pmap_copy().  (As a side-effect, pmap_copy()
   should now be faster on i386 SMP because it no longer generates IPIs
   for TLB shootdown on the other processors.)
 - Complete the locking of pmap_enter() and pmap_enter_quick().  (As of now,
   all changes to a user-level pmap on alpha, amd64, and i386 are performed
   with appropriate locking.)
2004-07-29 18:56:31 +00:00
Brian Feldman
d951b75210 Fix a race in vm_page_sleep_if_busy(). Due to vm_object locking
being incomplete, it currently has to know how to drop and pick back
up the vm_object's mutex if it has to sleep and drop the page queue
mutex.  The problem with this is that if the page is busy, while we
are sleeping, the page can be freed and object disappear.  When trying
to lock m->object, we'd get a stale or NULL pointer and crash.

The object is now cached, but this makes the assumption that
the object is referenced in some manner and will not itself
disappear while it is unlocked.  Since this only happens if
the object is locked, I had to remove an assumption earlier in
contigmalloc() that reversed the order of locking the object and
doing vm_page_sleep_if_busy(), not the normal order.
2004-07-21 23:56:09 +00:00
Alan Cox
e832aafc51 - Eliminate the pte object from the pmap. Instead, page table pages are
allocated as "no object" pages.  Similar changes were made to the amd64
   and i386 pmap last year.  The primary reason being that maintaining
   a pte object leads to lock order violations.  A secondary reason being
   that the pte object is redundant, i.e., the page table itself can be
   used to lookup page table pages.  (Historical note: The pte object
   predates our ability to allocate "no object" pages.  Thus, the pte
   object was a necessary evil.)
 - Unconditionally check the vm object lock's status in vm_page_remove().
   Previously, this assertion could not be made on Alpha due to its use
   of a pte object.
2004-07-19 18:12:04 +00:00
Alan Cox
790bdd0f2e Increase the scope of the page queues lock in vm_page_alloc() to cover
a diagnostic check that accesses the cache queue count.
2004-07-10 22:12:49 +00:00
Alan Cox
0a2df4773c Remove spl() calls. Update comments to reflect the removal of spl() calls.
Remove '\n' from panic() format strings.  Remove some blank lines.
2004-06-19 04:19:47 +00:00
Alan Cox
d45f21f31a Do not preset PG_BUSY on VM_ALLOC_NOOBJ pages. Such pages are not
accessible through an object.  Thus, PG_BUSY serves no purpose.
2004-06-17 06:16:58 +00:00
Alan Cox
4be14af9cf To date, unwiring a fictitious page has produced a panic. The reason
being that PHYS_TO_VM_PAGE() returns the wrong vm_page for fictitious
pages but unwiring uses PHYS_TO_VM_PAGE().  The resulting panic
reported an unexpected wired count.  Rather than attempting to fix
PHYS_TO_VM_PAGE(), this fix takes advantage of the properties of
fictitious pages.  Specifically, fictitious pages will never be
completely unwired.  Therefore, we can keep a fictitious page's wired
count forever set to one and thereby avoid the use of
PHYS_TO_VM_PAGE() when we know that we're working with a fictitious
page, just not which one.

In collaboration with: green@, tegge@
PR: kern/29915
2004-05-22 04:53:51 +00:00
Alan Cox
1bb816d3d1 Restructure vm_page_select_cache() so that adding assertions is easy.
Some of the conditions that caused vm_page_select_cache() to deactivate a
page were wrong.  For example, deactivating an unmanaged or wired page is a
nop.  Thus, if vm_page_select_cache() had ever encountered an unmanaged or
wired page, it would have looped forever.  Now, we assert that the page is
neither unmanaged nor wired.
2004-05-12 04:27:18 +00:00
Alan Cox
3f39cca96b Cache queue pages are not mapped. Thus, the pmap_remove_all() by
vm_page_alloc() is unnecessary.
2004-05-09 01:00:15 +00:00
Alan Cox
2ec91846fd Update the comment describing vm_page_grab() to reflect the previous
revision and correct some of its style errors.
2004-04-24 21:36:23 +00:00
Alan Cox
7ef6ba5d27 Push down the responsibility for zeroing a physical page from the
caller to vm_page_grab().  Although this gives VM_ALLOC_ZERO a
different meaning for vm_page_grab() than for vm_page_alloc(), I feel
such change is necessary to accomplish other goals.  Specifically, I
want to make the PG_ZERO flag immutable between the time it is
allocated by vm_page_alloc() and freed by vm_page_free() or
vm_page_free_zero() to avoid locking overheads.  Once we gave up on
the ability to automatically recognize a zeroed page upon entry to
vm_page_free(), the ability to mutate the PG_ZERO flag became useless.
Instead, I would like to say that "Once a page becomes valid, its
PG_ZERO flag must be ignored."
2004-04-24 20:53:55 +00:00
Warner Losh
05eb3785e7 Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999.

Approved by: core
2004-04-06 20:15:37 +00:00
Alan Cox
889eb0fc62 Eliminate unused arguments from vm_page_startup(). 2004-04-04 23:33:36 +00:00
Alan Cox
ca3b447732 Modify contigmalloc1() so that the free page queues lock is not held when
vm_page_free() is called.  The problem with holding this lock is that it is
a spin lock and vm_page_free() may attempt the acquisition of a different
default-type lock.
2004-03-02 08:25:58 +00:00
Alan Cox
0f75a97722 - Correct a long-standing race condition in vm_page_try_to_free() that
could result in a dirty page being unintentionally freed.
 - Simplify the dirty page check in vm_page_dontneed().

Reviewed by:	tegge
MFC after:	7 days
2004-02-19 07:43:55 +00:00
Alan Cox
84d98bf699 - Correct a long-standing race condition in vm_page_try_to_cache() that
could result in a panic "vm_page_cache: caching a dirty page, ...":
   Access to the page must be restricted or removed before calling
   vm_page_cache().  This race condition is identical in nature to that
   which was addressed by vm_pageout.c's revision 1.251.
 - Simplify the code surrounding the fix to this same race condition
   in vm_pageout.c's revision 1.251.  There should be no behavioral
   change.  Reviewed by: tegge

MFC after:	7 days
2004-02-14 08:54:37 +00:00
Alan Cox
65bae14d77 - Enable recursive acquisition of the mutex synchronizing access to the
free pages queue.  This is presently needed by contigmalloc1().
 - Move a sanity check against attempted double allocation of two pages
   to the same vm object offset from vm_page_alloc() to vm_page_insert().
   This provides better protection because double allocation could occur
   through a direct call to vm_page_insert(), such as that by
   vm_page_rename().
 - Modify contigmalloc1() to hold the mutex synchronizing access to the
   free pages queue while it scans vm_page_array in search of free pages.
 - Correct a potential leak of pages by contigmalloc1() that I introduced
   in revision 1.20: We must convert all cache queue pages to free pages
   before we begin removing free pages from the free queue.  Otherwise,
   if we have to restart the scan because we are unable to acquire the
   vm object lock that is necessary to convert a cache queue page to a
   free page, we leak those free pages already removed from the free queue.
2004-01-08 20:48:26 +00:00
Alan Cox
4804edb44f In vm_page_lookup() check the root of the vm object's splay tree for the
desired page before calling vm_page_splay().
2003-12-31 19:02:01 +00:00
Alan Cox
bcdaad7fe7 Simplify vm_page_grab(): Don't bother with the generation check. If the
vm object hasn't changed, the desired page will be at or near the root
of the vm object's splay tree, making vm_page_lookup() cheap.  (The only
lock required for vm_page_lookup() is already held.)  If, however, the
vm object has changed and retry was requested, eliminating the generation
check also eliminates a pointless acquisition and release of the page
queues lock.
2003-12-31 01:44:45 +00:00
Alan Cox
9582cd94cb - Create an unmapped guard page to trap access to vm_page_array[-1].
This guard page would have trapped the problems with the MFC of the PAE
   support to RELENG_4 at an earlier point in the sequence of events.

Submitted by:	tegge
2003-12-22 02:04:08 +00:00
Alan Cox
de33beddd5 - Additional vm object locking in vm_object_split()
- New vm object locking assertions in vm_page_insert() and
   vm_object_set_writeable_dirty()
2003-11-01 04:54:23 +00:00
Alan Cox
ab42316c2f - Retire vm_pageout_page_free(). Instead, use vm_page_select_cache() from
vm_pageout_scan().  Rationale: I don't like leaving a busy page in the
   cache queue with neither the vm object nor the vm page queues lock held.
 - Assert that the page is active in vm_pageout_page_stats().
2003-10-22 18:41:32 +00:00
Alan Cox
0d42c05ff4 - Assert that the containing vm object is locked in
vm_page_set_validclean().  (This function reads and modifies the
   vm page's valid field, which is synchronized by the lock on the
   containing vm object.)
2003-10-21 19:36:51 +00:00
Alan Cox
fee181a696 - Remove some long unused code. 2003-10-20 18:57:01 +00:00
Alan Cox
669890eaeb Retire vm_page_copy(). Its reason for being ended when peter@ modified
pmap_copy_page() et al. to accept a vm_page_t rather than a physical
address.  Also, this change will facilitate locking access to the vm page's
valid field.
2003-10-08 05:35:12 +00:00
Alan Cox
5a3970febf Assert that the containing vm object's lock is held in
vm_page_set_invalid().
2003-10-05 06:58:07 +00:00
Alan Cox
874f526de6 Assert that the containing vm object's lock is held in
vm_page_zero_invalid().
2003-10-04 21:56:27 +00:00
Alan Cox
bf0da100d6 - Extend the scope the vm object lock to cover calls to
vm_page_is_valid().
 - Assert that the lock on the containing vm object is held in
   vm_page_is_valid().
2003-10-04 19:23:29 +00:00
Alan Cox
50028aa7d2 In vm_page_remove(), assert that the vm object is locked, unless an Alpha.
(The Alpha still requires updates to its pmap.)
2003-09-28 04:50:48 +00:00
Alan Cox
95aad59a53 Initialize the page's pindex field even for VM_ALLOC_NOOBJ allocations.
(This field is useful for implementing sanity checks even if the page does
not belong to an object.)
2003-09-22 00:56:13 +00:00
Alan Cox
2370c6d40c Recent pmap changes permit the use of a more precise locking assertion
in vm_page_lookup().
2003-08-28 23:23:04 +00:00
Alan Cox
529e15ed69 Held pages, just like wired pages, should not be added to the cache queues.
Submitted by:	tegge
2003-08-23 20:29:29 +00:00
Alan Cox
b7ad744dc5 Hold the page queues lock when performing vm_page_clear_dirty() and
vm_page_set_invalid().
2003-08-23 18:11:53 +00:00
Alan Cox
0f132ba697 Assert that the vm object's lock is held on entry to vm_page_grab(); remove
code from this function that was needed when vm object locking was
incomplete.
2003-08-21 20:59:07 +00:00
Alan Cox
891c1d4bd3 Assert that the vm object lock is held in vm_page_alloc(). 2003-08-20 20:24:29 +00:00
Alan Cox
c53e8c5654 Modify vm_page_alloc() and vm_page_select_cache() to allow the page that
is returned by vm_page_select_cache() to belong to the object that is
already locked by the caller to vm_page_alloc().
2003-07-01 07:33:41 +00:00
Alan Cox
baaaadf125 - Use an int rather than a vm_pindex_t to represent the desired page
color in vm_page_alloc().  (This also has small performance benefits.)
 - Eliminate vm_page_select_free(); vm_page_alloc() might as well
   call vm_pageq_find() directly.
2003-06-28 07:58:10 +00:00
Alan Cox
9f2b1758c3 vm_page_select_cache() enforces a number of conditions on the returned
page.  Add the ability to lock the containing object to those conditions.
2003-06-26 15:44:03 +00:00
Alan Cox
f29ba63ec9 Maintain a lock on the vm object of interest throughout vm_fault(),
releasing the lock only if we are about to sleep (e.g., vm_pager_get_pages()
or vm_pager_has_pages()).  If we sleep, we have marked the vm object with
the paging-in-progress flag.
2003-06-22 21:35:41 +00:00
Alan Cox
37681d8642 Assert that the vm object is locked in vm_page_try_to_free(). 2003-06-19 01:50:14 +00:00
David E. O'Brien
874651b13c Use __FBSDID(). 2003-06-11 23:50:51 +00:00
Alan Cox
36d1fdf5a2 Teach vm_page_grab() how to handle the vm object's lock. 2003-06-07 23:22:04 +00:00
Alan Cox
5299887de5 - Relax the Giant required in vm_page_remove().
- Remove the Giant required from vm_page_free_toq().  (Any locking
   errors will be caught by vm_page_remove().)

This remedies a panic that occurred when kmem_malloc(NOWAIT) performed
without Giant failed to allocate the necessary pages.

Reported by:	phk
2003-04-25 06:35:05 +00:00
Alan Cox
2e9d00a15d Revision 1.246 should have also included
- Weaken the assertion in vm_page_insert() to require Giant only if the
   vm_object isn't locked.

Reported by:	 "Ilmar S. Habibulin" <ilmar@watson.org>
2003-04-22 14:26:02 +00:00
Alan Cox
03d4c1e644 Revision 1.52 of vm/uma_core.c has led to UMA's obj_alloc() being
called without Giant; and obj_alloc() in turn calls vm_page_alloc()
without Giant.  This causes an assertion failure in vm_page_alloc().
Fortunately, obj_alloc() is now MPSAFE.  So, we need only clean up
some assertions.

 - Weaken the assertion in vm_page_lookup() to require Giant only
   if the vm_object isn't locked.
 - Remove an assertion from vm_page_alloc() that duplicates a check
   performed in vm_page_lookup().

In collaboration with:	gallatin, jake, jeff
2003-04-22 05:36:14 +00:00
John Baldwin
d8fed0f0f2 - Kill the pv_flags member of the alpha mdpage since it stop being used
in rev 1.61 of pmap.c.
- Now that pmap_page_is_free() is empty and since it is just a hack for
  the Alpha pmap, remove it.
2003-04-10 18:42:06 +00:00
Jake Burkholder
227f9a1c58 - Add vm_paddr_t, a physical address type. This is required for systems
where physical addresses larger than virtual addresses, such as i386s
  with PAE.
- Use this to represent physical addresses in the MI vm system and in the
  i386 pmap code.  This also changes the paddr parameter to d_mmap_t.
- Fix printf formats to handle physical addresses >4G in the i386 memory
  detection code, and due to kvtop returning vm_paddr_t instead of u_long.

Note that this is a name change only; vm_paddr_t is still the same as
vm_offset_t on all currently supported platforms.

Sponsored by:	DARPA, Network Associates Laboratories
Discussed with:	re, phk (cdevsw change)
2003-03-25 00:07:06 +00:00
Maxime Henrion
dab392a4d4 Remove an empty comment. 2003-03-19 00:34:43 +00:00
Jake Burkholder
9f77ba59c5 Subtract the memory that backs the vm_page structures from phys_avail
after mapping it.  This makes it possible to determine if a physical
page has a backing vm_page or not.
2003-03-17 03:16:00 +00:00
Alan Cox
1a1e9f41e5 Teach vm_page_sleep_if_busy() to release the vm_object lock before sleeping. 2003-03-01 19:16:32 +00:00
Alan Cox
3fa24ec9f1 In vm_page_dirty(), assert that the page is not in the free queue(s). 2003-02-24 17:30:45 +00:00
Alan Cox
e6f2748cbc - Convert the tsleep()s in vm_wait() and vm_waitpfault() to msleep()s
with the page queue lock.
 - Assert that the page queue lock is held in vm_page_free_wakeup().
2003-02-01 21:18:16 +00:00
Alan Cox
28ec30cd9f - Hold the page queues lock around vm_page_hold().
- Assert that the page queues lock rather than Giant is held in
   vm_page_hold().
2003-01-20 09:24:03 +00:00
Alan Cox
b0ef8c5fe4 - Update vm_pageout_deficit using atomic operations. It's a simple
counter outside the scope of existing locks.
 - Eliminate a redundant clearing of vm_pageout_deficit.
2003-01-14 06:57:03 +00:00
Alan Cox
a15700fe32 Make vm_page_alloc() return PG_ZERO only if VM_ALLOC_ZERO is specified.
The objective being to eliminate some cases of page queues locking.
(See, for example, vm/vm_fault.c revision 1.160.)

Reviewed by:	tegge

(Also, pointed out by tegge that I changed vm_fault.c before changing
vm_page.c.  Oops.)
2003-01-12 23:32:46 +00:00
Alan Cox
b5dc830507 In vm_page_alloc(), fuse two if statements that are conditioned on the same
expression.
2003-01-11 20:07:17 +00:00
Alan Cox
9a032278bd In vm_page_alloc(), honor VM_ALLOC_ZERO for system and interrupt class
requests when the number of free pages is below the reserved threshold.
Previously, VM_ALLOC_ZERO was only honored when the number of free pages
was above the reserved threshold.  Honoring it in all cases generally
makes sense, does no harm, and simplifies the code.
2003-01-08 19:58:42 +00:00
Alan Cox
6c4952c7b4 Use atomic add and subtract to update the global wired page count,
cnt.v_wire_count.
2003-01-05 01:31:45 +00:00
Alan Cox
009f3e7a1e Refine the assertions in vm_page_alloc(). 2003-01-04 19:07:13 +00:00
Alan Cox
d61e1287a4 Update the assertions in vm_page_insert() and vm_page_lookup() to reflect
locking of the kmem_object.
2003-01-01 19:45:36 +00:00
Alan Cox
a28cc55e5b Reduce the number of times that we acquire and release the page queues
lock by making vm_page_rename()'s caller, rather than vm_page_rename(),
responsible for acquiring it.
2002-12-29 07:17:06 +00:00
Alan Cox
2ee5fea7d3 Assert that the page queues lock rather than Giant is held in
vm_page_flag_clear().
2002-12-28 22:49:37 +00:00
Alan Cox
24c9ad6bed - Remove vm_page_sleep_busy(). The transition to vm_page_sleep_if_busy(),
which incorporates page queue and field locking, is complete.
 - Assert that the page queue lock rather than Giant is held in
   vm_page_flag_set().
2002-12-19 07:23:46 +00:00
Alan Cox
495bedfbd0 Assert that the page queues lock is held in vm_page_unhold(),
vm_page_remove(), and vm_page_free_toq().
2002-12-15 00:06:02 +00:00
Alan Cox
178949e021 Hold the page queues/flags lock when calling vm_page_set_validclean().
Approved by:	re
2002-11-23 19:10:31 +00:00
Alan Cox
a12cc0e489 Remove vm_page_protect(). Instead, use pmap_page_protect() directly. 2002-11-18 04:05:22 +00:00
Alan Cox
4fec79bef8 Now that pmap_remove_all() is exported by our pmap implementations
use it directly.
2002-11-16 07:44:25 +00:00
Alan Cox
d154fb4fe6 When prot is VM_PROT_NONE, call pmap_page_protect() directly rather than
indirectly through vm_page_protect().  The one remaining page flag that
is updated by vm_page_protect() is already being updated by our various
pmap implementations.

Note: A later commit will similarly change the VM_PROT_READ case and
eliminate vm_page_protect().
2002-11-10 07:12:04 +00:00
Alan Cox
1f7c5f98d7 In vm_page_remove(), avoid calling vm_page_splay() if the object's memq
is empty.
2002-11-09 08:27:42 +00:00
Alan Cox
ada2a050be Export the function vm_page_splay(). 2002-11-04 19:21:39 +00:00
Alan Cox
c71f01affe - Remove the memory allocation for the object/offset hash table
because it's no longer used.  (See revision 1.215.)
 - Fix a harmless bug: the number of vm_page structures allocated wasn't
   properly adjusted when uma_bootstrap() was introduced.  Consequently,
   we were allocating 30 unused vm_page structures.
 - Wrap a long line.
2002-11-03 22:20:42 +00:00
Alan Cox
02af9de6fc Remove the vm page buckets mutex. As of revision 1.215 of vm/vm_page.c,
it is unused.
2002-11-02 22:39:30 +00:00
Jeff Roberson
026aa839a4 - Add a new flag to vm_page_alloc, VM_ALLOC_NOOBJ. This tells
vm_page_alloc not to insert this page into an object.  The pindex is
   still used for colorization.
 - Rework vm_page_select_* to accept a color instead of an object and
   pindex to work with VM_PAGE_NOOBJ.
 - Document other VM_ALLOC_ flags.

Reviewed by:	peter, jake
2002-11-01 00:59:03 +00:00
Alan Cox
f3b676f0ad o Reinline vm_page_undirty(), reducing the kernel size. (This reverts
a part of vm_page.h revision 1.87 and vm_page.c revision 1.167.)
2002-10-20 19:57:55 +00:00
Alan Cox
f4ecdf056e Complete the page queues locking needed for the page-based copy-
on-write (COW) mechanism.  (This mechanism is used by the zero-copy
TCP/IP implementation.)
 - Extend the scope of the page queues lock in vm_fault()
   to cover vm_page_cowfault().
 - Modify vm_page_cowfault() to release the page queues lock
   if it sleeps.
2002-10-19 18:34:39 +00:00
Matthew Dillon
b86ec922be Replace the vm_page hash table with a per-vmobject splay tree. There should
be no major change in performance from this change at this time but this
will allow other work to progress:  Giant lock removal around VM system
in favor of per-object mutexes, ranged fsyncs, more optimal COMMIT rpc's for
NFS, partial filesystem syncs by the syncer, more optimal object flushing,
etc.  Note that the buffer cache is already using a similar splay tree
mechanism.

Note that a good chunk of the old hash table code is still in the tree.
Alan or I will remove it prior to the release if the new code does not
introduce unsolvable bugs, else we can revert more easily.

Submitted by:	alc	(this is Alan's code)
Approved by:	re
2002-10-18 17:24:30 +00:00
Alan Cox
8a59b15cd4 o Synchronize updates to struct vm_page::cow with the page queues lock. 2002-09-02 04:04:12 +00:00
Alan Cox
fff6062ab6 o Retire vm_page_zero_fill() and vm_page_zero_fill_area(). Ever since
pmap_zero_page() and pmap_zero_page_area() were modified to accept
   a struct vm_page * instead of a physical address, vm_page_zero_fill()
   and vm_page_zero_fill_area() have served no purpose.
2002-08-25 00:22:31 +00:00
Alan Cox
60582cbe6d o Assert that the page queues lock is held in vm_page_activate(). 2002-08-11 00:21:40 +00:00
Alan Cox
db44450b11 o Remove the setting and clearing of the PG_MAPPED flag. (This flag is
obsolete.)
2002-08-10 07:11:16 +00:00
Alan Cox
06ec58b740 o Use pmap_page_is_mapped() in vm_page_protect() rather than the PG_MAPPED
flag.  (This is the only place in the entire kernel where the PG_MAPPED
   flag is tested.  It will be removed soon.)
2002-08-08 19:12:36 +00:00
Alan Cox
24c28f1ad6 o Acquire the page queues lock before checking the page's busy status
in vm_page_grab().  Also, replace the nearby tsleep() with an msleep()
   on the page queues lock.
2002-08-04 19:05:20 +00:00
Jeff Roberson
e6e370a7fe - Replace v_flag with v_iflag and v_vflag
- v_vflag is protected by the vnode lock and is used when synchronization
   with VOP calls is needed.
 - v_iflag is protected by interlock and is used for dealing with vnode
   management issues.  These flags include X/O LOCK, FREE, DOOMED, etc.
 - All accesses to v_iflag and v_vflag have either been locked or marked with
   mp_fixme's.
 - Many ASSERT_VOP_LOCKED calls have been added where the locking was not
   clear.
 - Many functions in vfs_subr.c were restructured to provide for stronger
   locking.

Idea stolen from:	BSD/OS
2002-08-04 10:29:36 +00:00
Alan Cox
aa9b1d9412 o Remove the setting of PG_MAPPED from vm_page_wire() and
vm_page_alloc(VM_ALLOC_WIRED).
2002-08-03 01:29:52 +00:00
Alan Cox
1e7ce68ff4 o Lock page queue accesses in nwfs and smbfs.
o Assert that the page queues lock is held in vm_page_deactivate().
2002-08-02 05:23:58 +00:00
Alan Cox
46086ddf91 o Acquire the page queues lock before calling vm_page_io_finish().
o Assert that the page queues lock is held in vm_page_io_finish().
2002-08-01 17:57:42 +00:00
Alan Cox
67c1fae92e o Lock page accesses by vm_page_io_start() with the page queues lock.
o Assert that the page queues lock is held in vm_page_io_start().
2002-07-31 07:27:08 +00:00
Alan Cox
e5f8bd9418 o Introduce vm_page_sleep_if_busy() as an eventual replacement for
vm_page_sleep_busy().  vm_page_sleep_if_busy() uses the page
   queues lock.
2002-07-29 19:41:22 +00:00
Alan Cox
2c071f61f0 o Modify vm_page_grab() to accept VM_ALLOC_WIRED. 2002-07-28 23:46:19 +00:00
Alan Cox
2999e9faca o Lock page queue accesses by vm_page_dontneed().
o Assert that the page queue lock is held in vm_page_dontneed().
2002-07-23 04:39:48 +00:00
Alan Cox
40eab1e944 o Lock page queue accesses by vm_page_try_to_cache(). (The accesses
in kern/vfs_bio.c are already locked.)
 o Assert that the page queues lock is held in vm_page_try_to_cache().
2002-07-20 20:58:46 +00:00
Alan Cox
d82efd2956 o Assert that the page queues lock is held in vm_page_try_to_free(). 2002-07-20 20:12:57 +00:00