Commit Graph

618 Commits

Author SHA1 Message Date
jchandra
10dfd55de4 Redo the page table page allocation on MIPS, as suggested by
alc@.

The UMA zone based allocation is replaced by a scheme that creates
a new free page list for the KSEG0 region, and a new function
in sys/vm that allocates pages from a specific free page list.

This also fixes a race condition introduced by the UMA based page table
page allocation code. Dropping the page queue and pmap locks before
the call to uma_zfree, and re-acquiring them afterwards  will introduce
a race condtion(noted by alc@).

The changes are :
- Revert the earlier changes in MIPS pmap.c that added UMA zone for
page table pages.
- Add a new freelist VM_FREELIST_HIGHMEM to MIPS vmparam.h for memory that
is not directly mapped (in 32bit kernel). Normal page allocations will first
try the HIGHMEM freelist and then the default(direct mapped) freelist.
- Add a new function 'vm_page_t vm_page_alloc_freelist(int flind, int
order, int req)' to vm/vm_page.c to allocate a page from a specified
freelist. The MIPS page table pages will be allocated using this function
from the freelist containing direct mapped pages.
- Move the page initialization code from vm_phys_alloc_contig() to a
new function vm_page_alloc_init(), and use this function to initialize
pages in vm_page_alloc_freelist() too.
- Split the  function vm_phys_alloc_pages(int pool, int order) to create
vm_phys_alloc_freelist_pages(int flind, int pool, int order), and use
this function from both vm_page_alloc_freelist() and vm_phys_alloc_pages().

Reviewed by:	alc
2010-07-21 09:27:00 +00:00
alc
7c09dc242c Add support for the VM_ALLOC_COUNT() hint to vm_page_alloc(). Consequently,
the maintenance of vm_pageout_deficit can be localized to just two places:
vm_page_alloc() and vm_pageout_scan().

This change also corrects an off-by-one error in the maintenance of
vm_pageout_deficit.  Historically, the buffer cache functions, allocbuf()
and vm_hold_load_pages(), have not taken into account that vm_page_alloc()
already increments vm_pageout_deficit by one.

Reviewed by:	kib
2010-07-09 19:38:30 +00:00
kib
3cf9fcd59a Make VM_ALLOC_RETRY flag mandatory for vm_page_grab(). Assert that the
flag is always provided, and unconditionally retry after sleep for the
busy page or failed allocation.

The intent is to remove VM_ALLOC_RETRY eventually.

Proposed and reviewed by:	alc
2010-07-08 08:37:51 +00:00
kib
15a394fbba Add the ability for the allocflag argument of the vm_page_grab() to
specify the increment of vm_pageout_deficit when sleeping due to page
shortage. Then, in allocbuf(), the code to allocate pages when extending
vmio buffer can be replaced by a call to vm_page_grab().

Suggested and reviewed by:	alc
MFC after:	2 weeks
2010-07-05 21:13:32 +00:00
kib
56b6a703a5 Introduce a helper function vm_page_find_least(). Use it in several places,
which inline the function.

Reviewed by:	alc
Tested by:	pho
MFC after:	1 week
2010-07-04 11:13:33 +00:00
alc
8c92592cd0 Improve the comment and man page for vm_page_alloc(). Specifically,
document one of the optional flags; clarify which of the flags are
optional (and which are not), and remove mention of a restriction on
the reclamation of cached pages that no longer holds since version 7.

MFC after:	1 week
2010-07-03 18:25:37 +00:00
alc
50ab2ca4b1 With the demise of page coloring, the page queue macros no longer serve any
useful purpose.  Eliminate them.

Reviewed by:	kib
2010-07-02 15:02:51 +00:00
alc
3832173dc1 Introduce vm_page_next() and vm_page_prev(), and use them in
vm_pageout_clean().  When iterating over a range of pages, these functions
can be cheaper than vm_page_lookup() because their implementation takes
advantage of the vm_object's memq being ordered.

Reviewed by:	kib@
MFC after:	3 weeks
2010-06-21 23:27:24 +00:00
alc
ddca4dfc3b Eliminate checks for a page having a NULL object in vm_pageout_scan()
and vm_pageout_page_stats().  These checks were recently introduced by
the first page locking commit, r207410, but they are not needed.  At
the same time, eliminate some redundant accesses to the page's object
field.  (These accesses should have neen eliminated by r207410.)

Make the assertion in vm_page_flag_set() stricter.  Specifically, only
managed pages should have PG_WRITEABLE set.

Add a comment documenting an assertion to vm_page_flag_clear().

It has long been the case that fictitious pages have their wire count
permanently set to one.  Add comments to vm_page_wire() and
vm_page_unwire() documenting this.  Add assertions to these functions
as well.

Update the comment describing vm_page_unwire().  Much of the old
comment had little to do with vm_page_unwire(), but a lot to do with
_vm_page_deactivate().  Move relevant parts of the old comment to
_vm_page_deactivate().

Only pages that belong to an object can be paged out.  Therefore, it
is pointless for vm_page_unwire() to acquire the page queues lock and
enqueue such pages in one of the paging queues.  Generally speaking,
such pages are immediately freed after the call to vm_page_unwire().
Previously, it was the call to vm_page_free() that reacquired the page
queues lock and removed these pages from the paging queues.  Now, we
will never acquire the page queues lock for this case.  (It is also
worth noting that since both vm_page_unwire() and vm_page_free()
occurred with the page locked, the page daemon never saw the page with
its object field set to NULL.)

Change the panic with vm_page_unwire() to provide a more precise message.

Reviewed by:	kib@
2010-06-14 19:54:19 +00:00
alc
7c212e010d Reduce the scope of the page queues lock and the number of
PG_REFERENCED changes in vm_pageout_object_deactivate_pages().
Simplify this function's inner loop using TAILQ_FOREACH(), and shorten
some of its overly long lines.  Update a stale comment.

Assert that PG_REFERENCED may be cleared only if the object containing
the page is locked.  Add a comment documenting this.

Assert that a caller to vm_page_requeue() holds the page queues lock,
and assert that the page is on a page queue.

Push down the page queues lock into pmap_ts_referenced() and
pmap_page_exists_quick().  (As of now, there are no longer any pmap
functions that expect to be called with the page queues lock held.)

Neither pmap_ts_referenced() nor pmap_page_exists_quick() should ever
be passed an unmanaged page.  Assert this rather than returning "0"
and "FALSE" respectively.

ARM:

Simplify pmap_page_exists_quick() by switching to TAILQ_FOREACH().

Push down the page queues lock inside of pmap_clearbit(), simplifying
pmap_clear_modify(), pmap_clear_reference(), and pmap_remove_write().
Additionally, this allows for avoiding the acquisition of the page
queues lock in some cases.

PowerPC/AIM:

moea*_page_exits_quick() and moea*_page_wired_mappings() will never be
called before pmap initialization is complete.  Therefore, the check
for moea_initialized can be eliminated.

Push down the page queues lock inside of moea*_clear_bit(),
simplifying moea*_clear_modify() and moea*_clear_reference().

The last parameter to moea*_clear_bit() is never used.  Eliminate it.

PowerPC/BookE:

Simplify mmu_booke_page_exists_quick()'s control flow.

Reviewed by:	kib@
2010-06-10 16:56:35 +00:00
kib
fb3a28316e Add assertion and comment in vm_page_flag_set() describing the expectations
when the PG_WRITEABLE flag is set.

Reviewed by:	alc
2010-06-03 10:11:45 +00:00
alc
518ce17afa Maintain the pretense that we support 32KB pages for the sake of the ia64
LINT build.
2010-06-03 02:24:53 +00:00
alc
24ac89cf14 Minimize the use of the page queues lock for synchronizing access to the
page's dirty field.  With the exception of one case, access to this field
is now synchronized by the object lock.
2010-06-02 15:46:37 +00:00
alc
3f1d4b057c Push down page queues lock acquisition in pmap_enter_object() and
pmap_is_referenced().  Eliminate the corresponding page queues lock
acquisitions from vm_map_pmap_enter() and mincore(), respectively.  In
mincore(), this allows some additional cases to complete without ever
acquiring the page queues lock.

Assert that the page is managed in pmap_is_referenced().

On powerpc/aim, push down the page queues lock acquisition from
moea*_is_modified() and moea*_is_referenced() into moea*_query_bit().
Again, this will allow some additional cases to complete without ever
acquiring the page queues lock.

Reorder a few statements in vm_page_dontneed() so that a race can't lead
to an old reference persisting.  This scenario is described in detail by a
comment.

Correct a spelling error in vm_page_dontneed().

Assert that the object is locked in vm_page_clear_dirty(), and restrict the
page queues lock assertion to just those cases in which the page is
currently writeable.

Add object locking to vnode_pager_generic_putpages().  This was the one
and only place where vm_page_clear_dirty() was being called without the
object being locked.

Eliminate an unnecessary vm_page_lock() around vnode_pager_setsize()'s call
to vm_page_clear_dirty().

Change vnode_pager_generic_putpages() to the modern-style of function
definition.  Also, change the name of one of the parameters to follow
virtual memory system naming conventions.

Reviewed by:	kib
2010-05-26 18:00:44 +00:00
alc
54739180f5 Eliminate the acquisition and release of the page queues lock from
vfs_busy_pages().  It is no longer needed.

Submitted by:	kib
2010-05-25 02:26:25 +00:00
alc
32b13ee957 Roughly half of a typical pmap_mincore() implementation is machine-
independent code.  Move this code into mincore(), and eliminate the
page queues lock from pmap_mincore().

Push down the page queues lock into pmap_clear_modify(),
pmap_clear_reference(), and pmap_is_modified().  Assert that these
functions are never passed an unmanaged page.

Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m:
Contrary to what the comment says, pmap_mincore() is not simply an
optimization.  Without a complete pmap_mincore() implementation,
mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED
because only the pmap can provide this information.

Eliminate the page queues lock from vfs_setdirty_locked_object(),
vm_pageout_clean(), vm_object_page_collect_flush(), and
vm_object_page_clean().  Generally speaking, these are all accesses
to the page's dirty field, which are synchronized by the containing
vm object's lock.

Reduce the scope of the page queues lock in vm_object_madvise() and
vm_page_dontneed().

Reviewed by:	kib (an earlier version)
2010-05-24 14:26:57 +00:00
alc
f8bed5b288 The page queues lock is no longer required by vm_page_set_invalid(), so
eliminate it.

Assert that the object containing the page is locked in
vm_page_test_dirty().  Perform some style clean up while I'm here.

Reviewed by:	kib
2010-05-18 16:40:29 +00:00
alc
f6c07c5b87 On entry to pmap_enter(), assert that the page is busy. While I'm
here, make the style of assertion used by pmap_enter() consistent
across all architectures.

On entry to pmap_remove_write(), assert that the page is neither
unmanaged nor fictitious, since we cannot remove write access to
either kind of page.

With the push down of the page queues lock, pmap_remove_write() cannot
condition its behavior on the state of the PG_WRITEABLE flag if the
page is busy.  Assert that the object containing the page is locked.
This allows us to know that the page will neither become busy nor will
PG_WRITEABLE be set on it while pmap_remove_write() is running.

Correct a long-standing bug in vm_page_cowsetup().  We cannot possibly
do copy-on-write-based zero-copy transmit on unmanaged or fictitious
pages, so don't even try.  Previously, the call to pmap_remove_write()
would have failed silently.
2010-05-16 23:45:10 +00:00
alc
862dd3e326 Correct an error of omission in r202897: Now that amd64 uses the direct map
to access the message buffer, we must explicitly request that the underlying
physical pages are included in a crash dump.

Reported by:	Benjamin Kaduk
2010-05-16 19:25:56 +00:00
alc
684507e744 Push down the acquisition of the page queues lock into vm_pageq_remove().
(This eliminates a surprising number of page queues lock acquisitions by
vm_fault() because the page's queue is PQ_NONE and thus the page queues
lock is not needed to remove the page from a queue.)
2010-05-09 16:55:42 +00:00
alc
59b934ef40 Minimize the scope of the page queues lock in vm_fault(). 2010-05-08 21:35:51 +00:00
alc
40b44f9713 Push down the page queues into vm_page_cache(), vm_page_try_to_cache(), and
vm_page_try_to_free().  Consequently, push down the page queues lock into
pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and
pmap_remove_write().

Push down the page queues lock into Xen's pmap_page_is_mapped().  (I
overlooked the Xen pmap in r207702.)

Switch to a per-processor counter for the total number of pages cached.
2010-05-08 20:34:01 +00:00
alc
3c8033e013 Push down the page queues lock into vm_page_activate(). 2010-05-07 15:49:43 +00:00
alc
ebbcafea24 Push down the page queues lock into vm_page_deactivate(). Eliminate an
incorrect comment.
2010-05-07 04:14:07 +00:00
alc
4b98b3d320 Push down the page queues lock inside of vm_page_free_toq() and
pmap_page_is_mapped() in preparation for removing page queues locking
around calls to vm_page_free().  Setting aside the assertion that calls
pmap_page_is_mapped(), vm_page_free_toq() now acquires and holds the page
queues lock just long enough to actually add or remove the page from the
paging queues.

Update vm_page_unhold() to reflect the above change.
2010-05-06 16:39:43 +00:00
alc
5c7ca3ee73 Acquire the page lock around all remaining calls to vm_page_free() on
managed pages that didn't already have that lock held.  (Freeing an
unmanaged page, such as the various pmaps use, doesn't require the page
lock.)

This allows a change in vm_page_remove()'s locking requirements.  It now
expects the page lock to be held instead of the page queues lock.
Consequently, the page queues lock is no longer required at all by callers
to vm_page_rename().

Discussed with: kib
2010-05-05 18:16:06 +00:00
alc
ea7b6345be Push down the acquisition of the page queues lock into vm_page_unwire().
Update the comment describing which lock should be held on entry to
vm_page_wire().

Reviewed by:	kib
2010-05-05 03:45:46 +00:00
alc
c9aaa1e2a2 Add page locking to the vm_page_cow* functions.
Push down the acquisition and release of the page queues lock into
vm_page_wire().

Reviewed by:	kib
2010-05-04 15:55:41 +00:00
alc
d84ce0b37b Add lock assertions. 2010-05-04 05:55:19 +00:00
alc
33bb944de8 Acquire the page lock around vm_page_wire() in vm_page_grab().
Assert that the page lock is held in vm_page_wire().
2010-05-03 17:55:32 +00:00
alc
dda80c9587 Assert that the page queues lock is held in vm_page_remove() and
vm_page_unwire() only if the page is managed, i.e., pageable.
2010-05-03 07:00:50 +00:00
alc
fbe19e5e15 Add page lock assertions where we access the page's hold_count. 2010-05-02 23:33:10 +00:00
alc
f35e97166b It makes no sense for vm_page_sleep_if_busy()'s helper, vm_page_sleep(),
to unconditionally set PG_REFERENCED on a page before sleeping.  In many
cases, it's perfectly ok for the page to disappear, i.e., be reclaimed by
the page daemon, before the caller to vm_page_sleep() is reawakened.
Instead, we now explicitly set PG_REFERENCED in those cases where having
the page persist until the caller is awakened is clearly desirable.  Note,
however, that setting PG_REFERENCED on the page is still only a hint,
and not a guarantee that the page should persist.
2010-05-02 17:33:46 +00:00
kmacy
c53b44ae51 don't allow unsynchronized free in vm_page_unhold 2010-04-30 02:46:49 +00:00
kmacy
1dc1263413 On Alan's advice, rather than do a wholesale conversion on a single
architecture from page queue lock to a hashed array of page locks
(based on a patch by Jeff Roberson), I've implemented page lock
support in the MI code and have only moved vm_page's hold_count
out from under page queue mutex to page lock. This changes
pmap_extract_and_hold on all pmaps.

Supported by: Bitgravity Inc.

Discussed with: alc, jeffr, and kib
2010-04-30 00:46:43 +00:00
alc
6976d9abe9 vm_reserv_alloc_page() should never be called on an OBJT_SG object, just as
it is never called on an OBJT_DEVICE object.  (This change should have been
included in r195840.)

Reported by:	dougb@, avg@
MFC after:	3 days
2010-04-05 06:23:31 +00:00
kib
75f11bce71 Update comment for vm_page_alloc(9), listing all acceptable flags [1].
Note that the function does not sleep, it can block.

Submitted by:	Giovanni Trematerra <giovanni.trematerra gmail com> [1]
MFC after:	3 days
2010-02-27 17:09:28 +00:00
alc
dce82c729a Align and pad the page queue and free page queue locks so that the linker
can't possibly place them together within the same cache line.

MFC after:	3 weeks
2009-10-04 18:53:10 +00:00
jhb
44220d7e1e Add a new type of VM object: OBJT_SG. An OBJT_SG object is very similar to
a device pager (OBJT_DEVICE) object in that it uses fictitious pages to
provide aliases to other memory addresses.  The primary difference is that
it uses an sglist(9) to determine the physical addresses for a given offset
into the object instead of invoking the d_mmap() method in a device driver.

Reviewed by:	alc
Approved by:	re (kensmith)
MFC after:	2 weeks
2009-07-24 13:50:29 +00:00
alc
40432bac3b An addendum to r195649, "Add support to the virtual memory system for
configuring machine-dependent memory attributes...":

Don't set the memory attribute for a "real" page that is allocated to
a device object in vm_page_alloc().  It is a pointless act, because
the device pager replaces this "real" page with a "fake" page and sets
the memory attribute on that "fake" page.

Eliminate pointless code from pmap_cache_bits() on amd64.

Employ the "Self Snoop" feature supported by some x86 processors to
avoid cache flushes in the pmap.

Approved by:	re (kib)
2009-07-18 01:50:05 +00:00
alc
ea60573817 Add support to the virtual memory system for configuring machine-
dependent memory attributes:

Rename vm_cache_mode_t to vm_memattr_t.  The new name reflects the
fact that there are machine-dependent memory attributes that have
nothing to do with controlling the cache's behavior.

Introduce vm_object_set_memattr() for setting the default memory
attributes that will be given to an object's pages.

Introduce and use pmap_page_{get,set}_memattr() for getting and
setting a page's machine-dependent memory attributes.  Add full
support for these functions on amd64 and i386 and stubs for them on
the other architectures.  The function pmap_page_set_memattr() is also
responsible for any other machine-dependent aspects of changing a
page's memory attributes, such as flushing the cache or updating the
direct map.  The uses include kmem_alloc_contig(), vm_page_alloc(),
and the device pager:

  kmem_alloc_contig() can now be used to allocate kernel memory with
  non-default memory attributes on amd64 and i386.

  vm_page_alloc() and the device pager will set the memory attributes
  for the real or fictitious page according to the object's default
  memory attributes.

Update the various pmap functions on amd64 and i386 that map pages to
incorporate each page's memory attributes in the mapping.

Notes: (1) Inherent to this design are safety features that prevent
the specification of inconsistent memory attributes by different
mappings on amd64 and i386.  In addition, the device pager provides a
warning when a device driver creates a fictitious page with memory
attributes that are inconsistent with the real page that the
fictitious page is an alias for. (2) Storing the machine-dependent
memory attributes for amd64 and i386 as a dedicated "int" in "struct
md_page" represents a compromise between space efficiency and the ease
of MFCing these changes to RELENG_7.

In collaboration with: jhb

Approved by:	re (kib)
2009-07-12 23:31:20 +00:00
alc
7b05ffed76 Strive for greater consistency among the places that implement real,
fictious, and contiguous page allocation.  Eliminate unnecessary
reinitialization of a page's fields.
2009-06-21 00:21:33 +00:00
alc
e4bf0af67f Add assertions in two places where a page's valid or dirty bits are changed. 2009-05-30 22:06:58 +00:00
alc
82da6bfdea Eliminate page queues locking from bufdone_finish() through the
following changes:

Rename vfs_page_set_valid() to vfs_page_set_validclean() to reflect
what this function actually does.  Suggested by: tegge

Introduce a new version of vfs_page_set_valid() that does no more than
what the function's name implies.  Specifically, it does not update
the page's dirty mask, and thus it does not require the page queues
lock to be held.

Update two of the three callers to the old vfs_page_set_valid() to
call vfs_page_set_validclean() instead because they actually require
the page's dirty mask to be cleared.

Introduce vm_page_set_valid().

Reviewed by:	tegge
2009-05-13 05:39:39 +00:00
kib
ac1b596fda Extend the struct vm_page wire_count to u_int to avoid the overflow
of the counter, that may happen when too many sendfile(2) calls are
being executed with this vnode [1].

To keep the size of the struct vm_page and offsets of the fields
accessed by out-of-tree modules, swap the types and locations
of the wire_count and cow fields. Add safety checks to detect cow
overflow and force fallback to the normal copy code for zero-copy
sockets. [2]

Reported by:	Anton Yuzhaninov <citrin citrin ru> [1]
Suggested by:	alc [2]
Reviewed by:	alc
MFC after:	2 weeks
2009-01-03 13:24:08 +00:00
raj
032a270ba5 Support kernel crash mini dumps on ARM architecture.
Obtained from:	Juniper Networks, Semihalf
2008-11-06 16:20:27 +00:00
emaste
a173e0f84d Move CTASSERT from header file to source file, per implementation note now
in the CTASSERT man page.
2008-09-26 18:44:40 +00:00
kmacy
de0a83a794 Work around differences in page allocation for initial page tables on xen
MFC after:	1 month
2008-08-17 23:40:29 +00:00
alc
25f7299f0f Essentially, neither madvise(..., MADV_DONTNEED) nor madvise(..., MADV_FREE)
work.  (Moreover, I don't believe that they have ever worked as intended.)
The explanation is fairly simple.  Both MADV_DONTNEED and MADV_FREE perform
vm_page_dontneed() on each page within the range given to madvise().  This
function moves the page to the inactive queue.  Specifically, if the page is
clean, it is moved to the head of the inactive queue where it is first in
line for processing by the page daemon.  On the other hand, if it is dirty,
it is placed at the tail.  Let's further examine the case in which the page
is clean.  Recall that the page is at the head of the line for processing by
the page daemon.  The expectation of vm_page_dontneed()'s author was that
the page would be transferred from the inactive queue to the cache queue by
the page daemon.  (Once the page is in the cache queue, it is, in effect,
free, that is, it can be reallocated to a new vm object by vm_page_alloc()
if it isn't reactivated quickly enough by a user of the old vm object.)  The
trouble is that nowhere in the execution of either MADV_DONTNEED or
MADV_FREE is either the machine-independent reference flag (PG_REFERENCED)
or the reference bit in any page table entry (PTE) mapping the page cleared.
Consequently, the immediate reaction of the page daemon is to reactivate the
page because it is referenced.  In effect, the madvise() was for naught.
The case in which the page was dirty is not too different.  Instead of being
laundered, the page is reactivated.

Note: The essential difference between MADV_DONTNEED and MADV_FREE is
that MADV_FREE clears a page's dirty field.  So, MADV_FREE is always
executing the clean case above.

This revision changes vm_page_dontneed() to clear both the machine-
independent reference flag (PG_REFERENCED) and the reference bit in all PTEs
mapping the page.

MFC after:	6 weeks
2008-06-06 18:38:43 +00:00
alc
47efda9e75 Don't call vm_reserv_alloc_page() on device-backed objects. Otherwise, the
system may panic because there is no reservation structure corresponding to
the physical address of the device memory.

Reported by: Giorgos Keramidas
2008-05-15 18:52:31 +00:00
alc
2f4904816f Introduce vm_reserv_reclaim_contig(). This function is used by
contigmalloc(9) as a last resort to steal pages from an inactive,
partially-used superpage reservation.

Rename vm_reserv_reclaim() to vm_reserv_reclaim_inactive() and
refactor it so that a separate subroutine is responsible for breaking
the selected reservation.  This subroutine is also used by
vm_reserv_reclaim_contig().
2008-04-06 18:09:28 +00:00
alc
caedbf233d Rename vm_pageq_requeue() to vm_page_requeue() on account of its recent
migration to vm/vm_page.c.
2008-03-19 20:24:35 +00:00
alc
4e9b2a2931 Almost seven years ago, vm/vm_page.c was split into three parts:
vm/vm_contig.c, vm/vm_page.c, and vm/vm_pageq.c.  Today, vm/vm_pageq.c
has withered to the point that it contains only four short functions,
two of which are only used by vm/vm_page.c.  Since I can't foresee any
reason for vm/vm_pageq.c to grow, it is time to fold the remaining
contents of vm/vm_pageq.c back into vm/vm_page.c.

Add some comments.  Rename one of the functions, vm_pageq_enqueue(),
that is now static within vm/vm_page.c to vm_page_enqueue().
Eliminate PQ_MAXCOUNT as it no longer serves any purpose.
2008-03-18 06:52:15 +00:00
alc
3c35e9380c Defer setting either PG_CACHED or PG_FREE until after the free page
queues lock is acquired.  Otherwise, the state of a reservation's
pages' flags and its population count can be inconsistent.  That could
result in a page being freed twice.

Reported by:	kris
2008-01-02 04:43:47 +00:00
alc
4565fa1697 Add the superpage reservation system. This is "part 2 of 2" of the
machine-independent support for superpages.  (The earlier part was
the rewrite of the physical memory allocator.)  The remainder of the
code required for superpages support is machine-dependent and will
be added to the various pmap implementations at a later date.

Initially, I am only supporting one large page size per architecture.
Moreover, I am only enabling the reservation system on amd64.  (In
an emergency, it can be disabled by setting VM_NRESERVLEVELS to 0
in amd64/include/vmparam.h or your kernel configuration file.)
2007-12-29 19:53:04 +00:00
alc
4518d14d23 Modify vm_phys_unfree_page() so that it no longer requires the given
page to be in the free lists.  Instead, it now returns TRUE if it
removed the page from the free lists and FALSE if the page was not
in the free lists.

This change is required to support superpage reservations.  Specifically,
once reservations are introduced, a cached page can either be in the
free lists or a reservation.
2007-12-20 22:45:54 +00:00
alc
3c2abd13fb Eliminate redundant code from vm_page_startup(). 2007-12-19 05:47:50 +00:00
alc
5929e7ecb6 Simplify vm_page_free_toq(). 2007-12-11 21:20:34 +00:00
alc
cf47268b02 Correct a comment. 2007-12-02 07:43:42 +00:00
alc
018efe29f9 When reactivating a cached page, reset the page's pool to the default
pool.  (Not doing this before was a performance pessimization but not
a cause for panic.)
2007-11-21 23:22:10 +00:00
kib
8cd6397d8a The intent of the freeing the (zeroed) page in vm_page_cache() for
default object rather than cache it was to have
vm_pager_has_page(object, pindex, ...) == FALSE to imply that there is
no cached page in object at pindex. This allows to avoid explicit
checks for cached pages in vm_object_backing_scan().

For now, we need the same bandaid for the swap object, otherwise both
the vm_page_lookup() and the pager can report that there is no page at
offset, while page is stored in the cache. Also, this fixes another
instance of the KASSERT("object type is incompatible") failure in the
vm_page_cache_transfer().

Reported and tested by:	Peter Holm
Reviewed by:	alc
MFC after:	3 days
2007-11-05 10:25:12 +00:00
alc
acb713befa Change vm_page_cache_transfer() such that it does not transfer pages
that would have an offset beyond the end of the target object.  Such
pages should remain in the source object.

MFC after:	3 days
Diagnosed and reviewed by:	Kostik Belousov
Reported and tested by:		Peter Holm
2007-10-27 00:09:30 +00:00
alc
d53c0afe54 In the rare case that vm_page_cache() actually frees the given page,
it must first ensure that the page is no longer mapped.  This is
trivially accomplished by calling pmap_remove_all() a little earlier
in vm_page_cache().  While I'm in the neighborbood, make a related
panic message a little more useful.

Approved by:	re (kensmith)
Reported by:	Peter Holm and Konstantin Belousov
Reviewed by:	Konstantin Belousov
2007-10-08 18:01:38 +00:00
alc
19c4fce2e3 Correct a lock assertion failure in sparc64's pmap_page_is_mapped() that is
a consequence of sparc64/sparc64/vm_machdep.c revision 1.76.  It occurs
when uma_small_free() frees a page.  The solution has two parts: (1) Mark
pages allocated with VM_ALLOC_NOOBJ as PG_UNMANAGED.  (2) Defer the lock
assertion in pmap_page_is_mapped() until after PG_UNMANAGED is tested.
This is safe because both PG_UNMANAGED and PG_FICTITIOUS are immutable
flags, i.e., they do not change state between the time that a page is
allocated and freed.

Approved by:	re (kensmith)
PR:		116794
2007-10-07 18:03:03 +00:00
alc
9d3ffe57ce Correct an error of omission in the reimplementation of the page
cache: vm_object_page_remove() should convert any cached pages that
fall with the specified range to free pages.  Otherwise, there could
be a problem if a file is first truncated and then regrown.
Specifically, some old data from prior to the truncation might reappear.

Generalize vm_page_cache_free() to support the conversion of either a
subset or the entirety of an object's cached pages.

Reported by: tegge
Reviewed by: tegge
Approved by: re (kensmith)
2007-09-27 04:21:59 +00:00
alc
d1bce06c64 Change the management of cached pages (PQ_CACHE) in two fundamental
ways:

(1) Cached pages are no longer kept in the object's resident page
splay tree and memq.  Instead, they are kept in a separate per-object
splay tree of cached pages.  However, access to this new per-object
splay tree is synchronized by the _free_ page queues lock, not to be
confused with the heavily contended page queues lock.  Consequently, a
cached page can be reclaimed by vm_page_alloc(9) without acquiring the
object's lock or the page queues lock.

This solves a problem independently reported by tegge@ and Isilon.
Specifically, they observed the page daemon consuming a great deal of
CPU time because of pages bouncing back and forth between the cache
queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE).  The source of
this problem turned out to be a deadlock avoidance strategy employed
when selecting a cached page to reclaim in vm_page_select_cache().
However, the root cause was really that reclaiming a cached page
required the acquisition of an object lock while the page queues lock
was already held.  Thus, this change addresses the problem at its
root, by eliminating the need to acquire the object's lock.

Moreover, keeping cached pages in the object's primary splay tree and
memq was, in effect, optimizing for the uncommon case.  Cached pages
are reclaimed far, far more often than they are reactivated.  Instead,
this change makes reclamation cheaper, especially in terms of
synchronization overhead, and reactivation more expensive, because
reactivated pages will have to be reentered into the object's primary
splay tree and memq.

(2) Cached pages are now stored alongside free pages in the physical
memory allocator's buddy queues, increasing the likelihood that large
allocations of contiguous physical memory (i.e., superpages) will
succeed.

Finally, as a result of this change long-standing restrictions on when
and where a cached page can be reclaimed and returned by
vm_page_alloc(9) are eliminated.  Specifically, calls to
vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and
return a formerly cached page.  Consequently, a call to malloc(9)
specifying M_NOWAIT is less likely to fail.

Discussed with: many over the course of the summer, including jeff@,
   Justin Husted @ Isilon, peter@, tegge@
Tested by: an earlier version by kris@
Approved by: re (kensmith)
2007-09-25 06:25:06 +00:00
alc
215153274b Add a counter for the total number of pages cached and support for
reporting the value of this counter in the program "vmstat".

Approved by:	re (rwatson)
2007-07-27 20:01:22 +00:00
alc
dc39b85c98 Eliminate two unused functions: vm_phys_alloc_pages() and
vm_phys_free_pages().  Rename vm_phys_alloc_pages_locked() to
vm_phys_alloc_pages() and vm_phys_free_pages_locked() to
vm_phys_free_pages().  Add comments regarding the need for the free page
queues lock to be held by callers to these functions.  No functional
changes.

Approved by:	re (hrs)
2007-07-14 21:21:17 +00:00
alc
3b36c8e7eb Correct a problem in the ZERO_COPY_SOCKETS option, specifically, in
vm_page_cowfault().  Initially, if vm_page_cowfault() sleeps, the given
page is wired, preventing it from being recycled.  However, when
transmission of the page completes, the page is unwired and returned to
the page queues.  At that point, the page is not in any special state
that prevents it from being recycled.  Consequently, vm_page_cowfault()
should verify that the page is still held by the same vm object before
retrying the replacement of the page.  Note: The containing object is,
however, safe from being recycled by virtue of having a non-zero
paging-in-progress count.

While I'm here, add some assertions and comments.

Approved by: re (rwatson)
MFC After: 3 weeks
2007-07-10 18:41:34 +00:00
mjacob
a7dcde4629 Don't declare inline a function which isn't. 2007-06-17 04:19:05 +00:00
alc
011d4e557f If attempting to cache a "busy", panic instead of printing a diagnostic
message and returning.
2007-06-16 21:07:51 +00:00
alc
a8415c5a0d Enable the new physical memory allocator.
This allocator uses a binary buddy system with a twist.  First and
foremost, this allocator is required to support the implementation of
superpages.  As a side effect, it enables a more robust implementation
of contigmalloc(9).  Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).

The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages.  Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space.  The performance benefits vary.  In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.

This allocator does not implement page coloring.  The reason is that
superpages have much the same effect.  The contiguous physical memory
allocation necessary for a superpage is inherently colored.

Finally, the one caveat is that this allocator does not effectively
support prezeroed pages.  I hope this is temporary.  On i386, this is
a slight pessimization.  However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects.  I speculate
that this is true in general of machines with a direct map.

Approved by:	re
2007-06-16 04:57:06 +00:00
attilio
e9fc4edc44 Optimize vmmeter locking.
In particular:
- Add an explicative table for locking of struct vmmeter members
- Apply new rules for some of those members
- Remove some unuseful comments

Heavily reviewed by: alc, bde, jeff
Approved by: jeff (mentor)
2007-06-10 21:59:14 +00:00
attilio
9bd4fdf7ce Do proper "locking" for missing vmmeters part.
Now, we assume no more sched_lock protection for some of them and use the
distribuited loads method for vmmeter (distribuited through CPUs).

Reviewed by: alc, bde
Approved by: jeff (mentor)
2007-06-04 21:45:18 +00:00
attilio
7dd8ed88a9 Revert VMCNT_* operations introduction.
Probabilly, a general approach is not the better solution here, so we should
solve the sched_lock protection problems separately.

Requested by: alc
Approved by: jeff (mentor)
2007-05-31 22:52:15 +00:00
jeff
953418f0d5 - rename VMCNT_DEC to VMCNT_SUB to reflect the count argument.
Suggested by:	julian@
Contributed by:	attilio@
2007-05-20 22:33:42 +00:00
jeff
e1996cb960 - define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating
vmcnts.  This can be used to abstract away pcpu details but also changes
   to use atomics for all counters now.  This means sched lock is no longer
   responsible for protecting counts in the switch routines.

Contributed by:		Attilio Rao <attilio@FreeBSD.org>
2007-05-18 07:10:50 +00:00
alc
b34f6f7ab1 Define every architecture as either VM_PHYSSEG_DENSE or
VM_PHYSSEG_SPARSE depending on whether the physical address space is
densely or sparsely populated with memory.  The effect of this
definition is to determine which of two implementations of
vm_page_array and PHYS_TO_VM_PAGE() is used.  The legacy
implementation is obtained by defining VM_PHYSSEG_DENSE, and a new
implementation that trades off time for space is obtained by defining
VM_PHYSSEG_SPARSE.  For now, all architectures except for ia64 and
sparc64 define VM_PHYSSEG_DENSE.  Defining VM_PHYSSEG_SPARSE on ia64
allows the entirety of my Itanium 2's memory to be used.  Previously,
only the first 1 GB could be used.  Defining VM_PHYSSEG_SPARSE on
sparc64 allows USIIIi-based systems to boot without crashing.

This change is a combination of Nathan Whitehorn's patch and my own
work in perforce.

Discussed with: kmacy, marius, Nathan Whitehorn
PR:		112194
2007-05-05 19:50:28 +00:00
alc
573a964db6 Change the way that unmanaged pages are created. Specifically,
immediately flag any page that is allocated to a OBJT_PHYS object as
unmanaged in vm_page_alloc() rather than waiting for a later call to
vm_page_unmanage().  This allows for the elimination of some uses of
the page queues lock.

Change the type of the kernel and kmem objects from OBJT_DEFAULT to
OBJT_PHYS.  This allows us to take advantage of the above change to
simplify the allocation of unmanaged pages in kmem_alloc() and
kmem_malloc().

Remove vm_page_unmanage().  It is no longer used.
2007-02-25 06:14:58 +00:00
alc
c0ed1b65cf Enable vm_page_free() and vm_page_free_zero() to be called on some pages
without the page queues lock being held, specifically, pages that are not
contained in a vm object and not a member of a page queue.
2007-02-18 05:54:42 +00:00
alc
7913035c29 Remove a stale comment. Add punctuation to a nearby comment. 2007-02-17 19:37:00 +00:00
alc
3e7d1b7ebd Relax the page queue lock assertions in vm_page_remove() and
vm_page_free_toq() to account for recent changes that allow
vm_page_free_toq() to be called on some pages without the page queues lock
being held, specifically, pages that are not contained in a vm object and
not a member of a page queue.  (Examples of such pages include page table
pages, pv entry pages, and uma small alloc pages.)
2007-02-15 05:43:38 +00:00
alc
0250171bf6 Avoid the unnecessary acquisition of the free page queues lock when a page
is actually being added to the hold queue, not the free queue.  At the same
time, avoid unnecessary tests to wake up threads waiting for free memory
and the idle thread that zeroes free pages.  (These tests will be performed
later when the page finally moves from the hold queue to the free queue.)
2007-02-14 07:05:55 +00:00
alc
909177e227 Use the free page queue mutex instead of the page queue mutex to
synchronize sleeping and waking of the zero idle thread.
2007-02-11 05:18:40 +00:00
alc
2eb15b506b Change the pagedaemon, vm_wait(), and vm_waitpfault() to sleep on the
vm page queue free mutex instead of the vm page queue mutex.
2007-02-07 06:37:30 +00:00
alc
4881bd38e2 Change the free page queue lock from a spin mutex to a default (blocking)
mutex.  With the demise of Alpha support, there is no longer a reason for
it to be a spin mutex.
2007-02-05 06:02:55 +00:00
kmacy
d0a44b7b7c Remove the requirement that phys_avail be sorted in ascending order
by explicitly finding the lowest and highest addresses when calculating
the size of the vm_pages array

Reviewed by :alc
2006-12-08 08:44:47 +00:00
alc
375373276d I misplaced the assertion that was added to vm_page_startup() in the
previous change.  Correct its placement.
2006-11-08 19:11:54 +00:00
alc
763826feff Simplify the construction of the free queues in vm_page_startup(). Add
an assertion to test a hypothesis concerning other redundant computation
in vm_page_startup().
2006-11-08 18:43:47 +00:00
alc
5d9c66a3f8 The page queues lock is no longer required by vm_page_busy() or
vm_page_wakeup().  Reduce or eliminate its use accordingly.
2006-10-22 21:18:48 +00:00
alc
cbcb760109 Replace PG_BUSY with VPO_BUSY. In other words, changes to the page's
busy flag, i.e., VPO_BUSY, are now synchronized by the per-vm object
lock instead of the global page queues lock.
2006-10-22 04:28:14 +00:00
kensmith
0c209e1877 Fix two minor style(9) nits in v1.313 which were noticed during an
MFC review.  alc@ will be MFCing V1.313 plus style fix to RELENG_6.
2006-09-29 00:20:56 +00:00
alc
f2ccfe9525 Refactor vm_page_sleep_if_busy() so that the test for a busy page is
inlined and a procedure call is made in the rare case, i.e., when it is
necessary to sleep.  In this case, inlining the test actually makes the
kernel smaller.
2006-08-27 19:50:13 +00:00
alc
ce3ad47700 Page flags are reset on (re)allocation. There is no need to clear any
flags except for PG_ZERO in vm_page_free_toq().
2006-08-21 00:34:31 +00:00
alc
cc1f2c465b Reimplement the page's NOSYNC flag as an object-synchronized instead of a
page queues-synchronized flag.  Reduce the scope of the page queues lock in
vm_fault() accordingly.

Move vm_fault()'s call to vm_object_set_writeable_dirty() outside of the
scope of the page queues lock.  Reviewed by: tegge
Additionally, eliminate an unnecessary dereference in computing the
argument that is passed to vm_object_set_writeable_dirty().
2006-08-13 00:11:09 +00:00
alc
b787cad1e0 Ensure that the page's new field for object-synchronized flags is always
initialized to zero.

Call vm_page_sleep_if_busy() instead of duplicating its implementation in
vm_page_grab().
2006-08-11 17:18:58 +00:00
alc
bc546843d7 Change vm_page_cowfault() so that it doesn't allocate a pre-busied page. 2006-08-10 04:48:29 +00:00
alc
b98eae58a6 Introduce a field to struct vm_page for storing flags that are
synchronized by the lock on the object containing the page.

Transition PG_WANTED and PG_SWAPINPROG to use the new field,
eliminating the need for holding the page queues lock when setting
or clearing these flags.  Rename PG_WANTED and PG_SWAPINPROG to
VPO_WANTED and VPO_SWAPINPROG, respectively.

Eliminate the assertion that the page queues lock is held in
vm_page_io_finish().

Eliminate the acquisition and release of the page queues lock
around calls to vm_page_io_finish() in kern_sendfile() and
vfs_unbusy_pages().
2006-08-09 17:43:27 +00:00
alc
84c8fb9bd2 Change vm_page_sleep_if_busy() so that it no longer requires the caller to
hold the page queues lock.
2006-08-06 00:15:40 +00:00
alc
cbc0dafbb2 When sleeping on a busy page, use the lock from the containing object
rather than the global page queues lock.
2006-08-03 23:56:11 +00:00
alc
a152234cf9 Complete the transition from pmap_page_protect() to pmap_remove_write().
Originally, I had adopted sparc64's name, pmap_clear_write(), for the
function that is now pmap_remove_write().  However, this function is more
like pmap_remove_all() than like pmap_clear_modify() or
pmap_clear_reference(), hence, the name change.

The higher-level rationale behind this change is described in
src/sys/amd64/amd64/pmap.c revision 1.567.  The short version is that I'm
trying to clean up and fix our support for execute access.

Reviewed by: marcel@ (ia64)
2006-08-01 19:06:06 +00:00
alc
d0e4b9565d Eliminate OBJ_WRITEABLE. It hasn't been used in a long time. 2006-07-21 06:40:29 +00:00
jhb
d4be78a6fa Move the code to handle the vm.blacklist tunable up a layer into
vm_page_startup().  As a result, we now only lookup the tunable once
instead of looking it up once for every physical page of memory in the
system.  This cuts out about a 1 second or so delay in boot on x86
systems.  The delay is much larger and more noticable on sun4v apparently.

Reported by:	kmacy
MFC after:	1 week
2006-06-23 16:44:24 +00:00
ps
b2b51c6092 Fix minidumps to include pages allocated via pmap_map on amd64.
These pages are allocated from the direct map, and were not previous
tracked.  This included the vm_page_array and the early UMA bootstrap
pages.

Reviewed by:	peter
2006-05-31 22:55:23 +00:00
peter
dbae6322e8 Introduce minidumps. Full physical memory crash dumps are still available
via the debug.minidump sysctl and tunable.

Traditional dumps store all physical memory.  This was once a good thing
when machines had a maximum of 64M of ram and 1GB of kvm.  These days,
machines often have many gigabytes of ram and a smaller amount of kvm.
libkvm+kgdb don't have a way to access physical ram that is not mapped
into kvm at the time of the crash dump, so the extra ram being dumped
is mostly wasted.

Minidumps invert the process.  Instead of dumping physical memory in
in order to guarantee that all of kvm's backing is dumped, minidumps
instead dump only memory that is actively mapped into kvm.

amd64 has a direct map region that things like UMA use.  Obviously we
cannot dump all of the direct map region because that is effectively
an old style all-physical-memory dump.  Instead, introduce a bitmap
and two helper routines (dump_add_page(pa) and dump_drop_page(pa)) that
allow certain critical direct map pages to be included in the dump.
uma_machdep.c's allocator is the intended consumer.

Dumps are a custom format.  At the very beginning of the file is a header,
then a copy of the message buffer, then the bitmap of pages present in
the dump, then the final level of the kvm page table trees (2MB mappings
are expanded into a 4K page mappings), then the sparse physical pages
according to the bitmap.  libkvm can now conveniently access the kvm
page table entries.

Booting my test 8GB machine, forcing it into ddb and forcing a dump
leads to a 48MB minidump.  While this is a best case, I expect minidumps
to be in the 100MB-500MB range.  Obviously, never larger than physical
memory of course.

minidumps are on by default.  It would want be necessary to turn them off
if it was necessary to debug corrupt kernel page table management as that
would mess up minidumps as well.

Both minidumps and regular dumps are supported on the same machine.
2006-04-21 04:24:50 +00:00
imp
f1947eff71 Remove leading __ from __(inline|const|signed|volatile). They are
obsolete.  This should reduce diffs to NetBSD as well.
2006-03-08 06:31:46 +00:00
ups
ad011f6a36 When the VM needs to allocated physical memory pages (for non interrupt use)
and it has not plenty of free pages it tries to free pages in the cache queue.
Unfortunately freeing a cached page requires the locking of the object that
owns the page. However in the context of allocating pages we may not be able
to lock the object and thus can only TRY to lock the object. If the locking try
fails the cache page can not be freed and is activated to move it out of the way
so that we may try to free other cache pages.

If all pages in the cache belong to objects that are currently locked the
cache queue can be emptied without freeing a single page. This scenario caused
two problems:

    1)  vm_page_alloc always failed allocation when it tried freeing pages from
        the cache queue and failed to do so. However if there are more than
        cnt.v_interrupt_free_min pages on the free list it should return pages
        when requested with priority VM_ALLOC_SYSTEM. Failure to do so can cause
        resource exhaustion deadlocks.

    2)  Threads than need to allocate pages spend a lot of time cleaning up the
        page queue without really getting anything done while the pagedaemon
         needs to work overtime to refill the cache.

This change fixes the first problem. (1)

Reviewed by:	tegge@
2006-02-15 22:29:53 +00:00
alc
2283c4343f Change #if defined(DIAGNOSTIC) to KASSERT. 2006-01-31 19:06:51 +00:00
alc
0ba6836a1e In vm_page_set_invalid() invalidate all of the page's mappings as soon as
any part of the page's contents is invalidated.

Submitted by: tegge
2006-01-24 07:21:38 +00:00
netchild
507a9b3e93 MI changes:
- provide an interface (macros) to the page coloring part of the VM system,
   this allows to try different coloring algorithms without the need to
   touch every file [1]
 - make the page queue tuning values readable: sysctl vm.stats.pagequeue
 - autotuning of the page coloring values based upon the cache size instead
   of options in the kernel config (disabling of the page coloring as a
   kernel option is still possible)

MD changes:
 - detection of the cache size: only IA32 and AMD64 (untested) contains
   cache size detection code, every other arch just comes with a dummy
   function (this results in the use of default values like it was the
   case without the autotuning of the page coloring)
 - print some more info on Intel CPU's (like we do on AMD and Transmeta
   CPU's)

Note to AMD owners (IA32 and AMD64): please run "sysctl vm.stats.pagequeue"
and report if the cache* values are zero (= bug in the cache detection code)
or not.

Based upon work by:	Chad David <davidc@acns.ab.ca> [1]
Reviewed by:		alc, arch (in 2004)
Discussed with:		alc, Chad David, arch (in 2004)
2005-12-31 14:39:20 +00:00
alc
9ecb89a9e0 Assert that the page that is given to vm_page_free_toq() does not have any
managed mappings.
2005-12-13 19:59:09 +00:00
alc
bf6d4253ee If a physical page is mapped by two or more virtual addresses, transmitted
by the zero-copy sockets method, and written to before the transmission
completes, we need to destroy all of the existing mappings to the page,
not just the one that we fault on.  Otherwise, the mappings will no longer
be to the same page and changes made through one of the mappings will not
be visible through the others.

Observed by: tegge
2005-11-08 06:33:21 +00:00
alc
e3ab6622c5 Consider the zero-copy transmission of a page that was wired by mlock(2).
If a copy-on-write fault occurs on the page, the new copy should inherit
a part of the original page's wire count.

Submitted by: tegge
MFC after: 1 week
2005-11-01 04:30:21 +00:00
des
c66d4a275d As alc pointed out to me, vm_page.c 1.305 was incomplete: uma_startup()
still uses the constant UMA_BOOT_PAGES.  Change it to accept boot_pages
as an additional argument.

MFC after:	2 weeks
2005-10-08 21:03:54 +00:00
des
1722544f93 Introduce the vm.boot_pages tunable and sysctl, which controls the number
of pages reserved to bootstrap the kernel memory allocator.

MFC after:	2 weeks
2005-08-12 12:24:19 +00:00
jeff
d1b34cc38c - In vm_page_insert() hold the backing vnode when the first page
is inserted.
 - In vm_page_remove() drop the backing vnode when the last page
   is removed.
 - Don't check the vnode to see if it must be reclaimed on every
   call to vm_page_free_toq() as we only check it now when it is
   actually required.  This saves us two lock operations per call.

Sponsored by:	Isilon Systems, Inc.
2005-03-15 14:14:09 +00:00
alc
403229a01e Transfer responsibility for freeing the page taken from the cache
queue and (possibly) unlocking the containing object from
vm_page_alloc() to vm_page_select_cache().  Recent optimizations to
vm_map_pmap_enter() (see vm_map.c revisions 1.362 and 1.363) and
pmap_enter_quick() have resulted in panic()s because vm_page_alloc()
mistakenly unlocked objects that had not been locked by
vm_page_select_cache().

Reported by: Peter Holm and Kris Kennaway
2005-01-07 05:02:19 +00:00
imp
f0bf889d0d /* -> /*- for license, minor formatting changes 2005-01-07 02:29:27 +00:00
alc
db7aa8b882 Assert that page allocations during an interrupt specify
VM_ALLOC_INTERRUPT.

Assert that pages removed from the cache queue are not busy.
2004-12-31 19:50:45 +00:00
alc
ec1a7d072a Access to the page's busy field is (now) synchronized by the containing
object's lock.  Therefore, the assertion that the page queues lock is held
can be removed from vm_page_io_start().
2004-12-29 04:18:22 +00:00
alc
5c1258faf6 Assert that the vm object is locked on entry to vm_page_sleep_if_busy();
remove some unneeded code.
2004-12-26 21:46:44 +00:00
alc
25b80a64b9 The synchronization provided by vm object locking has eliminated the
need for most calls to vm_page_busy().  Specifically, most calls to
vm_page_busy() occur immediately prior to a call to vm_page_remove().
In such cases, the containing vm object is locked across both calls.
Consequently, the setting of the vm page's PG_BUSY flag is not even
visible to other threads that are following the synchronization
protocol.

This change (1) eliminates the calls to vm_page_busy() that
immediately precede a call to vm_page_remove() or functions, such as
vm_page_free() and vm_page_rename(), that call it and (2) relaxes the
requirement in vm_page_remove() that the vm page's PG_BUSY flag is
set.  Now, the vm page's PG_BUSY flag is set only when the vm object
lock is released while the vm page is still in transition.  Typically,
this is when it is undergoing I/O.
2004-11-03 20:17:31 +00:00
alc
f73575dddd Assert that the containing vm object is locked in vm_page_cache() and
vm_page_try_to_cache().
2004-10-28 05:26:21 +00:00
alc
50d63268a9 Assert that the containing vm object is locked in vm_page_flash(). 2004-10-25 19:52:44 +00:00
alc
774f792bae Assert that the containing vm object is locked in vm_page_busy() and
vm_page_wakeup().
2004-10-24 23:53:47 +00:00
alc
faeb949021 Introduce VM_ALLOC_NOBUSY, an option to vm_page_alloc() and vm_page_grab()
that indicates that the caller does not want a page with its busy flag set.
In many places, the global page queues lock is acquired and released just
to clear the busy flag on a just allocated page.  Both the allocation of
the page and the clearing of the busy flag occur while the containing vm
object is locked.  So, the busy flag might as well never be set.
2004-10-24 06:15:36 +00:00
alc
a1afafd0da Correct two errors in PG_BUSY management by vm_page_cowfault(). Both
errors are in rarely executed paths.
1. Each time the retry_alloc path is taken, the PG_BUSY must be set again.
   Otherwise vm_page_remove() panics.
2. There is no need to set PG_BUSY on the newly allocated page before
   freeing it.  The page already has PG_BUSY set by vm_page_alloc().
   Setting it again could cause an assertion failure.

MFC after: 2 weeks
2004-10-18 08:11:59 +00:00
alc
b36ed839e4 Assert that the containing object is locked in vm_page_io_start() and
vm_page_io_finish().  The motivation being to transition synchronization of
the vm_page's busy field from the global page queues lock to the per-object
lock.
2004-10-17 22:33:40 +00:00
phk
1795816cf7 Add new a function isa_dma_init() which returns an errno when it fails
and which takes a M_WAITOK/M_NOWAIT flag argument.

Add compatibility isa_dmainit() macro which whines loudly if
isa_dma_init() fails.

Problem uncovered by:	tegge
2004-09-15 12:09:50 +00:00
alc
ff80a7c7f2 Advance the state of pmap locking on alpha, amd64, and i386.
- Enable recursion on the page queues lock.  This allows calls to
   vm_page_alloc(VM_ALLOC_NORMAL) and UMA's obj_alloc() with the page
   queues lock held.  Such calls are made to allocate page table pages
   and pv entries.
 - The previous change enables a partial reversion of vm/vm_page.c
   revision 1.216, i.e., the call to vm_page_alloc() by vm_page_cowfault()
   now specifies VM_ALLOC_NORMAL rather than VM_ALLOC_INTERRUPT.
 - Add partial locking to pmap_copy().  (As a side-effect, pmap_copy()
   should now be faster on i386 SMP because it no longer generates IPIs
   for TLB shootdown on the other processors.)
 - Complete the locking of pmap_enter() and pmap_enter_quick().  (As of now,
   all changes to a user-level pmap on alpha, amd64, and i386 are performed
   with appropriate locking.)
2004-07-29 18:56:31 +00:00
green
76cf7ed107 Fix a race in vm_page_sleep_if_busy(). Due to vm_object locking
being incomplete, it currently has to know how to drop and pick back
up the vm_object's mutex if it has to sleep and drop the page queue
mutex.  The problem with this is that if the page is busy, while we
are sleeping, the page can be freed and object disappear.  When trying
to lock m->object, we'd get a stale or NULL pointer and crash.

The object is now cached, but this makes the assumption that
the object is referenced in some manner and will not itself
disappear while it is unlocked.  Since this only happens if
the object is locked, I had to remove an assumption earlier in
contigmalloc() that reversed the order of locking the object and
doing vm_page_sleep_if_busy(), not the normal order.
2004-07-21 23:56:09 +00:00
alc
c700a8a66f - Eliminate the pte object from the pmap. Instead, page table pages are
allocated as "no object" pages.  Similar changes were made to the amd64
   and i386 pmap last year.  The primary reason being that maintaining
   a pte object leads to lock order violations.  A secondary reason being
   that the pte object is redundant, i.e., the page table itself can be
   used to lookup page table pages.  (Historical note: The pte object
   predates our ability to allocate "no object" pages.  Thus, the pte
   object was a necessary evil.)
 - Unconditionally check the vm object lock's status in vm_page_remove().
   Previously, this assertion could not be made on Alpha due to its use
   of a pte object.
2004-07-19 18:12:04 +00:00
alc
a7935ebbec Increase the scope of the page queues lock in vm_page_alloc() to cover
a diagnostic check that accesses the cache queue count.
2004-07-10 22:12:49 +00:00
alc
6edd95aba5 Remove spl() calls. Update comments to reflect the removal of spl() calls.
Remove '\n' from panic() format strings.  Remove some blank lines.
2004-06-19 04:19:47 +00:00
alc
4c2e464ea2 Do not preset PG_BUSY on VM_ALLOC_NOOBJ pages. Such pages are not
accessible through an object.  Thus, PG_BUSY serves no purpose.
2004-06-17 06:16:58 +00:00
alc
5d0912f6d8 To date, unwiring a fictitious page has produced a panic. The reason
being that PHYS_TO_VM_PAGE() returns the wrong vm_page for fictitious
pages but unwiring uses PHYS_TO_VM_PAGE().  The resulting panic
reported an unexpected wired count.  Rather than attempting to fix
PHYS_TO_VM_PAGE(), this fix takes advantage of the properties of
fictitious pages.  Specifically, fictitious pages will never be
completely unwired.  Therefore, we can keep a fictitious page's wired
count forever set to one and thereby avoid the use of
PHYS_TO_VM_PAGE() when we know that we're working with a fictitious
page, just not which one.

In collaboration with: green@, tegge@
PR: kern/29915
2004-05-22 04:53:51 +00:00
alc
625c8dfb26 Restructure vm_page_select_cache() so that adding assertions is easy.
Some of the conditions that caused vm_page_select_cache() to deactivate a
page were wrong.  For example, deactivating an unmanaged or wired page is a
nop.  Thus, if vm_page_select_cache() had ever encountered an unmanaged or
wired page, it would have looped forever.  Now, we assert that the page is
neither unmanaged nor wired.
2004-05-12 04:27:18 +00:00
alc
0049080776 Cache queue pages are not mapped. Thus, the pmap_remove_all() by
vm_page_alloc() is unnecessary.
2004-05-09 01:00:15 +00:00
alc
800747333a Update the comment describing vm_page_grab() to reflect the previous
revision and correct some of its style errors.
2004-04-24 21:36:23 +00:00
alc
106fdfcb2b Push down the responsibility for zeroing a physical page from the
caller to vm_page_grab().  Although this gives VM_ALLOC_ZERO a
different meaning for vm_page_grab() than for vm_page_alloc(), I feel
such change is necessary to accomplish other goals.  Specifically, I
want to make the PG_ZERO flag immutable between the time it is
allocated by vm_page_alloc() and freed by vm_page_free() or
vm_page_free_zero() to avoid locking overheads.  Once we gave up on
the ability to automatically recognize a zeroed page upon entry to
vm_page_free(), the ability to mutate the PG_ZERO flag became useless.
Instead, I would like to say that "Once a page becomes valid, its
PG_ZERO flag must be ignored."
2004-04-24 20:53:55 +00:00
imp
04c9462c02 Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999.

Approved by: core
2004-04-06 20:15:37 +00:00
alc
e9f0dd4665 Eliminate unused arguments from vm_page_startup(). 2004-04-04 23:33:36 +00:00
alc
163a9eabd1 Modify contigmalloc1() so that the free page queues lock is not held when
vm_page_free() is called.  The problem with holding this lock is that it is
a spin lock and vm_page_free() may attempt the acquisition of a different
default-type lock.
2004-03-02 08:25:58 +00:00
alc
4583be830a - Correct a long-standing race condition in vm_page_try_to_free() that
could result in a dirty page being unintentionally freed.
 - Simplify the dirty page check in vm_page_dontneed().

Reviewed by:	tegge
MFC after:	7 days
2004-02-19 07:43:55 +00:00
alc
86e8aa4dc7 - Correct a long-standing race condition in vm_page_try_to_cache() that
could result in a panic "vm_page_cache: caching a dirty page, ...":
   Access to the page must be restricted or removed before calling
   vm_page_cache().  This race condition is identical in nature to that
   which was addressed by vm_pageout.c's revision 1.251.
 - Simplify the code surrounding the fix to this same race condition
   in vm_pageout.c's revision 1.251.  There should be no behavioral
   change.  Reviewed by: tegge

MFC after:	7 days
2004-02-14 08:54:37 +00:00
alc
9f7878e05a - Enable recursive acquisition of the mutex synchronizing access to the
free pages queue.  This is presently needed by contigmalloc1().
 - Move a sanity check against attempted double allocation of two pages
   to the same vm object offset from vm_page_alloc() to vm_page_insert().
   This provides better protection because double allocation could occur
   through a direct call to vm_page_insert(), such as that by
   vm_page_rename().
 - Modify contigmalloc1() to hold the mutex synchronizing access to the
   free pages queue while it scans vm_page_array in search of free pages.
 - Correct a potential leak of pages by contigmalloc1() that I introduced
   in revision 1.20: We must convert all cache queue pages to free pages
   before we begin removing free pages from the free queue.  Otherwise,
   if we have to restart the scan because we are unable to acquire the
   vm object lock that is necessary to convert a cache queue page to a
   free page, we leak those free pages already removed from the free queue.
2004-01-08 20:48:26 +00:00
alc
6d0a5af017 In vm_page_lookup() check the root of the vm object's splay tree for the
desired page before calling vm_page_splay().
2003-12-31 19:02:01 +00:00
alc
e11aa2c75c Simplify vm_page_grab(): Don't bother with the generation check. If the
vm object hasn't changed, the desired page will be at or near the root
of the vm object's splay tree, making vm_page_lookup() cheap.  (The only
lock required for vm_page_lookup() is already held.)  If, however, the
vm object has changed and retry was requested, eliminating the generation
check also eliminates a pointless acquisition and release of the page
queues lock.
2003-12-31 01:44:45 +00:00
alc
dbc67551d3 - Create an unmapped guard page to trap access to vm_page_array[-1].
This guard page would have trapped the problems with the MFC of the PAE
   support to RELENG_4 at an earlier point in the sequence of events.

Submitted by:	tegge
2003-12-22 02:04:08 +00:00
alc
8ded4dfb69 - Additional vm object locking in vm_object_split()
- New vm object locking assertions in vm_page_insert() and
   vm_object_set_writeable_dirty()
2003-11-01 04:54:23 +00:00
alc
794553172b - Retire vm_pageout_page_free(). Instead, use vm_page_select_cache() from
vm_pageout_scan().  Rationale: I don't like leaving a busy page in the
   cache queue with neither the vm object nor the vm page queues lock held.
 - Assert that the page is active in vm_pageout_page_stats().
2003-10-22 18:41:32 +00:00
alc
35743f84b8 - Assert that the containing vm object is locked in
vm_page_set_validclean().  (This function reads and modifies the
   vm page's valid field, which is synchronized by the lock on the
   containing vm object.)
2003-10-21 19:36:51 +00:00
alc
eecac55b7d - Remove some long unused code. 2003-10-20 18:57:01 +00:00
alc
76f6c3b059 Retire vm_page_copy(). Its reason for being ended when peter@ modified
pmap_copy_page() et al. to accept a vm_page_t rather than a physical
address.  Also, this change will facilitate locking access to the vm page's
valid field.
2003-10-08 05:35:12 +00:00
alc
57538c344f Assert that the containing vm object's lock is held in
vm_page_set_invalid().
2003-10-05 06:58:07 +00:00
alc
632febaf7e Assert that the containing vm object's lock is held in
vm_page_zero_invalid().
2003-10-04 21:56:27 +00:00
alc
7c11eaebac - Extend the scope the vm object lock to cover calls to
vm_page_is_valid().
 - Assert that the lock on the containing vm object is held in
   vm_page_is_valid().
2003-10-04 19:23:29 +00:00
alc
8b797c473e In vm_page_remove(), assert that the vm object is locked, unless an Alpha.
(The Alpha still requires updates to its pmap.)
2003-09-28 04:50:48 +00:00
alc
601ceb70ef Initialize the page's pindex field even for VM_ALLOC_NOOBJ allocations.
(This field is useful for implementing sanity checks even if the page does
not belong to an object.)
2003-09-22 00:56:13 +00:00
alc
de18724542 Recent pmap changes permit the use of a more precise locking assertion
in vm_page_lookup().
2003-08-28 23:23:04 +00:00
alc
5b4e761019 Held pages, just like wired pages, should not be added to the cache queues.
Submitted by:	tegge
2003-08-23 20:29:29 +00:00
alc
5d6f66de90 Hold the page queues lock when performing vm_page_clear_dirty() and
vm_page_set_invalid().
2003-08-23 18:11:53 +00:00
alc
9e89497b7d Assert that the vm object's lock is held on entry to vm_page_grab(); remove
code from this function that was needed when vm object locking was
incomplete.
2003-08-21 20:59:07 +00:00
alc
f8ecd895b9 Assert that the vm object lock is held in vm_page_alloc(). 2003-08-20 20:24:29 +00:00
alc
7db05daaf9 Modify vm_page_alloc() and vm_page_select_cache() to allow the page that
is returned by vm_page_select_cache() to belong to the object that is
already locked by the caller to vm_page_alloc().
2003-07-01 07:33:41 +00:00
alc
1eef33b705 - Use an int rather than a vm_pindex_t to represent the desired page
color in vm_page_alloc().  (This also has small performance benefits.)
 - Eliminate vm_page_select_free(); vm_page_alloc() might as well
   call vm_pageq_find() directly.
2003-06-28 07:58:10 +00:00
alc
0e0026d70f vm_page_select_cache() enforces a number of conditions on the returned
page.  Add the ability to lock the containing object to those conditions.
2003-06-26 15:44:03 +00:00
alc
fa54a6610e Maintain a lock on the vm object of interest throughout vm_fault(),
releasing the lock only if we are about to sleep (e.g., vm_pager_get_pages()
or vm_pager_has_pages()).  If we sleep, we have marked the vm object with
the paging-in-progress flag.
2003-06-22 21:35:41 +00:00
alc
cb7655e83c Assert that the vm object is locked in vm_page_try_to_free(). 2003-06-19 01:50:14 +00:00
obrien
b0678d7a44 Use __FBSDID(). 2003-06-11 23:50:51 +00:00
alc
e448f32b52 Teach vm_page_grab() how to handle the vm object's lock. 2003-06-07 23:22:04 +00:00
alc
93ce6848ad - Relax the Giant required in vm_page_remove().
- Remove the Giant required from vm_page_free_toq().  (Any locking
   errors will be caught by vm_page_remove().)

This remedies a panic that occurred when kmem_malloc(NOWAIT) performed
without Giant failed to allocate the necessary pages.

Reported by:	phk
2003-04-25 06:35:05 +00:00
alc
eb6d5ae625 Revision 1.246 should have also included
- Weaken the assertion in vm_page_insert() to require Giant only if the
   vm_object isn't locked.

Reported by:	 "Ilmar S. Habibulin" <ilmar@watson.org>
2003-04-22 14:26:02 +00:00
alc
3b5c40ed83 Revision 1.52 of vm/uma_core.c has led to UMA's obj_alloc() being
called without Giant; and obj_alloc() in turn calls vm_page_alloc()
without Giant.  This causes an assertion failure in vm_page_alloc().
Fortunately, obj_alloc() is now MPSAFE.  So, we need only clean up
some assertions.

 - Weaken the assertion in vm_page_lookup() to require Giant only
   if the vm_object isn't locked.
 - Remove an assertion from vm_page_alloc() that duplicates a check
   performed in vm_page_lookup().

In collaboration with:	gallatin, jake, jeff
2003-04-22 05:36:14 +00:00
jhb
526c3912c0 - Kill the pv_flags member of the alpha mdpage since it stop being used
in rev 1.61 of pmap.c.
- Now that pmap_page_is_free() is empty and since it is just a hack for
  the Alpha pmap, remove it.
2003-04-10 18:42:06 +00:00
jake
783ae539c3 - Add vm_paddr_t, a physical address type. This is required for systems
where physical addresses larger than virtual addresses, such as i386s
  with PAE.
- Use this to represent physical addresses in the MI vm system and in the
  i386 pmap code.  This also changes the paddr parameter to d_mmap_t.
- Fix printf formats to handle physical addresses >4G in the i386 memory
  detection code, and due to kvtop returning vm_paddr_t instead of u_long.

Note that this is a name change only; vm_paddr_t is still the same as
vm_offset_t on all currently supported platforms.

Sponsored by:	DARPA, Network Associates Laboratories
Discussed with:	re, phk (cdevsw change)
2003-03-25 00:07:06 +00:00
mux
aeac636bfd Remove an empty comment. 2003-03-19 00:34:43 +00:00
jake
17dd501c20 Subtract the memory that backs the vm_page structures from phys_avail
after mapping it.  This makes it possible to determine if a physical
page has a backing vm_page or not.
2003-03-17 03:16:00 +00:00
alc
1d9375957c Teach vm_page_sleep_if_busy() to release the vm_object lock before sleeping. 2003-03-01 19:16:32 +00:00
alc
92c44b2895 In vm_page_dirty(), assert that the page is not in the free queue(s). 2003-02-24 17:30:45 +00:00
alc
cbb6b35209 - Convert the tsleep()s in vm_wait() and vm_waitpfault() to msleep()s
with the page queue lock.
 - Assert that the page queue lock is held in vm_page_free_wakeup().
2003-02-01 21:18:16 +00:00
alc
43a8628b32 - Hold the page queues lock around vm_page_hold().
- Assert that the page queues lock rather than Giant is held in
   vm_page_hold().
2003-01-20 09:24:03 +00:00
alc
c7ca47fcc7 - Update vm_pageout_deficit using atomic operations. It's a simple
counter outside the scope of existing locks.
 - Eliminate a redundant clearing of vm_pageout_deficit.
2003-01-14 06:57:03 +00:00
alc
fe540b81bb Make vm_page_alloc() return PG_ZERO only if VM_ALLOC_ZERO is specified.
The objective being to eliminate some cases of page queues locking.
(See, for example, vm/vm_fault.c revision 1.160.)

Reviewed by:	tegge

(Also, pointed out by tegge that I changed vm_fault.c before changing
vm_page.c.  Oops.)
2003-01-12 23:32:46 +00:00
alc
d833c390ba In vm_page_alloc(), fuse two if statements that are conditioned on the same
expression.
2003-01-11 20:07:17 +00:00
alc
03f7564d9e In vm_page_alloc(), honor VM_ALLOC_ZERO for system and interrupt class
requests when the number of free pages is below the reserved threshold.
Previously, VM_ALLOC_ZERO was only honored when the number of free pages
was above the reserved threshold.  Honoring it in all cases generally
makes sense, does no harm, and simplifies the code.
2003-01-08 19:58:42 +00:00
alc
e9e799ad84 Use atomic add and subtract to update the global wired page count,
cnt.v_wire_count.
2003-01-05 01:31:45 +00:00
alc
d5b9d0271f Refine the assertions in vm_page_alloc(). 2003-01-04 19:07:13 +00:00
alc
0912d72a88 Update the assertions in vm_page_insert() and vm_page_lookup() to reflect
locking of the kmem_object.
2003-01-01 19:45:36 +00:00
alc
3f894298eb Reduce the number of times that we acquire and release the page queues
lock by making vm_page_rename()'s caller, rather than vm_page_rename(),
responsible for acquiring it.
2002-12-29 07:17:06 +00:00
alc
e053499669 Assert that the page queues lock rather than Giant is held in
vm_page_flag_clear().
2002-12-28 22:49:37 +00:00
alc
6bff4b9a47 - Remove vm_page_sleep_busy(). The transition to vm_page_sleep_if_busy(),
which incorporates page queue and field locking, is complete.
 - Assert that the page queue lock rather than Giant is held in
   vm_page_flag_set().
2002-12-19 07:23:46 +00:00
alc
339c5b2dab Assert that the page queues lock is held in vm_page_unhold(),
vm_page_remove(), and vm_page_free_toq().
2002-12-15 00:06:02 +00:00
alc
cd298c7231 Hold the page queues/flags lock when calling vm_page_set_validclean().
Approved by:	re
2002-11-23 19:10:31 +00:00
alc
631af658fa Remove vm_page_protect(). Instead, use pmap_page_protect() directly. 2002-11-18 04:05:22 +00:00
alc
5e336b1d19 Now that pmap_remove_all() is exported by our pmap implementations
use it directly.
2002-11-16 07:44:25 +00:00
alc
fc8a5bc419 When prot is VM_PROT_NONE, call pmap_page_protect() directly rather than
indirectly through vm_page_protect().  The one remaining page flag that
is updated by vm_page_protect() is already being updated by our various
pmap implementations.

Note: A later commit will similarly change the VM_PROT_READ case and
eliminate vm_page_protect().
2002-11-10 07:12:04 +00:00
alc
5f7be03df2 In vm_page_remove(), avoid calling vm_page_splay() if the object's memq
is empty.
2002-11-09 08:27:42 +00:00
alc
a4cefee670 Export the function vm_page_splay(). 2002-11-04 19:21:39 +00:00
alc
50217c83e5 - Remove the memory allocation for the object/offset hash table
because it's no longer used.  (See revision 1.215.)
 - Fix a harmless bug: the number of vm_page structures allocated wasn't
   properly adjusted when uma_bootstrap() was introduced.  Consequently,
   we were allocating 30 unused vm_page structures.
 - Wrap a long line.
2002-11-03 22:20:42 +00:00
alc
039aae167e Remove the vm page buckets mutex. As of revision 1.215 of vm/vm_page.c,
it is unused.
2002-11-02 22:39:30 +00:00
jeff
827f7f802d - Add a new flag to vm_page_alloc, VM_ALLOC_NOOBJ. This tells
vm_page_alloc not to insert this page into an object.  The pindex is
   still used for colorization.
 - Rework vm_page_select_* to accept a color instead of an object and
   pindex to work with VM_PAGE_NOOBJ.
 - Document other VM_ALLOC_ flags.

Reviewed by:	peter, jake
2002-11-01 00:59:03 +00:00
alc
8bb7766bd6 o Reinline vm_page_undirty(), reducing the kernel size. (This reverts
a part of vm_page.h revision 1.87 and vm_page.c revision 1.167.)
2002-10-20 19:57:55 +00:00
alc
22918c79b0 Complete the page queues locking needed for the page-based copy-
on-write (COW) mechanism.  (This mechanism is used by the zero-copy
TCP/IP implementation.)
 - Extend the scope of the page queues lock in vm_fault()
   to cover vm_page_cowfault().
 - Modify vm_page_cowfault() to release the page queues lock
   if it sleeps.
2002-10-19 18:34:39 +00:00
dillon
277583f7f8 Replace the vm_page hash table with a per-vmobject splay tree. There should
be no major change in performance from this change at this time but this
will allow other work to progress:  Giant lock removal around VM system
in favor of per-object mutexes, ranged fsyncs, more optimal COMMIT rpc's for
NFS, partial filesystem syncs by the syncer, more optimal object flushing,
etc.  Note that the buffer cache is already using a similar splay tree
mechanism.

Note that a good chunk of the old hash table code is still in the tree.
Alan or I will remove it prior to the release if the new code does not
introduce unsolvable bugs, else we can revert more easily.

Submitted by:	alc	(this is Alan's code)
Approved by:	re
2002-10-18 17:24:30 +00:00
alc
0f13b0caca o Synchronize updates to struct vm_page::cow with the page queues lock. 2002-09-02 04:04:12 +00:00
alc
cdcc7b3446 o Retire vm_page_zero_fill() and vm_page_zero_fill_area(). Ever since
pmap_zero_page() and pmap_zero_page_area() were modified to accept
   a struct vm_page * instead of a physical address, vm_page_zero_fill()
   and vm_page_zero_fill_area() have served no purpose.
2002-08-25 00:22:31 +00:00
alc
ef9bd6a7a5 o Assert that the page queues lock is held in vm_page_activate(). 2002-08-11 00:21:40 +00:00
alc
5b6c90a737 o Remove the setting and clearing of the PG_MAPPED flag. (This flag is
obsolete.)
2002-08-10 07:11:16 +00:00
alc
170fb8b34d o Use pmap_page_is_mapped() in vm_page_protect() rather than the PG_MAPPED
flag.  (This is the only place in the entire kernel where the PG_MAPPED
   flag is tested.  It will be removed soon.)
2002-08-08 19:12:36 +00:00
alc
93773deb7b o Acquire the page queues lock before checking the page's busy status
in vm_page_grab().  Also, replace the nearby tsleep() with an msleep()
   on the page queues lock.
2002-08-04 19:05:20 +00:00
jeff
02517b6731 - Replace v_flag with v_iflag and v_vflag
- v_vflag is protected by the vnode lock and is used when synchronization
   with VOP calls is needed.
 - v_iflag is protected by interlock and is used for dealing with vnode
   management issues.  These flags include X/O LOCK, FREE, DOOMED, etc.
 - All accesses to v_iflag and v_vflag have either been locked or marked with
   mp_fixme's.
 - Many ASSERT_VOP_LOCKED calls have been added where the locking was not
   clear.
 - Many functions in vfs_subr.c were restructured to provide for stronger
   locking.

Idea stolen from:	BSD/OS
2002-08-04 10:29:36 +00:00
alc
2fc518e7d0 o Remove the setting of PG_MAPPED from vm_page_wire() and
vm_page_alloc(VM_ALLOC_WIRED).
2002-08-03 01:29:52 +00:00
alc
69425db61b o Lock page queue accesses in nwfs and smbfs.
o Assert that the page queues lock is held in vm_page_deactivate().
2002-08-02 05:23:58 +00:00
alc
88e0310b1e o Acquire the page queues lock before calling vm_page_io_finish().
o Assert that the page queues lock is held in vm_page_io_finish().
2002-08-01 17:57:42 +00:00
alc
b53d53a590 o Lock page accesses by vm_page_io_start() with the page queues lock.
o Assert that the page queues lock is held in vm_page_io_start().
2002-07-31 07:27:08 +00:00
alc
478c0ba990 o Introduce vm_page_sleep_if_busy() as an eventual replacement for
vm_page_sleep_busy().  vm_page_sleep_if_busy() uses the page
   queues lock.
2002-07-29 19:41:22 +00:00
alc
d0cb048dcb o Modify vm_page_grab() to accept VM_ALLOC_WIRED. 2002-07-28 23:46:19 +00:00
alc
b5578712f7 o Lock page queue accesses by vm_page_dontneed().
o Assert that the page queue lock is held in vm_page_dontneed().
2002-07-23 04:39:48 +00:00
alc
d9f77c5b73 o Lock page queue accesses by vm_page_try_to_cache(). (The accesses
in kern/vfs_bio.c are already locked.)
 o Assert that the page queues lock is held in vm_page_try_to_cache().
2002-07-20 20:58:46 +00:00
alc
4de6bb3a4d o Assert that the page queues lock is held in vm_page_try_to_free(). 2002-07-20 20:12:57 +00:00
alc
a01b1feba6 o Lock page queue accesses by vm_page_cache() in vm_fault() and
vm_pageout_scan().  (The others are already locked.)
 o Assert that the page queues lock is held in vm_page_cache().
2002-07-20 19:34:21 +00:00
alc
083a6fe2b0 o Duplicate an odd side-effect of vm_page_wire() in vm_page_allocate()
when VM_ALLOC_WIRED is specified: set the PG_MAPPED bit in flags.
 o In both vm_page_wire() and vm_page_allocate() add a comment saying
   that setting PG_MAPPED does not belong there.
2002-07-19 03:33:04 +00:00
alc
bf14f2641b o Introduce an argument, VM_ALLOC_WIRED, that requests vm_page_alloc()
to return a wired page.
 o Use VM_ALLOC_WIRED within Alpha's pmap_growkernel().  Also, because
   Alpha's pmap_growkernel() calls vm_page_alloc() from within a critical
   section, specify VM_ALLOC_INTERRUPT instead of VM_ALLOC_SYSTEM.  (Only
   VM_ALLOC_INTERRUPT is implemented entirely with a spin mutex.)
 o Assert that the page queues mutex is held in vm_page_wire()
   on Alpha, just like the other platforms.
2002-07-18 04:08:10 +00:00
alc
ec8a106e8a o Lock page queue accesses by vm_page_wire() that aren't
within a critical section.
 o Assert that the page queues lock is held in vm_page_wire()
   unless an Alpha.
2002-07-14 23:51:55 +00:00
alc
5258ed77bb o Lock page queue accesses by vm_page_unmanage().
o Assert that the page queues lock is held in vm_page_unmanage().
2002-07-13 23:55:30 +00:00
alc
828e129a10 o Complete the locking of page queue accesses by vm_page_unwire().
o Assert that the page queues lock is held in vm_page_unwire().
 o Make vm_page_lock_queues() and vm_page_unlock_queues() visible
   to kernel loadable modules.
2002-07-13 20:55:21 +00:00
gallatin
a1719914d4 Remove bogus vm_page_wakeup() in vm_page_cowfault() that will cause panics
in the zero-copy send path if a process attempts to write to a page
which is still in flight.

reviewed by: ken
2002-07-05 23:33:27 +00:00
alc
29ede8a9d4 o Resurrect vm_page_lock_queues(), vm_page_unlock_queues(), and the free
queue lock (revision 1.33 of vm/vm_page.c removed them).
 o Make the free queue lock a spin lock because it's sometimes acquired
   inside of a critical section.
2002-07-04 22:07:37 +00:00
ken
0d3a835f3f At long last, commit the zero copy sockets code.
MAKEDEV:	Add MAKEDEV glue for the ti(4) device nodes.

ti.4:		Update the ti(4) man page to include information on the
		TI_JUMBO_HDRSPLIT and TI_PRIVATE_JUMBOS kernel options,
		and also include information about the new character
		device interface and the associated ioctls.

man9/Makefile:	Add jumbo.9 and zero_copy.9 man pages and associated
		links.

jumbo.9:	New man page describing the jumbo buffer allocator
		interface and operation.

zero_copy.9:	New man page describing the general characteristics of
		the zero copy send and receive code, and what an
		application author should do to take advantage of the
		zero copy functionality.

NOTES:		Add entries for ZERO_COPY_SOCKETS, TI_PRIVATE_JUMBOS,
		TI_JUMBO_HDRSPLIT, MSIZE, and MCLSHIFT.

conf/files:	Add uipc_jumbo.c and uipc_cow.c.

conf/options:	Add the 5 options mentioned above.

kern_subr.c:	Receive side zero copy implementation.  This takes
		"disposable" pages attached to an mbuf, gives them to
		a user process, and then recycles the user's page.
		This is only active when ZERO_COPY_SOCKETS is turned on
		and the kern.ipc.zero_copy.receive sysctl variable is
		set to 1.

uipc_cow.c:	Send side zero copy functions.  Takes a page written
		by the user and maps it copy on write and assigns it
		kernel virtual address space.  Removes copy on write
		mapping once the buffer has been freed by the network
		stack.

uipc_jumbo.c:	Jumbo disposable page allocator code.  This allocates
		(optionally) disposable pages for network drivers that
		want to give the user the option of doing zero copy
		receive.

uipc_socket.c:	Add kern.ipc.zero_copy.{send,receive} sysctls that are
		enabled if ZERO_COPY_SOCKETS is turned on.

		Add zero copy send support to sosend() -- pages get
		mapped into the kernel instead of getting copied if
		they meet size and alignment restrictions.

uipc_syscalls.c:Un-staticize some of the sf* functions so that they
		can be used elsewhere.  (uipc_cow.c)

if_media.c:	In the SIOCGIFMEDIA ioctl in ifmedia_ioctl(), avoid
		calling malloc() with M_WAITOK.  Return an error if
		the M_NOWAIT malloc fails.

		The ti(4) driver and the wi(4) driver, at least, call
		this with a mutex held.  This causes witness warnings
		for 'ifconfig -a' with a wi(4) or ti(4) board in the
		system.  (I've only verified for ti(4)).

ip_output.c:	Fragment large datagrams so that each segment contains
		a multiple of PAGE_SIZE amount of data plus headers.
		This allows the receiver to potentially do page
		flipping on receives.

if_ti.c:	Add zero copy receive support to the ti(4) driver.  If
		TI_PRIVATE_JUMBOS is not defined, it now uses the
		jumbo(9) buffer allocator for jumbo receive buffers.

		Add a new character device interface for the ti(4)
		driver for the new debugging interface.  This allows
		(a patched version of) gdb to talk to the Tigon board
		and debug the firmware.  There are also a few additional
		debugging ioctls available through this interface.

		Add header splitting support to the ti(4) driver.

		Tweak some of the default interrupt coalescing
		parameters to more useful defaults.

		Add hooks for supporting transmit flow control, but
		leave it turned off with a comment describing why it
		is turned off.

if_tireg.h:	Change the firmware rev to 12.4.11, since we're really
		at 12.4.11 plus fixes from 12.4.13.

		Add defines needed for debugging.

		Remove the ti_stats structure, it is now defined in
		sys/tiio.h.

ti_fw.h:	12.4.11 firmware.

ti_fw2.h:	12.4.11 firmware, plus selected fixes from 12.4.13,
		and my header splitting patches.  Revision 12.4.13
		doesn't handle 10/100 negotiation properly.  (This
		firmware is the same as what was in the tree previously,
		with the addition of header splitting support.)

sys/jumbo.h:	Jumbo buffer allocator interface.

sys/mbuf.h:	Add a new external mbuf type, EXT_DISPOSABLE, to
		indicate that the payload buffer can be thrown away /
		flipped to a userland process.

socketvar.h:	Add prototype for socow_setup.

tiio.h:		ioctl interface to the character portion of the ti(4)
		driver, plus associated structure/type definitions.

uio.h:		Change prototype for uiomoveco() so that we'll know
		whether the source page is disposable.

ufs_readwrite.c:Update for new prototype of uiomoveco().

vm_fault.c:	In vm_fault(), check to see whether we need to do a page
		based copy on write fault.

vm_object.c:	Add a new function, vm_object_allocate_wait().  This
		does the same thing that vm_object allocate does, except
		that it gives the caller the opportunity to specify whether
		it should wait on the uma_zalloc() of the object structre.

		This allows vm objects to be allocated while holding a
		mutex.  (Without generating WITNESS warnings.)

		vm_object_allocate() is implemented as a call to
		vm_object_allocate_wait() with the malloc flag set to
		M_WAITOK.

vm_object.h:	Add prototype for vm_object_allocate_wait().

vm_page.c:	Add page-based copy on write setup, clear and fault
		routines.

vm_page.h:	Add page based COW function prototypes and variable in
		the vm_page structure.

Many thanks to Drew Gallatin, who wrote the zero copy send and receive
code, and to all the other folks who have tested and reviewed this code
over the years.
2002-06-26 03:37:47 +00:00
jeff
7083248242 Turn VM_ALLOC_ZERO into a flag.
Submitted by:	tegge
Reviewed by:	dillon
2002-06-25 22:01:12 +00:00
alc
0e84366ae7 o Convert the vm_page buckets mutex to a spin lock. (This resolves
an issue on the Alpha platform found by jeff@.)
 o Simplify vm_page_lookup().

Reviewed by:	jhb
2002-04-30 21:24:47 +00:00
peter
0feba1c376 We do not necessarily need to map/unmap pages to zero parts of them.
On systems where physical memory is also direct mapped (alpha, sparc,
ia64 etc) this is slightly harmful.
2002-04-28 00:15:48 +00:00
alc
796c529836 o Control access to the vm_page_buckets with a mutex.
o Fix some style(9) bugs.
2002-04-26 22:44:15 +00:00
peter
3d8c7d4cab Pass vm_page_t instead of physical addresses to pmap_zero_page[_area]()
and pmap_copy_page().  This gets rid of a couple more physical addresses
in upper layers, with the eventual aim of supporting PAE and dealing with
the physical addressing mostly within pmap.  (We will need either 64 bit
physical addresses or page indexes, possibly both depending on the
circumstances.  Leaving this to pmap itself gives more flexibilitly.)

Reviewed by:	jake
Tested on:	i386, ia64 and (I believe) sparc64. (my alpha was hosed)
2002-04-15 16:00:03 +00:00
jhb
db9aa81e23 Change callers of mtx_init() to pass in an appropriate lock type name. In
most cases NULL is passed, but in some cases such as network driver locks
(which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used.

Tested on:	i386, alpha, sparc64
2002-04-04 21:03:38 +00:00
jake
8d96cdfea3 Fix a long standing 32bit-ism. Don't assume that the size of a chunk of
memory in phys_avail will fit in 'int', use vm_size_t.  This fixes booting
on sparc64 machines with more than 2 gigs of ram.

Thanks to Jan Chrillesen for providing me with access to a 4 gig machine.
2002-04-03 06:57:52 +00:00
jeff
2923687da3 This is the first part of the new kernel memory allocator. This replaces
malloc(9) and vm_zone with a slab like allocator.

Reviewed by:	arch@
2002-03-19 09:11:49 +00:00
eivind
0799ec54b1 - Remove a number of extra newlines that do not belong here according to
style(9)
- Minor space adjustment in cases where we have "( ", " )", if(), return(),
  while(), for(), etc.
- Add /* SYMBOL */ after a few #endifs.

Reviewed by:	alc
2002-03-10 21:52:48 +00:00
alc
42c47fb891 o Create vm_pageq_enqueue() to encapsulate code that is duplicated time
and again in vm_page.c and vm_pageq.c.
 o Delete unusused prototypes.  (Mainly a result of the earlier renaming
   of various functions from vm_page_*() to vm_pageq_*().)
2002-03-04 18:55:26 +00:00
tegge
5f4060fe3b Add a page queue, PQ_HOLD, that temporarily owns pages with nonzero hold
count that would otherwise be on one of the free queues.  This eliminates a
panic when broken programs unmap memory that still has pending IO from raw
devices.

Reviewed by:	dillon, alc
2002-02-19 23:19:30 +00:00
silby
5e0801a04b Add one more comment to the OOM changes so that future readers of
the code may better understand the code.

Suggested by:	dillon
MFC after:	1 week
2002-02-19 18:50:49 +00:00
silby
d2e8b2531b Changes to make the OOM killer much more effective:
- Allow the OOM killer to target processes currently locked in
  memory.  These very often are the ones doing the memory hogging.
- Drop the wakeup priority of processes currently sleeping while
  waiting for their page fault to complete.  In order for the OOM
  killer to work well, the killed process and other system processes
  waiting on memory must be allowed to wakeup first.

Reviewed by:	dillon
MFC after:	1 week
2002-02-19 18:34:02 +00:00
dillon
cd4d323ad3 This fixes a large number of bugs in our NFS client side code. A recent
commit by Kirk also fixed a softupdates bug that could easily be triggered
by server side NFS.

	* An edge case with shared R+W mmap()'s and truncate whereby
	  the system would inappropriately clear the dirty bits on
	  still-dirty data.  (applicable to all filesystems)

	  THIS FIX TEMPORARILY DISABLED PENDING FURTHER TESTING.
	  see vm/vm_page.c line 1641

	* The straddle case for VM pages and buffer cache buffers when
	  truncating.  (applicable to NFS client side)

	* Possible SMP database corruption due to vm_pager_unmap_page()
	  not clearing the TLB for the other cpu's.  (applicable to NFS
	  client side but could effect all filesystems).  Note: not
	  considered serious since the corruption occurs beyond the file
	  EOF.

	* When flusing a dirty buffer due to B_CACHE getting cleared,
	  we were accidently setting B_CACHE again (that is, bwrite() sets
	  B_CACHE), when we really want it to stay clear after the write
	  is complete.  This resulted in a corrupt buffer.  (applicable
	  to all filesystems but probably only triggered by NFS)

	* We have to call vtruncbuf() when ftruncate()ing to remove
	  any buffer cache buffers.  This is still tentitive, I may
	  be able to remove it due to the second bug fix.  (applicable
	  to NFS client side)

	* vnode_pager_setsize() race against nfs_vinvalbuf()... we have
	  to set n_size before calling nfs_vinvalbuf or the NFS code
	  may recursively vnode_pager_setsize() to the original value
	  before the truncate.  This is what was causing the user mmap
	  bus faults in the nfs tester program.  (applicable to NFS
	  client side)

	* Fix to softupdates (see ufs/ffs/ffs_inode.c 1.73, commit made
	  by Kirk).

Testing program written by: Avadis Tevanian, Jr.
Testing program supplied by: jkh / Apple (see Dec2001 posting to freebsd-hackers with Subject 'NFS: How to make FreeBS fall on its face in one easy step')
MFC after:	1 week
2001-12-14 01:16:57 +00:00
dillon
f883ef447a Implement kern.maxvnodes. adjusting kern.maxvnodes now actually has a
real effect.

Optimize vfs_msync().  Avoid having to continually drop and re-obtain
mutexes when scanning the vnode list.  Improves looping case by 500%.

Optimize ffs_sync().  Avoid having to continually drop and re-obtain
mutexes when scanning the vnode list.  This makes a couple of assumptions,
which I believe are ok, in regards to vnode stability when the mount list
mutex is held.  Improves looping case by 500%.

(more optimization work is needed on top of these fixes)

MFC after:	1 week
2001-10-26 00:08:05 +00:00
peter
e0dbc46fb0 Implement idle zeroing of pages. I've been tinkering with this
on and off since John Dyson left his work-in-progress.

It is off by default for now.  sysctl vm.zeroidle_enable=1 to turn it on.

There are some hacks here to deal with the present lack of preemption - we
yield after doing a small number of pages since we wont preempt otherwise.

This is basically Matt's algorithm [with hysteresis] with an idle process
to call it in a similar way it used to be called from the idle loop.

I cleaned up the includes a fair bit here too.
2001-08-25 05:00:44 +00:00
dillon
a11f076b5c KASSERT if vm_page_t->wire_count overflows. 2001-08-22 04:01:56 +00:00
jhb
2ff1c253cd - Remove asleep(), await(), and M_ASLEEP.
- Callers of asleep() and await() have been converted to calling tsleep().
  The only caller outside of M_ASLEEP was the ata driver, which called both
  asleep() and await() with spl-raised, so there was no need for the
  asleep() and await() pair.  M_ASLEEP was unused.

Reviewed by:	jasone, peter
2001-08-10 06:37:05 +00:00
jake
b4050e8494 Oops. Last commit to vm_object.c should have got these files too.
Remove the use of atomic ops to manipulate vm_object and vm_page flags.
Giant is required here, so they are superfluous.

Discussed with:	dillon
2001-07-31 04:09:52 +00:00
assar
20642ac3f7 make vm_page_select_cache static
Requested by:	bde
2001-07-23 12:34:31 +00:00