Commit Graph

307 Commits

Author SHA1 Message Date
John Baldwin
013818111a Add a new type of VM object: OBJT_SG. An OBJT_SG object is very similar to
a device pager (OBJT_DEVICE) object in that it uses fictitious pages to
provide aliases to other memory addresses.  The primary difference is that
it uses an sglist(9) to determine the physical addresses for a given offset
into the object instead of invoking the d_mmap() method in a device driver.

Reviewed by:	alc
Approved by:	re (kensmith)
MFC after:	2 weeks
2009-07-24 13:50:29 +00:00
Alan Cox
26f4eea53f The bits set in a page's dirty mask are a subset of the bits set in its
valid mask.  Consequently, there is no need to perform a bit-wise and of
the page's dirty and valid masks in order to determine which parts of a
page are dirty and valid.

Eliminate an unnecessary #include.
2009-06-24 04:45:03 +00:00
Alan Cox
b78ddb0b8a Revise vm_pageout_scan()'s handling of partially dirty pages. Specifically,
rather than unconditionally making partially dirty pages fully dirty, only
make partially dirty pages fully dirty if the pmap says that the page has
been modified.

(This change is also a small optimization.  It eliminate an unnecessary call
to pmap_is_modified() on pages that are mapped read only.)

Suggested by:	tegge
2009-05-28 06:52:14 +00:00
Alan Cox
47916d0c37 Eliminate a pointless call to pmap_clear_reference() from vm_pageout_scan().
If the page belongs to an object with a reference count of zero, then it
can't have any managed mappings on which to clear a reference bit.
2009-05-17 20:40:41 +00:00
Konstantin Belousov
7981aa2431 Use the acquired reference to the vmspace instead of direct dereferencing
of p->p_vmspace in a place where it was missed in r191277.

Noted by:  pluknet gmail com
2009-04-28 11:45:36 +00:00
Konstantin Belousov
6bed074cd2 In both pageout oom handler and vm_daemon, acquire the reference to
the vmspace of the examined process instead of directly accessing its
vmspace, that may change. Also, as an optimization, check for P_INEXEC
flag before examining the process.

Reported and tested by:	pho (previous version)
Reviewed by:	alc
MFC after:	3 week
2009-04-19 20:53:47 +00:00
Alan Cox
f4b0c119c0 Calling pmap_clear_modify() after calling pmap_remove_write() is pointless.
The latter function already clears the modified status from each of the
page's mappings.
2009-04-19 07:18:08 +00:00
Konstantin Belousov
6129343d5d Instead of forcing vn_start_write() to reset mp back to NULL for the
failed calls with non-NULL vp, explicitely clear mp after failure.

Tested by:	stass
Reviewed by:	tegge
PR:		123768
MFC after:	1 week
2008-11-16 21:57:54 +00:00
Konstantin Belousov
2025d69ba7 Move the code for doing out-of-memory grass from vm_pageout_scan()
into the separate function vm_pageout_oom(). Supply a parameter for
vm_pageout_oom() describing a reason for the call.

Call vm_pageout_oom() from the swp_pager_meta_build() when swap zone
is exhausted.

Reviewed by:	alc
Tested by:	pho, jhb
MFC after:	2 weeks
2008-09-29 19:45:12 +00:00
Alan Cox
8d28bf04e2 Prevent an integer overflow in vm_pageout_page_stats() on machines with a
large number of physical pages.

PR:		126158
Submitted by:	Dmitry Tejblum
MFC after:	3 days
2008-09-21 18:01:34 +00:00
Tom Rhodes
6bd9cb1c81 Fill in a few sysctl descriptions.
Reviewed by:	alc, Matt Dillon <dillon@apollo.backplane.com>
Approved by:	alc
2008-08-03 14:26:15 +00:00
Alan Cox
e5b006ffca Rename vm_pageq_requeue() to vm_page_requeue() on account of its recent
migration to vm/vm_page.c.
2008-03-19 20:24:35 +00:00
Jeff Roberson
374ae2a393 - Relax requirements for p_numthreads, p_threads, p_swtick, and p_nice from
requiring the per-process spinlock to only requiring the process lock.
 - Reflect these changes in the proc.h documentation and consumers throughout
   the kernel.  This is a substantial reduction in locking cost for these
   fields and was made possible by recent changes to threading support.
2008-03-19 06:19:01 +00:00
Robert Watson
237fdd787b In keeping with style(9)'s recommendations on macros, use a ';'
after each SYSINIT() macro invocation.  This makes a number of
lightweight C parsers much happier with the FreeBSD kernel
source, including cflow's prcc and lxr.

MFC after:	1 month
Discussed with:	imp, rink
2008-03-16 10:58:09 +00:00
Alan Cox
da31e3aa04 Make contigmalloc(9)'s page laundering more robust. Specifically, use
vm_pageout_fallback_object_lock() in vm_contig_launder_page() to better
handle a lock-ordering problem.  Consequently, trylock's failure on the
page's containing object no longer implies that the page cannot be
laundered.

MFC after: 6 weeks
2007-11-25 20:37:29 +00:00
Alan Cox
5dfc28704d Add a read/write sysctl for reconfiguring the maximum number of physical
pages that can be wired.

Submitted by:	Eugene Grosbein
PR:		114654
MFC after:	6 weeks
2007-11-23 00:30:19 +00:00
Alan Cox
7bfda801a8 Change the management of cached pages (PQ_CACHE) in two fundamental
ways:

(1) Cached pages are no longer kept in the object's resident page
splay tree and memq.  Instead, they are kept in a separate per-object
splay tree of cached pages.  However, access to this new per-object
splay tree is synchronized by the _free_ page queues lock, not to be
confused with the heavily contended page queues lock.  Consequently, a
cached page can be reclaimed by vm_page_alloc(9) without acquiring the
object's lock or the page queues lock.

This solves a problem independently reported by tegge@ and Isilon.
Specifically, they observed the page daemon consuming a great deal of
CPU time because of pages bouncing back and forth between the cache
queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE).  The source of
this problem turned out to be a deadlock avoidance strategy employed
when selecting a cached page to reclaim in vm_page_select_cache().
However, the root cause was really that reclaiming a cached page
required the acquisition of an object lock while the page queues lock
was already held.  Thus, this change addresses the problem at its
root, by eliminating the need to acquire the object's lock.

Moreover, keeping cached pages in the object's primary splay tree and
memq was, in effect, optimizing for the uncommon case.  Cached pages
are reclaimed far, far more often than they are reactivated.  Instead,
this change makes reclamation cheaper, especially in terms of
synchronization overhead, and reactivation more expensive, because
reactivated pages will have to be reentered into the object's primary
splay tree and memq.

(2) Cached pages are now stored alongside free pages in the physical
memory allocator's buddy queues, increasing the likelihood that large
allocations of contiguous physical memory (i.e., superpages) will
succeed.

Finally, as a result of this change long-standing restrictions on when
and where a cached page can be reclaimed and returned by
vm_page_alloc(9) are eliminated.  Specifically, calls to
vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and
return a formerly cached page.  Consequently, a call to malloc(9)
specifying M_NOWAIT is less likely to fail.

Discussed with: many over the course of the summer, including jeff@,
   Justin Husted @ Isilon, peter@, tegge@
Tested by: an earlier version by kris@
Approved by: re (kensmith)
2007-09-25 06:25:06 +00:00
Jeff Roberson
b61ce5b0e6 - Move all of the PS_ flags into either p_flag or td_flags.
- p_sflag was mostly protected by PROC_LOCK rather than the PROC_SLOCK or
   previously the sched_lock.  These bugs have existed for some time.
 - Allow swapout to try each thread in a process individually and then
   swapin the whole process if any of these fail.  This allows us to move
   most scheduler related swap flags into td_flags.
 - Keep ki_sflag for backwards compat but change all in source tools to
   use the new and more correct location of P_INMEM.

Reported by:	pho
Reviewed by:	attilio, kib
Approved by:	re (kensmith)
2007-09-17 05:31:39 +00:00
Alan Cox
4cd457233b Correct an assertion in vm_pageout_flush(). Specifically, if a page's
status after vm_pager_put_pages() is VM_PAGER_PEND, then it could have
already been recycled, i.e., freed and reallocated to a new purpose;
thus, asserting that such pages cannot be written is inappropriate.

Reported by: kris
Submitted by: tegge
Approved by: re (kensmith)
MFC after: 1 week
2007-09-15 18:30:28 +00:00
Alan Cox
14137dc045 In the previous revision, when I replaced the unconditional acquisition
of Giant in vm_pageout_scan() with VFS_LOCK_GIANT(), I had to eliminate
the acquisition of the vnode interlock before releasing the vm object's
lock because the vnode interlock cannot be held when VFS_LOCK_GIANT() is
performed.  Unfortunately, this allows the vnode to be recycled between
the release of the vm object's lock and the vget() on the vnode.

In this revision, I prevent the vnode from being recycled by acquiring
another reference to the vm object and underlying vnode before releasing
the vm object's lock.

This change also addresses another preexisting but trivial problem.  By
acquiring another reference to the vm object, I also prevent the vm
object from being recycled.  Previously, the "vnodes skipped" counter
could be wrong because if it examined a recycled vm object.

Reported by:	kib
Reviewed by:	kib
Approved by:	re (kensmith)
MFC after:	3 weeks
2007-07-02 06:56:37 +00:00
Alan Cox
97824da382 Eliminate the use of Giant from vm_daemon(). Replace the unconditional
use of Giant in vm_pageout_scan() with VFS_LOCK_GIANT().

Approved by:	re (kensmith)
MFC after:	3 weeks
2007-06-26 18:24:05 +00:00
Alan Cox
9e897b1bc6 Eliminate unnecessary checks from vm_pageout_clean(): The page that is
passed to vm_pageout_clean() cannot possibly be PG_UNMANAGED because
it came from the inactive queue and PG_UNMANAGED pages are not in any
page queue.  Moreover, PG_UNMANAGED pages only exist in OBJT_PHYS
objects, and all pages within a OBJT_PHYS object are PG_UNMANAGED.
So, if the page that is passed to vm_pageout_clean() is not
PG_UNMANAGED, then it cannot be from an OBJT_PHYS object and its
neighbors from the same object cannot themselves be PG_UNMANAGED.

Reviewed by:	tegge
2007-06-18 02:04:38 +00:00
Alan Cox
2446e4f02c Enable the new physical memory allocator.
This allocator uses a binary buddy system with a twist.  First and
foremost, this allocator is required to support the implementation of
superpages.  As a side effect, it enables a more robust implementation
of contigmalloc(9).  Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).

The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages.  Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space.  The performance benefits vary.  In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.

This allocator does not implement page coloring.  The reason is that
superpages have much the same effect.  The contiguous physical memory
allocation necessary for a superpage is inherently colored.

Finally, the one caveat is that this allocator does not effectively
support prezeroed pages.  I hope this is temporary.  On i386, this is
a slight pessimization.  However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects.  I speculate
that this is true in general of machines with a direct map.

Approved by:	re
2007-06-16 04:57:06 +00:00
Alan Cox
d076fbea58 Eliminate dead code: We have not performed pageouts on the kernel object
in this millenium.
2007-06-13 06:10:10 +00:00
Attilio Rao
393a081d42 Optimize vmmeter locking.
In particular:
- Add an explicative table for locking of struct vmmeter members
- Apply new rules for some of those members
- Remove some unuseful comments

Heavily reviewed by: alc, bde, jeff
Approved by: jeff (mentor)
2007-06-10 21:59:14 +00:00
Jeff Roberson
982d11f836 Commit 14/14 of sched_lock decomposition.
- Use thread_lock() rather than sched_lock for per-thread scheduling
   sychronization.
 - Use the per-process spinlock rather than the sched_lock for per-process
   scheduling synchronization.

Tested by:      kris, current@
Tested on:      i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
2007-06-05 00:00:57 +00:00
Attilio Rao
b4b7081961 Do proper "locking" for missing vmmeters part.
Now, we assume no more sched_lock protection for some of them and use the
distribuited loads method for vmmeter (distribuited through CPUs).

Reviewed by: alc, bde
Approved by: jeff (mentor)
2007-06-04 21:45:18 +00:00
Attilio Rao
2feb50bf7d Revert VMCNT_* operations introduction.
Probabilly, a general approach is not the better solution here, so we should
solve the sched_lock protection problems separately.

Requested by: alc
Approved by: jeff (mentor)
2007-05-31 22:52:15 +00:00
Jeff Roberson
222d01951f - define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating
vmcnts.  This can be used to abstract away pcpu details but also changes
   to use atomics for all counters now.  This means sched lock is no longer
   responsible for protecting counts in the switch routines.

Contributed by:		Attilio Rao <attilio@FreeBSD.org>
2007-05-18 07:10:50 +00:00
Alan Cox
e9f995d824 Change the pagedaemon, vm_wait(), and vm_waitpfault() to sleep on the
vm page queue free mutex instead of the vm page queue mutex.
2007-02-07 06:37:30 +00:00
Xin LI
f67af5c918 Use FOREACH_PROC_IN_SYSTEM instead of using its unrolled form. 2007-01-17 15:05:52 +00:00
Alan Cox
9af80719db Replace PG_BUSY with VPO_BUSY. In other words, changes to the page's
busy flag, i.e., VPO_BUSY, are now synchronized by the per-vm object
lock instead of the global page queues lock.
2006-10-22 04:28:14 +00:00
Alan Cox
78985e424a Complete the transition from pmap_page_protect() to pmap_remove_write().
Originally, I had adopted sparc64's name, pmap_clear_write(), for the
function that is now pmap_remove_write().  However, this function is more
like pmap_remove_all() than like pmap_clear_modify() or
pmap_clear_reference(), hence, the name change.

The higher-level rationale behind this change is described in
src/sys/amd64/amd64/pmap.c revision 1.567.  The short version is that I'm
trying to clean up and fix our support for execute access.

Reviewed by: marcel@ (ia64)
2006-08-01 19:06:06 +00:00
Tor Egge
625e6c0af4 Expand scope of marker to reduce the number of page queue scan restarts. 2006-02-17 21:02:39 +00:00
Tor Egge
db27dcc0f0 Check return value from nonblocking call to vn_start_write(). 2006-02-17 18:22:19 +00:00
Alan Cox
3b7db47d7e Remove an unnecessary call to pmap_remove_all(). The given page is not
mapped because its contents are invalid.
2006-02-04 22:37:10 +00:00
Alan Cox
997e1c252b Use the new macros abstracting the page coloring/queues implementation.
(There are no functional changes.)
2006-01-27 07:28:51 +00:00
Alexander Leidinger
ef39c05baa MI changes:
- provide an interface (macros) to the page coloring part of the VM system,
   this allows to try different coloring algorithms without the need to
   touch every file [1]
 - make the page queue tuning values readable: sysctl vm.stats.pagequeue
 - autotuning of the page coloring values based upon the cache size instead
   of options in the kernel config (disabling of the page coloring as a
   kernel option is still possible)

MD changes:
 - detection of the cache size: only IA32 and AMD64 (untested) contains
   cache size detection code, every other arch just comes with a dummy
   function (this results in the use of default values like it was the
   case without the autotuning of the page coloring)
 - print some more info on Intel CPU's (like we do on AMD and Transmeta
   CPU's)

Note to AMD owners (IA32 and AMD64): please run "sysctl vm.stats.pagequeue"
and report if the cache* values are zero (= bug in the cache detection code)
or not.

Based upon work by:	Chad David <davidc@acns.ab.ca> [1]
Reviewed by:		alc, arch (in 2004)
Discussed with:		alc, Chad David, arch (in 2004)
2005-12-31 14:39:20 +00:00
Alan Cox
7a35a21e7b Reimplement the reclamation of PV entries. Specifically, perform
reclamation synchronously from get_pv_entry() instead of
asynchronously as part of the page daemon.  Additionally, limit the
reclamation to inactive pages unless allocation from the PV entry zone
or reclamation from the inactive queue fails.  Previously, reclamation
destroyed mappings to both inactive and active pages.  get_pv_entry()
still, however, wakes up the page daemon when reclamation occurs.  The
reason being that the page daemon may move some pages from the active
queue to the inactive queue, making some new pages available to future
reclamations.

Print the "reclaiming PV entries" message at most once per minute, but
don't stop printing it after the fifth time.  This way, we do not give
the impression that the problem has gone away.

Reviewed by: tegge
2005-11-09 08:19:21 +00:00
Tor Egge
8dbca793a9 Don't allow pagedaemon to skip pages while scanning PQ_ACTIVE or PQ_INACTIVE
due to the vm object being locked.

When a process writes large amounts of data to a file, the vm object associated
with that file can contain most of the physical pages on the machine.  If the
process is preempted while holding the lock on the vm object, pagedaemon would
be able to move very few pages from PQ_INACTIVE to PQ_CACHE or from PQ_ACTIVE
to PQ_INACTIVE, resulting in unlimited cleaning of dirty pages belonging to
other vm objects.

Temporarily unlock the page queues lock while locking vm objects to avoid lock
order violation.  Detect and handle relevant page queue changes.

This change depends on both the lock portion of struct vm_object and normal
struct vm_page being type stable.

Reviewed by:	alc
2005-08-10 00:17:36 +00:00
Warner Losh
60727d8b86 /* -> /*- for license, minor formatting changes 2005-01-07 02:29:27 +00:00
Alan Cox
df2e33bf42 Revise the part of vm_pageout_scan() that moves pages from the cache
queue to the free queue.  With this change, if a page from the cache
queue belongs to a locked object, it is simply skipped over rather
than moved to the inactive queue.
2005-01-06 20:22:36 +00:00
Alan Cox
34d9e6fdae During traversal of the inactive queue, try locking the page's containing
object before accessing the page's flags or the object's reference count.
2004-11-05 06:24:05 +00:00
Alan Cox
d19ef81437 The synchronization provided by vm object locking has eliminated the
need for most calls to vm_page_busy().  Specifically, most calls to
vm_page_busy() occur immediately prior to a call to vm_page_remove().
In such cases, the containing vm object is locked across both calls.
Consequently, the setting of the vm page's PG_BUSY flag is not even
visible to other threads that are following the synchronization
protocol.

This change (1) eliminates the calls to vm_page_busy() that
immediately precede a call to vm_page_remove() or functions, such as
vm_page_free() and vm_page_rename(), that call it and (2) relaxes the
requirement in vm_page_remove() that the vm page's PG_BUSY flag is
set.  Now, the vm page's PG_BUSY flag is set only when the vm object
lock is released while the vm page is still in transition.  Typically,
this is when it is undergoing I/O.
2004-11-03 20:17:31 +00:00
Alan Cox
b86e6ec007 During traversal of the active queue by vm_pageout_page_stats(), try
locking the page's containing object before accessing the page's flags.
2004-10-30 23:30:53 +00:00
Alan Cox
4b8a5c4095 Add an assignment statement that I omitted from the previous revision. 2004-10-30 07:09:46 +00:00
Alan Cox
b08abf6cc0 During traversal of the active queue, try locking the page's containing
object before accessing the page's flags or the object's reference count.
If the trylock fails, handle the page as though it is busy.
2004-10-27 18:29:17 +00:00
Alan Cox
3e36afbe27 Remove the GIANT_REQUIRED preceding pmap_remove() in
vm_pageout_map_deactivate_pages().
2004-07-18 04:38:11 +00:00
Alan Cox
3d2e54c317 Push down the acquisition and release of the page queues lock into
pmap_protect() and pmap_remove().  In general, they require the lock in
order to modify a page's pv list or flags.  In some cases, however,
pmap_protect() can avoid acquiring the lock.
2004-07-15 18:00:43 +00:00
Alan Cox
26354d4c08 Remove an unused and unimplemented sysctl. (For the record, it was marked
as unimplemented in revision 1.129 nearly six years ago.)
2004-07-12 17:45:37 +00:00