Commit Graph

304 Commits

Author SHA1 Message Date
alc
5ae8238125 Eliminate a pointless call to pmap_clear_reference() from vm_pageout_scan().
If the page belongs to an object with a reference count of zero, then it
can't have any managed mappings on which to clear a reference bit.
2009-05-17 20:40:41 +00:00
kib
83725d69bc Use the acquired reference to the vmspace instead of direct dereferencing
of p->p_vmspace in a place where it was missed in r191277.

Noted by:  pluknet gmail com
2009-04-28 11:45:36 +00:00
kib
e215ab3b02 In both pageout oom handler and vm_daemon, acquire the reference to
the vmspace of the examined process instead of directly accessing its
vmspace, that may change. Also, as an optimization, check for P_INEXEC
flag before examining the process.

Reported and tested by:	pho (previous version)
Reviewed by:	alc
MFC after:	3 week
2009-04-19 20:53:47 +00:00
alc
82fe2fe125 Calling pmap_clear_modify() after calling pmap_remove_write() is pointless.
The latter function already clears the modified status from each of the
page's mappings.
2009-04-19 07:18:08 +00:00
kib
4edc1f9451 Instead of forcing vn_start_write() to reset mp back to NULL for the
failed calls with non-NULL vp, explicitely clear mp after failure.

Tested by:	stass
Reviewed by:	tegge
PR:		123768
MFC after:	1 week
2008-11-16 21:57:54 +00:00
kib
de9e891748 Move the code for doing out-of-memory grass from vm_pageout_scan()
into the separate function vm_pageout_oom(). Supply a parameter for
vm_pageout_oom() describing a reason for the call.

Call vm_pageout_oom() from the swp_pager_meta_build() when swap zone
is exhausted.

Reviewed by:	alc
Tested by:	pho, jhb
MFC after:	2 weeks
2008-09-29 19:45:12 +00:00
alc
409b7f1ce0 Prevent an integer overflow in vm_pageout_page_stats() on machines with a
large number of physical pages.

PR:		126158
Submitted by:	Dmitry Tejblum
MFC after:	3 days
2008-09-21 18:01:34 +00:00
trhodes
f37865f7f0 Fill in a few sysctl descriptions.
Reviewed by:	alc, Matt Dillon <dillon@apollo.backplane.com>
Approved by:	alc
2008-08-03 14:26:15 +00:00
alc
caedbf233d Rename vm_pageq_requeue() to vm_page_requeue() on account of its recent
migration to vm/vm_page.c.
2008-03-19 20:24:35 +00:00
jeff
46f09d5bc3 - Relax requirements for p_numthreads, p_threads, p_swtick, and p_nice from
requiring the per-process spinlock to only requiring the process lock.
 - Reflect these changes in the proc.h documentation and consumers throughout
   the kernel.  This is a substantial reduction in locking cost for these
   fields and was made possible by recent changes to threading support.
2008-03-19 06:19:01 +00:00
rwatson
877d7c65ba In keeping with style(9)'s recommendations on macros, use a ';'
after each SYSINIT() macro invocation.  This makes a number of
lightweight C parsers much happier with the FreeBSD kernel
source, including cflow's prcc and lxr.

MFC after:	1 month
Discussed with:	imp, rink
2008-03-16 10:58:09 +00:00
alc
625e38eddc Make contigmalloc(9)'s page laundering more robust. Specifically, use
vm_pageout_fallback_object_lock() in vm_contig_launder_page() to better
handle a lock-ordering problem.  Consequently, trylock's failure on the
page's containing object no longer implies that the page cannot be
laundered.

MFC after: 6 weeks
2007-11-25 20:37:29 +00:00
alc
35af042efc Add a read/write sysctl for reconfiguring the maximum number of physical
pages that can be wired.

Submitted by:	Eugene Grosbein
PR:		114654
MFC after:	6 weeks
2007-11-23 00:30:19 +00:00
alc
d1bce06c64 Change the management of cached pages (PQ_CACHE) in two fundamental
ways:

(1) Cached pages are no longer kept in the object's resident page
splay tree and memq.  Instead, they are kept in a separate per-object
splay tree of cached pages.  However, access to this new per-object
splay tree is synchronized by the _free_ page queues lock, not to be
confused with the heavily contended page queues lock.  Consequently, a
cached page can be reclaimed by vm_page_alloc(9) without acquiring the
object's lock or the page queues lock.

This solves a problem independently reported by tegge@ and Isilon.
Specifically, they observed the page daemon consuming a great deal of
CPU time because of pages bouncing back and forth between the cache
queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE).  The source of
this problem turned out to be a deadlock avoidance strategy employed
when selecting a cached page to reclaim in vm_page_select_cache().
However, the root cause was really that reclaiming a cached page
required the acquisition of an object lock while the page queues lock
was already held.  Thus, this change addresses the problem at its
root, by eliminating the need to acquire the object's lock.

Moreover, keeping cached pages in the object's primary splay tree and
memq was, in effect, optimizing for the uncommon case.  Cached pages
are reclaimed far, far more often than they are reactivated.  Instead,
this change makes reclamation cheaper, especially in terms of
synchronization overhead, and reactivation more expensive, because
reactivated pages will have to be reentered into the object's primary
splay tree and memq.

(2) Cached pages are now stored alongside free pages in the physical
memory allocator's buddy queues, increasing the likelihood that large
allocations of contiguous physical memory (i.e., superpages) will
succeed.

Finally, as a result of this change long-standing restrictions on when
and where a cached page can be reclaimed and returned by
vm_page_alloc(9) are eliminated.  Specifically, calls to
vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and
return a formerly cached page.  Consequently, a call to malloc(9)
specifying M_NOWAIT is less likely to fail.

Discussed with: many over the course of the summer, including jeff@,
   Justin Husted @ Isilon, peter@, tegge@
Tested by: an earlier version by kris@
Approved by: re (kensmith)
2007-09-25 06:25:06 +00:00
jeff
3fc0f8b973 - Move all of the PS_ flags into either p_flag or td_flags.
- p_sflag was mostly protected by PROC_LOCK rather than the PROC_SLOCK or
   previously the sched_lock.  These bugs have existed for some time.
 - Allow swapout to try each thread in a process individually and then
   swapin the whole process if any of these fail.  This allows us to move
   most scheduler related swap flags into td_flags.
 - Keep ki_sflag for backwards compat but change all in source tools to
   use the new and more correct location of P_INMEM.

Reported by:	pho
Reviewed by:	attilio, kib
Approved by:	re (kensmith)
2007-09-17 05:31:39 +00:00
alc
0188378655 Correct an assertion in vm_pageout_flush(). Specifically, if a page's
status after vm_pager_put_pages() is VM_PAGER_PEND, then it could have
already been recycled, i.e., freed and reallocated to a new purpose;
thus, asserting that such pages cannot be written is inappropriate.

Reported by: kris
Submitted by: tegge
Approved by: re (kensmith)
MFC after: 1 week
2007-09-15 18:30:28 +00:00
alc
ab67a07868 In the previous revision, when I replaced the unconditional acquisition
of Giant in vm_pageout_scan() with VFS_LOCK_GIANT(), I had to eliminate
the acquisition of the vnode interlock before releasing the vm object's
lock because the vnode interlock cannot be held when VFS_LOCK_GIANT() is
performed.  Unfortunately, this allows the vnode to be recycled between
the release of the vm object's lock and the vget() on the vnode.

In this revision, I prevent the vnode from being recycled by acquiring
another reference to the vm object and underlying vnode before releasing
the vm object's lock.

This change also addresses another preexisting but trivial problem.  By
acquiring another reference to the vm object, I also prevent the vm
object from being recycled.  Previously, the "vnodes skipped" counter
could be wrong because if it examined a recycled vm object.

Reported by:	kib
Reviewed by:	kib
Approved by:	re (kensmith)
MFC after:	3 weeks
2007-07-02 06:56:37 +00:00
alc
c6438ef575 Eliminate the use of Giant from vm_daemon(). Replace the unconditional
use of Giant in vm_pageout_scan() with VFS_LOCK_GIANT().

Approved by:	re (kensmith)
MFC after:	3 weeks
2007-06-26 18:24:05 +00:00
alc
c7ee2c66ef Eliminate unnecessary checks from vm_pageout_clean(): The page that is
passed to vm_pageout_clean() cannot possibly be PG_UNMANAGED because
it came from the inactive queue and PG_UNMANAGED pages are not in any
page queue.  Moreover, PG_UNMANAGED pages only exist in OBJT_PHYS
objects, and all pages within a OBJT_PHYS object are PG_UNMANAGED.
So, if the page that is passed to vm_pageout_clean() is not
PG_UNMANAGED, then it cannot be from an OBJT_PHYS object and its
neighbors from the same object cannot themselves be PG_UNMANAGED.

Reviewed by:	tegge
2007-06-18 02:04:38 +00:00
alc
a8415c5a0d Enable the new physical memory allocator.
This allocator uses a binary buddy system with a twist.  First and
foremost, this allocator is required to support the implementation of
superpages.  As a side effect, it enables a more robust implementation
of contigmalloc(9).  Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).

The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages.  Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space.  The performance benefits vary.  In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.

This allocator does not implement page coloring.  The reason is that
superpages have much the same effect.  The contiguous physical memory
allocation necessary for a superpage is inherently colored.

Finally, the one caveat is that this allocator does not effectively
support prezeroed pages.  I hope this is temporary.  On i386, this is
a slight pessimization.  However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects.  I speculate
that this is true in general of machines with a direct map.

Approved by:	re
2007-06-16 04:57:06 +00:00
alc
0630c4e3a4 Eliminate dead code: We have not performed pageouts on the kernel object
in this millenium.
2007-06-13 06:10:10 +00:00
attilio
e9fc4edc44 Optimize vmmeter locking.
In particular:
- Add an explicative table for locking of struct vmmeter members
- Apply new rules for some of those members
- Remove some unuseful comments

Heavily reviewed by: alc, bde, jeff
Approved by: jeff (mentor)
2007-06-10 21:59:14 +00:00
jeff
91d1501790 Commit 14/14 of sched_lock decomposition.
- Use thread_lock() rather than sched_lock for per-thread scheduling
   sychronization.
 - Use the per-process spinlock rather than the sched_lock for per-process
   scheduling synchronization.

Tested by:      kris, current@
Tested on:      i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
2007-06-05 00:00:57 +00:00
attilio
9bd4fdf7ce Do proper "locking" for missing vmmeters part.
Now, we assume no more sched_lock protection for some of them and use the
distribuited loads method for vmmeter (distribuited through CPUs).

Reviewed by: alc, bde
Approved by: jeff (mentor)
2007-06-04 21:45:18 +00:00
attilio
7dd8ed88a9 Revert VMCNT_* operations introduction.
Probabilly, a general approach is not the better solution here, so we should
solve the sched_lock protection problems separately.

Requested by: alc
Approved by: jeff (mentor)
2007-05-31 22:52:15 +00:00
jeff
e1996cb960 - define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating
vmcnts.  This can be used to abstract away pcpu details but also changes
   to use atomics for all counters now.  This means sched lock is no longer
   responsible for protecting counts in the switch routines.

Contributed by:		Attilio Rao <attilio@FreeBSD.org>
2007-05-18 07:10:50 +00:00
alc
2eb15b506b Change the pagedaemon, vm_wait(), and vm_waitpfault() to sleep on the
vm page queue free mutex instead of the vm page queue mutex.
2007-02-07 06:37:30 +00:00
delphij
9856d14ea1 Use FOREACH_PROC_IN_SYSTEM instead of using its unrolled form. 2007-01-17 15:05:52 +00:00
alc
cbcb760109 Replace PG_BUSY with VPO_BUSY. In other words, changes to the page's
busy flag, i.e., VPO_BUSY, are now synchronized by the per-vm object
lock instead of the global page queues lock.
2006-10-22 04:28:14 +00:00
alc
a152234cf9 Complete the transition from pmap_page_protect() to pmap_remove_write().
Originally, I had adopted sparc64's name, pmap_clear_write(), for the
function that is now pmap_remove_write().  However, this function is more
like pmap_remove_all() than like pmap_clear_modify() or
pmap_clear_reference(), hence, the name change.

The higher-level rationale behind this change is described in
src/sys/amd64/amd64/pmap.c revision 1.567.  The short version is that I'm
trying to clean up and fix our support for execute access.

Reviewed by: marcel@ (ia64)
2006-08-01 19:06:06 +00:00
tegge
d0ed011fe8 Expand scope of marker to reduce the number of page queue scan restarts. 2006-02-17 21:02:39 +00:00
tegge
3cf2c022c2 Check return value from nonblocking call to vn_start_write(). 2006-02-17 18:22:19 +00:00
alc
83970835a0 Remove an unnecessary call to pmap_remove_all(). The given page is not
mapped because its contents are invalid.
2006-02-04 22:37:10 +00:00
alc
d2566b1431 Use the new macros abstracting the page coloring/queues implementation.
(There are no functional changes.)
2006-01-27 07:28:51 +00:00
netchild
507a9b3e93 MI changes:
- provide an interface (macros) to the page coloring part of the VM system,
   this allows to try different coloring algorithms without the need to
   touch every file [1]
 - make the page queue tuning values readable: sysctl vm.stats.pagequeue
 - autotuning of the page coloring values based upon the cache size instead
   of options in the kernel config (disabling of the page coloring as a
   kernel option is still possible)

MD changes:
 - detection of the cache size: only IA32 and AMD64 (untested) contains
   cache size detection code, every other arch just comes with a dummy
   function (this results in the use of default values like it was the
   case without the autotuning of the page coloring)
 - print some more info on Intel CPU's (like we do on AMD and Transmeta
   CPU's)

Note to AMD owners (IA32 and AMD64): please run "sysctl vm.stats.pagequeue"
and report if the cache* values are zero (= bug in the cache detection code)
or not.

Based upon work by:	Chad David <davidc@acns.ab.ca> [1]
Reviewed by:		alc, arch (in 2004)
Discussed with:		alc, Chad David, arch (in 2004)
2005-12-31 14:39:20 +00:00
alc
8852c8f9e2 Reimplement the reclamation of PV entries. Specifically, perform
reclamation synchronously from get_pv_entry() instead of
asynchronously as part of the page daemon.  Additionally, limit the
reclamation to inactive pages unless allocation from the PV entry zone
or reclamation from the inactive queue fails.  Previously, reclamation
destroyed mappings to both inactive and active pages.  get_pv_entry()
still, however, wakes up the page daemon when reclamation occurs.  The
reason being that the page daemon may move some pages from the active
queue to the inactive queue, making some new pages available to future
reclamations.

Print the "reclaiming PV entries" message at most once per minute, but
don't stop printing it after the fifth time.  This way, we do not give
the impression that the problem has gone away.

Reviewed by: tegge
2005-11-09 08:19:21 +00:00
tegge
a73af1b81d Don't allow pagedaemon to skip pages while scanning PQ_ACTIVE or PQ_INACTIVE
due to the vm object being locked.

When a process writes large amounts of data to a file, the vm object associated
with that file can contain most of the physical pages on the machine.  If the
process is preempted while holding the lock on the vm object, pagedaemon would
be able to move very few pages from PQ_INACTIVE to PQ_CACHE or from PQ_ACTIVE
to PQ_INACTIVE, resulting in unlimited cleaning of dirty pages belonging to
other vm objects.

Temporarily unlock the page queues lock while locking vm objects to avoid lock
order violation.  Detect and handle relevant page queue changes.

This change depends on both the lock portion of struct vm_object and normal
struct vm_page being type stable.

Reviewed by:	alc
2005-08-10 00:17:36 +00:00
imp
f0bf889d0d /* -> /*- for license, minor formatting changes 2005-01-07 02:29:27 +00:00
alc
437d364a03 Revise the part of vm_pageout_scan() that moves pages from the cache
queue to the free queue.  With this change, if a page from the cache
queue belongs to a locked object, it is simply skipped over rather
than moved to the inactive queue.
2005-01-06 20:22:36 +00:00
alc
cc2178b9c8 During traversal of the inactive queue, try locking the page's containing
object before accessing the page's flags or the object's reference count.
2004-11-05 06:24:05 +00:00
alc
25b80a64b9 The synchronization provided by vm object locking has eliminated the
need for most calls to vm_page_busy().  Specifically, most calls to
vm_page_busy() occur immediately prior to a call to vm_page_remove().
In such cases, the containing vm object is locked across both calls.
Consequently, the setting of the vm page's PG_BUSY flag is not even
visible to other threads that are following the synchronization
protocol.

This change (1) eliminates the calls to vm_page_busy() that
immediately precede a call to vm_page_remove() or functions, such as
vm_page_free() and vm_page_rename(), that call it and (2) relaxes the
requirement in vm_page_remove() that the vm page's PG_BUSY flag is
set.  Now, the vm page's PG_BUSY flag is set only when the vm object
lock is released while the vm page is still in transition.  Typically,
this is when it is undergoing I/O.
2004-11-03 20:17:31 +00:00
alc
17432a99e5 During traversal of the active queue by vm_pageout_page_stats(), try
locking the page's containing object before accessing the page's flags.
2004-10-30 23:30:53 +00:00
alc
c823d3b356 Add an assignment statement that I omitted from the previous revision. 2004-10-30 07:09:46 +00:00
alc
ce02afb500 During traversal of the active queue, try locking the page's containing
object before accessing the page's flags or the object's reference count.
If the trylock fails, handle the page as though it is busy.
2004-10-27 18:29:17 +00:00
alc
c2ace5ec03 Remove the GIANT_REQUIRED preceding pmap_remove() in
vm_pageout_map_deactivate_pages().
2004-07-18 04:38:11 +00:00
alc
123cfa6b64 Push down the acquisition and release of the page queues lock into
pmap_protect() and pmap_remove().  In general, they require the lock in
order to modify a page's pv list or flags.  In some cases, however,
pmap_protect() can avoid acquiring the lock.
2004-07-15 18:00:43 +00:00
alc
f591650a79 Remove an unused and unimplemented sysctl. (For the record, it was marked
as unimplemented in revision 1.129 nearly six years ago.)
2004-07-12 17:45:37 +00:00
alc
a96a1d58f7 Call vm_pageout_page_stats() with the page queues lock held. 2004-06-24 04:08:43 +00:00
alc
a035bc6c10 Remove spl calls. 2004-06-24 03:13:30 +00:00
julian
6c9d81ae0d Nice, is a property of a process as a whole..
I mistakenly moved it to the ksegroup when breaking up the process
structure. Put it back in the proc structure.
2004-06-16 00:26:31 +00:00