Commit Graph

257 Commits

Author SHA1 Message Date
Konstantin Belousov
4a0dfeb299 MFC r198505:
When protection of wired read-only mapping is changed to read-write,
install new shadow object behind the map entry and copy the pages
from the underlying objects to it. This makes the mprotect(2) call to
actually perform the requested operation instead of silently do nothing
and return success, that causes SIGSEGV on later write access to the
mapping.

Reuse vm_fault_copy_entry() to do the copying, modifying it to behave
correctly when src_entry == dst_entry.
2009-11-17 18:38:00 +00:00
Konstantin Belousov
82bccc6465 MFC r198476 (by alc):
Simplify the inner loop of vm_fault_copy_entry().

Approved by:	alc
2009-11-17 18:31:09 +00:00
Alan Cox
38c15033c7 MFC r198472
Eliminate an unnecessary check from vm_fault_prefault().
2009-10-31 18:18:32 +00:00
John Baldwin
013818111a Add a new type of VM object: OBJT_SG. An OBJT_SG object is very similar to
a device pager (OBJT_DEVICE) object in that it uses fictitious pages to
provide aliases to other memory addresses.  The primary difference is that
it uses an sglist(9) to determine the physical addresses for a given offset
into the object instead of invoking the d_mmap() method in a device driver.

Reviewed by:	alc
Approved by:	re (kensmith)
MFC after:	2 weeks
2009-07-24 13:50:29 +00:00
Konstantin Belousov
121fd46175 When forking a vm space that has wired map entries, do not forget to
charge the objects created by vm_fault_copy_entry. The object charge
was set, but reserve not incremented.

Reported by:	Greg Rivers <gcr+freebsd-current tharned org>
Reviewed by:	alc (previous version)
Approved by:	re (kensmith)
2009-07-03 22:17:37 +00:00
Konstantin Belousov
3364c323e6 Implement global and per-uid accounting of the anonymous memory. Add
rlimit RLIMIT_SWAP that limits the amount of swap that may be reserved
for the uid.

The accounting information (charge) is associated with either map entry,
or vm object backing the entry, assuming the object is the first one
in the shadow chain and entry does not require COW. Charge is moved
from entry to object on allocation of the object, e.g. during the mmap,
assuming the object is allocated, or on the first page fault on the
entry. It moves back to the entry on forks due to COW setup.

The per-entry granularity of accounting makes the charge process fair
for processes that change uid during lifetime, and decrements charge
for proper uid when region is unmapped.

The interface of vm_pager_allocate(9) is extended by adding struct ucred *,
that is used to charge appropriate uid when allocation if performed by
kernel, e.g. md(4).

Several syscalls, among them is fork(2), may now return ENOMEM when
global or per-uid limits are enforced.

In collaboration with:	pho
Reviewed by:	alc
Approved by:	re (kensmith)
2009-06-23 20:45:22 +00:00
Alan Cox
0a2e596a93 Eliminate unnecessary obfuscation when testing a page's valid bits. 2009-06-07 19:38:26 +00:00
Alan Cox
d7d9cfed36 Eliminate an incorrect comment. 2009-05-07 05:44:13 +00:00
Alan Cox
78cfe1f7bd Eliminate an archaic band-aid. The immediately preceding comment already
explains why the band-aid is unnecessary.

Suggested by:	tegge
2009-04-26 20:54:57 +00:00
Alan Cox
f9855e177d Allow valid pages to be mapped for read access when they have a non-zero
busy count.  Only mappings that allow write access should be prevented by
a non-zero busy count.

(The prohibition on mapping pages for read access when they have a non-
zero busy count originated in revision 1.202 of i386/i386/pmap.c when
this code was a part of the pmap.)

Reviewed by:	tegge
2009-04-19 00:34:34 +00:00
Alan Cox
5758fe7185 Prior to r188331 a map entry's last read offset was only updated by a hard
fault.  In r188331 this update was relocated because of synchronization
changes to a place where it would occur on both hard and soft faults.  This
change again restricts the update to hard faults.
2009-02-25 07:52:53 +00:00
Alan Cox
c722e407dc Avoid some cases of unnecessary page queues locking by vm_fault's delete-
behind heuristic.
2009-02-09 06:23:21 +00:00
Alan Cox
7b54b1a9f5 Eliminate OBJ_NEEDGIANT. After r188331, OBJ_NEEDGIANT's only use is by a
redundant assertion in vm_fault().

Reviewed by:	kib
2009-02-08 22:17:24 +00:00
Konstantin Belousov
2fada4c2b3 Remove no longer valid comment.
Submitted by:	alc
2009-02-08 21:20:13 +00:00
Konstantin Belousov
d2bf64c309 Do not sleep for vnode lock while holding map lock in vm_fault. Try to
acquire vnode lock for OBJT_VNODE object after map lock is dropped.
Because we have the busy page(s) in the object, sleeping there would
result in deadlock with vnode resize. Try to get lock without sleeping,
and, if the attempt failed, drop the state, lock the vnode, and restart
the fault handler from the start with already locked vnode.

Because the vnode_pager_lock() function is inlined in vm_fault(),
axe it.

Based on suggestion by:	alc
Reviewed by:	tegge, alc
Tested by:	pho
2009-02-08 20:23:46 +00:00
Konstantin Belousov
0d0be82a5d Style. 2009-02-08 19:37:01 +00:00
Alan Cox
ec96dca788 Simplify the inner loop of vm_fault()'s delete-behind heuristic.
Instead of checking each page for PG_UNMANAGED, perform a one-time
check whether the object is OBJT_PHYS.  (PG_UNMANAGED pages only
belong to OBJT_PHYS objects.)
2008-03-16 17:37:19 +00:00
Alan Cox
593e717ec9 Eliminate an unnecessary test from vm_fault's delete-behind heuristic.
Specifically, since the delete-behind heuristic is never applied to a
device-backed object, there is no point in checking whether each of the
object's pages is fictitious.  (Only device-backed objects have
fictitious pages.)
2008-03-09 06:08:58 +00:00
Alan Cox
eb2a051720 Add an access type parameter to pmap_enter(). It will be used to implement
superpage promotion.

Correct a style error in kmem_malloc(): pmap_enter()'s last parameter is
a Boolean.
2008-01-03 07:34:34 +00:00
Alan Cox
f8a47341fe Add the superpage reservation system. This is "part 2 of 2" of the
machine-independent support for superpages.  (The earlier part was
the rewrite of the physical memory allocator.)  The remainder of the
code required for superpages support is machine-dependent and will
be added to the various pmap implementations at a later date.

Initially, I am only supporting one large page size per architecture.
Moreover, I am only enabling the reservation system on amd64.  (In
an emergency, it can be disabled by setting VM_NRESERVLEVELS to 0
in amd64/include/vmparam.h or your kernel configuration file.)
2007-12-29 19:53:04 +00:00
Konstantin Belousov
4ab8ab9285 Do not dereference NULL pointer.
Reported by:	Peter Holm
Reviewed by:	alc
Approved by:	re (kensmith)
2007-10-08 20:09:53 +00:00
Alan Cox
7bfda801a8 Change the management of cached pages (PQ_CACHE) in two fundamental
ways:

(1) Cached pages are no longer kept in the object's resident page
splay tree and memq.  Instead, they are kept in a separate per-object
splay tree of cached pages.  However, access to this new per-object
splay tree is synchronized by the _free_ page queues lock, not to be
confused with the heavily contended page queues lock.  Consequently, a
cached page can be reclaimed by vm_page_alloc(9) without acquiring the
object's lock or the page queues lock.

This solves a problem independently reported by tegge@ and Isilon.
Specifically, they observed the page daemon consuming a great deal of
CPU time because of pages bouncing back and forth between the cache
queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE).  The source of
this problem turned out to be a deadlock avoidance strategy employed
when selecting a cached page to reclaim in vm_page_select_cache().
However, the root cause was really that reclaiming a cached page
required the acquisition of an object lock while the page queues lock
was already held.  Thus, this change addresses the problem at its
root, by eliminating the need to acquire the object's lock.

Moreover, keeping cached pages in the object's primary splay tree and
memq was, in effect, optimizing for the uncommon case.  Cached pages
are reclaimed far, far more often than they are reactivated.  Instead,
this change makes reclamation cheaper, especially in terms of
synchronization overhead, and reactivation more expensive, because
reactivated pages will have to be reentered into the object's primary
splay tree and memq.

(2) Cached pages are now stored alongside free pages in the physical
memory allocator's buddy queues, increasing the likelihood that large
allocations of contiguous physical memory (i.e., superpages) will
succeed.

Finally, as a result of this change long-standing restrictions on when
and where a cached page can be reclaimed and returned by
vm_page_alloc(9) are eliminated.  Specifically, calls to
vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and
return a formerly cached page.  Consequently, a call to malloc(9)
specifying M_NOWAIT is less likely to fail.

Discussed with: many over the course of the summer, including jeff@,
   Justin Husted @ Isilon, peter@, tegge@
Tested by: an earlier version by kris@
Approved by: re (kensmith)
2007-09-25 06:25:06 +00:00
Alan Cox
806453645a Two changes to vm_fault_additional_pages():
1. Rewrite the backward scan.  Specifically, reverse the order in which
   pages are allocated so that upon failure it is never necessary to
   free pages that were just allocated.  Moreover, any allocated pages
   can be put to use.  This makes the backward scan behave just like the
   forward scan.

2. Eliminate an explicit, unsynchronized check for low memory before
   calling vm_page_alloc().  It serves no useful purpose.  It is, in
   effect, optimizing the uncommon case at the expense of the common
   case.

Approved by:	re (hrs)
MFC after:	3 weeks
2007-07-20 06:55:11 +00:00
Alan Cox
d1974c0df1 Eliminate the special case handling of OBJT_DEVICE objects in
vm_fault_additional_pages() that was introduced in revision 1.47.  Then
as now, it is unnecessary because dev_pager_haspage() returns zero for
both the number of pages to read ahead and read behind, producing the
same exact behavior by vm_fault_additional_pages() as the special case
handling.

Approved by: re (rwatson)
2007-07-08 19:42:52 +00:00
Alan Cox
65ea29a690 When a cached page is reactivated in vm_fault(), update the counter that
tracks the total number of reactivated pages.  (We have not been
counting reactivations by vm_fault() since revision 1.46.)

Correct a comment in vm_fault_additional_pages().

Approved by:	re (kensmith)
MFC after:	1 week
2007-07-06 21:25:21 +00:00
Matt Jacob
9dae729081 Initialize reqpage to zero. 2007-06-17 04:14:27 +00:00
Attilio Rao
6759608248 Rework the PCPU_* (MD) interface:
- Rename PCPU_LAZY_INC into PCPU_INC
- Add the PCPU_ADD interface which just does an add on the pcpu member
  given a specific value.

Note that for most architectures PCPU_INC and PCPU_ADD are not safe.
This is a point that needs some discussions/work in the next days.

Reviewed by: alc, bde
Approved by: jeff (mentor)
2007-06-04 21:38:48 +00:00
Jeff Roberson
1c4bcd050a - Move rusage from being per-process in struct pstats to per-thread in
td_ru.  This removes the requirement for per-process synchronization in
   statclock() and mi_switch().  This was previously supported by
   sched_lock which is going away.  All modifications to rusage are now
   done in the context of the owning thread.  reads proceed without locks.
 - Aggregate exiting threads rusage in thread_exit() such that the exiting
   thread's rusage is not lost.
 - Provide a new routine, rufetch() to fetch an aggregate of all rusage
   structures from all threads in a process.  This routine must be used
   in any place requiring a rusage from a process prior to it's exit.  The
   exited process's rusage is still available via p_ru.
 - Aggregate tick statistics only on demand via rufetch() or when a thread
   exits.  Tick statistics are kept in the thread and protected by sched_lock
   until it exits.

Initial patch by:	attilio
Reviewed by:		attilio, bde (some objections), arch (mostly silent)
2007-06-01 01:12:45 +00:00
Attilio Rao
2feb50bf7d Revert VMCNT_* operations introduction.
Probabilly, a general approach is not the better solution here, so we should
solve the sched_lock protection problems separately.

Requested by: alc
Approved by: jeff (mentor)
2007-05-31 22:52:15 +00:00
Alan Cox
cf4682ae23 Eliminate the reactivation of cached pages in vm_fault_prefault() and
vm_map_pmap_enter() unless the caller is madvise(MADV_WILLNEED).  With
the exception of calls to vm_map_pmap_enter() from
madvise(MADV_WILLNEED), vm_fault_prefault() and vm_map_pmap_enter()
are both used to create speculative mappings.  Thus, always
reactivating cached pages is a mistake.  In principle, cached pages
should only be reactivated by an actual access.  Otherwise, the
following misbehavior can occur.  On a hard fault for a text page the
clustering algorithm fetches not only the required page but also
several of the adjacent pages.  Now, suppose that one or more of the
adjacent pages are never accessed.  Ultimately, these unused pages
become cached pages through the efforts of the page daemon.  However,
the next activation of the executable reactivates and maps these
unused pages.  Consequently, they are never replaced.  In effect, they
become pinned in memory.
2007-05-22 04:45:59 +00:00
Jeff Roberson
222d01951f - define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating
vmcnts.  This can be used to abstract away pcpu details but also changes
   to use atomics for all counters now.  This means sched lock is no longer
   responsible for protecting counts in the switch routines.

Contributed by:		Attilio Rao <attilio@FreeBSD.org>
2007-05-18 07:10:50 +00:00
Pawel Jakub Dawidek
fcdd9721e4 Fix a problem for file systems that don't implement VOP_BMAP() operation.
The problem is this: vm_fault_additional_pages() calls vm_pager_has_page(),
which calls vnode_pager_haspage(). Now when VOP_BMAP() returns an error (eg.
EOPNOTSUPP), vnode_pager_haspage() returns TRUE without initializing 'before'
and 'after' arguments, so we have some accidental values there. This bascially
was causing this condition to be meet:

	if ((rahead + rbehind) >
	    ((cnt.v_free_count + cnt.v_cache_count) - cnt.v_free_reserved)) {
		pagedaemon_wakeup();
		[...]
	}

(we have some random values in rahead and rbehind variables)

I'm not entirely sure this is the right fix, maybe we should just return FALSE
in vnode_pager_haspage() when VOP_BMAP() fails?

alc@ knows about this problem, maybe he will be able to come up with a better
fix if this is not the right one.
2007-04-05 20:49:46 +00:00
Alan Cox
768131d293 vm_page_busy() no longer requires the page queues lock to be held. Reduce
the scope of the page queues lock in vm_fault() accordingly.
2007-03-23 06:11:25 +00:00
Alan Cox
d8810d894d Use PCPU_LAZY_INC() to update page fault statistics. 2007-03-05 18:55:14 +00:00
Alan Cox
44b8bd66f9 Make pmap_enter() responsible for setting PG_WRITEABLE instead
of its caller.  (As a beneficial side-effect, a high-contention
acquisition of the page queues lock in vm_fault() is eliminated.)
2006-11-12 21:48:34 +00:00
Alan Cox
66bdd5d619 The page queues lock is no longer required by vm_page_wakeup(). 2006-10-23 05:27:31 +00:00
Alan Cox
9af80719db Replace PG_BUSY with VPO_BUSY. In other words, changes to the page's
busy flag, i.e., VPO_BUSY, are now synchronized by the per-vm object
lock instead of the global page queues lock.
2006-10-22 04:28:14 +00:00
Alan Cox
9fea8cad08 Eliminate unnecessary PG_BUSY tests. They originally served a purpose
that is now handled by vm object locking.
2006-10-21 21:02:04 +00:00
Alan Cox
b146f9e5d2 Reimplement the page's NOSYNC flag as an object-synchronized instead of a
page queues-synchronized flag.  Reduce the scope of the page queues lock in
vm_fault() accordingly.

Move vm_fault()'s call to vm_object_set_writeable_dirty() outside of the
scope of the page queues lock.  Reviewed by: tegge
Additionally, eliminate an unnecessary dereference in computing the
argument that is passed to vm_object_set_writeable_dirty().
2006-08-13 00:11:09 +00:00
Alan Cox
e7e56b2889 Eliminate the acquisition and release of the page queues lock around a call
to vm_page_sleep_if_busy().
2006-08-06 00:17:17 +00:00
Alan Cox
2cf139527c Retire debug.mpsafevm. None of the architectures supported in CVS require
it any longer.
2006-07-21 23:22:49 +00:00
Stephan Uphoff
2053c12705 Remove mpte optimization from pmap_enter_quick().
There is a race with the current locking scheme and removing
it should have no measurable performance impact.
This fixes page faults leading to panics in pmap_enter_quick_locked()
on amd64/i386.

Reviewed by: alc,jhb,peter,ps
2006-06-15 01:01:06 +00:00
Alan Cox
8f8790a76d Simplify the implementation of vm_fault_additional_pages() based upon the
object's memq being ordered.  Specifically, replace repeated calls to
vm_page_lookup() by two simple constant-time operations.

Reviewed by: tegge
2006-05-13 20:05:44 +00:00
Warner Losh
62a59e8f0d Remove leading __ from __(inline|const|signed|volatile). They are
obsolete.  This should reduce diffs to NetBSD as well.
2006-03-08 06:31:46 +00:00
Tor Egge
44ed341759 Adjust old comment (present in rev 1.1) to match changes in rev 1.82.
PR:	kern/92509
Submitted by:   "Bryan Venteicher" <bryanv@daemoninthecloset.org>
2006-02-02 21:55:38 +00:00
Alan Cox
82eedee4a4 Use the new macros abstracting the page coloring/queues implementation.
(There are no functional changes.)
2006-01-27 08:35:32 +00:00
Alexander Leidinger
ef39c05baa MI changes:
- provide an interface (macros) to the page coloring part of the VM system,
   this allows to try different coloring algorithms without the need to
   touch every file [1]
 - make the page queue tuning values readable: sysctl vm.stats.pagequeue
 - autotuning of the page coloring values based upon the cache size instead
   of options in the kernel config (disabling of the page coloring as a
   kernel option is still possible)

MD changes:
 - detection of the cache size: only IA32 and AMD64 (untested) contains
   cache size detection code, every other arch just comes with a dummy
   function (this results in the use of default values like it was the
   case without the autotuning of the page coloring)
 - print some more info on Intel CPU's (like we do on AMD and Transmeta
   CPU's)

Note to AMD owners (IA32 and AMD64): please run "sysctl vm.stats.pagequeue"
and report if the cache* values are zero (= bug in the cache detection code)
or not.

Based upon work by:	Chad David <davidc@acns.ab.ca> [1]
Reviewed by:		alc, arch (in 2004)
Discussed with:		alc, Chad David, arch (in 2004)
2005-12-31 14:39:20 +00:00
Tor Egge
b898bb1be3 Don't access fs->first_object after dropping reference to it.
The result could be a missed or extra giant unlock.

Reviewed by:	alc
2005-12-20 12:27:59 +00:00
Alan Cox
05406e6f33 Remove unneeded calls to pmap_remove_all(). The given page is not mapped.
Reviewed by: tegge
2005-12-11 22:06:57 +00:00
Alan Cox
57b5187b16 Eliminate an incorrect cast. 2005-09-07 01:42:30 +00:00