Commit Graph

265 Commits

Author SHA1 Message Date
alc
5eaca2d838 Properly synchronize the previous change. 2009-11-28 00:50:09 +00:00
alc
a9520143df Support the new VM_PROT_COPY option on wired pages. The effect of which
is that a debugger can now set a breakpoint in a program that uses mlock(2)
on its text segment or mlockall(2) on its entire address space.
2009-11-27 22:08:29 +00:00
alc
dcb93e6c95 Simplify the invocation of vm_fault(). Specifically, eliminate the flag
VM_FAULT_DIRTY.  The information provided by this flag can be trivially
inferred by vm_fault().

Discussed with:	kib
2009-11-27 20:24:11 +00:00
alc
2d9252d6c7 Replace VM_PROT_OVERRIDE_WRITE by VM_PROT_COPY. VM_PROT_OVERRIDE_WRITE has
represented a write access that is allowed to override write protection.
Until now, VM_PROT_OVERRIDE_WRITE has been used to write breakpoints into
text pages.  Text pages are not just write protected but they are also
copy-on-write.  VM_PROT_OVERRIDE_WRITE overrides the write protection on the
text page and triggers the replication of the page so that the breakpoint
will be written to a private copy.  However, here is where things become
confused.  It is the debugger, not the process being debugged that requires
write access to the copied page.  Nonetheless, the copied page is being
mapped into the process with write access enabled.  In other words, once the
debugger sets a breakpoint within a text page, the program can write to its
private copy of that text page.  Whereas prior to setting the breakpoint, a
SIGSEGV would have occurred upon a write access.  VM_PROT_COPY addresses
this problem.  The combination of VM_PROT_READ and VM_PROT_COPY forces the
replication of a copy-on-write page even though the access is only for read.
Moreover, the replicated page is only mapped into the process with read
access, and not write access.

Reviewed by:	kib
MFC after:	4 weeks
2009-11-26 05:16:07 +00:00
alc
ca67dc4da4 Simplify both the invocation and the implementation of vm_fault() for wiring
pages.

(Note: Claims made in the comments about the handling of breakpoints in
wired pages have been false for roughly a decade.  This and another bug
involving breakpoints will be fixed in coming changes.)

Reviewed by:	kib
2009-11-18 18:05:54 +00:00
alc
4ba0c7ba2f Eliminate an unnecessary #include. (This #include should have been removed
in r188331 when vnode_pager_lock() was eliminated.)
2009-11-04 03:12:56 +00:00
alc
4d317ded24 Eliminate a bit of hackery from vm_fault(). The operations that this
hackery sought to prevent are now properly supported by vm_map_protect().
(See r198505.)

Reviewed by:	kib
2009-11-03 17:15:15 +00:00
alc
b6a248b75a Correct an error in vm_fault_copy_entry() that has existed since the first
version of this file.  When a process forks, any wired pages are immediately
copied because copy-on-write is not supported for wired pages.  In other
words, the child process is given its own private copy of each wired page
from its parent's address space.  Unfortunately, to date, these copied pages
have been mapped into the child's address space with the wrong permissions,
typically VM_PROT_ALL.  This change corrects the permissions.

Reviewed by:	kib
2009-10-31 17:39:56 +00:00
kib
feb999713b When protection of wired read-only mapping is changed to read-write,
install new shadow object behind the map entry and copy the pages
from the underlying objects to it. This makes the mprotect(2) call to
actually perform the requested operation instead of silently do nothing
and return success, that causes SIGSEGV on later write access to the
mapping.

Reuse vm_fault_copy_entry() to do the copying, modifying it to behave
correctly when src_entry == dst_entry.

Reviewed by:	alc
MFC after:	3 weeks
2009-10-27 10:15:58 +00:00
alc
d4f827eb7a Simplify the inner loop of vm_fault_copy_entry().
Reviewed by:	kib
2009-10-26 00:01:52 +00:00
alc
9911b79277 Eliminate an unnecessary check from vm_fault_prefault(). 2009-10-25 17:30:50 +00:00
jhb
44220d7e1e Add a new type of VM object: OBJT_SG. An OBJT_SG object is very similar to
a device pager (OBJT_DEVICE) object in that it uses fictitious pages to
provide aliases to other memory addresses.  The primary difference is that
it uses an sglist(9) to determine the physical addresses for a given offset
into the object instead of invoking the d_mmap() method in a device driver.

Reviewed by:	alc
Approved by:	re (kensmith)
MFC after:	2 weeks
2009-07-24 13:50:29 +00:00
kib
af8ce5a988 When forking a vm space that has wired map entries, do not forget to
charge the objects created by vm_fault_copy_entry. The object charge
was set, but reserve not incremented.

Reported by:	Greg Rivers <gcr+freebsd-current tharned org>
Reviewed by:	alc (previous version)
Approved by:	re (kensmith)
2009-07-03 22:17:37 +00:00
kib
fa686c638e Implement global and per-uid accounting of the anonymous memory. Add
rlimit RLIMIT_SWAP that limits the amount of swap that may be reserved
for the uid.

The accounting information (charge) is associated with either map entry,
or vm object backing the entry, assuming the object is the first one
in the shadow chain and entry does not require COW. Charge is moved
from entry to object on allocation of the object, e.g. during the mmap,
assuming the object is allocated, or on the first page fault on the
entry. It moves back to the entry on forks due to COW setup.

The per-entry granularity of accounting makes the charge process fair
for processes that change uid during lifetime, and decrements charge
for proper uid when region is unmapped.

The interface of vm_pager_allocate(9) is extended by adding struct ucred *,
that is used to charge appropriate uid when allocation if performed by
kernel, e.g. md(4).

Several syscalls, among them is fork(2), may now return ENOMEM when
global or per-uid limits are enforced.

In collaboration with:	pho
Reviewed by:	alc
Approved by:	re (kensmith)
2009-06-23 20:45:22 +00:00
alc
919e3cbf28 Eliminate unnecessary obfuscation when testing a page's valid bits. 2009-06-07 19:38:26 +00:00
alc
30d072f507 Eliminate an incorrect comment. 2009-05-07 05:44:13 +00:00
alc
b9963f1636 Eliminate an archaic band-aid. The immediately preceding comment already
explains why the band-aid is unnecessary.

Suggested by:	tegge
2009-04-26 20:54:57 +00:00
alc
b13621e4e2 Allow valid pages to be mapped for read access when they have a non-zero
busy count.  Only mappings that allow write access should be prevented by
a non-zero busy count.

(The prohibition on mapping pages for read access when they have a non-
zero busy count originated in revision 1.202 of i386/i386/pmap.c when
this code was a part of the pmap.)

Reviewed by:	tegge
2009-04-19 00:34:34 +00:00
alc
8a8f5251fa Prior to r188331 a map entry's last read offset was only updated by a hard
fault.  In r188331 this update was relocated because of synchronization
changes to a place where it would occur on both hard and soft faults.  This
change again restricts the update to hard faults.
2009-02-25 07:52:53 +00:00
alc
e3d0161279 Avoid some cases of unnecessary page queues locking by vm_fault's delete-
behind heuristic.
2009-02-09 06:23:21 +00:00
alc
9513bac196 Eliminate OBJ_NEEDGIANT. After r188331, OBJ_NEEDGIANT's only use is by a
redundant assertion in vm_fault().

Reviewed by:	kib
2009-02-08 22:17:24 +00:00
kib
9988f9e959 Remove no longer valid comment.
Submitted by:	alc
2009-02-08 21:20:13 +00:00
kib
379838428b Do not sleep for vnode lock while holding map lock in vm_fault. Try to
acquire vnode lock for OBJT_VNODE object after map lock is dropped.
Because we have the busy page(s) in the object, sleeping there would
result in deadlock with vnode resize. Try to get lock without sleeping,
and, if the attempt failed, drop the state, lock the vnode, and restart
the fault handler from the start with already locked vnode.

Because the vnode_pager_lock() function is inlined in vm_fault(),
axe it.

Based on suggestion by:	alc
Reviewed by:	tegge, alc
Tested by:	pho
2009-02-08 20:23:46 +00:00
kib
b798264c6d Style. 2009-02-08 19:37:01 +00:00
alc
0de51cf047 Simplify the inner loop of vm_fault()'s delete-behind heuristic.
Instead of checking each page for PG_UNMANAGED, perform a one-time
check whether the object is OBJT_PHYS.  (PG_UNMANAGED pages only
belong to OBJT_PHYS objects.)
2008-03-16 17:37:19 +00:00
alc
160b9af7de Eliminate an unnecessary test from vm_fault's delete-behind heuristic.
Specifically, since the delete-behind heuristic is never applied to a
device-backed object, there is no point in checking whether each of the
object's pages is fictitious.  (Only device-backed objects have
fictitious pages.)
2008-03-09 06:08:58 +00:00
alc
545d26e30b Add an access type parameter to pmap_enter(). It will be used to implement
superpage promotion.

Correct a style error in kmem_malloc(): pmap_enter()'s last parameter is
a Boolean.
2008-01-03 07:34:34 +00:00
alc
4565fa1697 Add the superpage reservation system. This is "part 2 of 2" of the
machine-independent support for superpages.  (The earlier part was
the rewrite of the physical memory allocator.)  The remainder of the
code required for superpages support is machine-dependent and will
be added to the various pmap implementations at a later date.

Initially, I am only supporting one large page size per architecture.
Moreover, I am only enabling the reservation system on amd64.  (In
an emergency, it can be disabled by setting VM_NRESERVLEVELS to 0
in amd64/include/vmparam.h or your kernel configuration file.)
2007-12-29 19:53:04 +00:00
kib
53bbfe99fb Do not dereference NULL pointer.
Reported by:	Peter Holm
Reviewed by:	alc
Approved by:	re (kensmith)
2007-10-08 20:09:53 +00:00
alc
d1bce06c64 Change the management of cached pages (PQ_CACHE) in two fundamental
ways:

(1) Cached pages are no longer kept in the object's resident page
splay tree and memq.  Instead, they are kept in a separate per-object
splay tree of cached pages.  However, access to this new per-object
splay tree is synchronized by the _free_ page queues lock, not to be
confused with the heavily contended page queues lock.  Consequently, a
cached page can be reclaimed by vm_page_alloc(9) without acquiring the
object's lock or the page queues lock.

This solves a problem independently reported by tegge@ and Isilon.
Specifically, they observed the page daemon consuming a great deal of
CPU time because of pages bouncing back and forth between the cache
queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE).  The source of
this problem turned out to be a deadlock avoidance strategy employed
when selecting a cached page to reclaim in vm_page_select_cache().
However, the root cause was really that reclaiming a cached page
required the acquisition of an object lock while the page queues lock
was already held.  Thus, this change addresses the problem at its
root, by eliminating the need to acquire the object's lock.

Moreover, keeping cached pages in the object's primary splay tree and
memq was, in effect, optimizing for the uncommon case.  Cached pages
are reclaimed far, far more often than they are reactivated.  Instead,
this change makes reclamation cheaper, especially in terms of
synchronization overhead, and reactivation more expensive, because
reactivated pages will have to be reentered into the object's primary
splay tree and memq.

(2) Cached pages are now stored alongside free pages in the physical
memory allocator's buddy queues, increasing the likelihood that large
allocations of contiguous physical memory (i.e., superpages) will
succeed.

Finally, as a result of this change long-standing restrictions on when
and where a cached page can be reclaimed and returned by
vm_page_alloc(9) are eliminated.  Specifically, calls to
vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and
return a formerly cached page.  Consequently, a call to malloc(9)
specifying M_NOWAIT is less likely to fail.

Discussed with: many over the course of the summer, including jeff@,
   Justin Husted @ Isilon, peter@, tegge@
Tested by: an earlier version by kris@
Approved by: re (kensmith)
2007-09-25 06:25:06 +00:00
alc
8765bda351 Two changes to vm_fault_additional_pages():
1. Rewrite the backward scan.  Specifically, reverse the order in which
   pages are allocated so that upon failure it is never necessary to
   free pages that were just allocated.  Moreover, any allocated pages
   can be put to use.  This makes the backward scan behave just like the
   forward scan.

2. Eliminate an explicit, unsynchronized check for low memory before
   calling vm_page_alloc().  It serves no useful purpose.  It is, in
   effect, optimizing the uncommon case at the expense of the common
   case.

Approved by:	re (hrs)
MFC after:	3 weeks
2007-07-20 06:55:11 +00:00
alc
f58d26b291 Eliminate the special case handling of OBJT_DEVICE objects in
vm_fault_additional_pages() that was introduced in revision 1.47.  Then
as now, it is unnecessary because dev_pager_haspage() returns zero for
both the number of pages to read ahead and read behind, producing the
same exact behavior by vm_fault_additional_pages() as the special case
handling.

Approved by: re (rwatson)
2007-07-08 19:42:52 +00:00
alc
0985df88fc When a cached page is reactivated in vm_fault(), update the counter that
tracks the total number of reactivated pages.  (We have not been
counting reactivations by vm_fault() since revision 1.46.)

Correct a comment in vm_fault_additional_pages().

Approved by:	re (kensmith)
MFC after:	1 week
2007-07-06 21:25:21 +00:00
mjacob
cd7cf5829b Initialize reqpage to zero. 2007-06-17 04:14:27 +00:00
attilio
e333d0ff0e Rework the PCPU_* (MD) interface:
- Rename PCPU_LAZY_INC into PCPU_INC
- Add the PCPU_ADD interface which just does an add on the pcpu member
  given a specific value.

Note that for most architectures PCPU_INC and PCPU_ADD are not safe.
This is a point that needs some discussions/work in the next days.

Reviewed by: alc, bde
Approved by: jeff (mentor)
2007-06-04 21:38:48 +00:00
jeff
a7a8bac81f - Move rusage from being per-process in struct pstats to per-thread in
td_ru.  This removes the requirement for per-process synchronization in
   statclock() and mi_switch().  This was previously supported by
   sched_lock which is going away.  All modifications to rusage are now
   done in the context of the owning thread.  reads proceed without locks.
 - Aggregate exiting threads rusage in thread_exit() such that the exiting
   thread's rusage is not lost.
 - Provide a new routine, rufetch() to fetch an aggregate of all rusage
   structures from all threads in a process.  This routine must be used
   in any place requiring a rusage from a process prior to it's exit.  The
   exited process's rusage is still available via p_ru.
 - Aggregate tick statistics only on demand via rufetch() or when a thread
   exits.  Tick statistics are kept in the thread and protected by sched_lock
   until it exits.

Initial patch by:	attilio
Reviewed by:		attilio, bde (some objections), arch (mostly silent)
2007-06-01 01:12:45 +00:00
attilio
7dd8ed88a9 Revert VMCNT_* operations introduction.
Probabilly, a general approach is not the better solution here, so we should
solve the sched_lock protection problems separately.

Requested by: alc
Approved by: jeff (mentor)
2007-05-31 22:52:15 +00:00
alc
08b2128056 Eliminate the reactivation of cached pages in vm_fault_prefault() and
vm_map_pmap_enter() unless the caller is madvise(MADV_WILLNEED).  With
the exception of calls to vm_map_pmap_enter() from
madvise(MADV_WILLNEED), vm_fault_prefault() and vm_map_pmap_enter()
are both used to create speculative mappings.  Thus, always
reactivating cached pages is a mistake.  In principle, cached pages
should only be reactivated by an actual access.  Otherwise, the
following misbehavior can occur.  On a hard fault for a text page the
clustering algorithm fetches not only the required page but also
several of the adjacent pages.  Now, suppose that one or more of the
adjacent pages are never accessed.  Ultimately, these unused pages
become cached pages through the efforts of the page daemon.  However,
the next activation of the executable reactivates and maps these
unused pages.  Consequently, they are never replaced.  In effect, they
become pinned in memory.
2007-05-22 04:45:59 +00:00
jeff
e1996cb960 - define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating
vmcnts.  This can be used to abstract away pcpu details but also changes
   to use atomics for all counters now.  This means sched lock is no longer
   responsible for protecting counts in the switch routines.

Contributed by:		Attilio Rao <attilio@FreeBSD.org>
2007-05-18 07:10:50 +00:00
pjd
b6f1c3fccc Fix a problem for file systems that don't implement VOP_BMAP() operation.
The problem is this: vm_fault_additional_pages() calls vm_pager_has_page(),
which calls vnode_pager_haspage(). Now when VOP_BMAP() returns an error (eg.
EOPNOTSUPP), vnode_pager_haspage() returns TRUE without initializing 'before'
and 'after' arguments, so we have some accidental values there. This bascially
was causing this condition to be meet:

	if ((rahead + rbehind) >
	    ((cnt.v_free_count + cnt.v_cache_count) - cnt.v_free_reserved)) {
		pagedaemon_wakeup();
		[...]
	}

(we have some random values in rahead and rbehind variables)

I'm not entirely sure this is the right fix, maybe we should just return FALSE
in vnode_pager_haspage() when VOP_BMAP() fails?

alc@ knows about this problem, maybe he will be able to come up with a better
fix if this is not the right one.
2007-04-05 20:49:46 +00:00
alc
584a970755 vm_page_busy() no longer requires the page queues lock to be held. Reduce
the scope of the page queues lock in vm_fault() accordingly.
2007-03-23 06:11:25 +00:00
alc
2210798637 Use PCPU_LAZY_INC() to update page fault statistics. 2007-03-05 18:55:14 +00:00
alc
6093953d36 Make pmap_enter() responsible for setting PG_WRITEABLE instead
of its caller.  (As a beneficial side-effect, a high-contention
acquisition of the page queues lock in vm_fault() is eliminated.)
2006-11-12 21:48:34 +00:00
alc
f395a2d02c The page queues lock is no longer required by vm_page_wakeup(). 2006-10-23 05:27:31 +00:00
alc
cbcb760109 Replace PG_BUSY with VPO_BUSY. In other words, changes to the page's
busy flag, i.e., VPO_BUSY, are now synchronized by the per-vm object
lock instead of the global page queues lock.
2006-10-22 04:28:14 +00:00
alc
7d7a43f1b4 Eliminate unnecessary PG_BUSY tests. They originally served a purpose
that is now handled by vm object locking.
2006-10-21 21:02:04 +00:00
alc
cc1f2c465b Reimplement the page's NOSYNC flag as an object-synchronized instead of a
page queues-synchronized flag.  Reduce the scope of the page queues lock in
vm_fault() accordingly.

Move vm_fault()'s call to vm_object_set_writeable_dirty() outside of the
scope of the page queues lock.  Reviewed by: tegge
Additionally, eliminate an unnecessary dereference in computing the
argument that is passed to vm_object_set_writeable_dirty().
2006-08-13 00:11:09 +00:00
alc
bc6eabeb88 Eliminate the acquisition and release of the page queues lock around a call
to vm_page_sleep_if_busy().
2006-08-06 00:17:17 +00:00
alc
b5b274360a Retire debug.mpsafevm. None of the architectures supported in CVS require
it any longer.
2006-07-21 23:22:49 +00:00
ups
b3a7439a45 Remove mpte optimization from pmap_enter_quick().
There is a race with the current locking scheme and removing
it should have no measurable performance impact.
This fixes page faults leading to panics in pmap_enter_quick_locked()
on amd64/i386.

Reviewed by: alc,jhb,peter,ps
2006-06-15 01:01:06 +00:00