Commit Graph

298 Commits

Author SHA1 Message Date
kib
187a8c5cd6 Calculate the count of per-process cow faults. Export the count to
userspace using the obscure spare int field in struct kinfo_proc.

Submitted by:	Andrey Zonov <andrey zonov org>
MFC after:	1 week
2012-05-23 18:10:54 +00:00
alc
323d529ebe Give vm_fault()'s sequential access optimization a makeover.
There are two aspects to the sequential access optimization: (1) read ahead
of pages that are expected to be accessed in the near future and (2) unmap
and cache behind of pages that are not expected to be accessed again.  This
revision changes both aspects.

The read ahead optimization is now more effective.  It starts with the same
initial read window as before, but arithmetically grows the window on
sequential page faults.  This can yield increased read bandwidth.  For
example, on one of my machines, a program using mmap() to read a file that
is several times larger than the machine's physical memory takes about 17%
less time to complete.

The unmap and cache behind optimization is now more selectively applied.
The read ahead window must grow to its maximum size before unmap and cache
behind is performed.  This significantly reduces the number of times that
pages are unmapped and cached only to be reactivated a short time later.

The unmap and cache behind optimization now clears each page's referenced
flag.  Previously, in the case of dirty pages, if the containing file was
still mapped at the time that the page daemon examined the dirty pages,
they would be reactivated.

From a stylistic standpoint, this revision also cleanly separates the
implementation of the read ahead and unmap/cache behind optimizations.

Glanced at:	kib
MFC after:	2 weeks
2012-05-10 15:16:42 +00:00
jhb
5829de48d9 Add new ktrace records for the start and end of VM faults. This gives
a pair of records similar to syscall entry and return that a user can
use to determine how long page faults take.  The new ktrace records are
enabled via the 'p' trace type, and are enabled in the default set of
trace points.

Reviewed by:	kib
MFC after:	2 weeks
2012-04-05 17:13:14 +00:00
alc
e02fd6b842 Handle spurious page faults that may occur in no-fault sections of the
kernel.

When access restrictions are added to a page table entry, we flush the
corresponding virtual address mapping from the TLB.  In contrast, when
access restrictions are removed from a page table entry, we do not
flush the virtual address mapping from the TLB.  This is exactly as
recommended in AMD's documentation.  In effect, when access
restrictions are removed from a page table entry, AMD's MMUs will
transparently refresh a stale TLB entry.  In short, this saves us from
having to perform potentially costly TLB flushes.  In contrast,
Intel's MMUs are allowed to generate a spurious page fault based upon
the stale TLB entry.  Usually, such spurious page faults are handled
by vm_fault() without incident.  However, when we are executing
no-fault sections of the kernel, we are not allowed to execute
vm_fault().  This change introduces special-case handling for spurious
page faults that occur in no-fault sections of the kernel.

In collaboration with:	kib
Tested by:		gibbs (an earlier version)

I would also like to acknowledge Hiroki Sato's assistance in
diagnosing this problem.

MFC after:	1 week
2012-03-22 04:52:51 +00:00
kib
e84b0ecd81 Use the trick of performing the atomic operation on the contained aligned
word to handle the dirty mask updates in vm_page_clear_dirty_mask().
Remove the vm page queue lock around vm_page_dirty() call in vm_fault_hold()
the sole purpose of which was to protect dirty on architectures which
does not provide short or byte-wide atomics.

Reviewed by:	alc, attilio
Tested by:	flo (sparc64)
MFC after:	2 weeks
2011-09-28 14:57:50 +00:00
kib
a9d505a22a Split the vm_page flags PG_WRITEABLE and PG_REFERENCED into atomic
flags field. Updates to the atomic flags are performed using the atomic
ops on the containing word, do not require any vm lock to be held, and
are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9)
functions are provided to modify afalgs.

Document the changes to flags field to only require the page lock.

Introduce vm_page_reference(9) function to provide a stable KPI and
KBI for filesystems like tmpfs and zfs which need to mark a page as
referenced.

Reviewed by:    alc, attilio
Tested by:      marius, flo (sparc64); andreast (powerpc, powerpc64)
Approved by:	re (bz)
2011-09-06 10:30:11 +00:00
kib
61e3fec296 Add a facility to disable processing page faults. When activated,
uiomove generates EFAULT if any accessed address is not mapped, as
opposed to handling the fault.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	alc (previous version)
2011-07-09 15:21:10 +00:00
alc
e7ea911039 Revert to using the page queues lock in vm_page_clear_dirty_mask() on
MIPS.  (At present, although atomic_clear_char() is defined by atomic.h
on MIPS, it is not actually implemented by support.S.)
2011-06-23 05:23:59 +00:00
alc
95eeb54f18 Precisely document the synchronization rules for the page's dirty field.
(Saying that the lock on the object that the page belongs to must be held
only represents one aspect of the rules.)

Eliminate the use of the page queues lock for atomically performing read-
modify-write operations on the dirty field when the underlying architecture
supports atomic operations on char and short types.

Document the fact that 32KB pages aren't really supported.

Reviewed by:	attilio, kib
2011-06-19 19:13:24 +00:00
kib
eeb1ebf124 Handle the corner case in vm_fault_quick_hold_pages().
If supplied length is zero, and user address is invalid, function
might return -1, due to the truncation and rounding of the address.
The callers interpret the situation as EFAULT. Instead of handling
the zero length in caller, filter it in vm_fault_quick_hold_pages().

Sponsored by:	The FreeBSD Foundation
Reviewed by:	alc
2011-03-25 16:38:10 +00:00
alc
01330416e0 For some time now, the kernel and kmem objects have been ordinary
OBJT_PHYS objects.  Thus, there is no need for handling them specially
in vm_fault().  In fact, this special case handling would have led to
an assertion failure just before the call to pmap_enter().

Reviewed by:	kib@
MFC after:	6 weeks
2011-01-15 19:21:28 +00:00
alc
99000f1878 Correct a typo in vm_fault_quick_hold_pages().
Reported by:	Bartosz Stec
2010-12-28 20:02:30 +00:00
alc
d835374cc6 Retire vm_fault_quick(). It's no longer used.
Reviewed by:	kib@
2010-12-25 23:54:50 +00:00
alc
971b02b7bc Introduce and use a new VM interface for temporarily pinning pages. This
new interface replaces the combined use of vm_fault_quick() and
pmap_extract_and_hold() throughout the kernel.

In collaboration with:	kib@
2010-12-25 21:26:56 +00:00
alc
be5201b0d1 Introduce vm_fault_hold() and use it to (1) eliminate a long-standing race
condition in proc_rwmem() and to (2) simplify the implementation of the
cxgb driver's vm_fault_hold_user_pages().  Specifically, in proc_rwmem()
the requested read or write could fail because the targeted page could be
reclaimed between the calls to vm_fault() and vm_page_hold().

In collaboration with:	kib@
MFC after:	6 weeks
2010-12-20 22:49:31 +00:00
trasz
e5fb69509c Replace pointer to "struct uidinfo" with pointer to "struct ucred"
in "struct vm_object".  This is required to make it possible to account
for per-jail swap usage.

Reviewed by:	kib@
Tested by:	pho@
Sponsored by:	FreeBSD Foundation
2010-12-02 17:37:16 +00:00
alc
3e4ac0b6ee Use vm_page_prev() instead of vm_page_lookup() in the implementation of
vm_fault()'s automatic delete-behind heuristic.
vm_page_prev() is typically faster.
2010-07-02 19:59:18 +00:00
kib
598e4abcd1 When waiting for the busy page, do not unlock the object unless unlock
cannot be avoided.

Reviewed by:	alc
MFC after:	1 week
2010-05-20 08:51:01 +00:00
alc
684507e744 Push down the acquisition of the page queues lock into vm_pageq_remove().
(This eliminates a surprising number of page queues lock acquisitions by
vm_fault() because the page's queue is PQ_NONE and thus the page queues
lock is not needed to remove the page from a queue.)
2010-05-09 16:55:42 +00:00
alc
59b934ef40 Minimize the scope of the page queues lock in vm_fault(). 2010-05-08 21:35:51 +00:00
alc
40b44f9713 Push down the page queues into vm_page_cache(), vm_page_try_to_cache(), and
vm_page_try_to_free().  Consequently, push down the page queues lock into
pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and
pmap_remove_write().

Push down the page queues lock into Xen's pmap_page_is_mapped().  (I
overlooked the Xen pmap in r207702.)

Switch to a per-processor counter for the total number of pages cached.
2010-05-08 20:34:01 +00:00
alc
3c8033e013 Push down the page queues lock into vm_page_activate(). 2010-05-07 15:49:43 +00:00
alc
ebbcafea24 Push down the page queues lock into vm_page_deactivate(). Eliminate an
incorrect comment.
2010-05-07 04:14:07 +00:00
alc
fecc56fac1 Eliminate page queues locking around most calls to vm_page_free(). 2010-05-06 18:58:32 +00:00
alc
5c7ca3ee73 Acquire the page lock around all remaining calls to vm_page_free() on
managed pages that didn't already have that lock held.  (Freeing an
unmanaged page, such as the various pmaps use, doesn't require the page
lock.)

This allows a change in vm_page_remove()'s locking requirements.  It now
expects the page lock to be held instead of the page queues lock.
Consequently, the page queues lock is no longer required at all by callers
to vm_page_rename().

Discussed with: kib
2010-05-05 18:16:06 +00:00
alc
ea7b6345be Push down the acquisition of the page queues lock into vm_page_unwire().
Update the comment describing which lock should be held on entry to
vm_page_wire().

Reviewed by:	kib
2010-05-05 03:45:46 +00:00
alc
c9aaa1e2a2 Add page locking to the vm_page_cow* functions.
Push down the acquisition and release of the page queues lock into
vm_page_wire().

Reviewed by:	kib
2010-05-04 15:55:41 +00:00
alc
97e3b0ccbf Simplify vm_fault(). The introduction of the new page lock renders a bit of
cleverness by vm_fault() to avoid repeatedly releasing and reacquiring the
page queues lock pointless.

Reviewed by:	kib, kmacy
2010-05-02 20:24:25 +00:00
alc
f35e97166b It makes no sense for vm_page_sleep_if_busy()'s helper, vm_page_sleep(),
to unconditionally set PG_REFERENCED on a page before sleeping.  In many
cases, it's perfectly ok for the page to disappear, i.e., be reclaimed by
the page daemon, before the caller to vm_page_sleep() is reawakened.
Instead, we now explicitly set PG_REFERENCED in those cases where having
the page persist until the caller is awakened is clearly desirable.  Note,
however, that setting PG_REFERENCED on the page is still only a hint,
and not a guarantee that the page should persist.
2010-05-02 17:33:46 +00:00
kib
96ebc710b9 Unlock page lock instead of recursively locking it. 2010-04-30 16:20:14 +00:00
kmacy
1dc1263413 On Alan's advice, rather than do a wholesale conversion on a single
architecture from page queue lock to a hashed array of page locks
(based on a patch by Jeff Roberson), I've implemented page lock
support in the MI code and have only moved vm_page's hold_count
out from under page queue mutex to page lock. This changes
pmap_extract_and_hold on all pmaps.

Supported by: Bitgravity Inc.

Discussed with: alc, jeffr, and kib
2010-04-30 00:46:43 +00:00
alc
bf8b583320 Setting PG_REFERENCED on a page at the end of vm_fault() is redundant since
the page table entry's accessed bit is either preset by the immediately
preceding call to pmap_enter() or by hardware (or software) upon return
from vm_fault() when the faulting access is restarted.
2010-04-28 06:34:47 +00:00
kib
47feb6893a When OOM searches for a process to kill, ignore the processes already
killed by OOM. When killed process waits for a page allocation, try to
satisfy the request as fast as possible.

This removes the often encountered deadlock, where OOM continously
selects the same victim process, that sleeps uninterruptibly waiting
for a page. The killed process may still sleep if page cannot be
obtained immediately, but testing has shown that system has much
higher chance to survive in OOM situation with the patch.

In collaboration with:	pho
Reviewed by:	alc
MFC after:	4 weeks
2010-04-06 10:43:01 +00:00
alc
5eaca2d838 Properly synchronize the previous change. 2009-11-28 00:50:09 +00:00
alc
a9520143df Support the new VM_PROT_COPY option on wired pages. The effect of which
is that a debugger can now set a breakpoint in a program that uses mlock(2)
on its text segment or mlockall(2) on its entire address space.
2009-11-27 22:08:29 +00:00
alc
dcb93e6c95 Simplify the invocation of vm_fault(). Specifically, eliminate the flag
VM_FAULT_DIRTY.  The information provided by this flag can be trivially
inferred by vm_fault().

Discussed with:	kib
2009-11-27 20:24:11 +00:00
alc
2d9252d6c7 Replace VM_PROT_OVERRIDE_WRITE by VM_PROT_COPY. VM_PROT_OVERRIDE_WRITE has
represented a write access that is allowed to override write protection.
Until now, VM_PROT_OVERRIDE_WRITE has been used to write breakpoints into
text pages.  Text pages are not just write protected but they are also
copy-on-write.  VM_PROT_OVERRIDE_WRITE overrides the write protection on the
text page and triggers the replication of the page so that the breakpoint
will be written to a private copy.  However, here is where things become
confused.  It is the debugger, not the process being debugged that requires
write access to the copied page.  Nonetheless, the copied page is being
mapped into the process with write access enabled.  In other words, once the
debugger sets a breakpoint within a text page, the program can write to its
private copy of that text page.  Whereas prior to setting the breakpoint, a
SIGSEGV would have occurred upon a write access.  VM_PROT_COPY addresses
this problem.  The combination of VM_PROT_READ and VM_PROT_COPY forces the
replication of a copy-on-write page even though the access is only for read.
Moreover, the replicated page is only mapped into the process with read
access, and not write access.

Reviewed by:	kib
MFC after:	4 weeks
2009-11-26 05:16:07 +00:00
alc
ca67dc4da4 Simplify both the invocation and the implementation of vm_fault() for wiring
pages.

(Note: Claims made in the comments about the handling of breakpoints in
wired pages have been false for roughly a decade.  This and another bug
involving breakpoints will be fixed in coming changes.)

Reviewed by:	kib
2009-11-18 18:05:54 +00:00
alc
4ba0c7ba2f Eliminate an unnecessary #include. (This #include should have been removed
in r188331 when vnode_pager_lock() was eliminated.)
2009-11-04 03:12:56 +00:00
alc
4d317ded24 Eliminate a bit of hackery from vm_fault(). The operations that this
hackery sought to prevent are now properly supported by vm_map_protect().
(See r198505.)

Reviewed by:	kib
2009-11-03 17:15:15 +00:00
alc
b6a248b75a Correct an error in vm_fault_copy_entry() that has existed since the first
version of this file.  When a process forks, any wired pages are immediately
copied because copy-on-write is not supported for wired pages.  In other
words, the child process is given its own private copy of each wired page
from its parent's address space.  Unfortunately, to date, these copied pages
have been mapped into the child's address space with the wrong permissions,
typically VM_PROT_ALL.  This change corrects the permissions.

Reviewed by:	kib
2009-10-31 17:39:56 +00:00
kib
feb999713b When protection of wired read-only mapping is changed to read-write,
install new shadow object behind the map entry and copy the pages
from the underlying objects to it. This makes the mprotect(2) call to
actually perform the requested operation instead of silently do nothing
and return success, that causes SIGSEGV on later write access to the
mapping.

Reuse vm_fault_copy_entry() to do the copying, modifying it to behave
correctly when src_entry == dst_entry.

Reviewed by:	alc
MFC after:	3 weeks
2009-10-27 10:15:58 +00:00
alc
d4f827eb7a Simplify the inner loop of vm_fault_copy_entry().
Reviewed by:	kib
2009-10-26 00:01:52 +00:00
alc
9911b79277 Eliminate an unnecessary check from vm_fault_prefault(). 2009-10-25 17:30:50 +00:00
jhb
44220d7e1e Add a new type of VM object: OBJT_SG. An OBJT_SG object is very similar to
a device pager (OBJT_DEVICE) object in that it uses fictitious pages to
provide aliases to other memory addresses.  The primary difference is that
it uses an sglist(9) to determine the physical addresses for a given offset
into the object instead of invoking the d_mmap() method in a device driver.

Reviewed by:	alc
Approved by:	re (kensmith)
MFC after:	2 weeks
2009-07-24 13:50:29 +00:00
kib
af8ce5a988 When forking a vm space that has wired map entries, do not forget to
charge the objects created by vm_fault_copy_entry. The object charge
was set, but reserve not incremented.

Reported by:	Greg Rivers <gcr+freebsd-current tharned org>
Reviewed by:	alc (previous version)
Approved by:	re (kensmith)
2009-07-03 22:17:37 +00:00
kib
fa686c638e Implement global and per-uid accounting of the anonymous memory. Add
rlimit RLIMIT_SWAP that limits the amount of swap that may be reserved
for the uid.

The accounting information (charge) is associated with either map entry,
or vm object backing the entry, assuming the object is the first one
in the shadow chain and entry does not require COW. Charge is moved
from entry to object on allocation of the object, e.g. during the mmap,
assuming the object is allocated, or on the first page fault on the
entry. It moves back to the entry on forks due to COW setup.

The per-entry granularity of accounting makes the charge process fair
for processes that change uid during lifetime, and decrements charge
for proper uid when region is unmapped.

The interface of vm_pager_allocate(9) is extended by adding struct ucred *,
that is used to charge appropriate uid when allocation if performed by
kernel, e.g. md(4).

Several syscalls, among them is fork(2), may now return ENOMEM when
global or per-uid limits are enforced.

In collaboration with:	pho
Reviewed by:	alc
Approved by:	re (kensmith)
2009-06-23 20:45:22 +00:00
alc
919e3cbf28 Eliminate unnecessary obfuscation when testing a page's valid bits. 2009-06-07 19:38:26 +00:00
alc
30d072f507 Eliminate an incorrect comment. 2009-05-07 05:44:13 +00:00
alc
b9963f1636 Eliminate an archaic band-aid. The immediately preceding comment already
explains why the band-aid is unnecessary.

Suggested by:	tegge
2009-04-26 20:54:57 +00:00