2677 Commits

Author SHA1 Message Date
kib
b1ef5079fa In swap pager, do not free the non-requested pages from the run if they are
wired. Kstack pages are wired, this change prepares swap pager for handling
of long runs of kstack pages.

Noted and reviewed by:	alc
Tested by:	pho
MFC after:	2 weeks
2010-04-29 09:57:25 +00:00
alc
bf8b583320 Setting PG_REFERENCED on a page at the end of vm_fault() is redundant since
the page table entry's accessed bit is either preset by the immediately
preceding call to pmap_enter() or by hardware (or software) upon return
from vm_fault() when the faulting access is restarted.
2010-04-28 06:34:47 +00:00
alc
e3ebeca68e Change vm_object_madvise() so that it checks whether the page is invalid
or unmanaged before acquiring the page queues lock.  Neither of these
tests require that lock.  Moreover, a better way of testing if the page
is unmanaged is to test the type of vm object.  This avoids a pointless
vm_page_lookup().

MFC after:	3 weeks
2010-04-28 04:57:32 +00:00
alc
0a905b1db9 Resurrect pmap_is_referenced() and use it in mincore(). Essentially,
pmap_ts_referenced() is not always appropriate for checking whether or
not pages have been referenced because it clears any reference bits
that it encounters.  For example, in mincore(), clearing the reference
bits has two negative consequences.  First, it throws off the activity
count calculations performed by the page daemon.  Specifically, a page
on which mincore() has called pmap_ts_referenced() looks less active
to the page daemon than it should.  Consequently, the page could be
deactivated prematurely by the page daemon.  Arguably, this problem
could be fixed by having mincore() duplicate the activity count
calculation on the page.  However, there is a second problem for which
that is not a solution.  In order to clear a reference on a 4KB page,
it may be necessary to demote a 2/4MB page mapping.  Thus, a mincore()
by one process can have the side effect of demoting a superpage
mapping within another process!
2010-04-24 17:32:52 +00:00
alc
53e61968c0 Eliminate an unnecessary call to pmap_remove_all(). If a page belongs to
an object whose reference count is zero, then that page cannot possibly
be mapped.
2010-04-20 04:16:39 +00:00
alc
f8993e9243 vm_thread_swapout() can safely dirty the page before rather than after
acquiring the page queues lock.
2010-04-19 00:18:14 +00:00
jmallett
4f9a815abe o) Add a VM find-space option, VMFS_TLB_ALIGNED_SPACE, which searches the
address space for an address as aligned by the new pmap_align_tlb()
   function, which is for constraints imposed by the TLB. [1]
o) Add a kmem_alloc_nofault_space() function, which acts like
   kmem_alloc_nofault() but allows the caller to specify which find-space
   option to use. [1]
o) Use kmem_alloc_nofault_space() with VMFS_TLB_ALIGNED_SPACE to allocate the
   kernel stack address on MIPS. [1]
o) Make pmap_align_tlb() on MIPS align addresses so that they do not start on
   an odd boundary within the TLB, so that they are suitable for insertion as
   wired entries and do not have to share a TLB entry with another mapping,
   assuming they are appropriately-sized.
o) Eliminate md_realstack now that the kstack will be appropriately-aligned on
   MIPS.
o) Increase the number of guard pages to 2 so that we retain the proper
   alignment of the kstack address.

Reviewed by:	[1] alc
X-MFC-after:	Making sure alc has not come up with a better interface.
2010-04-18 22:32:07 +00:00
alc
03f963758c Remove a nonsensical test from vm_pageout_clean(). A page can't be in the
inactive queue and have a non-zero wire count.

Reviewed by:	kib
MFC after:	3 weeks
2010-04-18 21:29:28 +00:00
alc
7558b73168 There is no justification for vm_object_split() setting PG_REFERENCED on a
page that it is going to sleep on.  Eliminate it.

MFC after:	3 weeks
2010-04-18 17:50:09 +00:00
alc
ae8ea7c783 In vm_object_madvise() setting PG_REFERENCED on a page before sleeping on
that page only makes sense if the advice is MADV_WILLNEED.  In that case,
the intention is to activate the page, so discouraging the page daemon
from reclaiming the page makes sense.  In contrast, in the other cases,
MADV_DONTNEED and MADV_FREE, it makes no sense whatsoever to discourage
the page daemon from reclaiming the page by setting PG_REFERENCED.

Wrap a nearby line.

Discussed with:	kib
MFC after:	3 weeks
2010-04-17 21:14:37 +00:00
alc
7075c32b12 In vm_object_backing_scan(), setting PG_REFERENCED on a page before
sleeping on that page is nonsensical.  Doing so reduces the likelihood
that the page daemon will reclaim the page before the thread waiting in
vm_object_backing_scan() is reawakened.  However, it does not guarantee
that the page is not reclaimed, so vm_object_backing_scan() restarts
after reawakening.  More importantly, this muddles the meaning of
PG_REFERENCED.  There is no reason to believe that the caller of
vm_object_backing_scan() is going to use (i.e., access) the contents of
the page.  There is especially no reason to believe that an access is
more likely because vm_object_backing_scan() had to sleep on the page.

Discussed with:	kib
MFC after:	3 weeks
2010-04-17 18:35:07 +00:00
alc
06e8a2d9cc Setting PG_REFERENCED on the requested page in swap_pager_getpages() is
either redundant or harmful, depending on the caller.  For example, when
called by vm_fault(), it is redundant.  However, when called by
vm_thread_swapin(), it is harmful.  Specifically, if the thread is later
swapped out, having PG_REFERENCED set on its stack pages leads the page
daemon to reactivate these stack pages and delay their reclamation.

Reviewed by:	kib
MFC after:	3 weeks
2010-04-17 17:02:17 +00:00
alc
93c04293ef Simplify vm_thread_swapin(). 2010-04-13 06:48:37 +00:00
alc
89e5d72c2b Initialize the virtual memory-related resource limits in a single place.
Previously, one of these limits was initialized in two places to a
different value in each place.  Moreover, because an unsigned int was used
to represent the amount of pageable physical memory, some of these limits
were incorrectly initialized on 64-bit architectures.  (Currently, this
error is masked by login.conf's default settings.)

Make vm_thread_swapin() and vm_thread_swapout() static.

Submitted by:	bde (an earlier version)
Reviewed by:	kib
2010-04-11 16:26:07 +00:00
alc
2366ade4c5 Introduce the function kmem_alloc_attr(), which allocates kernel virtual
memory with the specified physical attributes.  In particular, like
kmem_alloc_contig(), the caller can specify the physical address range
from which the physical pages are allocated and the memory attributes
(i.e., cache behavior) for these physical pages.  However, in contrast to
kmem_alloc_contig() or contigmalloc(), the physical pages that are
allocated by kmem_alloc_attr() are not necessarily physically contiguous.
This function is needed by DRM and VirtualBox.

Correct an error in the prototype for kmem_malloc().  The third argument
had the wrong type.

Tested by:	rnoland
MFC after:	3 days
2010-04-09 02:39:20 +00:00
joel
e05c4d0c19 Start copyright notice with /*- 2010-04-07 16:29:10 +00:00
kib
47feb6893a When OOM searches for a process to kill, ignore the processes already
killed by OOM. When killed process waits for a page allocation, try to
satisfy the request as fast as possible.

This removes the often encountered deadlock, where OOM continously
selects the same victim process, that sleeps uninterruptibly waiting
for a page. The killed process may still sleep if page cannot be
obtained immediately, but testing has shown that system has much
higher chance to survive in OOM situation with the patch.

In collaboration with:	pho
Reviewed by:	alc
MFC after:	4 weeks
2010-04-06 10:43:01 +00:00
alc
6976d9abe9 vm_reserv_alloc_page() should never be called on an OBJT_SG object, just as
it is never called on an OBJT_DEVICE object.  (This change should have been
included in r195840.)

Reported by:	dougb@, avg@
MFC after:	3 days
2010-04-05 06:23:31 +00:00
alc
7530e331f2 Make _vm_map_init() the one place where the vm map's pmap field is
initialized.

Reviewed by:	kib
2010-04-03 19:07:05 +00:00
alc
d9ce618d9b Re-enable the call to pmap_release() by vmspace_dofree(). The accounting
problem that is described in the comment has been addressed.

Submitted by:	kib
Tested by:	pho (a few months ago)
MFC after:	6 weeks
2010-04-03 16:20:22 +00:00
jhb
399c01844a Reject attempts to create a MAP_ANON mapping with a non-zero offset.
PR:		kern/71258
Submitted by:	Alexander Best
MFC after:	2 weeks
2010-03-23 21:08:07 +00:00
kmacy
122090fb7e - enable alignment on amd64 only
- only align pcpu caches and the volatile portion of uma_zone
2010-03-22 22:39:32 +00:00
kmacy
377858b3ae turn 205266 in to a no-op until the problem can be properly diagnosed 2010-03-18 20:30:25 +00:00
kmacy
4e6ab892f5 Cache line align various structures and move volatile counters to
not share a cache line with (mostly) immutable state

Reviewed by:	jeff@
MFC after:	7 days
2010-03-17 21:18:28 +00:00
kib
75f11bce71 Update comment for vm_page_alloc(9), listing all acceptable flags [1].
Note that the function does not sleep, it can block.

Submitted by:	Giovanni Trematerra <giovanni.trematerra gmail com> [1]
MFC after:	3 days
2010-02-27 17:09:28 +00:00
kib
a638c3a888 Remove write-only variable.
MFC after:	3 days
2010-02-22 16:00:56 +00:00
alc
eeebe8a449 Align the start of the clean submap to a superpage boundary. Although
no superpage mappings are created within the clean submap, aligning the
start of the clean submap helps to prevent interference with kmem_alloc()'s
use of superpages.
2010-02-21 22:23:13 +00:00
kib
baeb94977a The MAP_ENTRY_NEEDS_COPY flag belongs to protoeflags, cow variable
uses different namespace.

Reported by:	Jonathan Anderson <jonathan.anderson cl cam ac uk>
MFC after:	3 days
2010-01-29 19:25:45 +00:00
kib
ea1469181d When a vnode-backed vm object is referenced, it increments the vnode
reference count, and decrements it on dereference. If referenced object
is deallocated, object type is reset to OBJT_DEAD. Consequently, all
vnode references that are owned by object references are never released.
vunref() the vnode in vm object deallocation code for OBJT_VNODE
appropriate number of times to prevent leak.

Add an assertion to the vm_pageout() to make sure that we never get
reference on the vnode but then do not execute code to release it.

In collaboration with:	pho
Reviewed by:	alc
MFC after:	3 weeks
2010-01-17 21:26:14 +00:00
rnoland
3dc3ad8568 Update d_mmap() to accept vm_ooffset_t and vm_memattr_t.
This replaces d_mmap() with the d_mmap2() implementation and also
changes the type of offset to vm_ooffset_t.

Purge d_mmap2().

All driver modules will need to be rebuilt since D_VERSION is also
bumped.

Reviewed by:	jhb@
MFC after:	Not in this lifetime...
2009-12-29 21:51:28 +00:00
antoine
bfd388c026 (S)LIST_HEAD_INITIALIZER takes a (S)LIST_HEAD as an argument.
Fix some wrong usages.
Note: this does not affect generated binaries as this argument is not used.

PR:		137213
Submitted by:	Eygene Ryabinkin (initial version)
MFC after:	1 month
2009-12-28 22:56:30 +00:00
kib
b79e14054c VI_OBJDIRTY vnode flag mirrors the state of OBJ_MIGHTBEDIRTY vm object
flag. Besides providing the redundand information, need to update both
vnode and object flags causes more acquisition of vnode interlock.
OBJ_MIGHTBEDIRTY is only checked for vnode-backed vm objects.

Remove VI_OBJDIRTY and make sure that OBJ_MIGHTBEDIRTY is set only for
vnode-backed vm objects.

Suggested and reviewed by:	alc
Tested by:	pho
MFC after:	3 weeks
2009-12-21 12:29:38 +00:00
antoine
646097b80a Remove trailing ";" in UMA_HASH_INSERT and UMA_HASH_REMOVE macros.
MFC after:	1 month
2009-12-05 17:45:56 +00:00
alc
5eaca2d838 Properly synchronize the previous change. 2009-11-28 00:50:09 +00:00
alc
a9520143df Support the new VM_PROT_COPY option on wired pages. The effect of which
is that a debugger can now set a breakpoint in a program that uses mlock(2)
on its text segment or mlockall(2) on its entire address space.
2009-11-27 22:08:29 +00:00
alc
dcb93e6c95 Simplify the invocation of vm_fault(). Specifically, eliminate the flag
VM_FAULT_DIRTY.  The information provided by this flag can be trivially
inferred by vm_fault().

Discussed with:	kib
2009-11-27 20:24:11 +00:00
alc
2d9252d6c7 Replace VM_PROT_OVERRIDE_WRITE by VM_PROT_COPY. VM_PROT_OVERRIDE_WRITE has
represented a write access that is allowed to override write protection.
Until now, VM_PROT_OVERRIDE_WRITE has been used to write breakpoints into
text pages.  Text pages are not just write protected but they are also
copy-on-write.  VM_PROT_OVERRIDE_WRITE overrides the write protection on the
text page and triggers the replication of the page so that the breakpoint
will be written to a private copy.  However, here is where things become
confused.  It is the debugger, not the process being debugged that requires
write access to the copied page.  Nonetheless, the copied page is being
mapped into the process with write access enabled.  In other words, once the
debugger sets a breakpoint within a text page, the program can write to its
private copy of that text page.  Whereas prior to setting the breakpoint, a
SIGSEGV would have occurred upon a write access.  VM_PROT_COPY addresses
this problem.  The combination of VM_PROT_READ and VM_PROT_COPY forces the
replication of a copy-on-write page even though the access is only for read.
Moreover, the replicated page is only mapped into the process with read
access, and not write access.

Reviewed by:	kib
MFC after:	4 weeks
2009-11-26 05:16:07 +00:00
alc
ca67dc4da4 Simplify both the invocation and the implementation of vm_fault() for wiring
pages.

(Note: Claims made in the comments about the handling of breakpoints in
wired pages have been false for roughly a decade.  This and another bug
involving breakpoints will be fixed in coming changes.)

Reviewed by:	kib
2009-11-18 18:05:54 +00:00
alc
4ba0c7ba2f Eliminate an unnecessary #include. (This #include should have been removed
in r188331 when vnode_pager_lock() was eliminated.)
2009-11-04 03:12:56 +00:00
alc
4d317ded24 Eliminate a bit of hackery from vm_fault(). The operations that this
hackery sought to prevent are now properly supported by vm_map_protect().
(See r198505.)

Reviewed by:	kib
2009-11-03 17:15:15 +00:00
attilio
1c940ef4f4 Split P_NOLOAD into a per-thread flag (TDF_NOLOAD).
This improvements aims for avoiding further cache-misses in scheduler
specific functions which need to keep track of average thread running
time and further locking in places setting for this flag.

Reported by:	jeff (originally), kris (currently)
Reviewed by:	jhb
Tested by:	Giuseppe Cocomazzi <sbudella at email dot it>
2009-11-03 16:46:52 +00:00
alc
9c1e3b8d87 Avoid pointless calls to pmap_protect().
Reviewed by:	kib
2009-11-02 17:45:39 +00:00
ivoras
346b77e39e Add sysctl documentation strings. The descriptions are derived
from tuning(7). One of the descriptions references tuning(7) because
it is too complex to adequatly describe here (it is not a simple
boolean sysctl) and users should be warned to that.

Reviewed by:	alc, kib
Approved by:	gnn (mentor)
2009-11-02 16:56:59 +00:00
alc
b6a248b75a Correct an error in vm_fault_copy_entry() that has existed since the first
version of this file.  When a process forks, any wired pages are immediately
copied because copy-on-write is not supported for wired pages.  In other
words, the child process is given its own private copy of each wired page
from its parent's address space.  Unfortunately, to date, these copied pages
have been mapped into the child's address space with the wrong permissions,
typically VM_PROT_ALL.  This change corrects the permissions.

Reviewed by:	kib
2009-10-31 17:39:56 +00:00
kib
feb999713b When protection of wired read-only mapping is changed to read-write,
install new shadow object behind the map entry and copy the pages
from the underlying objects to it. This makes the mprotect(2) call to
actually perform the requested operation instead of silently do nothing
and return success, that causes SIGSEGV on later write access to the
mapping.

Reuse vm_fault_copy_entry() to do the copying, modifying it to behave
correctly when src_entry == dst_entry.

Reviewed by:	alc
MFC after:	3 weeks
2009-10-27 10:15:58 +00:00
alc
d4f827eb7a Simplify the inner loop of vm_fault_copy_entry().
Reviewed by:	kib
2009-10-26 00:01:52 +00:00
alc
9911b79277 Eliminate an unnecessary check from vm_fault_prefault(). 2009-10-25 17:30:50 +00:00
marcel
51bb720939 o Introduce vm_sync_icache() for making the I-cache coherent with
the memory or D-cache, depending on the semantics of the platform.
    vm_sync_icache() is basically a wrapper around pmap_sync_icache(),
    that translates the vm_map_t argumument to pmap_t.
o   Introduce pmap_sync_icache() to all PMAP implementation. For powerpc
    it replaces the pmap_page_executable() function, added to solve
    the I-cache problem in uiomove_fromphys().
o   In proc_rwmem() call vm_sync_icache() when writing to a page that
    has execute permissions. This assures that when breakpoints are
    written, the I-cache will be coherent and the process will actually
    hit the breakpoint.
o   This also fixes the Book-E PMAP implementation that was missing
    necessary locking while trying to deal with the I-cache coherency
    in pmap_enter() (read: mmu_booke_enter_locked).

The key property of this change is that the I-cache is made coherent
*after* writes have been done. Doing it in the PMAP layer when adding
or changing a mapping means that the I-cache is made coherent *before*
any writes happen. The difference is key when the I-cache prefetches.
2009-10-21 18:38:02 +00:00
kib
04ed7ad878 Remove spurious call to priv_check(PRIV_VM_SWAP_NOQUOTA).
Call priv_check(PRIV_VM_SWAP_NORLIMIT) only when per-uid limit is
actually exceed.

Both changes aim at calling priv_check(9) only for the cases when
privilege is actually exercised by the process.

Reported and tested by:	rwatson
Reviewed by:	alc
MFC after:	3 days
2009-10-18 12:55:39 +00:00
alc
dce82c729a Align and pad the page queue and free page queue locks so that the linker
can't possibly place them together within the same cache line.

MFC after:	3 weeks
2009-10-04 18:53:10 +00:00