2833 Commits

Author SHA1 Message Date
Kip Macy
e8f263195d - don't check hold_count without the page lock held
- don't leak the page lock if m->object is NULL
  (assuming that that check will in fact even be valid when m->object is protected by the page lock)
2010-04-30 19:40:37 +00:00
Konstantin Belousov
e20e8c1558 Unlock page lock instead of recursively locking it. 2010-04-30 16:20:14 +00:00
Kip Macy
6d74d042e3 don't allow unsynchronized free in vm_page_unhold 2010-04-30 02:46:49 +00:00
Kip Macy
2965a45315 On Alan's advice, rather than do a wholesale conversion on a single
architecture from page queue lock to a hashed array of page locks
(based on a patch by Jeff Roberson), I've implemented page lock
support in the MI code and have only moved vm_page's hold_count
out from under page queue mutex to page lock. This changes
pmap_extract_and_hold on all pmaps.

Supported by: Bitgravity Inc.

Discussed with: alc, jeffr, and kib
2010-04-30 00:46:43 +00:00
Alan Cox
82bfb965d1 Simplify the inner loop of vm_pageout_object_deactivate_pages(). Rather
than checking each page for PG_UNMANAGED, check the vm object's type.
Only OBJT_PHYS can have unmanaged pages.  Eliminate a pointless counter.
The vm object is locked, that lock is never released by the inner loop,
and the set of pages contained by the vm object is not changed by the
inner loop.  Therefore, the counter serves no purpose.
2010-04-29 16:18:45 +00:00
Konstantin Belousov
6fb8c0c117 When doing kstack swapin, read as much pages in one run as possible.
Suggested and reviewed by:	alc (previous version)
Tested by:	pho
MFC after:	2 weeks
2010-04-29 09:59:16 +00:00
Konstantin Belousov
e86a87e97e In swap pager, do not free the non-requested pages from the run if they are
wired. Kstack pages are wired, this change prepares swap pager for handling
of long runs of kstack pages.

Noted and reviewed by:	alc
Tested by:	pho
MFC after:	2 weeks
2010-04-29 09:57:25 +00:00
Alan Cox
77d6d85393 Setting PG_REFERENCED on a page at the end of vm_fault() is redundant since
the page table entry's accessed bit is either preset by the immediately
preceding call to pmap_enter() or by hardware (or software) upon return
from vm_fault() when the faulting access is restarted.
2010-04-28 06:34:47 +00:00
Alan Cox
6a2a3d7338 Change vm_object_madvise() so that it checks whether the page is invalid
or unmanaged before acquiring the page queues lock.  Neither of these
tests require that lock.  Moreover, a better way of testing if the page
is unmanaged is to test the type of vm object.  This avoids a pointless
vm_page_lookup().

MFC after:	3 weeks
2010-04-28 04:57:32 +00:00
Alan Cox
7b85f59183 Resurrect pmap_is_referenced() and use it in mincore(). Essentially,
pmap_ts_referenced() is not always appropriate for checking whether or
not pages have been referenced because it clears any reference bits
that it encounters.  For example, in mincore(), clearing the reference
bits has two negative consequences.  First, it throws off the activity
count calculations performed by the page daemon.  Specifically, a page
on which mincore() has called pmap_ts_referenced() looks less active
to the page daemon than it should.  Consequently, the page could be
deactivated prematurely by the page daemon.  Arguably, this problem
could be fixed by having mincore() duplicate the activity count
calculation on the page.  However, there is a second problem for which
that is not a solution.  In order to clear a reference on a 4KB page,
it may be necessary to demote a 2/4MB page mapping.  Thus, a mincore()
by one process can have the side effect of demoting a superpage
mapping within another process!
2010-04-24 17:32:52 +00:00
Alan Cox
5d4a7b7945 Eliminate an unnecessary call to pmap_remove_all(). If a page belongs to
an object whose reference count is zero, then that page cannot possibly
be mapped.
2010-04-20 04:16:39 +00:00
Alan Cox
7b7d5b6c58 vm_thread_swapout() can safely dirty the page before rather than after
acquiring the page queues lock.
2010-04-19 00:18:14 +00:00
Juli Mallett
ca596a25f0 o) Add a VM find-space option, VMFS_TLB_ALIGNED_SPACE, which searches the
address space for an address as aligned by the new pmap_align_tlb()
   function, which is for constraints imposed by the TLB. [1]
o) Add a kmem_alloc_nofault_space() function, which acts like
   kmem_alloc_nofault() but allows the caller to specify which find-space
   option to use. [1]
o) Use kmem_alloc_nofault_space() with VMFS_TLB_ALIGNED_SPACE to allocate the
   kernel stack address on MIPS. [1]
o) Make pmap_align_tlb() on MIPS align addresses so that they do not start on
   an odd boundary within the TLB, so that they are suitable for insertion as
   wired entries and do not have to share a TLB entry with another mapping,
   assuming they are appropriately-sized.
o) Eliminate md_realstack now that the kstack will be appropriately-aligned on
   MIPS.
o) Increase the number of guard pages to 2 so that we retain the proper
   alignment of the kstack address.

Reviewed by:	[1] alc
X-MFC-after:	Making sure alc has not come up with a better interface.
2010-04-18 22:32:07 +00:00
Alan Cox
b28889a2fc Remove a nonsensical test from vm_pageout_clean(). A page can't be in the
inactive queue and have a non-zero wire count.

Reviewed by:	kib
MFC after:	3 weeks
2010-04-18 21:29:28 +00:00
Alan Cox
4b9dd5d537 There is no justification for vm_object_split() setting PG_REFERENCED on a
page that it is going to sleep on.  Eliminate it.

MFC after:	3 weeks
2010-04-18 17:50:09 +00:00
Alan Cox
b11b56b55b In vm_object_madvise() setting PG_REFERENCED on a page before sleeping on
that page only makes sense if the advice is MADV_WILLNEED.  In that case,
the intention is to activate the page, so discouraging the page daemon
from reclaiming the page makes sense.  In contrast, in the other cases,
MADV_DONTNEED and MADV_FREE, it makes no sense whatsoever to discourage
the page daemon from reclaiming the page by setting PG_REFERENCED.

Wrap a nearby line.

Discussed with:	kib
MFC after:	3 weeks
2010-04-17 21:14:37 +00:00
Alan Cox
aefea7f519 In vm_object_backing_scan(), setting PG_REFERENCED on a page before
sleeping on that page is nonsensical.  Doing so reduces the likelihood
that the page daemon will reclaim the page before the thread waiting in
vm_object_backing_scan() is reawakened.  However, it does not guarantee
that the page is not reclaimed, so vm_object_backing_scan() restarts
after reawakening.  More importantly, this muddles the meaning of
PG_REFERENCED.  There is no reason to believe that the caller of
vm_object_backing_scan() is going to use (i.e., access) the contents of
the page.  There is especially no reason to believe that an access is
more likely because vm_object_backing_scan() had to sleep on the page.

Discussed with:	kib
MFC after:	3 weeks
2010-04-17 18:35:07 +00:00
Alan Cox
0b6ace4743 Setting PG_REFERENCED on the requested page in swap_pager_getpages() is
either redundant or harmful, depending on the caller.  For example, when
called by vm_fault(), it is redundant.  However, when called by
vm_thread_swapin(), it is harmful.  Specifically, if the thread is later
swapped out, having PG_REFERENCED set on its stack pages leads the page
daemon to reactivate these stack pages and delay their reclamation.

Reviewed by:	kib
MFC after:	3 weeks
2010-04-17 17:02:17 +00:00
Alan Cox
6c68c971cb Simplify vm_thread_swapin(). 2010-04-13 06:48:37 +00:00
Alan Cox
ac45ee97c9 Initialize the virtual memory-related resource limits in a single place.
Previously, one of these limits was initialized in two places to a
different value in each place.  Moreover, because an unsigned int was used
to represent the amount of pageable physical memory, some of these limits
were incorrectly initialized on 64-bit architectures.  (Currently, this
error is masked by login.conf's default settings.)

Make vm_thread_swapin() and vm_thread_swapout() static.

Submitted by:	bde (an earlier version)
Reviewed by:	kib
2010-04-11 16:26:07 +00:00
Alan Cox
8ef9d880e6 Introduce the function kmem_alloc_attr(), which allocates kernel virtual
memory with the specified physical attributes.  In particular, like
kmem_alloc_contig(), the caller can specify the physical address range
from which the physical pages are allocated and the memory attributes
(i.e., cache behavior) for these physical pages.  However, in contrast to
kmem_alloc_contig() or contigmalloc(), the physical pages that are
allocated by kmem_alloc_attr() are not necessarily physically contiguous.
This function is needed by DRM and VirtualBox.

Correct an error in the prototype for kmem_malloc().  The third argument
had the wrong type.

Tested by:	rnoland
MFC after:	3 days
2010-04-09 02:39:20 +00:00
Joel Dahl
c0587701ad Start copyright notice with /*- 2010-04-07 16:29:10 +00:00
Konstantin Belousov
3f1c4c4f31 When OOM searches for a process to kill, ignore the processes already
killed by OOM. When killed process waits for a page allocation, try to
satisfy the request as fast as possible.

This removes the often encountered deadlock, where OOM continously
selects the same victim process, that sleeps uninterruptibly waiting
for a page. The killed process may still sleep if page cannot be
obtained immediately, but testing has shown that system has much
higher chance to survive in OOM situation with the patch.

In collaboration with:	pho
Reviewed by:	alc
MFC after:	4 weeks
2010-04-06 10:43:01 +00:00
Alan Cox
f6d00b38c7 vm_reserv_alloc_page() should never be called on an OBJT_SG object, just as
it is never called on an OBJT_DEVICE object.  (This change should have been
included in r195840.)

Reported by:	dougb@, avg@
MFC after:	3 days
2010-04-05 06:23:31 +00:00
Alan Cox
92351f162e Make _vm_map_init() the one place where the vm map's pmap field is
initialized.

Reviewed by:	kib
2010-04-03 19:07:05 +00:00
Alan Cox
0ef12795b5 Re-enable the call to pmap_release() by vmspace_dofree(). The accounting
problem that is described in the comment has been addressed.

Submitted by:	kib
Tested by:	pho (a few months ago)
MFC after:	6 weeks
2010-04-03 16:20:22 +00:00
John Baldwin
5711bf30da Reject attempts to create a MAP_ANON mapping with a non-zero offset.
PR:		kern/71258
Submitted by:	Alexander Best
MFC after:	2 weeks
2010-03-23 21:08:07 +00:00
Kip Macy
1a23373ceb - enable alignment on amd64 only
- only align pcpu caches and the volatile portion of uma_zone
2010-03-22 22:39:32 +00:00
Kip Macy
6b4391d786 turn 205266 in to a no-op until the problem can be properly diagnosed 2010-03-18 20:30:25 +00:00
Kip Macy
5e4bb93cca Cache line align various structures and move volatile counters to
not share a cache line with (mostly) immutable state

Reviewed by:	jeff@
MFC after:	7 days
2010-03-17 21:18:28 +00:00
Konstantin Belousov
ddb16cfc32 Update comment for vm_page_alloc(9), listing all acceptable flags [1].
Note that the function does not sleep, it can block.

Submitted by:	Giovanni Trematerra <giovanni.trematerra gmail com> [1]
MFC after:	3 days
2010-02-27 17:09:28 +00:00
Konstantin Belousov
d7de6e2cb6 Remove write-only variable.
MFC after:	3 days
2010-02-22 16:00:56 +00:00
Alan Cox
fd776d18a0 Align the start of the clean submap to a superpage boundary. Although
no superpage mappings are created within the clean submap, aligning the
start of the clean submap helps to prevent interference with kmem_alloc()'s
use of superpages.
2010-02-21 22:23:13 +00:00
Konstantin Belousov
41c2274481 The MAP_ENTRY_NEEDS_COPY flag belongs to protoeflags, cow variable
uses different namespace.

Reported by:	Jonathan Anderson <jonathan.anderson cl cam ac uk>
MFC after:	3 days
2010-01-29 19:25:45 +00:00
Konstantin Belousov
b9f180d1de When a vnode-backed vm object is referenced, it increments the vnode
reference count, and decrements it on dereference. If referenced object
is deallocated, object type is reset to OBJT_DEAD. Consequently, all
vnode references that are owned by object references are never released.
vunref() the vnode in vm object deallocation code for OBJT_VNODE
appropriate number of times to prevent leak.

Add an assertion to the vm_pageout() to make sure that we never get
reference on the vnode but then do not execute code to release it.

In collaboration with:	pho
Reviewed by:	alc
MFC after:	3 weeks
2010-01-17 21:26:14 +00:00
Robert Noland
cfd7bacef2 Update d_mmap() to accept vm_ooffset_t and vm_memattr_t.
This replaces d_mmap() with the d_mmap2() implementation and also
changes the type of offset to vm_ooffset_t.

Purge d_mmap2().

All driver modules will need to be rebuilt since D_VERSION is also
bumped.

Reviewed by:	jhb@
MFC after:	Not in this lifetime...
2009-12-29 21:51:28 +00:00
Antoine Brodin
13e403fdea (S)LIST_HEAD_INITIALIZER takes a (S)LIST_HEAD as an argument.
Fix some wrong usages.
Note: this does not affect generated binaries as this argument is not used.

PR:		137213
Submitted by:	Eygene Ryabinkin (initial version)
MFC after:	1 month
2009-12-28 22:56:30 +00:00
Konstantin Belousov
49e3050e6c VI_OBJDIRTY vnode flag mirrors the state of OBJ_MIGHTBEDIRTY vm object
flag. Besides providing the redundand information, need to update both
vnode and object flags causes more acquisition of vnode interlock.
OBJ_MIGHTBEDIRTY is only checked for vnode-backed vm objects.

Remove VI_OBJDIRTY and make sure that OBJ_MIGHTBEDIRTY is set only for
vnode-backed vm objects.

Suggested and reviewed by:	alc
Tested by:	pho
MFC after:	3 weeks
2009-12-21 12:29:38 +00:00
Antoine Brodin
4e2d83fc4a Remove trailing ";" in UMA_HASH_INSERT and UMA_HASH_REMOVE macros.
MFC after:	1 month
2009-12-05 17:45:56 +00:00
Alan Cox
79f6ebe233 Properly synchronize the previous change. 2009-11-28 00:50:09 +00:00
Alan Cox
d8778512cf Support the new VM_PROT_COPY option on wired pages. The effect of which
is that a debugger can now set a breakpoint in a program that uses mlock(2)
on its text segment or mlockall(2) on its entire address space.
2009-11-27 22:08:29 +00:00
Alan Cox
e2997fea72 Simplify the invocation of vm_fault(). Specifically, eliminate the flag
VM_FAULT_DIRTY.  The information provided by this flag can be trivially
inferred by vm_fault().

Discussed with:	kib
2009-11-27 20:24:11 +00:00
Alan Cox
a6d42a0d62 Replace VM_PROT_OVERRIDE_WRITE by VM_PROT_COPY. VM_PROT_OVERRIDE_WRITE has
represented a write access that is allowed to override write protection.
Until now, VM_PROT_OVERRIDE_WRITE has been used to write breakpoints into
text pages.  Text pages are not just write protected but they are also
copy-on-write.  VM_PROT_OVERRIDE_WRITE overrides the write protection on the
text page and triggers the replication of the page so that the breakpoint
will be written to a private copy.  However, here is where things become
confused.  It is the debugger, not the process being debugged that requires
write access to the copied page.  Nonetheless, the copied page is being
mapped into the process with write access enabled.  In other words, once the
debugger sets a breakpoint within a text page, the program can write to its
private copy of that text page.  Whereas prior to setting the breakpoint, a
SIGSEGV would have occurred upon a write access.  VM_PROT_COPY addresses
this problem.  The combination of VM_PROT_READ and VM_PROT_COPY forces the
replication of a copy-on-write page even though the access is only for read.
Moreover, the replicated page is only mapped into the process with read
access, and not write access.

Reviewed by:	kib
MFC after:	4 weeks
2009-11-26 05:16:07 +00:00
Alan Cox
2db65ab46e Simplify both the invocation and the implementation of vm_fault() for wiring
pages.

(Note: Claims made in the comments about the handling of breakpoints in
wired pages have been false for roughly a decade.  This and another bug
involving breakpoints will be fixed in coming changes.)

Reviewed by:	kib
2009-11-18 18:05:54 +00:00
Alan Cox
2dd02f4773 Eliminate an unnecessary #include. (This #include should have been removed
in r188331 when vnode_pager_lock() was eliminated.)
2009-11-04 03:12:56 +00:00
Alan Cox
86684848b6 Eliminate a bit of hackery from vm_fault(). The operations that this
hackery sought to prevent are now properly supported by vm_map_protect().
(See r198505.)

Reviewed by:	kib
2009-11-03 17:15:15 +00:00
Attilio Rao
1b9d701fee Split P_NOLOAD into a per-thread flag (TDF_NOLOAD).
This improvements aims for avoiding further cache-misses in scheduler
specific functions which need to keep track of average thread running
time and further locking in places setting for this flag.

Reported by:	jeff (originally), kris (currently)
Reviewed by:	jhb
Tested by:	Giuseppe Cocomazzi <sbudella at email dot it>
2009-11-03 16:46:52 +00:00
Alan Cox
2fafce9e4c Avoid pointless calls to pmap_protect().
Reviewed by:	kib
2009-11-02 17:45:39 +00:00
Ivan Voras
8a9c731f13 Add sysctl documentation strings. The descriptions are derived
from tuning(7). One of the descriptions references tuning(7) because
it is too complex to adequatly describe here (it is not a simple
boolean sysctl) and users should be warned to that.

Reviewed by:	alc, kib
Approved by:	gnn (mentor)
2009-11-02 16:56:59 +00:00
Alan Cox
e4ed417a35 Correct an error in vm_fault_copy_entry() that has existed since the first
version of this file.  When a process forks, any wired pages are immediately
copied because copy-on-write is not supported for wired pages.  In other
words, the child process is given its own private copy of each wired page
from its parent's address space.  Unfortunately, to date, these copied pages
have been mapped into the child's address space with the wrong permissions,
typically VM_PROT_ALL.  This change corrects the permissions.

Reviewed by:	kib
2009-10-31 17:39:56 +00:00