that page only makes sense if the advice is MADV_WILLNEED. In that case,
the intention is to activate the page, so discouraging the page daemon
from reclaiming the page makes sense. In contrast, in the other cases,
MADV_DONTNEED and MADV_FREE, it makes no sense whatsoever to discourage
the page daemon from reclaiming the page by setting PG_REFERENCED.
Wrap a nearby line.
Discussed with: kib
MFC after: 3 weeks
sleeping on that page is nonsensical. Doing so reduces the likelihood
that the page daemon will reclaim the page before the thread waiting in
vm_object_backing_scan() is reawakened. However, it does not guarantee
that the page is not reclaimed, so vm_object_backing_scan() restarts
after reawakening. More importantly, this muddles the meaning of
PG_REFERENCED. There is no reason to believe that the caller of
vm_object_backing_scan() is going to use (i.e., access) the contents of
the page. There is especially no reason to believe that an access is
more likely because vm_object_backing_scan() had to sleep on the page.
Discussed with: kib
MFC after: 3 weeks
either redundant or harmful, depending on the caller. For example, when
called by vm_fault(), it is redundant. However, when called by
vm_thread_swapin(), it is harmful. Specifically, if the thread is later
swapped out, having PG_REFERENCED set on its stack pages leads the page
daemon to reactivate these stack pages and delay their reclamation.
Reviewed by: kib
MFC after: 3 weeks
Previously, one of these limits was initialized in two places to a
different value in each place. Moreover, because an unsigned int was used
to represent the amount of pageable physical memory, some of these limits
were incorrectly initialized on 64-bit architectures. (Currently, this
error is masked by login.conf's default settings.)
Make vm_thread_swapin() and vm_thread_swapout() static.
Submitted by: bde (an earlier version)
Reviewed by: kib
memory with the specified physical attributes. In particular, like
kmem_alloc_contig(), the caller can specify the physical address range
from which the physical pages are allocated and the memory attributes
(i.e., cache behavior) for these physical pages. However, in contrast to
kmem_alloc_contig() or contigmalloc(), the physical pages that are
allocated by kmem_alloc_attr() are not necessarily physically contiguous.
This function is needed by DRM and VirtualBox.
Correct an error in the prototype for kmem_malloc(). The third argument
had the wrong type.
Tested by: rnoland
MFC after: 3 days
killed by OOM. When killed process waits for a page allocation, try to
satisfy the request as fast as possible.
This removes the often encountered deadlock, where OOM continously
selects the same victim process, that sleeps uninterruptibly waiting
for a page. The killed process may still sleep if page cannot be
obtained immediately, but testing has shown that system has much
higher chance to survive in OOM situation with the patch.
In collaboration with: pho
Reviewed by: alc
MFC after: 4 weeks
no superpage mappings are created within the clean submap, aligning the
start of the clean submap helps to prevent interference with kmem_alloc()'s
use of superpages.
reference count, and decrements it on dereference. If referenced object
is deallocated, object type is reset to OBJT_DEAD. Consequently, all
vnode references that are owned by object references are never released.
vunref() the vnode in vm object deallocation code for OBJT_VNODE
appropriate number of times to prevent leak.
Add an assertion to the vm_pageout() to make sure that we never get
reference on the vnode but then do not execute code to release it.
In collaboration with: pho
Reviewed by: alc
MFC after: 3 weeks
This replaces d_mmap() with the d_mmap2() implementation and also
changes the type of offset to vm_ooffset_t.
Purge d_mmap2().
All driver modules will need to be rebuilt since D_VERSION is also
bumped.
Reviewed by: jhb@
MFC after: Not in this lifetime...
Fix some wrong usages.
Note: this does not affect generated binaries as this argument is not used.
PR: 137213
Submitted by: Eygene Ryabinkin (initial version)
MFC after: 1 month
flag. Besides providing the redundand information, need to update both
vnode and object flags causes more acquisition of vnode interlock.
OBJ_MIGHTBEDIRTY is only checked for vnode-backed vm objects.
Remove VI_OBJDIRTY and make sure that OBJ_MIGHTBEDIRTY is set only for
vnode-backed vm objects.
Suggested and reviewed by: alc
Tested by: pho
MFC after: 3 weeks
represented a write access that is allowed to override write protection.
Until now, VM_PROT_OVERRIDE_WRITE has been used to write breakpoints into
text pages. Text pages are not just write protected but they are also
copy-on-write. VM_PROT_OVERRIDE_WRITE overrides the write protection on the
text page and triggers the replication of the page so that the breakpoint
will be written to a private copy. However, here is where things become
confused. It is the debugger, not the process being debugged that requires
write access to the copied page. Nonetheless, the copied page is being
mapped into the process with write access enabled. In other words, once the
debugger sets a breakpoint within a text page, the program can write to its
private copy of that text page. Whereas prior to setting the breakpoint, a
SIGSEGV would have occurred upon a write access. VM_PROT_COPY addresses
this problem. The combination of VM_PROT_READ and VM_PROT_COPY forces the
replication of a copy-on-write page even though the access is only for read.
Moreover, the replicated page is only mapped into the process with read
access, and not write access.
Reviewed by: kib
MFC after: 4 weeks
pages.
(Note: Claims made in the comments about the handling of breakpoints in
wired pages have been false for roughly a decade. This and another bug
involving breakpoints will be fixed in coming changes.)
Reviewed by: kib
This improvements aims for avoiding further cache-misses in scheduler
specific functions which need to keep track of average thread running
time and further locking in places setting for this flag.
Reported by: jeff (originally), kris (currently)
Reviewed by: jhb
Tested by: Giuseppe Cocomazzi <sbudella at email dot it>
from tuning(7). One of the descriptions references tuning(7) because
it is too complex to adequatly describe here (it is not a simple
boolean sysctl) and users should be warned to that.
Reviewed by: alc, kib
Approved by: gnn (mentor)
version of this file. When a process forks, any wired pages are immediately
copied because copy-on-write is not supported for wired pages. In other
words, the child process is given its own private copy of each wired page
from its parent's address space. Unfortunately, to date, these copied pages
have been mapped into the child's address space with the wrong permissions,
typically VM_PROT_ALL. This change corrects the permissions.
Reviewed by: kib
install new shadow object behind the map entry and copy the pages
from the underlying objects to it. This makes the mprotect(2) call to
actually perform the requested operation instead of silently do nothing
and return success, that causes SIGSEGV on later write access to the
mapping.
Reuse vm_fault_copy_entry() to do the copying, modifying it to behave
correctly when src_entry == dst_entry.
Reviewed by: alc
MFC after: 3 weeks
the memory or D-cache, depending on the semantics of the platform.
vm_sync_icache() is basically a wrapper around pmap_sync_icache(),
that translates the vm_map_t argumument to pmap_t.
o Introduce pmap_sync_icache() to all PMAP implementation. For powerpc
it replaces the pmap_page_executable() function, added to solve
the I-cache problem in uiomove_fromphys().
o In proc_rwmem() call vm_sync_icache() when writing to a page that
has execute permissions. This assures that when breakpoints are
written, the I-cache will be coherent and the process will actually
hit the breakpoint.
o This also fixes the Book-E PMAP implementation that was missing
necessary locking while trying to deal with the I-cache coherency
in pmap_enter() (read: mmu_booke_enter_locked).
The key property of this change is that the I-cache is made coherent
*after* writes have been done. Doing it in the PMAP layer when adding
or changing a mapping means that the I-cache is made coherent *before*
any writes happen. The difference is key when the I-cache prefetches.
Call priv_check(PRIV_VM_SWAP_NORLIMIT) only when per-uid limit is
actually exceed.
Both changes aim at calling priv_check(9) only for the cases when
privilege is actually exercised by the process.
Reported and tested by: rwatson
Reviewed by: alc
MFC after: 3 days
This is done to make it harder to exploit kernel NULL pointer security
vulnerabilities. While this of course does not fix vulnerabilities,
it does mitigate their impact.
Note that this may break some applications, most likely emulators or
similar, which for one reason or another require mapping memory at
zero.
This restriction can be disabled with the security.bsd.mmap_zero
sysctl variable.
Discussed with: rwatson, bz
Tested by: bz (Wine), simon (VirtualBox)
Submitted by: jhb
of the linked object is zero-length. More old code assumes that mmap
of zero length returns success.
For a.out and pre-8 ELF binaries, allow the mmap of zero length.
Reported by: tegge
Reviewed by: tegge, alc, jhb
MFC after: 3 days
Remove the altkstacks, instead instantiate threads with kernel stack
allocated with the right size from the start. For the thread that has
kernel stack cached, verify that requested stack size is equial to the
actual, and reallocate the stack if sizes differ [1].
This fixes the bug introduced by r173361 that was committed several days
after r173004 and consisted of kthread_add(9) ignoring the non-default
kernel stack size.
Also, r173361 removed the caching of the kernel stacks for a non-first
thread in the process. Introduce separate kernel stack cache that keeps
some limited amount of preallocated kernel stacks to lower the latency
of thread allocation. Add vm_lowmem handler to prune the cache on
low memory condition. This way, system with reasonable amount of the
threads get lower latency of thread creation, while still not exhausting
significant portion of KVA for unused kstacks.
Submitted by: peter [1]
Discussed with: jhb, julian, peter
Reviewed by: jhb
Tested by: pho (and retested according to new test scenarious)
MFC after: 1 week
allocated with the right size from the start. For the thread that has
kernel stack cached, verify that requested stack size is equial to the
actual, and reallocate the stack if sizes differ [1].
This fixes the bug introduced by r173361 that was committed several days
after r173004 and consisted of kthread_add(9) ignoring the non-default
kernel stack size.
Also, r173361 removed the caching of the kernel stacks for a non-first
thread in the process. Introduce separate kernel stack cache that keeps
some limited amount of preallocated kernel stacks to lower the latency
of thread allocation. Add vm_lowmem handler to prune the cache on
low memory condition. This way, system with reasonable amount of the
threads get lower latency of thread creation, while still not exhausting
significant portion of KVA for unused kstacks.
Submitted by: peter [1]
Discussed with: jhb, julian, peter
Reviewed by: jhb
Tested by: pho
MFC after: 1 week
accidentally lost at one point during the PAT development. Without this
fix vm_pager_get_pages() was zeroing each of the pages.
Submitted by: czander @ NVidia
MFC after: 3 days