allocate the requested page because too few pages are cached or free.
Document the VM_ALLOC_COUNT() option to vm_page_alloc() and
vm_page_alloc_freelist().
Make style changes to vm_page_alloc() and vm_page_alloc_freelist(),
such as using a variable name that more closely corresponds to the
comments.
Use the defined types instead of int when manipulating masks.
Supposedly, it could fix support for 32KB page size in the
machine-independend VM layer.
Reviewed by: alc
MFC after: 2 weeks
madvise(2) except that it operates on a file descriptor instead of a
memory region. It is currently only supported on regular files.
Just as with madvise(2), the advice given to posix_fadvise(2) can be
divided into two types. The first type provide hints about data access
patterns and are used in the file read and write routines to modify the
I/O flags passed down to VOP_READ() and VOP_WRITE(). These modes are
thus filesystem independent. Note that to ease implementation (and
since this API is only advisory anyway), only a single non-normal
range is allowed per file descriptor.
The second type of hints are used to hint to the OS that data will or
will not be used. These hints are implemented via a new VOP_ADVISE().
A default implementation is provided which does nothing for the WILLNEED
request and attempts to move any clean pages to the cache page queue for
the DONTNEED request. This latter case required two other changes.
First, a new V_CLEANONLY flag was added to vinvalbuf(). This requests
vinvalbuf() to only flush clean buffers for the vnode from the buffer
cache and to not remove any backing pages from the vnode. This is
used to ensure clean pages are not wired into the buffer cache before
attempting to move them to the cache page queue. The second change adds
a new vm_object_page_cache() method. This method is somewhat similar to
vm_object_page_remove() except that instead of freeing each page in the
specified range, it attempts to move clean pages to the cache queue if
possible.
To preserve the ABI of struct file, the f_cdevpriv pointer is now reused
in a union to point to the currently active advice region if one is
present for regular files.
Reviewed by: jilles, kib, arch@
Approved by: re (kib)
MFC after: 1 month
and use these new options in the mips pmap.
Wake up the page daemon in vm_page_alloc_freelist() if the number of free
and cached pages becomes too low.
Tidy up vm_page_alloc_init(). In particular, add a comment about an
important restriction on its use.
Tested by: jchandra@
eliminating duplicated code in the various pmap implementations.
Micro-optimize vm_phys_free_pages().
Introduce vm_phys_free_contig(). It is fast routine for freeing an
arbitrary number of physically contiguous pages. In particular, it
doesn't require the number of pages to be a power of two.
Use "u_long" instead of "unsigned long".
Bruce Evans (bde@) has convinced me that the "boundary" parameters
to kmem_alloc_contig(), vm_phys_alloc_contig(), and
vm_reserv_reclaim_contig() should be of type "vm_paddr_t" and not
"u_long". Make this change.
more general VM system interfaces. So, their implementation can now
reside in kern_malloc.c alongside the other functions that are declared
in malloc.h.
common cases that can be handled in constant time. The insight being
that a page's parent in the vm object's tree is very often its
predecessor or successor in the vm object's ordered memq.
Tested by: jhb
MFC after: 10 days
word to handle the dirty mask updates in vm_page_clear_dirty_mask().
Remove the vm page queue lock around vm_page_dirty() call in vm_fault_hold()
the sole purpose of which was to protect dirty on architectures which
does not provide short or byte-wide atomics.
Reviewed by: alc, attilio
Tested by: flo (sparc64)
MFC after: 2 weeks
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.
Reviewed by: rwatson
Approved by: re (bz)
flags field. Updates to the atomic flags are performed using the atomic
ops on the containing word, do not require any vm lock to be held, and
are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9)
functions are provided to modify afalgs.
Document the changes to flags field to only require the page lock.
Introduce vm_page_reference(9) function to provide a stable KPI and
KBI for filesystems like tmpfs and zfs which need to mark a page as
referenced.
Reviewed by: alc, attilio
Tested by: marius, flo (sparc64); andreast (powerpc, powerpc64)
Approved by: re (bz)
after the conversion of the swap device size to the page size units,
not before. That lifts the limit on the usable swap partition size
from 32GB to 256GB, that is less depressing for the modern systems.
Submitted by: Alexander V. Chernikov <melifaro ipfw ru>
Reviewed by: alc
Approved by: re (bz)
MFC after: 2 weeks
kernel for FreeBSD 9.0:
Add a new capability mask argument to fget(9) and friends, allowing system
call code to declare what capabilities are required when an integer file
descriptor is converted into an in-kernel struct file *. With options
CAPABILITIES compiled into the kernel, this enforces capability
protection; without, this change is effectively a no-op.
Some cases require special handling, such as mmap(2), which must preserve
information about the maximum rights at the time of mapping in the memory
map so that they can later be enforced in mprotect(2) -- this is done by
narrowing the rights in the existing max_protection field used for similar
purposes with file permissions.
In namei(9), we assert that the code is not reached from within capability
mode, as we're not yet ready to enforce namespace capabilities there.
This will follow in a later commit.
Update two capability names: CAP_EVENT and CAP_KEVENT become
CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they
represent.
Approved by: re (bz)
Submitted by: jonathan
Sponsored by: Google Inc
to VPO_UNMANAGED (and also making the flag protected by the vm object
lock, instead of vm page queue lock).
- Mark the fake pages with both PG_FICTITIOUS (as it is now) and
VPO_UNMANAGED. As a consequence, pmap code now can use use just
VPO_UNMANAGED to decide whether the page is unmanaged.
Reviewed by: alc
Tested by: pho (x86, previous version), marius (sparc64),
marcel (arm, ia64, powerpc), ray (mips)
Sponsored by: The FreeBSD Foundation
Approved by: re (bz)
configured swap devices in the Linux-compatible format.
Based on the submission by: Robert Millan <rmh debian org>
PR: kern/159281
Reviewed by: bde
Approved by: re (kensmith)
MFC after: 2 weeks
allocated the device pager for the given handle, then the object
fictitious pages list and the object membership in the global object
list still need to be initialized. Otherwise, dev_pager_dealloc() will
traverse uninitialized pointers.
Reported and tested by: pho
Reviewed by: jhb
Approved by: re (kensmith)
MFC after: 1 week
function vm_mmap_to_errno(). It is useful for the drivers that implement
mmap(2)-like functionality, to be able to return error codes consistent
with mmap(2).
Sponsored by: The FreeBSD Foundation
No objections from: alc
MFC after: 1 week
uiomove generates EFAULT if any accessed address is not mapped, as
opposed to handling the fault.
Sponsored by: The FreeBSD Foundation
Reviewed by: alc (previous version)
won't happen before 9.0. This commit adds "#ifdef RACCT" around all the
"PROC_LOCK(p); racct_whatever(p, ...); PROC_UNLOCK(p)" instances, in order
to avoid useless locking/unlocking in kernels built without "options RACCT".
option to vm_object_page_remove() asserts that the specified range of pages
is not mapped, or more precisely that none of these pages have any managed
mappings. Thus, vm_object_page_remove() need not call pmap_remove_all() on
the pages.
This change not only saves time by eliminating pointless calls to
pmap_remove_all(), but it also eliminates an inconsistency in the use of
pmap_remove_all() versus related functions, like pmap_remove_write(). It
eliminates harmless but pointless calls to pmap_remove_all() that were being
performed on PG_UNMANAGED pages.
Update all of the existing assertions on pmap_remove_all() to reflect this
change.
Reviewed by: kib
(Saying that the lock on the object that the page belongs to must be held
only represents one aspect of the rules.)
Eliminate the use of the page queues lock for atomically performing read-
modify-write operations on the dirty field when the underlying architecture
supports atomic operations on char and short types.
Document the fact that 32KB pages aren't really supported.
Reviewed by: attilio, kib
vm_page_undirty(). The assert is not precise due to VPO_BUSY owner
to tracked, so assertion does not catch the case when VPO_BUSY is
owned by other thread.
Reviewed by: alc
VM_PAGER_AGAIN to VM_PAGER_ERROR for the uwritten pages. Return
VM_PAGER_AGAIN for the partially written page. Always forward at least
one page in the loop of vm_object_page_clean().
VM_PAGER_ERROR causes the page reactivation and does not clear the
page dirty state, so the write is not lost.
The change fixes an infinite loop in vm_object_page_clean() when the
filesystem returns permanent errors for some page writes.
Reported and tested by: gavin
Reviewed by: alc, rmacklem
MFC after: 1 week
uma_startup2() was called. Thus, setting the variable "booted" to true in
uma_startup() was ok on machines with UMA_MD_SMALL_ALLOC defined, because
any allocations made after uma_startup() but before uma_startup2() could be
satisfied by uma_small_alloc(). Now, however, some multipage allocations
are necessary before uma_startup2() just to allocate zone structures on
machines with a large number of processors. Thus, a Boolean can no longer
effectively describe the state of the UMA allocator. Instead, make "booted"
have three values to describe how far initialization has progressed. This
allows multipage allocations to continue using startup_alloc() until
uma_startup2(), but single-page allocations may begin using
uma_small_alloc() after uma_startup().
2. With the aforementioned change, only a modest increase in boot pages is
necessary to boot UMA on a large number of processors.
3. Retire UMA_MD_SMALL_ALLOC_NEEDS_VM. It has only been used between
r182028 and r204128.
Reviewed by: attilio [1], nwhitehorn [3]
Tested by: sbruno
architectures (i386, for example) the virtual memory space may be
constrained enough that 2MB is a large chunk. Use 64K for arches
other than amd64 and ia64, with special handling for sparc64 due to
differing hardware.
Also commit the comment changes to kmem_init_zero_region() that I
missed due to not saving the file. (Darn the unfamiliar development
environment).
Arch maintainers, please feel free to adjust ZERO_REGION_SIZE as you
see fit.
Requested by: alc
MFC after: 1 week
MFC with: r221853
Hold the vnode around the region where object lock is dropped, until
vnode lock is acquired.
Do not drop the vnode reference for a case when the object was
deallocated during unlock. Note that in this case, VV_TEXT is cleared
by vnode_pager_dealloc().
Reported and tested by: pho
Reviewed by: alc
MFC after: 3 days
If supplied length is zero, and user address is invalid, function
might return -1, due to the truncation and rounding of the address.
The callers interpret the situation as EFAULT. Instead of handling
the zero length in caller, filter it in vm_fault_quick_hold_pages().
Sponsored by: The FreeBSD Foundation
Reviewed by: alc
- Hold the proc lock while changing the state from PRS_NEW to PRS_NORMAL
in fork to honor the locking requirements. While here, expand the scope
of the PROC_LOCK() on the new process (p2) to avoid some LORs. Previously
the code was locking the new child process (p2) after it had locked the
parent process (p1). However, when locking two processes, the safe order
is to lock the child first, then the parent.
- Fix various places that were checking p_state against PRS_NEW without
having the process locked to use PROC_LOCK(). Every place was already
locking the process, just after the PRS_NEW check.
- Remove or reduce the use of PROC_SLOCK() for places that were checking
p_state against PRS_NEW. The PROC_LOCK() alone is sufficient for reading
the current state.
- Reorder fill_kinfo_proc() slightly so it only acquires PROC_SLOCK() once.
MFC after: 1 week
which are not yet fully initialized (i.e. ones with p_state == PRS_NEW).
Without it, we could panic in _thread_lock_flags().
Note that there may be other instances of FOREACH_PROC_IN_SYSTEM() that
require similar fix.
Reported by: pho, keramida
Discussed with: kib
As it was pointed out by Alan Cox, that no longer serves its purpose with
the modern UMA allocator compared to the old one used in 4.x days.
The removal of sysctl eliminates max_proc_mmap type overflow leading to
the broken mmap(2) seen with large amount of physical memory on arches
with factually unbound KVA space (such as amd64). It was found that
slightly less than 256GB of physmem was enough to trigger the overflow.
Reviewed by: alc, kib
Approved by: avg (mentor)
MFC after: 2 months
vm_map_insert(), the kmem_back() assumption about newly inserted
entry might be broken due to interference of two factors. In the low
memory condition, when vm_page_alloc() returns NULL, supplied map is
unlocked. If another thread performs kmem_malloc() meantime, and its
map entry is placed right next to our thread map entry in the map,
both entries wire count is still 0 and entries are coalesced due to
vm_map_simplify_entry().
Mark new entry with MAP_ENTRY_IN_TRANSITION to prevent coalesce.
Fix some style issues, tighten the assertions to account for
MAP_ENTRY_IN_TRANSITION state.
Reported and tested by: pho
Reviewed by: alc
KASSERT()s and eliminate the rest.
Replace excessive printf()s and a panic() in bufdone_finish() with a
KASSERT() in vm_page_io_finish().
Reviewed by: kib
incorrectly calling vm_object_page_clean(). They are passing the length of
the range rather than the ending offset of the range.
Perform the OFF_TO_IDX() conversion in vm_object_page_clean() rather than the
callers.
Reviewed by: kib
MFC after: 3 weeks
MAP_STACK_* entries. (See r71983 and r74235.)
In some cases, performing this call to vm_map_simplify_entry() halves the
number of vm map entries used by the Sun JDK.
sbuf_new_for_sysctl(9). This allows using an sbuf with a SYSCTL_OUT
drain for extremely large amounts of data where the caller knows that
appropriate references are held, and sleeping is not an issue.
Inspired by: rwatson
assertion that is no longer required. Long ago, calls to vm_page_alloc()
from an interrupt handler had to specify VM_ALLOC_INTERRUPT so that
vm_page_alloc() would not attempt to reclaim a PQ_CACHE page from another vm
object. Today, with the synchronization on a vm object's collection of
PQ_CACHE pages, this is no longer an issue. In fact, VM_ALLOC_INTERRUPT now
reclaims PQ_CACHE pages just like VM_ALLOC_{NORMAL,SYSTEM}.
MFC after: 3 weeks
OBJT_PHYS objects. Thus, there is no need for handling them specially
in vm_fault(). In fact, this special case handling would have led to
an assertion failure just before the call to pmap_enter().
Reviewed by: kib@
MFC after: 6 weeks
need it anymore. Moreover, its implementation had a type mismatch, a
long is not necessarily an uint64_t. (This mismatch was hidden by
casting.) Move the remaining two counters up a level in the sysctl
hierarchy. There is no reason for them to be under the vm.pmap node.
Reviewed by: kib
hold this lock until the end of the function.
With the aforementioned change to vm_pageout_clean(), page locks don't need
to support recursive (MTX_RECURSE) or duplicate (MTX_DUPOK) acquisitions.
Reviewed by: kib
consumer of the flag, and it used the flag because OBJ_MIGHTBEDIRTY
was cleared early in vm_object_page_clean, before the cleaning pass
was done. This is no longer true after r216799.
Moreover, since OBJ_CLEANING is a flag, and not the counter, it could
be reset too prematurely when parallel vm_object_page_clean() are
performed.
Reviewed by: alc (as a part of the bigger patch)
MFC after: 1 month (after r216799 is merged)
instead skip over them. As long as a page is held, it can't be reclaimed by
contigmalloc(M_WAITOK). Moreover, a held page may be undergoing
modification, e.g., vmapbuf(), so even if the hold were released before the
completion of contigmalloc(), the page might have to be flushed again.
MFC after: 3 weeks
vm_object_set_writeable_dirty().
Fix an issue where restart of the scan in vm_object_page_clean() did
not removed write permissions for newly added pages or, if the mapping
for some already scanned page changed to writeable due to fault.
Merge the two loops in vm_object_page_clean(), doing the remove of
write permission and cleaning in the same loop. The restart of the
loop then correctly downgrade writeable mappings.
Fix an issue where a second caller to msync() might actually return
before the first caller had actually completed flushing the
pages. Clear the OBJ_MIGHTBEDIRTY flag after the cleaning loop, not
before.
Calls to pmap_is_modified() are not needed after pmap_remove_write()
there.
Proposed, reviewed and tested by: alc
MFC after: 1 week
condition in proc_rwmem() and to (2) simplify the implementation of the
cxgb driver's vm_fault_hold_user_pages(). Specifically, in proc_rwmem()
the requested read or write could fail because the targeted page could be
reclaimed between the calls to vm_fault() and vm_page_hold().
In collaboration with: kib@
MFC after: 6 weeks
vmspace_fork and vm_map_wire that would lead to "vm_fault_copy_wired: page
missing" panics. While faulting in pages for a map entry that is being
wired down, mark the containing map as busy. In vmspace_fork wait until the
map is unbusy, before we try to copy the entries.
Reviewed by: kib
MFC after: 5 days
Sponsored by: Isilon Systems, Inc.
mapped and entered via vm_page_setup, keep track of it like we do
for amd64.
# A separate commit will be made to move this to a capability-based ifdef
# rather than arch-based ifdef.
Submitted by: alc@
MFC after: 1 week
in "struct vm_object". This is required to make it possible to account
for per-jail swap usage.
Reviewed by: kib@
Tested by: pho@
Sponsored by: FreeBSD Foundation
vm_page_startup(). Specifically, the dump_avail array should be used
instead of the phys_avail array to calculate the size of vm_page_dump. For
example, the pages for the message buffer are allocated prior to
vm_page_startup() by subtracting them from the last entry in the phys_avail
array, but the first thing that vm_page_startup() does after creating the
vm_page_dump array is to set the bits corresponding to the message buffer
pages in that array. However, these bits might not actually exist in the
array, because the size of the array is determined by the current value in
the last entry of the phys_avail array. In general, the only reason why
this doesn't always result in an out-of-bounds array access is that the size
of the vm_page_dump array is rounded up to the next page boundary. This
change eliminates that dependence on rounding (and luck).
MFC after: 6 weeks
The current implementation of vm_page_alloc_freelist() does not handle
order > 0 correctly. Remove order parameter to the function and use it
only for order 0 pages.
Submitted by: alc
backing storage. Such pages might be then reused, racing with the
assert in vm_object_page_collect_flush() that verified that dirty
pages from the run (most likely, pages with VM_PAGER_AGAIN status) are
write-protected still. In fact, the page indexes for the pages that
were removed from the object page list should be ignored by
vm_object_page_clean().
Return the length of successfully written run from vm_pageout_flush(),
that is, the count of pages between requested page and first page
after requested with status VM_PAGER_AGAIN. Supply the requested page
index in the array to vm_pageout_flush(). Use the returned run length
to forward the index of next page to clean in vm_object_page_clean().
Reported by: avg
Reviewed by: alc
MFC after: 1 week
object page list. The only use of object generation count now is a
restart of the scan in vm_object_page_clean(), which makes sense to do
on the page addition. Page removals do not affect the dirtiness of the
object, as well as manipulations with the shadow chain.
Suggested and reviewed by: alc
MFC after: 1 week
The ports/Mk/bsd.port.mk uses sys/param.h to fetch osrel, and cannot
grok several constants with the prefix.
Reported and tested by: swell.k gmail com
MFC after: 1 week
The unmapped page separates the tip of the stack and possible adjanced
segment, making some uses of stack overflow harder. The stack growing
code refuses to expand the segment to the last page of the reseved
region when sysctl security.bsd.stack_guard_page is set to 1. The
default value for sysctl and accompanying tunable is 0.
Please note that mmap(MAP_FIXED) still can place a mapping right up to
the stack, making continuous region.
Reviewed by: alc
MFC after: 1 week
creation of large page mappings in the pmap, it can provide modest
performance benefits. In particular, for a "buildworld" on a 2x 1GHz
Ultrasparc IIIi it reduced the wall clock time by 2.2% and the system
time by 12.6%.
Tested by: marius@
ensure that grow_amount is a multiple of the page size. Otherwise, the
kernel may crash in swap_reserve_by_uid() on HEAD and FreeBSD 8.x, and
produce a core file with a missing stack on FreeBSD 7.x.
Diagnosed and reported by: jilles
Reviewed by: kib
MFC after: 1 week
zones whose objects are larger than a page to use startup_alloc(). This
allows allocation of zone objects during early boot on machines with a large
number of CPUs since the resulting zone objects are larger than a page.
Submitted by: trema
Reviewed by: attilio
MFC after: 1 week
its value as a loop invariant. Currently this is a no-op because
'atomic_cmpset_int()' clobbers all memory on current architectures.
- Use atomic_fetchadd_int() instead of an atomic_cmpset_int() loop to drop
a reference in vmspace_free().
Reviewed by: alc
MFC after: 1 month
rounding. The same value can also be obtained with uma_zone_get_max, but this
change avoids a caller having to make two back-to-back calls.
Sponsored by: FreeBSD Foundation
Reviewed by: gnn, jhb
- Add uma_zone_get_cur which returns the current approximate occupancy of
a zone. This is useful for providing stats via sysctl amongst other things.
Sponsored by: FreeBSD Foundation
Reviewed by: gnn, jhb
MFC after: 2 weeks
addresses that is greater than a superpage in size but not a multiple of
the superpage size, then vm_map_find() is not always expanding the kernel
pmap to support the last few small pages being allocated. These failures
are not commonplace, so this was first noticed by someone porting FreeBSD
to a new architecture. Previously, we grew the kernel page table in
vm_map_findspace() when we found the first available virtual address.
This works most of the time because we always grow the kernel pmap or page
table by an amount that is a multiple of the superpage size. Now, instead,
we defer the call to pmap_growkernel() until we are committed to a range
of virtual addresses in vm_map_insert(). In general, there is another
reason to prefer calling pmap_growkernel() in vm_map_insert(). It makes
it possible for someone to do the equivalent of an mmap(MAP_FIXED) on the
kernel map.
Reported by: Svatopluk Kraus
Reviewed by: kib@
MFC after: 3 weeks
write to nonetheless be mapped PROT_WRITE and MAP_PRIVATE, i.e.,
copy-on-write.
(This is a regression in the new implementation of POSIX shared memory
objects that is used by HEAD and RELENG_8. This bug does not exist in
RELENG_7's user-level, file-based implementation.)
PR: 150260
MFC after: 3 weeks
vm_map_unlock_nodefer() part of the synchronization interface for maps.
Add comments to vm_map_unlock_and_wait() and vm_map_wakeup() describing
how they should be used. In particular, describe the deferred deallocations
issue with vm_map_unlock_and_wait().
Redo the implementation of vm_map_unlock_and_wait() so that it passes
along the caller's file and line information, just like the other map
locking primitives.
Reviewed by: kib
X-MFC after: r212824
on map unlock to the lock downgrade and later read unlock operation.
System map entries cannot be backed by OBJT_VNODE objects, no need to
defer deallocation for them. Map entries from user maps do not require
the owner map for deallocation, and can be accumulated in the
thread-local list for freeing when a user map is unlocked.
Move the collection of entries for deferred reclamation into
vm_map_delete(). Create helper vm_map_process_deferred(), that is
called from locations where processing is feasible. Do not process
deferred entries in vm_map_unlock_and_wait() since map_sleep_mtx is
held.
Reviewed by: alc, rstone (previous versions)
Tested by: pho
MFC after: 2 weeks
Add a drain function for struct sysctl_req, and use it for a variety
of handlers, some of which had to do awkward things to get a large
enough SBUF_FIXEDLEN buffer.
Note that some sysctl handlers were explicitly outputting a trailing
NUL byte. This behaviour was preserved, though it should not be
necessary.
Reviewed by: phk (original patch)
unexpected things in copyout(9) and so wiring the user buffer is not
sufficient to perform a copyout(9) while holding a random mutex.
Requested by: nwhitehorn
handlers, some of which had to do awkward things to get a large enough
FIXEDLEN buffer.
Note that some sysctl handlers were explicitly outputting a trailing NUL
byte. This behaviour was preserved, though it should not be necessary.
Reviewed by: phk
in a range must be checked when calling pmap_remove(). Calling
pmap_remove() from vm_pageout_map_deactivate_pages() with the entire range
of the map could result in attempting to demap an extraordinary number
of pages (> 10^15), so iterate through each map entry and unmap each of
them individually.
MFC after: 6 weeks
lock on the pmc-sx lock. This prevents a deadlock with
pmc_log_process_mappings, which has an exclusive lock on pmc-sx and tries
to get a read lock on a vm_map. Downgrading the vm_map_lock in munmap
allows pmc_log_process_mappings to continue, preventing the deadlock.
Without this change I could cause a deadlock on a multicore 8.1-RELEASE
system by having one thread constantly mmap'ing and then munmap'ing a
PROT_EXEC mapping in a loop while I repeatedly invoked and stopped pmcstat
in system-wide sampling mode.
Reviewed by: fabient
Approved by: emaste (mentor)
MFC after: 2 weeks
vm_page_startup uses MSGBUF_SIZE value for adding msgbuf pages to minidump.
If opt_msgbuf.h is not included and MSGBUF_SIZE is overriden in kernel
config, then not all msgbuf pages will be dumped. And most importantly,
struct msgbuf itself will not be included. Thus the dump would look
corrupted/incomplete to tools like kgdb, dmesg, etc that try to access
struct msgbuf as one of the first things they do when working on a crash
dump.
MFC after: 5 days
to uma_zone_set_max().
The UMA zone limit is not exactly set to the value supplied but
rounded up to completely fill the backing store increment (a page
normally). This can lead to surprising situations where the number
of elements allocated from UMA is higher than the supplied limit
value. The new get function reads back the effective value so that
the supplied limit value can be adjusted to the real limit.
Reviewed by: jeffr
MFC after: 1 week
case future compile-time knobs were added that it wants to use.
Also add include guards and forward declarations to vm/memguard.h.
Approved by: zml (mentor)
MFC after: 1 month
use-after-free over a longer time. Also release the backing pages of
a guarded allocation at free(9) time to reduce the overhead of using
memguard(9). Allow setting and varying the malloc type at run-time.
Add knobs to allow:
- randomly guarding memory
- adding un-backed KVA guard pages to detect underflow and overflow
- a lower limit on the size of allocations that are guarded
Reviewed by: alc
Reviewed by: brueffer, Ulrich Spörlein <uqs spoerlein net> (man page)
Silence from: -arch
Approved by: zml (mentor)
MFC after: 1 month
cdev will never be destroyed. Propagate the flag to devfs vnodes as
VV_ETERNVALDEV. Use the flags to avoid acquiring devmtx and taking a
thread reference on such nodes.
In collaboration with: pho
MFC after: 1 month
now it uses a very dumb first-touch allocation policy. This will change in
the future.
- Each architecture indicates the maximum number of supported memory domains
via a new VM_NDOMAIN parameter in <machine/vmparam.h>.
- Each cpu now has a PCPU_GET(domain) member to indicate the memory domain
a CPU belongs to. Domain values are dense and numbered from 0.
- When a platform supports multiple domains, the default freelist
(VM_FREELIST_DEFAULT) is split up into N freelists, one for each domain.
The MD code is required to populate an array of mem_affinity structures.
Each entry in the array defines a range of memory (start and end) and a
domain for the range. Multiple entries may be present for a single
domain. The list is terminated by an entry where all fields are zero.
This array of structures is used to split up phys_avail[] regions that
fall in VM_FREELIST_DEFAULT into per-domain freelists.
- Each memory domain has a separate lookup-array of freelists that is
used when fulfulling a physical memory allocation. Right now the
per-domain freelists are listed in a round-robin order for each domain.
In the future a table such as the ACPI SLIT table may be used to order
the per-domain lookup lists based on the penalty for each memory domain
relative to a specific domain. The lookup lists may be examined via a
new vm.phys.lookup_lists sysctl.
- The first-touch policy is implemented by using PCPU_GET(domain) to
pick a lookup list when allocating memory.
Reviewed by: alc
details of the string buffer allocation in one place.
Eliminate the portion of the string buffer that was dedicated to storing
the interpreter name. The pointer to the interpreter name can simply be
made to point to the appropriate argument string.
Reviewed by: kib