vm_map_lookup{,_locked}() to vm_map_lookup_entry(). Having the fast path
in vm_map_lookup{,_locked}() limits its benefits to page faults. Moving
it to vm_map_lookup_entry() extends its benefits to other operations on
the vm map.
PowerPC/AIM. Consequently, it should not be used to determine the maximum
number of kernel map entries. Intead, use VM_MIN_KERNEL_ADDRESS, which marks
the start of the kernel map on all architectures.
Tested by: marcel@ (PowerPC/AIM)
support for VMFS_ALIGNED_SPACE, which requests the allocation of an
address range best suited to superpages. The old options TRUE and FALSE
are mapped to VMFS_ANY_SPACE and VMFS_NO_SPACE, so that there is no
immediate need to update all of vm_map_find(9)'s callers.
While I'm here, correct a misstatement about vm_map_find(9)'s return
values in the man page.
While the KSE project was quite successful in bringing threading to
FreeBSD, the M:N approach taken by the kse library was never developed
to its full potential. Backwards compatibility will be provided via
libmap.conf for dynamically linked binaries and static binaries will
be broken.
silent NULL pointer dereference in the i386 and sparc64 pmap_pinit()
when the kmem_alloc_nofault() failed to allocate address space. Both
functions now return error instead of panicing or dereferencing NULL.
As consequence, vmspace_exec() and vmspace_unshare() returns the errno
int. struct vmspace arg was added to vm_forkproc() to avoid dealing
with failed allocation when most of the fork1() job is already done.
The kernel stack for the thread is now set up in the thread_alloc(),
that itself may return NULL. Also, allocation of the first process
thread is performed in the fork1() to properly deal with stack
allocation failure. proc_linkup() is separated into proc_linkup()
called from fork1(), and proc_linkup0(), that is used to set up the
kernel process (was known as swapper).
In collaboration with: Peter Holm
Reviewed by: jhb
since revision 1.1. Specifically, neither traversal of the vm map checks
whether the end of the vm map has been reached. Consequently, the first
traversal can wrap around and bogusly return an error.
This error has gone unnoticed for so long because no one had ever before
tried msync(2)ing a region above the stack.
Reported by: peter
MFC after: 1 week
ways:
(1) Cached pages are no longer kept in the object's resident page
splay tree and memq. Instead, they are kept in a separate per-object
splay tree of cached pages. However, access to this new per-object
splay tree is synchronized by the _free_ page queues lock, not to be
confused with the heavily contended page queues lock. Consequently, a
cached page can be reclaimed by vm_page_alloc(9) without acquiring the
object's lock or the page queues lock.
This solves a problem independently reported by tegge@ and Isilon.
Specifically, they observed the page daemon consuming a great deal of
CPU time because of pages bouncing back and forth between the cache
queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE). The source of
this problem turned out to be a deadlock avoidance strategy employed
when selecting a cached page to reclaim in vm_page_select_cache().
However, the root cause was really that reclaiming a cached page
required the acquisition of an object lock while the page queues lock
was already held. Thus, this change addresses the problem at its
root, by eliminating the need to acquire the object's lock.
Moreover, keeping cached pages in the object's primary splay tree and
memq was, in effect, optimizing for the uncommon case. Cached pages
are reclaimed far, far more often than they are reactivated. Instead,
this change makes reclamation cheaper, especially in terms of
synchronization overhead, and reactivation more expensive, because
reactivated pages will have to be reentered into the object's primary
splay tree and memq.
(2) Cached pages are now stored alongside free pages in the physical
memory allocator's buddy queues, increasing the likelihood that large
allocations of contiguous physical memory (i.e., superpages) will
succeed.
Finally, as a result of this change long-standing restrictions on when
and where a cached page can be reclaimed and returned by
vm_page_alloc(9) are eliminated. Specifically, calls to
vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and
return a formerly cached page. Consequently, a call to malloc(9)
specifying M_NOWAIT is less likely to fail.
Discussed with: many over the course of the summer, including jeff@,
Justin Husted @ Isilon, peter@, tegge@
Tested by: an earlier version by kris@
Approved by: re (kensmith)
For this, introduce vm_map_fixed() that does that for MAP_FIXED case.
Dropping the lock allowed for parallel thread to occupy the freed space.
Reported by: Tijl Coosemans <tijl ulyssis org>
Reviewed by: alc
Approved by: re (kensmith)
MFC after: 2 weeks
Probabilly, a general approach is not the better solution here, so we should
solve the sched_lock protection problems separately.
Requested by: alc
Approved by: jeff (mentor)
These functions are intended to do the same actions of sx_xlock() and
sx_slock() but with the difference to perform an interruptible sleep, so
that sleep can be interrupted by external events.
In order to support these new featueres, some code renstruction is needed,
but external API won't be affected at all.
Note: use "void" cast for "int" returning functions in order to avoid tools
like Coverity prevents to whine.
Requested by: rwatson
Tested by: rwatson
Reviewed by: jhb
Approved by: jeff (mentor)
vm_map_pmap_enter() unless the caller is madvise(MADV_WILLNEED). With
the exception of calls to vm_map_pmap_enter() from
madvise(MADV_WILLNEED), vm_fault_prefault() and vm_map_pmap_enter()
are both used to create speculative mappings. Thus, always
reactivating cached pages is a mistake. In principle, cached pages
should only be reactivated by an actual access. Otherwise, the
following misbehavior can occur. On a hard fault for a text page the
clustering algorithm fetches not only the required page but also
several of the adjacent pages. Now, suppose that one or more of the
adjacent pages are never accessed. Ultimately, these unused pages
become cached pages through the efforts of the page daemon. However,
the next activation of the executable reactivates and maps these
unused pages. Consequently, they are never replaced. In effect, they
become pinned in memory.
vmcnts. This can be used to abstract away pcpu details but also changes
to use atomics for all counters now. This means sched lock is no longer
responsible for protecting counts in the switch routines.
Contributed by: Attilio Rao <attilio@FreeBSD.org>
1) Eliminate an unnecessary check for fictitious pages. Specifically,
only device-backed objects contain fictitious pages and the object is
not device-backed.
2) Change the types of "psize" and "tmpidx" to vm_pindex_t in order to
prevent possible wrap around with extremely large maps and objects,
respectively. Observed by: tegge (last summer)
immediately flag any page that is allocated to a OBJT_PHYS object as
unmanaged in vm_page_alloc() rather than waiting for a later call to
vm_page_unmanage(). This allows for the elimination of some uses of
the page queues lock.
Change the type of the kernel and kmem objects from OBJT_DEFAULT to
OBJT_PHYS. This allows us to take advantage of the above change to
simplify the allocation of unmanaged pages in kmem_alloc() and
kmem_malloc().
Remove vm_page_unmanage(). It is no longer used.
Found mapped cache page. Specifically, if cnt.v_free_count dips below
cnt.v_free_reserved after p_start has been set to a non-NULL value,
then vm_map_pmap_enter() would break out of the loop and incorrectly
call pmap_enter_object() for the remaining address range. To correct
this error, this revision truncates the address range so that
pmap_enter_object() will not map any cache pages.
In collaboration with: tegge@
Reported by: kris@
vmspace_exitfree() and vmspace_free() which could result in the same
vmspace being freed twice.
Factor out part of exit1() into new function vmspace_exit(). Attach
to vmspace0 to allow old vmspace to be freed earlier.
Add new function, vmspace_acquire_ref(), for obtaining a vmspace
reference for a vmspace belonging to another process. Avoid changing
vmspace refcount from 0 to 1 since that could also lead to the same
vmspace being freed twice.
Change vmtotal() and swapout_procs() to use vmspace_acquire_ref().
Reviewed by: alc
on entry and it assumes the responsibility for releasing the page queues
lock if it must sleep.
Remove a bogus comment from pmap_enter_quick().
Using the first change, modify vm_map_pmap_enter() so that the page queues
lock is acquired and released once, rather than each time that a page
is mapped.
In such cases, the busying of the page and the unlocking of the
containing object by vm_map_pmap_enter() and vm_fault_prefault() is
unnecessary overhead. To eliminate this overhead, this change
modifies pmap_enter_quick() so that it expects the object to be locked
on entry and it assumes the responsibility for busying the page and
unlocking the object if it must sleep. Note: alpha, amd64, i386 and
ia64 are the only implementations optimized by this change; arm,
powerpc, and sparc64 still conservatively busy the page and unlock the
object within every pmap_enter_quick() call.
Additionally, this change is the first case where we synchronize
access to the page's PG_BUSY flag and busy field using the containing
object's lock rather than the global page queues lock. (Modifications
to the page's PG_BUSY flag and busy field have asserted both locks for
several weeks, enabling an incremental transition.)
"debug.mpsafevm" results in (almost) Giant-free execution of zero-fill
page faults. (Giant is held only briefly, just long enough to determine
if there is a vnode backing the faulting address.)
Also, condition the acquisition and release of Giant around calls to
pmap_remove() on "debug.mpsafevm".
The effect on performance is significant. On my dual Opteron, I see a
3.6% reduction in "buildworld" time.
- Use atomic operations to update several counters in vm_fault().
wait for system wires to disappear, do so (much more trivially) by
instead only checking for system wires of user maps and not kernel maps.
Alternative by: tor
Reviewed by: alc
algorithm built into the map entry splay tree. This replaces the
first_free hint in struct vm_map with two fields in vm_map_entry:
adj_free, the amount of free space following a map entry, and
max_free, the maximum amount of free space in the entry's subtree.
These fields make it possible to find a first-fit free region of a
given size in one pass down the tree, so O(log n) amortized using
splay trees.
This significantly reduces the overhead in vm_map_findspace() for
applications that mmap() many hundreds or thousands of regions, and
has a negligible slowdown (0.1%) on buildworld. See, for example, the
discussion of a micro-benchmark titled "Some mmap observations
compared to Linux 2.6/OpenBSD" on -hackers in late October 2003.
OpenBSD adopted this approach in March 2002, and NetBSD added it in
November 2003, both with Red-Black trees.
Submitted by: Mark W. Krentel
to avoid later changes before pmap_enter() and vm_fault_prefault()
has completed.
Simplify deadlock avoidance by not blocking on vm map relookup.
In collaboration with: alc