the "toss the largest process" emergency handling) from vm_map.c to
swap_pager.c.
The quantity calculated depends strongly on the internals of the
swap_pager and by moving it, we no longer need to expose the
internal metrics of the swap_pager to the world.
order to avoid the overhead of later page faults. In general, it
implements two cases: one for vnode-backed objects and one for
device-backed objects. Only the device-backed case is really
machine-dependent, belonging in the pmap.
This commit moves the vnode-backed case into the (relatively) new
function vm_map_pmap_enter(). On amd64 and i386, this commit only
amounts to code rearrangement. On alpha and ia64, the new machine
independent (MI) implementation of the vnode case is smaller and more
efficient than their pmap-based implementations. (The MI
implementation takes advantage of the fact that objects in -CURRENT
are ordered collections of pages.) On sparc64, pmap_object_init_pt()
hadn't (yet) been implemented.
- Add a parameter to vm_pageout_flush() that tells vm_pageout_flush()
whether its caller has locked the vm_object. (This is a temporary
measure to bootstrap vm_object locking.)
process to kill, don't block on a map lock while holding the
process lock. Instead, skip processes whose map locks are held
and find something else to kill.
- Add vm_map_trylock_read() to support the above.
Reviewed by: alc, mike (mentor)
It's unnecessary for two reasons: (1) Giant is at present already held in
such cases and (2) our various implementations of pmap_growkernel() look to
be MP safe. (For example, for sparc64 the proof of (2) is trivial.)
dereferenced when a process exits due to the vmspace ref-count being
bumped. Change shmexit() and shmexit_myhook() to take a vmspace instead
of a process and call it in vmspace_dofree(). This way if it is missed
in exit1()'s early-resource-free it will still be caught when the zombie is
reaped.
Also fix a potential race in shmexit_myhook() by NULLing out
vmspace->vm_shm prior to calling shm_delete_mapping() and free().
MFC after: 7 days
is now synchronized by a mutex, whereas access to user maps is still
synchronized by a lockmgr()-based lock. Why? No single type of lock,
including sx locks, meets the requirements of both types of vm map.
Sometimes we sleep while holding the lock on a user map. Thus, a
a mutex isn't appropriate. On the other hand, both lockmgr()-based
and sx locks release Giant when a thread/process blocks during
contention for a lock. This could lead to a race condition in a legacy
driver (that relies on Giant for synchronization) if it attempts to
kmem_malloc() and fails to immediately obtain the lock. Fortunately,
we never sleep while holding a system map lock.
- Add a mtx_destroy() to vm_object_collapse(). (This allows a bzero()
to migrate from _vm_object_allocate() to vm_object_zinit(), where it
will be performed less often.)
resource starvation we clean-up as much of the vmspace structure as we
can when the last process using it exits. The rest of the structure
is cleaned up when it is reaped. But since exit1() decrements the ref
count it is possible for a double-free to occur if someone else, such as
the process swapout code, references and then dereferences the structure.
Additionally, the final cleanup of the structure should not occur until
the last process referencing it is reaped.
This commit solves the problem by introducing a secondary reference count,
calling 'vm_exitingcnt'. The normal reference count is decremented on exit
and vm_exitingcnt is incremented. vm_exitingcnt is decremented when the
process is reaped. When both vm_exitingcnt and vm_refcnt are 0, the
structure is freed for real.
MFC after: 3 weeks
constants VM_MIN_ADDRESS, VM_MAXUSER_ADDRESS, USRSTACK and PS_STRINGS.
This is mainly so that they can be variable even for the native abi, based
on different machine types. Get stack protections from the sysentvec too.
This makes it trivial to map the stack non-executable for certain abis, on
machines that support it.
handler in the kernel at the same time. Also, allow for the
exec_new_vmspace() code to build a different sized vmspace depending on
the executable environment. This is a big help for execing i386 binaries
on ia64. The ELF exec code grows the ability to map partial pages when
there is a page size difference, eg: emulating 4K pages on 8K or 16K
hardware pages.
Flesh out the i386 emulation support for ia64. At this point, the only
binary that I know of that fails is cvsup, because the cvsup runtime
tries to execute code in pages not marked executable.
Obtained from: dfr (mostly, many tweaks from me).
Use lmin(long, long), not min(u_int, u_int). This is a problem here on
ia64 which has *way* more than 2^32 pages of KVA. 281474976710655 pages
to be precice.
_vm_map_lock_read(), and _vm_map_trylock(). Submitted by: tegge
o Remove GIANT_REQUIRED from kmem_alloc_wait() and kmem_free_wakeup().
(This clears the way for exec_map accesses to move outside of Giant.
The exec_map is not a system map.)
o Remove some premature MPSAFE comments.
Reviewed by: tegge
and kmem_free_wakeup(). Previously, kmem_free_wakeup() always
called wakeup(). In general, no one was sleeping.
o Export vm_map_unlock_and_wait() and vm_map_wakeup() from vm_map.c
for use in vm_kern.c.
of the KVA space's size in addition to the amount of physical memory
and reduce it by a factor of two.
Under the old formula, our reservation amounted to one kernel map entry
per virtual page in the KVA space on a 4GB i386.
types are not required, as the overhead is unnecessary:
o In the i386 pmap_protect(), `sindex' and `eindex' represent page
indices within the 32-bit virtual address space.
o In swp_pager_meta_build() and swp_pager_meta_ctl(), use a temporary
variable to store the low few bits of a vm_pindex_t that gets used
as an array index.
o vm_uiomove() uses `osize' and `idx' for page offsets within a
map entry.
o In vm_object_split(), `idx' is a page offset within a map entry.
release of Giant around the direct manipulation of the vm_object and
the optional call to pmap_object_init_pt().
o In vm_map_findspace(), remove GIANT_REQUIRED. Instead, acquire and
release Giant around the occasional call to pmap_growkernel().
o In vm_map_find(), remove GIANT_REQUIRED.
release of Giant.
o Reduce the scope of GIANT_REQUIRED in vm_map_insert().
These changes will enable us to remove the acquisition and release
of Giant from obreak().
allocated slabs and bucket caches for free items. It will not go ask the vm
for pages. This differs from M_NOWAIT in that it not only doesn't block, it
doesn't even ask.
- Add a new zcreate option ZONE_VM, that sets the BUCKETCACHE zflag. This
tells uma that it should only allocate buckets out of the bucket cache, and
not from the VM. It does this by using the M_NOVM option to zalloc when
getting a new bucket. This is so that the VM doesn't recursively enter
itself while trying to allocate buckets for vm_map_entry zones. If there
are already allocated buckets when we get here we'll still use them but
otherwise we'll skip it.
- Use the ZONE_VM flag on vm map entries and pv entries on x86.
vm_map_user_pageable().
o Remove vm_map_pageable() and vm_map_user_pageable().
o Remove vm_map_clear_recursive() and vm_map_set_recursive(). (They were
only used by vm_map_pageable() and vm_map_user_pageable().)
Reviewed by: tegge
Submitted by: tegge
o Eliminate the "!mapentzone" check from vm_map_entry_create() and
vm_map_entry_dispose(). Reviewed by: tegge
o Fix white-space usage in vm_map_entry_create().
or user vm_maps. This implementation has two key benefits when compared
to vm_map_{user_,}pageable(): (1) it avoids a race condition through
the use of "in-transition" vm_map entries and (2) it eliminates lock
recursion on the vm_map.
Note: there is still an error case that requires clean up.
Reviewed by: tegge
o Add a stub for vm_map_wire().
Note: the description of the previous commit had an error. The in-
transition flag actually blocks the deallocation of a vm_map_entry by
vm_map_delete() and vm_map_simplify_entry().
or user vm_maps. In accordance with the standards for munlock(2),
and in contrast to vm_map_user_pageable(), this implementation does not
allow holes in the specified region. This implementation uses the
"in transition" flag described below.
o Introduce a new flag, "in transition," to the vm_map_entry.
Eventually, vm_map_delete() and vm_map_simplify_entry() will respect
this flag by deallocating in-transition vm_map_entrys, allowing
the vm_map lock to be safely released in vm_map_unwire() and (the
forthcoming) vm_map_wire().
o Modify vm_map_simplify_entry() to respect the in-transition flag.
In collaboration with: tegge
vm_map_create(), and vm_map_submap().
o Make further use of a local variable in vm_map_entry_splay()
that caches a reference to one of a vm_map_entry's children.
(This reduces code size somewhat.)
o Revert a part of revision 1.66, deinlining vmspace_pmap().
(This function is MPSAFE.)
deinlining vm_map_entry_behavior() and vm_map_entry_set_behavior()
actually increases the kernel's size.
o Make vm_map_entry_set_behavior() static and add a comment describing
its purpose.
o Remove an unnecessary initialization statement from vm_map_entry_splay().
into the vm_object layer:
o Acquire and release Giant in vm_object_shadow() and
vm_object_page_remove().
o Remove the GIANT_REQUIRED assertion preceding vm_map_delete()'s call
to vm_object_page_remove().
o Remove the acquisition and release of Giant around vm_map_lookup()'s
call to vm_object_shadow().
and vm_map_delete(). Assert GIANT_REQUIRED in vm_map_delete()
only if operating on the kernel_object or the kmem_object.
o Remove GIANT_REQUIRED from vm_map_remove().
o Remove the acquisition and release of Giant from munmap().
the last accessed datum is moved to the root of the splay tree.
Therefore, on lookups in which the hint resulted in O(1) access,
the splay tree still achieves O(1) access. In contrast, on lookups
in which the hint failed miserably, the splay tree achieves amortized
logarithmic complexity, resulting in dramatic improvements on vm_maps
with a large number of entries. For example, the execution time
for replaying an access log from www.cs.rice.edu against the thttpd
web server was reduced by 23.5% due to the large number of files
simultaneously mmap()ed by this server. (The machine in question has
enough memory to cache most of this workload.)
Nothing comes for free: At present, I see a 0.2% slowdown on "buildworld"
due to the overhead of maintaining the splay tree. I believe that
some or all of this can be eliminated through optimizations
to the code.
Developed in collaboration with: Juan E Navarro <jnavarro@cs.rice.edu>
Reviewed by: jeff
release Giant around vm_map_madvise()'s call to pmap_object_init_pt().
o Replace GIANT_REQUIRED in vm_object_madvise() with the acquisition
and release of Giant.
o Remove the acquisition and release of Giant from madvise().
vm_object_deallocate(), replacing the assertion GIANT_REQUIRED.
o Remove GIANT_REQUIRED from vm_map_protect() and vm_map_simplify_entry().
o Acquire and release Giant around vm_map_protect()'s call to pmap_protect().
Altogether, these changes eliminate the need for mprotect() to acquire
and release Giant.
mutex class. Currently this is only used for kmapentzone because kmapents
are are potentially allocated when freeing memory. This is not dangerous
though because no other allocations will be done while holding the
kmapentzone lock.
in the same style as sys/proc.h.
o Undo the de-inlining of several trivial, MPSAFE methods on the vm_map.
(Contrary to the commit message for vm_map.h revision 1.66 and vm_map.c
revision 1.206, de-inlining these methods increased the kernel's size.)
statclock can access it in the tail end of statclock_process() at an
unfortunate time. This bit me several times on an SMP alpha (UP2000)
and the problem went away with this change. I'm not sure why it doesn't
break x86 as well. Maybe it's because the clocks are much faster
on alpha (HZ=1024 by default).
and pmap_copy_page(). This gets rid of a couple more physical addresses
in upper layers, with the eventual aim of supporting PAE and dealing with
the physical addressing mostly within pmap. (We will need either 64 bit
physical addresses or page indexes, possibly both depending on the
circumstances. Leaving this to pmap itself gives more flexibilitly.)
Reviewed by: jake
Tested on: i386, ia64 and (I believe) sparc64. (my alpha was hosed)
best path forward now is likely to change the lockmgr locks to simple
sleep mutexes, then see if any extra contention it generates is greater
than removed overhead of managing local locking state information,
cost of extra calls into lockmgr, etc.
Additionally, making the vm_map lock a mutex and respecting it properly
will put us much closer to not needing Giant magic in vm.
While doing this, move it earlier in the sysinit boot process so that the
VM system can use it.
After that, the system is now able to use sx locks instead of lockmgr
locks in the VM system. To accomplish this, some of the more
questionable uses of the locks (such as testing whether they are
owned or not, as well as allowing shared+exclusive recursion) are
removed, and simpler logic throughout is used so locks should also be
easier to understand.
This has been tested on my laptop for months, and has not shown any
problems on SMP systems, either, so appears quite safe. One more
user of lockmgr down, many more to go :)
style(9)
- Minor space adjustment in cases where we have "( ", " )", if(), return(),
while(), for(), etc.
- Add /* SYMBOL */ after a few #endifs.
Reviewed by: alc
shared.
Also introduce vm_endcopy instead of using pointer tricks when
initializing new vmspaces.
The race occured because of how the reference was utilized:
test vmspace reference,
possibly block,
decrement reference
When sharing a vmspace between multiple processes it was possible
for two processes exiting at the same time to test the reference
count, possibly block and neither one free because they wouldn't
see the other's update.
Submitted by: green
(allocating pv entries w/ zalloci) when called in a loop due to
an madvise(). It is possible to completely exhaust the free page list and
cause a system panic when an expected allocation fails.
- vm map entries are not valid after the map has been unlocked.
- An exclusive lock on the map is needed before calling
vm_map_simplify_entry().
Fix cleanup after page wiring failure to unwire all pages that had been
successfully wired before the failure was detected.
Reviewed by: dillon
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.
Sorry john! (your next MFC will be a doosie!)
Reviewed by: peter@freebsd.org, dillon@freebsd.org
X-MFC after: ha ha ha ha
most of these inlines had been bloated in -current far beyond their
original intent. Normalize prototypes and function declarations to be ANSI
only (half already were). And do some general cleanup.
(kernel size also reduced by 50-100K, but that isn't the prime intent)
(this commit is just the first stage). Also add various GIANT_ macros to
formalize the removal of Giant, making it easy to test in a more piecemeal
fashion. These macros will allow us to test fine-grained locks to a degree
before removing Giant, and also after, and to remove Giant in a piecemeal
fashion via sysctl's on those subsystems which the authors believe can
operate without Giant.