Commit Graph

366 Commits

Author SHA1 Message Date
Alan Cox
b7903e65fb Remove GIANT_REQUIRED from vmspace_exec().
Prodded by: jeff
2005-05-02 07:05:20 +00:00
Alan Cox
986b43f845 Add checks to vm_map_findspace() to test for address wrap. The conditions
where this could occur are very rare, but possible.

Submitted by: Mark W. Krentel
MFC after: 2 weeks
2005-01-18 19:50:09 +00:00
Warner Losh
60727d8b86 /* -> /*- for license, minor formatting changes 2005-01-07 02:29:27 +00:00
Alan Cox
1f70d62298 Modify pmap_enter_quick() so that it expects the page queues to be locked
on entry and it assumes the responsibility for releasing the page queues
lock if it must sleep.

Remove a bogus comment from pmap_enter_quick().

Using the first change, modify vm_map_pmap_enter() so that the page queues
lock is acquired and released once, rather than each time that a page
is mapped.
2004-12-23 20:16:11 +00:00
Alan Cox
85f5b24573 In the common case, pmap_enter_quick() completes without sleeping.
In such cases, the busying of the page and the unlocking of the
containing object by vm_map_pmap_enter() and vm_fault_prefault() is
unnecessary overhead.  To eliminate this overhead, this change
modifies pmap_enter_quick() so that it expects the object to be locked
on entry and it assumes the responsibility for busying the page and
unlocking the object if it must sleep.  Note: alpha, amd64, i386 and
ia64 are the only implementations optimized by this change; arm,
powerpc, and sparc64 still conservatively busy the page and unlock the
object within every pmap_enter_quick() call.

Additionally, this change is the first case where we synchronize
access to the page's PG_BUSY flag and busy field using the containing
object's lock rather than the global page queues lock.  (Modifications
to the page's PG_BUSY flag and busy field have asserted both locks for
several weeks, enabling an incremental transition.)
2004-12-15 19:55:05 +00:00
Alan Cox
94ddc7076d Push Giant deep into vm_forkproc(), acquiring it only if the process has
mapped System V shared memory segments (see shmfork_myhook()) or requires
the allocation of an ldt (see vm_fault_wire()).
2004-09-03 05:11:32 +00:00
Alan Cox
c1fbc251cd - Introduce and use a new tunable "debug.mpsafevm". At present, setting
"debug.mpsafevm" results in (almost) Giant-free execution of zero-fill
   page faults.  (Giant is held only briefly, just long enough to determine
   if there is a vnode backing the faulting address.)

   Also, condition the acquisition and release of Giant around calls to
   pmap_remove() on "debug.mpsafevm".

   The effect on performance is significant.  On my dual Opteron, I see a
   3.6% reduction in "buildworld" time.

 - Use atomic operations to update several counters in vm_fault().
2004-08-16 06:16:12 +00:00
Brian Feldman
7c938963a4 Rather than bringing back all of the changes to make VM map deletion
wait for system wires to disappear, do so (much more trivially) by
instead only checking for system wires of user maps and not kernel maps.

Alternative by:	tor
Reviewed by:	alc
2004-08-16 03:11:09 +00:00
Alan Cox
6eaee3fee4 Remove spl calls. 2004-08-14 18:57:41 +00:00
Alan Cox
0164e05781 Replace the linear search in vm_map_findspace() with an O(log n)
algorithm built into the map entry splay tree.  This replaces the
first_free hint in struct vm_map with two fields in vm_map_entry:
adj_free, the amount of free space following a map entry, and
max_free, the maximum amount of free space in the entry's subtree.
These fields make it possible to find a first-fit free region of a
given size in one pass down the tree, so O(log n) amortized using
splay trees.

This significantly reduces the overhead in vm_map_findspace() for
applications that mmap() many hundreds or thousands of regions, and
has a negligible slowdown (0.1%) on buildworld.  See, for example, the
discussion of a micro-benchmark titled "Some mmap observations
compared to Linux 2.6/OpenBSD" on -hackers in late October 2003.

OpenBSD adopted this approach in March 2002, and NetBSD added it in
November 2003, both with Red-Black trees.

Submitted by: Mark W. Krentel
2004-08-13 08:06:34 +00:00
Tor Egge
19dc560756 The vm map lock is needed in vm_fault() after the page has been found,
to avoid later changes before pmap_enter() and vm_fault_prefault()
has completed.

Simplify deadlock avoidance by not blocking on vm map relookup.

In collaboration with: alc
2004-08-12 20:14:49 +00:00
Brian Feldman
c5f60ffccf Re-delete the comment from r1.352. 2004-08-12 17:22:28 +00:00
Brian Feldman
0ada205ee6 Back out all behavioral chnages. 2004-08-10 14:42:48 +00:00
Brian Feldman
9689d5e5ee Revamp VM map wiring.
* Allow no-fault wiring/unwiring to succeed for consistency;
  however, the wired count remains at zero, so it's a special case.

* Fix issues inside vm_map_wire() and vm_map_unwire() where the
  exact state of user wiring (one or zero) and system wiring
  (zero or more) could be confused; for example, system unwiring
  could succeed in removing a user wire, instead of being an
  error.

* Require all mappings to be unwired before they are deleted.
  When VM space is still wired upon deletion, it will be waited
  upon for the following unwire.  This makes vslock(9) work
  rather than allowing kernel-locked memory to be deleted
  out from underneath of its consumer as it would before.
2004-08-09 19:52:29 +00:00
Alan Cox
91491c3566 Remove a stale comment from vm_map_lookup() that pertains to share maps.
(The last vestiges of the share map code were removed in revisions 1.153
and 1.159.)
2004-08-09 18:15:46 +00:00
Alan Cox
684a62b7bf - Push down the acquisition and release of Giant into pmap_enter_quick()
on those architectures without pmap locking.
 - Eliminate the acquisition and release of Giant in vm_map_pmap_enter().
2004-08-04 22:03:16 +00:00
Brian Feldman
b23f72e98a * Add a "how" argument to uma_zone constructors and initialization functions
so that they know whether the allocation is supposed to be able to sleep
  or not.
* Allow uma_zone constructors and initialation functions to return either
  success or error.  Almost all of the ones in the tree currently return
  success unconditionally, but mbuf is a notable exception: the packet
  zone constructor wants to be able to fail if it cannot suballocate an
  mbuf cluster, and the mbuf allocators want to be able to fail in general
  in a MAC kernel if the MAC mbuf initializer fails.  This fixes the
  panics people are seeing when they run out of memory for mbuf clusters.
* Allow debug.nosleepwithlocks on WITNESS to be disabled, without changing
  the default.

Both bmilekic and jeff have reviewed the changes made to make failable
zone allocations work.
2004-08-02 00:18:36 +00:00
Alan Cox
9bb0e06861 - Push down the acquisition and release of Giant into pmap_protect() on
those architectures without pmap locking.
 - Eliminate the acquisition and release of Giant from vm_map_protect().

(Translation: mprotect(2) runs to completion without touching Giant on
alpha, amd64, i386 and ia64.)
2004-07-30 20:38:30 +00:00
Maxime Henrion
12c649749c Get rid of another lockmgr(9) consumer by using sx locks for the user
maps.  We always acquire the sx lock exclusively here, but we can't
use a mutex because we want to be able to sleep while holding the
lock.  This is completely equivalent to what we were doing with the
lockmgr(9) locks before.

Approved by:	alc
2004-07-30 09:10:28 +00:00
Alan Cox
1a276a3f91 - Use atomic ops for updating the vmspace's refcnt and exitingcnt.
- Push down Giant into shmexit().  (Giant is acquired only if the vmspace
   contains shm segments.)
 - Eliminate the acquisition of Giant from proc_rwmem().
 - Reduce the scope of Giant in exit1(), uncovering the destruction of the
   address space.
2004-07-27 03:53:41 +00:00
Alan Cox
57a21aba93 Make the code and comments for vm_object_coalesce() consistent. 2004-07-25 07:48:47 +00:00
Alan Cox
51ab6c2890 Simplify vmspace initialization. The bcopy() of fields from the old
vmspace to the new vmspace in vmspace_exec() is mostly wasted effort.  With
one exception, vm_swrss, the copied fields are immediately overwritten.
Instead, initialize these fields to zero in vmspace_alloc(), eliminating a
bcopy() from vmspace_exec() and a bzero() from vmspace_fork().
2004-07-24 07:40:35 +00:00
Peter Wemm
5476633aed Semi-gratuitous change. Move two refcount operations to their own lines
rather than be buried inside an if (expression).  And now that the if
expression is the same in both exit paths, use the same ordering.
2004-07-21 05:08:10 +00:00
Peter Wemm
3f25cbddc2 Move the initialization and teardown of pmaps to the vmspace zone's
init and fini handlers.  Our vm system removes all userland mappings at
exit prior to calling pmap_release.  It just so happens that we might
as well reuse the pmap for the next process since the userland slate
has already been wiped clean.

However.  There is a functional benefit to this as well.  For platforms
that share userland and kernel context in the same pmap, it means that
the kernel portion of a pmap remains valid after the vmspace has been
freed (process exit) and while it is in uma's cache.  This is significant
for i386 SMP systems with kernel context borrowing because it avoids
a LOT of IPIs from the pmap_lazyfix() cleanup in the usual case.

Tested on:  amd64, i386, sparc64, alpha
Glanced at by:  alc
2004-07-21 00:29:21 +00:00
Alan Cox
3d2e54c317 Push down the acquisition and release of the page queues lock into
pmap_protect() and pmap_remove().  In general, they require the lock in
order to modify a page's pv list or flags.  In some cases, however,
pmap_protect() can avoid acquiring the lock.
2004-07-15 18:00:43 +00:00
Andrew Gallatin
b351299ca3 Use MIN() macro rather than ulmin() inline, and fix stray tab
that snuck in with my last commit.

Submitted by: green
2004-06-28 19:58:39 +00:00
Andrew Gallatin
1dad8fe1ed Fix alpha - the use of min() on longs was loosing the high bits and
returning wrong answers, leading to strange values vm2->vm_{s,t,d}size.
2004-06-28 19:15:40 +00:00
Brian Feldman
2a7be1b6d1 Correct the tracking of various bits of the process's vmspace and vm_map
when not propogated on fork (due to minherit(2)).  Consistency checks
otherwise fail when the vm_map is freed and it appears to have not been
emptied completely, causing an INVARIANTS panic in vm_map_zdtor().

PR:		kern/68017
Submitted by:	Mark W. Krentel <krentel@dreamscape.com>
Reviewed by:	alc
2004-06-24 22:43:46 +00:00
Dag-Erling Smørgrav
b103b94801 Back out previous commit; it went to the wrong file. 2004-05-25 18:28:52 +00:00
Dag-Erling Smørgrav
9507605f93 MFS: rev 1.187.2.27 through 1.187.2.29, fix MS_INVALIDATE semantics but
provide a sysctl knob for reverting to old ones.
2004-05-25 16:31:49 +00:00
Alan Cox
3ffbc0cd8e Correct two error cases in vm_map_unwire():
1. Contrary to the Single Unix Specification our implementation of
   munlock(2) when performed on an unwired virtual address range has
   returned an error.  Correct this.  Note, however, that the behavior
   of "system" unwiring is unchanged, only "user" unwiring is changed.
   If "system" unwiring is performed on an unwired virtual address
   range, an error is still returned.

2. Performing an errant "system" unwiring on a virtual address range
   that was "user" (i.e., mlock(2)) but not "system" wired would
   incorrectly undo the "user" wiring instead of returning an error.
   Correct this.

Discussed with:  green@
Reviewed by:     tegge@
2004-05-25 05:51:17 +00:00
Alan Cox
4be14af9cf To date, unwiring a fictitious page has produced a panic. The reason
being that PHYS_TO_VM_PAGE() returns the wrong vm_page for fictitious
pages but unwiring uses PHYS_TO_VM_PAGE().  The resulting panic
reported an unexpected wired count.  Rather than attempting to fix
PHYS_TO_VM_PAGE(), this fix takes advantage of the properties of
fictitious pages.  Specifically, fictitious pages will never be
completely unwired.  Therefore, we can keep a fictitious page's wired
count forever set to one and thereby avoid the use of
PHYS_TO_VM_PAGE() when we know that we're working with a fictitious
page, just not which one.

In collaboration with: green@, tegge@
PR: kern/29915
2004-05-22 04:53:51 +00:00
Brian Feldman
af7cd0c521 Properly remove MAP_FUTUREWIRE when a vm_map_entry gets torn down.
Previously, mlockall(2) usage would leak MAP_FUTUREWIRE of the process's
vmspace::vm_map and subsequent processes would wire all of their memory.
Coupled with a wired-page leak in vm_fault_unwire(), this would run the
system out of free pages and cause programs to randomly SIGBUS when
faulting in new pages.

(Note that this is not the fix for the latter part; pages are still
 leaked when a wired area is unmapped in some cases.)

Reviewed by:	alc
PR		kern/62930
2004-05-07 00:17:07 +00:00
Alan Cox
4da4d293df In cases where a file was resident in memory mmap(..., PROT_NONE, ...)
would actually map the file with read access enabled.  According to
http://www.opengroup.org/onlinepubs/007904975/functions/mmap.html this is
an error.  Similarly, an madvise(..., MADV_WILLNEED) would enable read
access on a virtual address range that was PROT_NONE.

The solution implemented herein is (1) to pass a vm_prot_t to
vm_map_pmap_enter() describing the allowed access and (2) to make
vm_map_pmap_enter() responsible for understanding the limitations of
pmap_enter_quick().

Submitted by:	"Mark W. Krentel" <krentel@dreamscape.com>
PR:		kern/64573
2004-04-24 03:46:44 +00:00
Warner Losh
05eb3785e7 Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999.

Approved by: core
2004-04-06 20:15:37 +00:00
Tim J. Robbins
ed0302e6a7 Do not copy vm_exitingcnt to the new vmspace in vmspace_exec(). Copying
it led to impossibly high values in the new vmspace, causing it to never
drop to 0 and be freed.
2004-03-23 08:37:34 +00:00
Alan Cox
fcffa790e9 Retire pmap_pinit2(). Alpha was the last platform that used it. However,
ever since alpha/alpha/pmap.c revision 1.81 introduced the list allpmaps,
there has been no reason for having this function on Alpha.  Briefly,
when pmap_growkernel() relied upon the list of all processes to find and
update the various pmaps to reflect a growth in the kernel's valid
address space, pmap_init2() served to avoid a race between pmap
initialization and pmap_growkernel().  Specifically, pmap_pinit2() was
responsible for initializing the kernel portions of the pmap and
pmap_pinit2() was called after the process structure contained a pointer
to the new pmap for use by pmap_growkernel().  Thus, an update to the
kernel's address space might be applied to the new pmap unnecessarily,
but an update would never be lost.
2004-03-07 21:06:48 +00:00
Alan Cox
40448065e8 Further reduce the use of Giant in vm_map_delete(): Perform pmap_remove()
on system maps, besides the kmem_map, without Giant.

In collaboration with:	tegge
2004-02-12 20:56:06 +00:00
Alan Cox
bfee999d6a - Locking for the per-process resource limits structure has eliminated
the need for Giant in vm_map_growstack().
 - Use the proc * that is passed to vm_map_growstack() rather than
   curthread->td_proc.
2004-02-05 06:33:18 +00:00
John Baldwin
91d5354a2c Locking for the per-process resource limits structure.
- struct plimit includes a mutex to protect a reference count.  The plimit
  structure is treated similarly to struct ucred in that is is always copy
  on write, so having a reference to a structure is sufficient to read from
  it without needing a further lock.
- The proc lock protects the p_limit pointer and must be held while reading
  limits from a process to keep the limit structure from changing out from
  under you while reading from it.
- Various global limits that are ints are not protected by a lock since
  int writes are atomic on all the archs we support and thus a lock
  wouldn't buy us anything.
- All accesses to individual resource limits from a process are abstracted
  behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return
  either an rlimit, or the current or max individual limit of the specified
  resource from a process.
- dosetrlimit() was renamed to kern_setrlimit() to match existing style of
  other similar syscall helper functions.
- The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit()
  (it didn't used the stackgap when it should have) but uses lim_rlimit()
  and kern_setrlimit() instead.
- The svr4 compat no longer uses the stackgap for resource limits calls,
  but uses lim_rlimit() and kern_setrlimit() instead.
- The ibcs2 compat no longer uses the stackgap for resource limits.  It
  also no longer uses the stackgap for accessing sysctl's for the
  ibcs2_sysconf() syscall but uses kernel_sysctl() instead.  As a result,
  ibcs2_sysconf() no longer needs Giant.
- The p_rlimit macro no longer exists.

Submitted by:	mtm (mostly, I only did a few cleanups and catchups)
Tested on:	i386
Compiled on:	alpha, amd64
2004-02-04 21:52:57 +00:00
John Baldwin
b56ef1c10d Drop the reference count on the old vmspace after fully switching the
current thread to the new vmspace.

Suggested by:	dillon
2004-02-02 23:23:48 +00:00
Alan Cox
4da9f125cc - Modify vm_object_split() to expect a locked vm object on entry and
return on a locked vm object on exit.  Remove GIANT_REQUIRED.
 - Eliminate some unnecessary local variables from vm_object_split().
2003-12-30 22:28:36 +00:00
Alan Cox
75898105c0 Minor correction to revision 1.258: Use the proc pointer that is passed to
vm_map_growstack() in the RLIMIT_VMEM check rather than curthread.
2003-12-26 21:54:45 +00:00
Alan Cox
1cd5fbd854 - Avoid a lock-order reversal between Giant and a system map mutex that
occurs when kmem_malloc() fails to allocate a sufficient number of vm
   pages.  Specifically, we avoid the lock-order reversal by not grabbing
   Giant around pmap_remove() if the map is the kmem_map.

Approved by:	re (jhb)
Reported by:	Eugene <eugene3@web.de>
2003-11-19 18:48:45 +00:00
Alan Cox
b7b7cd4421 Changes to msync(2)
- Return EBUSY if the region was wired by mlock(2) and MS_INVALIDATE
   is specified to msync(2).  This is required by the Open Group Base
   Specifications Issue 6.
 - vm_map_sync() doesn't return KERN_FAILURE.  Thus, msync(2) can't
   possibly return EIO.
 - The second major loop in vm_map_sync() handles sub maps.  Thus,
   failing on sub maps in the first major loop isn't necessary.
2003-11-14 06:55:11 +00:00
Alan Cox
d88346020b - The Open Group Base Specifications Issue 6 specifies that an munmap(2)
must return EINVAL if size is zero.  Submitted by: tegge
 - In order to avoid a race condition in multithreaded applications, the
   check and removal operations by munmap(2) must be in the same critical
   section.  To accomodate this, vm_map_check_protection() is modified to
   require its caller to obtain at least a read lock on the map.
2003-11-10 01:37:40 +00:00
Alan Cox
637315ed9c - Remove Giant from msync(2). Giant is still acquired by the lower layers
if we drop into the pmap or vnode layers.
 - Migrate the handling of zero-length msync(2)s into vm_map_sync() so that
   multithread applications can't change the map between implementing the
   zero-length hack in msync(2) and reacquiring the map lock in
   vm_map_sync().

Reviewed by:	tegge
2003-11-09 22:09:04 +00:00
Alan Cox
950f8459d4 - Rename vm_map_clean() to vm_map_sync(). This better reflects the fact
that msync(2) is its only caller.
 - Migrate the parts of the old vm_map_clean() that examined the internals
   of a vm object to a new function vm_object_sync() that is implemented in
   vm_object.c.  At the same, introduce the necessary vm object locking so
   that vm_map_sync() and vm_object_sync() can be called without Giant.

Reviewed by:	tegge
2003-11-09 05:25:35 +00:00
Alan Cox
32a89c324e - Move the implementation of OBJ_ONEMAPPING from vm_map_delete() to
vm_map_entry_delete() so that all of the vm object manipulation is
   performed in one place.
2003-11-05 05:48:22 +00:00
Marcel Moolenaar
199c91ab79 Update avail_ssize for rstacks after growing them. 2003-11-04 06:48:58 +00:00