Commit Graph

515 Commits

Author SHA1 Message Date
Alan Cox
986b43f845 Add checks to vm_map_findspace() to test for address wrap. The conditions
where this could occur are very rare, but possible.

Submitted by: Mark W. Krentel
MFC after: 2 weeks
2005-01-18 19:50:09 +00:00
Warner Losh
60727d8b86 /* -> /*- for license, minor formatting changes 2005-01-07 02:29:27 +00:00
Alan Cox
1f70d62298 Modify pmap_enter_quick() so that it expects the page queues to be locked
on entry and it assumes the responsibility for releasing the page queues
lock if it must sleep.

Remove a bogus comment from pmap_enter_quick().

Using the first change, modify vm_map_pmap_enter() so that the page queues
lock is acquired and released once, rather than each time that a page
is mapped.
2004-12-23 20:16:11 +00:00
Alan Cox
85f5b24573 In the common case, pmap_enter_quick() completes without sleeping.
In such cases, the busying of the page and the unlocking of the
containing object by vm_map_pmap_enter() and vm_fault_prefault() is
unnecessary overhead.  To eliminate this overhead, this change
modifies pmap_enter_quick() so that it expects the object to be locked
on entry and it assumes the responsibility for busying the page and
unlocking the object if it must sleep.  Note: alpha, amd64, i386 and
ia64 are the only implementations optimized by this change; arm,
powerpc, and sparc64 still conservatively busy the page and unlock the
object within every pmap_enter_quick() call.

Additionally, this change is the first case where we synchronize
access to the page's PG_BUSY flag and busy field using the containing
object's lock rather than the global page queues lock.  (Modifications
to the page's PG_BUSY flag and busy field have asserted both locks for
several weeks, enabling an incremental transition.)
2004-12-15 19:55:05 +00:00
Alan Cox
94ddc7076d Push Giant deep into vm_forkproc(), acquiring it only if the process has
mapped System V shared memory segments (see shmfork_myhook()) or requires
the allocation of an ldt (see vm_fault_wire()).
2004-09-03 05:11:32 +00:00
Alan Cox
c1fbc251cd - Introduce and use a new tunable "debug.mpsafevm". At present, setting
"debug.mpsafevm" results in (almost) Giant-free execution of zero-fill
   page faults.  (Giant is held only briefly, just long enough to determine
   if there is a vnode backing the faulting address.)

   Also, condition the acquisition and release of Giant around calls to
   pmap_remove() on "debug.mpsafevm".

   The effect on performance is significant.  On my dual Opteron, I see a
   3.6% reduction in "buildworld" time.

 - Use atomic operations to update several counters in vm_fault().
2004-08-16 06:16:12 +00:00
Brian Feldman
7c938963a4 Rather than bringing back all of the changes to make VM map deletion
wait for system wires to disappear, do so (much more trivially) by
instead only checking for system wires of user maps and not kernel maps.

Alternative by:	tor
Reviewed by:	alc
2004-08-16 03:11:09 +00:00
Alan Cox
6eaee3fee4 Remove spl calls. 2004-08-14 18:57:41 +00:00
Alan Cox
0164e05781 Replace the linear search in vm_map_findspace() with an O(log n)
algorithm built into the map entry splay tree.  This replaces the
first_free hint in struct vm_map with two fields in vm_map_entry:
adj_free, the amount of free space following a map entry, and
max_free, the maximum amount of free space in the entry's subtree.
These fields make it possible to find a first-fit free region of a
given size in one pass down the tree, so O(log n) amortized using
splay trees.

This significantly reduces the overhead in vm_map_findspace() for
applications that mmap() many hundreds or thousands of regions, and
has a negligible slowdown (0.1%) on buildworld.  See, for example, the
discussion of a micro-benchmark titled "Some mmap observations
compared to Linux 2.6/OpenBSD" on -hackers in late October 2003.

OpenBSD adopted this approach in March 2002, and NetBSD added it in
November 2003, both with Red-Black trees.

Submitted by: Mark W. Krentel
2004-08-13 08:06:34 +00:00
Tor Egge
19dc560756 The vm map lock is needed in vm_fault() after the page has been found,
to avoid later changes before pmap_enter() and vm_fault_prefault()
has completed.

Simplify deadlock avoidance by not blocking on vm map relookup.

In collaboration with: alc
2004-08-12 20:14:49 +00:00
Brian Feldman
c5f60ffccf Re-delete the comment from r1.352. 2004-08-12 17:22:28 +00:00
Brian Feldman
0ada205ee6 Back out all behavioral chnages. 2004-08-10 14:42:48 +00:00
Brian Feldman
9689d5e5ee Revamp VM map wiring.
* Allow no-fault wiring/unwiring to succeed for consistency;
  however, the wired count remains at zero, so it's a special case.

* Fix issues inside vm_map_wire() and vm_map_unwire() where the
  exact state of user wiring (one or zero) and system wiring
  (zero or more) could be confused; for example, system unwiring
  could succeed in removing a user wire, instead of being an
  error.

* Require all mappings to be unwired before they are deleted.
  When VM space is still wired upon deletion, it will be waited
  upon for the following unwire.  This makes vslock(9) work
  rather than allowing kernel-locked memory to be deleted
  out from underneath of its consumer as it would before.
2004-08-09 19:52:29 +00:00
Alan Cox
91491c3566 Remove a stale comment from vm_map_lookup() that pertains to share maps.
(The last vestiges of the share map code were removed in revisions 1.153
and 1.159.)
2004-08-09 18:15:46 +00:00
Alan Cox
684a62b7bf - Push down the acquisition and release of Giant into pmap_enter_quick()
on those architectures without pmap locking.
 - Eliminate the acquisition and release of Giant in vm_map_pmap_enter().
2004-08-04 22:03:16 +00:00
Brian Feldman
b23f72e98a * Add a "how" argument to uma_zone constructors and initialization functions
so that they know whether the allocation is supposed to be able to sleep
  or not.
* Allow uma_zone constructors and initialation functions to return either
  success or error.  Almost all of the ones in the tree currently return
  success unconditionally, but mbuf is a notable exception: the packet
  zone constructor wants to be able to fail if it cannot suballocate an
  mbuf cluster, and the mbuf allocators want to be able to fail in general
  in a MAC kernel if the MAC mbuf initializer fails.  This fixes the
  panics people are seeing when they run out of memory for mbuf clusters.
* Allow debug.nosleepwithlocks on WITNESS to be disabled, without changing
  the default.

Both bmilekic and jeff have reviewed the changes made to make failable
zone allocations work.
2004-08-02 00:18:36 +00:00
Alan Cox
9bb0e06861 - Push down the acquisition and release of Giant into pmap_protect() on
those architectures without pmap locking.
 - Eliminate the acquisition and release of Giant from vm_map_protect().

(Translation: mprotect(2) runs to completion without touching Giant on
alpha, amd64, i386 and ia64.)
2004-07-30 20:38:30 +00:00
Maxime Henrion
12c649749c Get rid of another lockmgr(9) consumer by using sx locks for the user
maps.  We always acquire the sx lock exclusively here, but we can't
use a mutex because we want to be able to sleep while holding the
lock.  This is completely equivalent to what we were doing with the
lockmgr(9) locks before.

Approved by:	alc
2004-07-30 09:10:28 +00:00
Alan Cox
1a276a3f91 - Use atomic ops for updating the vmspace's refcnt and exitingcnt.
- Push down Giant into shmexit().  (Giant is acquired only if the vmspace
   contains shm segments.)
 - Eliminate the acquisition of Giant from proc_rwmem().
 - Reduce the scope of Giant in exit1(), uncovering the destruction of the
   address space.
2004-07-27 03:53:41 +00:00
Alan Cox
57a21aba93 Make the code and comments for vm_object_coalesce() consistent. 2004-07-25 07:48:47 +00:00
Alan Cox
51ab6c2890 Simplify vmspace initialization. The bcopy() of fields from the old
vmspace to the new vmspace in vmspace_exec() is mostly wasted effort.  With
one exception, vm_swrss, the copied fields are immediately overwritten.
Instead, initialize these fields to zero in vmspace_alloc(), eliminating a
bcopy() from vmspace_exec() and a bzero() from vmspace_fork().
2004-07-24 07:40:35 +00:00
Peter Wemm
5476633aed Semi-gratuitous change. Move two refcount operations to their own lines
rather than be buried inside an if (expression).  And now that the if
expression is the same in both exit paths, use the same ordering.
2004-07-21 05:08:10 +00:00
Peter Wemm
3f25cbddc2 Move the initialization and teardown of pmaps to the vmspace zone's
init and fini handlers.  Our vm system removes all userland mappings at
exit prior to calling pmap_release.  It just so happens that we might
as well reuse the pmap for the next process since the userland slate
has already been wiped clean.

However.  There is a functional benefit to this as well.  For platforms
that share userland and kernel context in the same pmap, it means that
the kernel portion of a pmap remains valid after the vmspace has been
freed (process exit) and while it is in uma's cache.  This is significant
for i386 SMP systems with kernel context borrowing because it avoids
a LOT of IPIs from the pmap_lazyfix() cleanup in the usual case.

Tested on:  amd64, i386, sparc64, alpha
Glanced at by:  alc
2004-07-21 00:29:21 +00:00
Alan Cox
3d2e54c317 Push down the acquisition and release of the page queues lock into
pmap_protect() and pmap_remove().  In general, they require the lock in
order to modify a page's pv list or flags.  In some cases, however,
pmap_protect() can avoid acquiring the lock.
2004-07-15 18:00:43 +00:00
Andrew Gallatin
b351299ca3 Use MIN() macro rather than ulmin() inline, and fix stray tab
that snuck in with my last commit.

Submitted by: green
2004-06-28 19:58:39 +00:00
Andrew Gallatin
1dad8fe1ed Fix alpha - the use of min() on longs was loosing the high bits and
returning wrong answers, leading to strange values vm2->vm_{s,t,d}size.
2004-06-28 19:15:40 +00:00
Brian Feldman
2a7be1b6d1 Correct the tracking of various bits of the process's vmspace and vm_map
when not propogated on fork (due to minherit(2)).  Consistency checks
otherwise fail when the vm_map is freed and it appears to have not been
emptied completely, causing an INVARIANTS panic in vm_map_zdtor().

PR:		kern/68017
Submitted by:	Mark W. Krentel <krentel@dreamscape.com>
Reviewed by:	alc
2004-06-24 22:43:46 +00:00
Dag-Erling Smørgrav
b103b94801 Back out previous commit; it went to the wrong file. 2004-05-25 18:28:52 +00:00
Dag-Erling Smørgrav
9507605f93 MFS: rev 1.187.2.27 through 1.187.2.29, fix MS_INVALIDATE semantics but
provide a sysctl knob for reverting to old ones.
2004-05-25 16:31:49 +00:00
Alan Cox
3ffbc0cd8e Correct two error cases in vm_map_unwire():
1. Contrary to the Single Unix Specification our implementation of
   munlock(2) when performed on an unwired virtual address range has
   returned an error.  Correct this.  Note, however, that the behavior
   of "system" unwiring is unchanged, only "user" unwiring is changed.
   If "system" unwiring is performed on an unwired virtual address
   range, an error is still returned.

2. Performing an errant "system" unwiring on a virtual address range
   that was "user" (i.e., mlock(2)) but not "system" wired would
   incorrectly undo the "user" wiring instead of returning an error.
   Correct this.

Discussed with:  green@
Reviewed by:     tegge@
2004-05-25 05:51:17 +00:00
Alan Cox
4be14af9cf To date, unwiring a fictitious page has produced a panic. The reason
being that PHYS_TO_VM_PAGE() returns the wrong vm_page for fictitious
pages but unwiring uses PHYS_TO_VM_PAGE().  The resulting panic
reported an unexpected wired count.  Rather than attempting to fix
PHYS_TO_VM_PAGE(), this fix takes advantage of the properties of
fictitious pages.  Specifically, fictitious pages will never be
completely unwired.  Therefore, we can keep a fictitious page's wired
count forever set to one and thereby avoid the use of
PHYS_TO_VM_PAGE() when we know that we're working with a fictitious
page, just not which one.

In collaboration with: green@, tegge@
PR: kern/29915
2004-05-22 04:53:51 +00:00
Brian Feldman
af7cd0c521 Properly remove MAP_FUTUREWIRE when a vm_map_entry gets torn down.
Previously, mlockall(2) usage would leak MAP_FUTUREWIRE of the process's
vmspace::vm_map and subsequent processes would wire all of their memory.
Coupled with a wired-page leak in vm_fault_unwire(), this would run the
system out of free pages and cause programs to randomly SIGBUS when
faulting in new pages.

(Note that this is not the fix for the latter part; pages are still
 leaked when a wired area is unmapped in some cases.)

Reviewed by:	alc
PR		kern/62930
2004-05-07 00:17:07 +00:00
Alan Cox
4da4d293df In cases where a file was resident in memory mmap(..., PROT_NONE, ...)
would actually map the file with read access enabled.  According to
http://www.opengroup.org/onlinepubs/007904975/functions/mmap.html this is
an error.  Similarly, an madvise(..., MADV_WILLNEED) would enable read
access on a virtual address range that was PROT_NONE.

The solution implemented herein is (1) to pass a vm_prot_t to
vm_map_pmap_enter() describing the allowed access and (2) to make
vm_map_pmap_enter() responsible for understanding the limitations of
pmap_enter_quick().

Submitted by:	"Mark W. Krentel" <krentel@dreamscape.com>
PR:		kern/64573
2004-04-24 03:46:44 +00:00
Warner Losh
05eb3785e7 Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999.

Approved by: core
2004-04-06 20:15:37 +00:00
Tim J. Robbins
ed0302e6a7 Do not copy vm_exitingcnt to the new vmspace in vmspace_exec(). Copying
it led to impossibly high values in the new vmspace, causing it to never
drop to 0 and be freed.
2004-03-23 08:37:34 +00:00
Alan Cox
fcffa790e9 Retire pmap_pinit2(). Alpha was the last platform that used it. However,
ever since alpha/alpha/pmap.c revision 1.81 introduced the list allpmaps,
there has been no reason for having this function on Alpha.  Briefly,
when pmap_growkernel() relied upon the list of all processes to find and
update the various pmaps to reflect a growth in the kernel's valid
address space, pmap_init2() served to avoid a race between pmap
initialization and pmap_growkernel().  Specifically, pmap_pinit2() was
responsible for initializing the kernel portions of the pmap and
pmap_pinit2() was called after the process structure contained a pointer
to the new pmap for use by pmap_growkernel().  Thus, an update to the
kernel's address space might be applied to the new pmap unnecessarily,
but an update would never be lost.
2004-03-07 21:06:48 +00:00
Alan Cox
40448065e8 Further reduce the use of Giant in vm_map_delete(): Perform pmap_remove()
on system maps, besides the kmem_map, without Giant.

In collaboration with:	tegge
2004-02-12 20:56:06 +00:00
Alan Cox
bfee999d6a - Locking for the per-process resource limits structure has eliminated
the need for Giant in vm_map_growstack().
 - Use the proc * that is passed to vm_map_growstack() rather than
   curthread->td_proc.
2004-02-05 06:33:18 +00:00
John Baldwin
91d5354a2c Locking for the per-process resource limits structure.
- struct plimit includes a mutex to protect a reference count.  The plimit
  structure is treated similarly to struct ucred in that is is always copy
  on write, so having a reference to a structure is sufficient to read from
  it without needing a further lock.
- The proc lock protects the p_limit pointer and must be held while reading
  limits from a process to keep the limit structure from changing out from
  under you while reading from it.
- Various global limits that are ints are not protected by a lock since
  int writes are atomic on all the archs we support and thus a lock
  wouldn't buy us anything.
- All accesses to individual resource limits from a process are abstracted
  behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return
  either an rlimit, or the current or max individual limit of the specified
  resource from a process.
- dosetrlimit() was renamed to kern_setrlimit() to match existing style of
  other similar syscall helper functions.
- The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit()
  (it didn't used the stackgap when it should have) but uses lim_rlimit()
  and kern_setrlimit() instead.
- The svr4 compat no longer uses the stackgap for resource limits calls,
  but uses lim_rlimit() and kern_setrlimit() instead.
- The ibcs2 compat no longer uses the stackgap for resource limits.  It
  also no longer uses the stackgap for accessing sysctl's for the
  ibcs2_sysconf() syscall but uses kernel_sysctl() instead.  As a result,
  ibcs2_sysconf() no longer needs Giant.
- The p_rlimit macro no longer exists.

Submitted by:	mtm (mostly, I only did a few cleanups and catchups)
Tested on:	i386
Compiled on:	alpha, amd64
2004-02-04 21:52:57 +00:00
John Baldwin
b56ef1c10d Drop the reference count on the old vmspace after fully switching the
current thread to the new vmspace.

Suggested by:	dillon
2004-02-02 23:23:48 +00:00
Alan Cox
4da9f125cc - Modify vm_object_split() to expect a locked vm object on entry and
return on a locked vm object on exit.  Remove GIANT_REQUIRED.
 - Eliminate some unnecessary local variables from vm_object_split().
2003-12-30 22:28:36 +00:00
Alan Cox
75898105c0 Minor correction to revision 1.258: Use the proc pointer that is passed to
vm_map_growstack() in the RLIMIT_VMEM check rather than curthread.
2003-12-26 21:54:45 +00:00
Alan Cox
1cd5fbd854 - Avoid a lock-order reversal between Giant and a system map mutex that
occurs when kmem_malloc() fails to allocate a sufficient number of vm
   pages.  Specifically, we avoid the lock-order reversal by not grabbing
   Giant around pmap_remove() if the map is the kmem_map.

Approved by:	re (jhb)
Reported by:	Eugene <eugene3@web.de>
2003-11-19 18:48:45 +00:00
Alan Cox
b7b7cd4421 Changes to msync(2)
- Return EBUSY if the region was wired by mlock(2) and MS_INVALIDATE
   is specified to msync(2).  This is required by the Open Group Base
   Specifications Issue 6.
 - vm_map_sync() doesn't return KERN_FAILURE.  Thus, msync(2) can't
   possibly return EIO.
 - The second major loop in vm_map_sync() handles sub maps.  Thus,
   failing on sub maps in the first major loop isn't necessary.
2003-11-14 06:55:11 +00:00
Alan Cox
d88346020b - The Open Group Base Specifications Issue 6 specifies that an munmap(2)
must return EINVAL if size is zero.  Submitted by: tegge
 - In order to avoid a race condition in multithreaded applications, the
   check and removal operations by munmap(2) must be in the same critical
   section.  To accomodate this, vm_map_check_protection() is modified to
   require its caller to obtain at least a read lock on the map.
2003-11-10 01:37:40 +00:00
Alan Cox
637315ed9c - Remove Giant from msync(2). Giant is still acquired by the lower layers
if we drop into the pmap or vnode layers.
 - Migrate the handling of zero-length msync(2)s into vm_map_sync() so that
   multithread applications can't change the map between implementing the
   zero-length hack in msync(2) and reacquiring the map lock in
   vm_map_sync().

Reviewed by:	tegge
2003-11-09 22:09:04 +00:00
Alan Cox
950f8459d4 - Rename vm_map_clean() to vm_map_sync(). This better reflects the fact
that msync(2) is its only caller.
 - Migrate the parts of the old vm_map_clean() that examined the internals
   of a vm object to a new function vm_object_sync() that is implemented in
   vm_object.c.  At the same, introduce the necessary vm object locking so
   that vm_map_sync() and vm_object_sync() can be called without Giant.

Reviewed by:	tegge
2003-11-09 05:25:35 +00:00
Alan Cox
32a89c324e - Move the implementation of OBJ_ONEMAPPING from vm_map_delete() to
vm_map_entry_delete() so that all of the vm object manipulation is
   performed in one place.
2003-11-05 05:48:22 +00:00
Marcel Moolenaar
199c91ab79 Update avail_ssize for rstacks after growing them. 2003-11-04 06:48:58 +00:00
Dag-Erling Smørgrav
a86fa82659 Whitespace cleanup. 2003-11-03 16:14:45 +00:00
Alan Cox
a89c6258bb - Increase the scope of the source object lock in vm_map_copy_entry(). 2003-11-03 00:59:54 +00:00
Alan Cox
b921a12b3b - Introduce and use vm_object_reference_locked(). Unlike
vm_object_reference(), this function must not be used to reanimate dead
   vm objects.  This restriction simplifies locking.

Reviewed by:	tegge
2003-11-02 21:30:10 +00:00
Marcel Moolenaar
08667f6dc1 Fix two bugs introduced with the rstack functionality and specific to
the rstack functionality:
1. Fix a KASSERT that tests for the address to be above the upward
   growable stack. Typically for rstack, the faulting address can be
   identical to the record end of the upward growable entry, and
   very likely is on ia64. The KASSERT tested for greater than, not
   greater equal, so whenever the register stack had to be grown
   the assertion fired.
2. When we grow the upward growable stack entry and adjust the
   unlying object, don't forget to adjust the size of the VM map.
   Not doing so would trigger an assert in vm_mapzdtor().

Pointy hat: marcel (for not testing with INVARIANTS).
2003-10-31 07:29:28 +00:00
Alan Cox
cbef13d877 Corrections to revision 1.305
- Specifying VM_MAP_WIRE_HOLESOK should not assume that the start
   address is the beginning of the map.  Instead, move to the first
   entry after the start address.
 - The implementation of VM_MAP_WIRE_HOLESOK was incomplete.  This
   caused the failure of mlockall(2) in some circumstances.
2003-10-18 18:48:17 +00:00
Bruce M Simpson
2bc7dd5661 Move pmap_resident_count() from the MD pmap.h to the MI pmap.h.
Add a definition of pmap_wired_count().
Add a definition of vmspace_wired_count().

Reviewed by:	truckman
Discussed with:	peter
2003-10-06 01:47:12 +00:00
Marcel Moolenaar
fd75d71049 Part 2 of implementing rstacks: add the ability to create rstacks and
use the ability on ia64 to map the register stack. The orientation of
the stack (i.e. its grow direction) is passed to vm_map_stack() in the
overloaded cow argument. Since the grow direction is represented by
bits, it is possible and allowed to create bi-directional stacks.
This is not an advertised feature, more of a side-effect.

Fix a bug in vm_map_growstack() that's specific to rstacks and which
we could only find by having the ability to create rstacks: when
the mapped stack ends at the faulting address, we have not actually
mapped the faulting address. we need to include or cover the faulting
address.

Note that at this time mmap(2) has not been extended to allow the
creation of rstacks by processes. If such a need arises, this can
be done.

Tested on: alpha, i386, ia64, sparc64
2003-09-27 22:28:14 +00:00
Mike Silbersack
3fde38df46 Adjust the kmapentzone limit so that it takes into account the size of
maxproc and maxfiles, as procs, pipes, and other structures cause allocations
from kmapentzone.

Submitted by:	tegge
2003-09-23 18:56:54 +00:00
Alan Cox
6c527f260e Change the handling of the kernel and kmem objects in vm_map_delete(): In
order to use "unmanaged" pages in the kmem object, vm_map_delete() must
unconditionally perform pmap_remove().  Otherwise, sparc64 has problems.

Tested by:	jake
2003-09-23 04:28:04 +00:00
Marcel Moolenaar
b21a0008ba Introduce MAP_ENTRY_GROWS_DOWN and MAP_ENTRY_GROWS_UP to allow for
growable (stack) entries that not only grow down, but also grow up.
Have vm_map_growstack() take these flags into account when growing
an entry.

This is the first step in adding support for upward growable stacks.
It is a required feature on ia64 to support the register stack (or
rstack as I like to call it -- it also means reverse stack). We do
not currently create rstacks, so the upward growing is not exercised
and the change should be a functional no-op.

Reviewed by: alc
2003-08-30 21:25:23 +00:00
Alan Cox
5402d8ec23 Remove GIANT_REQUIRED from vmspace_alloc(). 2003-08-13 19:23:51 +00:00
Bruce M Simpson
abd498aa71 Add the mlockall() and munlockall() system calls.
- All those diffs to syscalls.master for each architecture *are*
   necessary. This needed clarification; the stub code generation for
   mlockall() was disabled, which would prevent applications from
   linking to this API (suggested by mux)
 - Giant has been quoshed. It is no longer held by the code, as
   the required locking has been pushed down within vm_map.c.
 - Callers must specify VM_MAP_WIRE_HOLESOK or VM_MAP_WIRE_NOHOLES
   to express their intention explicitly.
 - Inspected at the vmstat, top and vm pager sysctl stats level.
   Paging-in activity is occurring correctly, using a test harness.
 - The RES size for a process may appear to be greater than its SIZE.
   This is believed to be due to mappings of the same shared library
   page being wired twice. Further exploration is needed.
 - Believed to back out of allocations and locks correctly
   (tested with WITNESS, MUTEX_PROFILING, INVARIANTS and DIAGNOSTIC).

PR:             kern/43426, standards/54223
Reviewed by:    jake, alc
Approved by:    jake (mentor)
MFC after:	2 weeks
2003-08-11 07:14:08 +00:00
Poul-Henning Kamp
ec38b344cb Move the implementation of the vmspace_swap_count() (used only in
the "toss the largest process" emergency handling) from vm_map.c to
swap_pager.c.

The quantity calculated depends strongly on the internals of the
swap_pager and by moving it, we no longer need to expose the
internal metrics of the swap_pager to the world.
2003-07-18 10:47:58 +00:00
Alan Cox
1f78f902a8 Background: pmap_object_init_pt() premaps the pages of a object in
order to avoid the overhead of later page faults.  In general, it
implements two cases: one for vnode-backed objects and one for
device-backed objects.  Only the device-backed case is really
machine-dependent, belonging in the pmap.

This commit moves the vnode-backed case into the (relatively) new
function vm_map_pmap_enter().  On amd64 and i386, this commit only
amounts to code rearrangement.  On alpha and ia64, the new machine
independent (MI) implementation of the vnode case is smaller and more
efficient than their pmap-based implementations.  (The MI
implementation takes advantage of the fact that objects in -CURRENT
are ordered collections of pages.)  On sparc64, pmap_object_init_pt()
hadn't (yet) been implemented.
2003-07-03 20:18:02 +00:00
Alan Cox
8526ce9b64 Check the address provided to vm_map_stack() against the vm map's maximum,
returning an error if the address is too high.
2003-07-01 03:57:25 +00:00
Alan Cox
0551c08dee Introduce vm_map_pmap_enter(). Presently, this is a stub calling the MD
pmap_object_init_pt().
2003-06-29 23:32:55 +00:00
Alan Cox
23252eeabe Simple read-modify-write operations on a vm object's flags, ref_count, and
shadow_count can now rely on its mutex for synchronization.  Remove one use
of Giant from vm_map_insert().
2003-06-27 18:52:49 +00:00
Alan Cox
95018011e5 Remove a GIANT_REQUIRED on the kernel object that we no longer need. 2003-06-25 05:31:02 +00:00
David E. O'Brien
874651b13c Use __FBSDID(). 2003-06-11 23:50:51 +00:00
Alan Cox
d7fc221044 Pass the vm object to vm_object_collapse() with its lock held. 2003-06-07 02:29:17 +00:00
Alan Cox
4e73db5f40 Increase the scope of the vm_object lock in vm_map_delete(). 2003-04-30 19:18:09 +00:00
Alan Cox
8ba20a48bd Add vm_object locking to vmspace_swap_count(). 2003-04-30 00:43:17 +00:00
Alan Cox
155080d31e - Extend the scope of two existing vm_object locks to cover
swap_pager_freespace().
2003-04-26 05:30:56 +00:00
Alan Cox
b6e48e0372 - Acquire the vm_object's lock when performing vm_object_page_clean().
- Add a parameter to vm_pageout_flush() that tells vm_pageout_flush()
   whether its caller has locked the vm_object.  (This is a temporary
   measure to bootstrap vm_object locking.)
2003-04-24 04:31:25 +00:00
Alan Cox
1d284e00b5 - Update the vm_object locking in vm_map_insert(). 2003-04-20 21:56:40 +00:00
Alan Cox
7d040e3cc5 Update vm_object locking in vm_map_delete(). 2003-04-20 04:35:47 +00:00
Alan Cox
034b3d7a6f o Update locking around vm_object_page_remove() in vm_map_clean()
to use the new macros.
 o Remove unnecessary increment and decrement of the vm_object's
   reference count in vm_map_clean().
2003-04-19 01:43:32 +00:00
Alan Cox
e2479b4fc3 Lock some manipulations of the vm object's flags. 2003-04-13 20:22:02 +00:00
Poul-Henning Kamp
b4b138c27f Including <sys/stdint.h> is (almost?) universally only to be able to use
%j in printfs, so put a newsted include in <sys/systm.h> where the printf
prototype lives and save everybody else the trouble.
2003-03-18 08:45:25 +00:00
David Schultz
72d97679ff - When the VM daemon is out of swap space and looking for a
process to kill, don't block on a map lock while holding the
  process lock.  Instead, skip processes whose map locks are held
  and find something else to kill.
- Add vm_map_trylock_read() to support the above.

Reviewed by:	alc, mike (mentor)
2003-03-12 23:13:16 +00:00
Alan Cox
09c80124a3 Remove ENABLE_VFS_IOOPT. It is a long unfinished work-in-progress.
Discussed on:	arch@
2003-03-06 03:41:02 +00:00
Warner Losh
a163d034fa Back out M_* changes, per decision of the TRB.
Approved by: trb
2003-02-19 05:47:46 +00:00
Alan Cox
814f5c92d7 Remove the acquisition and release of Giant around pmap_growkernel().
It's unnecessary for two reasons: (1) Giant is at present already held in
such cases and (2) our various implementations of pmap_growkernel() look to
be MP safe.  (For example, for sparc64 the proof of (2) is trivial.)
2003-02-15 20:01:09 +00:00
Alan Cox
d923c5986e Add MTX_DUPOK to the initialization of system map locks. 2003-01-25 18:45:55 +00:00
Alfred Perlstein
44956c9863 Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
2003-01-21 08:56:16 +00:00
Matthew Dillon
2d5c7e4506 Close the remaining user address mapping races for physical
I/O, CAM, and AIO.  Still TODO: streamline useracc() checks.

Reviewed by:	alc, tegge
MFC after:	7 days
2003-01-20 17:46:48 +00:00
Matthew Dillon
3db161e079 It is possible for an active aio to prevent shared memory from being
dereferenced when a process exits due to the vmspace ref-count being
bumped.  Change shmexit() and shmexit_myhook() to take a vmspace instead
of a process and call it in vmspace_dofree().  This way if it is missed
in exit1()'s early-resource-free it will still be caught when the zombie is
reaped.

Also fix a potential race in shmexit_myhook() by NULLing out
vmspace->vm_shm prior to calling shm_delete_mapping() and free().

MFC after:	7 days
2003-01-13 23:04:32 +00:00
Alan Cox
a6864937e2 Lock the vm object when performing vm_object_clear_flag(). 2003-01-03 09:15:43 +00:00
Alan Cox
36daaecd04 Implement a variant locking scheme for vm maps: Access to system maps
is now synchronized by a mutex, whereas access to user maps is still
synchronized by a lockmgr()-based lock.  Why?  No single type of lock,
including sx locks, meets the requirements of both types of vm map.
Sometimes we sleep while holding the lock on a user map.  Thus, a
a mutex isn't appropriate.  On the other hand, both lockmgr()-based
and sx locks release Giant when a thread/process blocks during
contention for a lock.  This could lead to a race condition in a legacy
driver (that relies on Giant for synchronization) if it attempts to
kmem_malloc() and fails to immediately obtain the lock.  Fortunately,
we never sleep while holding a system map lock.
2002-12-31 19:38:04 +00:00
Alan Cox
3a92e5d5e9 - Increment the vm_map's timestamp if _vm_map_trylock() succeeds.
- Introduce map_sleep_mtx and use it to replace Giant in
   vm_map_unlock_and_wait() and vm_map_wakeup().  (Original
   version by: tegge.)
2002-12-30 00:41:33 +00:00
Alan Cox
e3a9e1b2a8 - Remove vm_object_init2(). It is unused.
- Add a mtx_destroy() to vm_object_collapse().  (This allows a bzero()
   to migrate from _vm_object_allocate() to vm_object_zinit(), where it
   will be performed less often.)
2002-12-29 21:01:14 +00:00
Matthew Dillon
389d2b6e21 Fix a refcount race with the vmspace structure. In order to prevent
resource starvation we clean-up as much of the vmspace structure as we
can when the last process using it exits.  The rest of the structure
is cleaned up when it is reaped.  But since exit1() decrements the ref
count it is possible for a double-free to occur if someone else, such as
the process swapout code, references and then dereferences the structure.
Additionally, the final cleanup of the structure should not occur until
the last process referencing it is reaped.

This commit solves the problem by introducing a secondary reference count,
calling 'vm_exitingcnt'.  The normal reference count is decremented on exit
and vm_exitingcnt is incremented.  vm_exitingcnt is decremented when the
process is reaped.  When both vm_exitingcnt and vm_refcnt are 0, the
structure is freed for real.

MFC after:	3 weeks
2002-12-15 18:50:04 +00:00
Alan Cox
5e83956af5 Perform vm_object_lock() and vm_object_unlock() around
vm_object_page_remove().
2002-12-15 07:16:51 +00:00
Alan Cox
bc105a6797 Hold the page queues lock when calling pmap_protect(); it updates fields
of the vm_page structure.  Make the style of the pmap_protect() calls
consistent.

Approved by:	re (blanket)
2002-12-01 18:57:56 +00:00
Alan Cox
85e03a7e1e Acquire and release the page queues lock around calls to pmap_protect()
because it updates flags within the vm page.

Approved by:	re (blanket)
2002-11-25 22:00:31 +00:00
Alan Cox
f6116791a2 Fix an error case in vm_map_wire(): unwiring of an entry during cleanup
after a user wire error fails when the entry is already system wired.

Reported by:	tegge
2002-11-09 21:26:49 +00:00
Maxime Henrion
cd034a5be9 Correctly print vm_offset_t types. 2002-11-07 22:49:07 +00:00
Poul-Henning Kamp
af045176d1 Properly put macro args in ().
Spotted by:	FlexeLint.
2002-10-16 10:52:15 +00:00
Matthew N. Dodd
4a2eca23ca Modify vm_map_clean() (and thus the msync(2) system call) to support
invalidation of cached pages for objects of type OBJT_DEVICE.

Submitted by:	Christian Zander <zander@minion.de>
Approved by:	alc
2002-09-22 08:22:32 +00:00
Jake Burkholder
05ba50f522 Use the fields in the sysentvec and in the vm map header in place of the
constants VM_MIN_ADDRESS, VM_MAXUSER_ADDRESS, USRSTACK and PS_STRINGS.
This is mainly so that they can be variable even for the native abi, based
on different machine types.  Get stack protections from the sysentvec too.
This makes it trivial to map the stack non-executable for certain abis, on
machines that support it.
2002-09-21 22:07:17 +00:00
Alan Cox
4eaa117956 o Use vm_object_lock() in place of Giant when manipulating a vm object
in vm_map_insert().
2002-08-24 17:52:08 +00:00
Alan Cox
ef594d3186 o Merge vm_fault_wire() and vm_fault_user_wire() by adding a new parameter,
user_wire.
2002-07-24 19:47:56 +00:00
Peter Wemm
3ebc124838 Infrastructure tweaks to allow having both an Elf32 and an Elf64 executable
handler in the kernel at the same time.  Also, allow for the
exec_new_vmspace() code to build a different sized vmspace depending on
the executable environment.  This is a big help for execing i386 binaries
on ia64.   The ELF exec code grows the ability to map partial pages when
there is a page size difference, eg: emulating 4K pages on 8K or 16K
hardware pages.

Flesh out the i386 emulation support for ia64.  At this point, the only
binary that I know of that fails is cvsup, because the cvsup runtime
tries to execute code in pages not marked executable.

Obtained from:  dfr (mostly, many tweaks from me).
2002-07-20 02:56:12 +00:00
Peter Wemm
9e7c1bce60 (VM_MAX_KERNEL_ADDRESS - KERNBASE) / PAGE_SIZE may not fit in an integer.
Use lmin(long, long), not min(u_int, u_int).  This is a problem here on
ia64 which has *way* more than 2^32 pages of KVA.  281474976710655 pages
to be precice.
2002-07-18 10:28:00 +00:00
Alan Cox
93bc4879e6 o Assert GIANT_REQUIRED on system maps in _vm_map_lock(),
_vm_map_lock_read(), and _vm_map_trylock().  Submitted by: tegge
 o Remove GIANT_REQUIRED from kmem_alloc_wait() and kmem_free_wakeup().
   (This clears the way for exec_map accesses to move outside of Giant.
   The exec_map is not a system map.)
 o Remove some premature MPSAFE comments.

Reviewed by:	tegge
2002-07-12 23:20:06 +00:00
Alan Cox
9688f93163 o Add a "needs wakeup" flag to the vm_map for use by kmem_alloc_wait()
and kmem_free_wakeup().  Previously, kmem_free_wakeup() always
   called wakeup().  In general, no one was sleeping.
 o Export vm_map_unlock_and_wait() and vm_map_wakeup() from vm_map.c
   for use in vm_kern.c.
2002-07-11 02:39:24 +00:00
Alan Cox
22a97b04de o Make the reservation of KVA space for kernel map entries a function
of the KVA space's size in addition to the amount of physical memory
   and reduce it by a factor of two.

Under the old formula, our reservation amounted to one kernel map entry
per virtual page in the KVA space on a 4GB i386.
2002-07-03 19:16:37 +00:00
Ian Dowse
23f09d50bb Avoid using the 64-bit vm_pindex_t in a few places where 64-bit
types are not required, as the overhead is unnecessary:

 o In the i386 pmap_protect(), `sindex' and `eindex' represent page
   indices within the 32-bit virtual address space.
 o In swp_pager_meta_build() and swp_pager_meta_ctl(), use a temporary
   variable to store the low few bits of a vm_pindex_t that gets used
   as an array index.
 o vm_uiomove() uses `osize' and `idx' for page offsets within a
   map entry.
 o In vm_object_split(), `idx' is a page offset within a map entry.
2002-06-26 20:32:51 +00:00
Matthew Dillon
a69ac1740f Enforce RLIMIT_VMEM on growable mappings (aka the primary stack or any
MAP_STACK mapping).

Suggested by:	alc
2002-06-26 03:13:46 +00:00
Alan Cox
409748276e o In vm_map_insert(), replace GIANT_REQUIRED by the acquisition and
release of Giant around the direct manipulation of the vm_object and
   the optional call to pmap_object_init_pt().
 o In vm_map_findspace(), remove GIANT_REQUIRED.  Instead, acquire and
   release Giant around the occasional call to pmap_growkernel().
 o In vm_map_find(), remove GIANT_REQUIRED.
2002-06-22 17:47:12 +00:00
Alan Cox
27168693db o Remove GIANT_REQUIRED from vm_map_stack(). 2002-06-21 06:03:47 +00:00
Alan Cox
00e1854a1f o Replace GIANT_REQUIRED in vm_object_coalesce() by the acquisition and
release of Giant.
 o Reduce the scope of GIANT_REQUIRED in vm_map_insert().

These changes will enable us to remove the acquisition and release
of Giant from obreak().
2002-06-19 06:02:03 +00:00
Alan Cox
515630b12f o Remove LK_CANRECURSE from the vm_map lock. 2002-06-18 18:31:35 +00:00
Jeff Roberson
18aa2de5a7 - Introduce the new M_NOVM option which tells uma to only check the currently
allocated slabs and bucket caches for free items.  It will not go ask the vm
  for pages.  This differs from M_NOWAIT in that it not only doesn't block, it
  doesn't even ask.

- Add a new zcreate option ZONE_VM, that sets the BUCKETCACHE zflag.  This
  tells uma that it should only allocate buckets out of the bucket cache, and
  not from the VM.  It does this by using the M_NOVM option to zalloc when
  getting a new bucket.  This is so that the VM doesn't recursively enter
  itself while trying to allocate buckets for vm_map_entry zones.  If there
  are already allocated buckets when we get here we'll still use them but
  otherwise we'll skip it.

- Use the ZONE_VM flag on vm map entries and pv entries on x86.
2002-06-17 22:02:41 +00:00
Alan Cox
b49ecb86d0 o Acquire and release Giant in vm_map_wakeup() to prevent
a lost wakeup().

Reviewed by:	tegge
2002-06-17 13:27:40 +00:00
Alan Cox
1d7cf06c8c o Use vm_map_wire() and vm_map_unwire() in place of vm_map_pageable() and
vm_map_user_pageable().
 o Remove vm_map_pageable() and vm_map_user_pageable().
 o Remove vm_map_clear_recursive() and vm_map_set_recursive().  (They were
   only used by vm_map_pageable() and vm_map_user_pageable().)

Reviewed by:	tegge
2002-06-14 18:21:01 +00:00
Alan Cox
d46e7d6bee o Acquire and release Giant in vm_map_unlock_and_wait().
Submitted by:	tegge
2002-06-12 08:15:52 +00:00
Alan Cox
28c58286ef o Properly handle a failure by vm_fault_wire() or vm_fault_user_wire()
in vm_map_wire().
 o Make two white-space changes in vm_map_wire().

Reviewed by:	tegge
2002-06-11 19:13:59 +00:00
Alan Cox
73b2bace26 o Teach vm_map_delete() to respect the "in-transition" flag
on a vm_map_entry by sleeping until the flag is cleared.

Submitted by:	tegge
2002-06-11 05:24:22 +00:00
Alan Cox
2b4a2c272d o In vm_map_entry_create(), call uma_zalloc() with M_NOWAIT on system maps.
Submitted by: tegge
 o Eliminate the "!mapentzone" check from vm_map_entry_create() and
   vm_map_entry_dispose().  Reviewed by: tegge
 o Fix white-space usage in vm_map_entry_create().
2002-06-10 06:11:45 +00:00
Alan Cox
12d7cc840f o Add vm_map_wire() for wiring contiguous regions of either kernel
or user vm_maps.  This implementation has two key benefits when compared
   to vm_map_{user_,}pageable(): (1) it avoids a race condition through
   the use of "in-transition" vm_map entries and (2) it eliminates lock
   recursion on the vm_map.

Note: there is still an error case that requires clean up.

Reviewed by:	tegge
2002-06-09 20:25:18 +00:00
Alan Cox
b2f3846aef o Simplify vm_map_unwire() by merging the second and third passes
over the caller-specified region.
2002-06-08 19:00:40 +00:00
Alan Cox
e27e17b711 o Remove an unnecessary call to vm_map_wakeup() from vm_map_unwire().
o Add a stub for vm_map_wire().

Note: the description of the previous commit had an error.  The in-
transition flag actually blocks the deallocation of a vm_map_entry by
vm_map_delete() and vm_map_simplify_entry().
2002-06-08 07:32:38 +00:00
Alan Cox
acd9a301ec o Add vm_map_unwire() for unwiring contiguous regions of either kernel
or user vm_maps.  In accordance with the standards for munlock(2),
   and in contrast to vm_map_user_pageable(), this implementation does not
   allow holes in the specified region.  This implementation uses the
   "in transition" flag described below.
 o Introduce a new flag, "in transition," to the vm_map_entry.
   Eventually, vm_map_delete() and vm_map_simplify_entry() will respect
   this flag by deallocating in-transition vm_map_entrys, allowing
   the vm_map lock to be safely released in vm_map_unwire() and (the
   forthcoming) vm_map_wire().
 o Modify vm_map_simplify_entry() to respect the in-transition flag.

In collaboration with:	tegge
2002-06-07 18:34:23 +00:00
Alan Cox
c5aaa06ded o Migrate vm_map_split() from vm_map.c to vm_object.c, renaming it
to vm_object_split().  Its interface should still be changed
   to resemble vm_object_shadow().
2002-06-02 23:54:09 +00:00
Alan Cox
0d78c0dce2 o Style fixes to vm_map_split(), including the elimination of one variable
declaration that shadows another.

Note: This function should really be vm_object_split(), not vm_map_split().

Reviewed by:	md5
2002-06-02 19:32:05 +00:00
Alan Cox
61c075b67f o Remove GIANT_REQUIRED from vm_map_zfini(), vm_map_zinit(),
vm_map_create(), and vm_map_submap().
 o Make further use of a local variable in vm_map_entry_splay()
   that caches a reference to one of a vm_map_entry's children.
   (This reduces code size somewhat.)
 o Revert a part of revision 1.66, deinlining vmspace_pmap().
   (This function is MPSAFE.)
2002-06-01 22:41:43 +00:00
Alan Cox
794316a866 o Revert a part of revision 1.66, contrary to what that commit message says,
deinlining vm_map_entry_behavior() and vm_map_entry_set_behavior()
   actually increases the kernel's size.
 o Make vm_map_entry_set_behavior() static and add a comment describing
   its purpose.
 o Remove an unnecessary initialization statement from vm_map_entry_splay().
2002-06-01 16:59:30 +00:00
Alan Cox
9917e01041 Further work on pushing Giant out of the vm_map layer and down
into the vm_object layer:
 o Acquire and release Giant in vm_object_shadow() and
   vm_object_page_remove().
 o Remove the GIANT_REQUIRED assertion preceding vm_map_delete()'s call
   to vm_object_page_remove().
 o Remove the acquisition and release of Giant around vm_map_lookup()'s
   call to vm_object_shadow().
2002-05-31 03:48:55 +00:00
Alan Cox
4b9fdc2bce o Acquire and release Giant around pmap operations in vm_fault_unwire()
and vm_map_delete().  Assert GIANT_REQUIRED in vm_map_delete()
   only if operating on the kernel_object or the kmem_object.
 o Remove GIANT_REQUIRED from vm_map_remove().
 o Remove the acquisition and release of Giant from munmap().
2002-05-26 04:54:56 +00:00
Alan Cox
4e94f40222 o Replace the vm_map's hint by the root of a splay tree. By design,
the last accessed datum is moved to the root of the splay tree.
   Therefore, on lookups in which the hint resulted in O(1) access,
   the splay tree still achieves O(1) access.  In contrast, on lookups
   in which the hint failed miserably, the splay tree achieves amortized
   logarithmic complexity, resulting in dramatic improvements on vm_maps
   with a large number of entries.  For example, the execution time
   for replaying an access log from www.cs.rice.edu against the thttpd
   web server was reduced by 23.5% due to the large number of files
   simultaneously mmap()ed by this server.  (The machine in question has
   enough memory to cache most of this workload.)

   Nothing comes for free: At present, I see a 0.2% slowdown on "buildworld"
   due to the overhead of maintaining the splay tree.  I believe that
   some or all of this can be eliminated through optimizations
   to the code.

Developed in collaboration with: Juan E Navarro <jnavarro@cs.rice.edu>
Reviewed by:	jeff
2002-05-24 01:33:24 +00:00
Alan Cox
094f6d2694 o Remove GIANT_REQUIRED from vm_map_madvise(). Instead, acquire and
release Giant around vm_map_madvise()'s call to pmap_object_init_pt().
 o Replace GIANT_REQUIRED in vm_object_madvise() with the acquisition
   and release of Giant.
 o Remove the acquisition and release of Giant from madvise().
2002-05-18 07:48:06 +00:00
Alan Cox
a47335fdb4 o Remove GIANT_REQUIRED and an excessive number of blank lines
from vm_map_inherit().  (minherit() need not acquire Giant
   anymore.)
2002-05-12 18:42:05 +00:00
Alan Cox
47c3ccc467 o Acquire and release Giant in vm_object_reference() and
vm_object_deallocate(), replacing the assertion GIANT_REQUIRED.
 o Remove GIANT_REQUIRED from vm_map_protect() and vm_map_simplify_entry().
 o Acquire and release Giant around vm_map_protect()'s call to pmap_protect().

Altogether, these changes eliminate the need for mprotect() to acquire
and release Giant.
2002-05-12 05:22:56 +00:00
Alan Cox
e86256c1f4 o Move vm_freeze_copyopts() from vm_map.{c.h} to vm_object.{c,h}. It's plainly
an operation on a vm_object and belongs in the latter place.
2002-05-06 00:12:47 +00:00
Alan Cox
c50fe92b8d o Condition the compilation of uiomoveco() and vm_uiomove()
on ENABLE_VFS_IOOPT.
 o Add a comment to the effect that this code is experimental
   support for zero-copy I/O.
2002-05-05 22:42:40 +00:00
Alan Cox
15fdd586e3 o Remove GIANT_REQUIRED from vm_map_lookup() and vm_map_lookup_done().
o Acquire and release Giant around vm_map_lookup()'s call
   to vm_object_shadow().
2002-05-05 05:36:28 +00:00
Alan Cox
8c5c5d049f o Remove GIANT_REQUIRED from vm_map_lookup_entry() and
vm_map_check_protection().
 o Call vm_map_check_protection() without Giant held in munmap().
2002-05-04 02:07:36 +00:00
Alan Cox
bc91c5107a o Change the implementation of vm_map locking to use exclusive locks
exclusively.  The interface still, however, distinguishes
   between a shared lock and an exclusive lock.
2002-05-02 17:32:27 +00:00
Alan Cox
569687d02f o Remove dead and lockmgr()-specific debugging code. 2002-05-02 02:32:09 +00:00
Jeff Roberson
28bc44195c Add a new zone flag UMA_ZONE_MTXCLASS. This puts the zone in it's own
mutex class.  Currently this is only used for kmapentzone because kmapents
are are potentially allocated when freeing memory.  This is not dangerous
though because no other allocations will be done while holding the
kmapentzone lock.
2002-04-29 23:45:41 +00:00
Alan Cox
780b1c0997 Pass the caller's file name and line number to the vm_map locking functions. 2002-04-28 23:12:52 +00:00
Alan Cox
d974f03c69 o Introduce and use vm_map_trylock() to replace several direct uses
of lockmgr().
 o Add missing synchronization to vmspace_swap_count(): Obtain a read lock
   on the vm_map before traversing it.
2002-04-28 06:07:54 +00:00
Alan Cox
089b073345 o Begin documenting the (existing) locking protocol on the vm_map
in the same style as sys/proc.h.
 o Undo the de-inlining of several trivial, MPSAFE methods on the vm_map.
   (Contrary to the commit message for vm_map.h revision 1.66 and vm_map.c
   revision 1.206, de-inlining these methods increased the kernel's size.)
2002-04-27 22:01:37 +00:00
Peter Wemm
334f706177 Do not free the vmspace until p->p_vmspace is set to null. Otherwise
statclock can access it in the tail end of statclock_process() at an
unfortunate time.  This bit me several times on an SMP alpha (UP2000)
and the problem went away with this change.  I'm not sure why it doesn't
break x86 as well.  Maybe it's because the clocks are much faster
on alpha (HZ=1024 by default).
2002-04-17 05:26:42 +00:00
Peter Wemm
1a87a0da66 Pass vm_page_t instead of physical addresses to pmap_zero_page[_area]()
and pmap_copy_page().  This gets rid of a couple more physical addresses
in upper layers, with the eventual aim of supporting PAE and dealing with
the physical addressing mostly within pmap.  (We will need either 64 bit
physical addresses or page indexes, possibly both depending on the
circumstances.  Leaving this to pmap itself gives more flexibilitly.)

Reviewed by:	jake
Tested on:	i386, ia64 and (I believe) sparc64. (my alpha was hosed)
2002-04-15 16:00:03 +00:00
Jeff Roberson
670d17b5c0 Remove references to vm_zone.h and switch over to the new uma API. 2002-03-20 04:02:59 +00:00
Jeff Roberson
9eb6e51923 Quit a warning introduced by UMA. This only occurs on machines where
vm_size_t != unsigned long.

Reviewed by:	phk
2002-03-19 11:49:10 +00:00
Jeff Roberson
8355f576a9 This is the first part of the new kernel memory allocator. This replaces
malloc(9) and vm_zone with a slab like allocator.

Reviewed by:	arch@
2002-03-19 09:11:49 +00:00
Brian Feldman
25adb370be Back out the modification of vm_map locks from lockmgr to sx locks. The
best path forward now is likely to change the lockmgr locks to simple
sleep mutexes, then see if any extra contention it generates is greater
than removed overhead of managing local locking state information,
cost of extra calls into lockmgr, etc.

Additionally, making the vm_map lock a mutex and respecting it properly
will put us much closer to not needing Giant magic in vm.
2002-03-18 15:08:09 +00:00
Alan Cox
2f6c16e1e8 Acquire a read lock on the map inside of vm_map_check_protection() rather
than expecting the caller to do so.  This (1) eliminates duplicated code in
kernacc() and useracc() and (2) fixes missing synchronization in munmap().
2002-03-17 03:19:31 +00:00
Brian Feldman
0e0af8ecda Rename SI_SUB_MUTEX to SI_SUB_MTX_POOL to make the name at all accurate.
While doing this, move it earlier in the sysinit boot process so that the
VM system can use it.

After that, the system is now able to use sx locks instead of lockmgr
locks in the VM system.  To accomplish this, some of the more
questionable uses of the locks (such as testing whether they are
owned or not, as well as allowing shared+exclusive recursion) are
removed, and simpler logic throughout is used so locks should also be
easier to understand.

This has been tested on my laptop for months, and has not shown any
problems on SMP systems, either, so appears quite safe.  One more
user of lockmgr down, many more to go :)
2002-03-13 23:48:08 +00:00
Eivind Eklund
a128794977 - Remove a number of extra newlines that do not belong here according to
style(9)
- Minor space adjustment in cases where we have "( ", " )", if(), return(),
  while(), for(), etc.
- Add /* SYMBOL */ after a few #endifs.

Reviewed by:	alc
2002-03-10 21:52:48 +00:00
Matthew Dillon
8c5dffe8ca Fix a bug in the vm_map_clean() procedure. msync()ing an area of memory
that has just been mapped MAP_ANON|MAP_NOSYNC and has not yet been accessed
will panic the machine.

MFC after:	1 day
2002-03-07 03:54:56 +00:00
Alfred Perlstein
582ec34cd8 Fix a race with free'ing vmspaces at process exit when vmspaces are
shared.

Also introduce vm_endcopy instead of using pointer tricks when
initializing new vmspaces.

The race occured because of how the reference was utilized:
  test vmspace reference,
  possibly block,
  decrement reference

When sharing a vmspace between multiple processes it was possible
for two processes exiting at the same time to test the reference
count, possibly block and neither one free because they wouldn't
see the other's update.

Submitted by: green
2002-02-05 21:23:05 +00:00
Matthew Dillon
e302698320 Don't let pmap_object_init_pt() exhaust all available free pages
(allocating pv entries w/ zalloci) when called in a loop due to
an madvise().  It is possible to completely exhaust the free page list and
cause a system panic when an expected allocation fails.
2001-10-31 03:06:33 +00:00
Tor Egge
e7673b8424 Fix locking violations during page wiring:
- vm map entries are not valid after the map has been unlocked.

 - An exclusive lock on the map is needed before calling
   vm_map_simplify_entry().

Fix cleanup after page wiring failure to unwire all pages that had been
successfully wired before the failure was detected.

Reviewed by:	dillon
2001-10-14 20:47:08 +00:00
John Baldwin
61d80e90a9 Add missing includes of sys/ktr.h. 2001-10-11 17:53:43 +00:00
Paul Saab
cbc89bfbfe Make MAXTSIZ, DFLDSIZ, MAXDSIZ, DFLSSIZ, MAXSSIZ, SGROWSIZ loader
tunable.

Reviewed by:	peter
MFC after:	2 weeks
2001-10-10 23:06:54 +00:00
Julian Elischer
b40ce4165d KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after:    ha ha ha ha
2001-09-12 08:38:13 +00:00
Matthew Dillon
1b40f8c036 Change inlines back into mainline code in preparation for mutexing. Also,
most of these inlines had been bloated in -current far beyond their
original intent.  Normalize prototypes and function declarations to be ANSI
only (half already were).  And do some general cleanup.

(kernel size also reduced by 50-100K, but that isn't the prime intent)
2001-07-04 20:15:18 +00:00
Matthew Dillon
0cddd8f023 With Alfred's permission, remove vm_mtx in favor of a fine-grained approach
(this commit is just the first stage).  Also add various GIANT_ macros to
formalize the removal of Giant, making it easy to test in a more piecemeal
fashion. These macros will allow us to test fine-grained locks to a degree
before removing Giant, and also after, and to remove Giant in a piecemeal
fashion via sysctl's on those subsystems which the authors believe can
operate without Giant.
2001-07-04 16:20:28 +00:00
Bosko Milekic
08442f8a82 Introduce numerous SMP friendly changes to the mbuf allocator. Namely,
introduce a modified allocation mechanism for mbufs and mbuf clusters; one
which can scale under SMP and which offers the possibility of resource
reclamation to be implemented in the future. Notable advantages:

 o Reduce contention for SMP by offering per-CPU pools and locks.
 o Better use of data cache due to per-CPU pools.
 o Much less code cache pollution due to excessively large allocation macros.
 o Framework for `grouping' objects from same page together so as to be able
   to possibly free wired-down pages back to the system if they are no longer
   needed by the network stacks.

 Additional things changed with this addition:

  - Moved some mbuf specific declarations and initializations from
    sys/conf/param.c into mbuf-specific code where they belong.
  - m_getclr() has been renamed to m_get_clrd() because the old name is really
    confusing. m_getclr() HAS been preserved though and is defined to the new
    name. No tree sweep has been done "to change the interface," as the old
    name will continue to be supported and is not depracated. The change was
    merely done because m_getclr() sounds too much like "m_get a cluster."
  - TEMPORARILY disabled mbtypes statistics displaying in netstat(1) and
    systat(1) (see TODO below).
  - Fixed systat(1) to display number of "free mbufs" based on new per-CPU
    stat structures.
  - Fixed netstat(1) to display new per-CPU stats based on sysctl-exported
    per-CPU stat structures. All infos are fetched via sysctl.

 TODO (in order of priority):

  - Re-enable mbtypes statistics in both netstat(1) and systat(1) after
    introducing an SMP friendly way to collect the mbtypes stats under the
    already introduced per-CPU locks (i.e. hopefully don't use atomic() - it
    seems too costly for a mere stat update, especially when other locks are
    already present).
  - Optionally have systat(1) display not only "total free mbufs" but also
    "total free mbufs per CPU pool."
  - Fix minor length-fetching issues in netstat(1) related to recently
    re-enabled option to read mbuf stats from a core file.
  - Move reference counters at least for mbuf clusters into an unused portion
    of the cluster itself, to save space and need to allocate a counter.
  - Look into introducing resource freeing possibly from a kproc.

Reviewed by (in parts): jlemon, jake, silby, terry
Tested by: jlemon (Intel & Alpha), mjacob (Intel & Alpha)
Preliminary performance measurements: jlemon (and me, obviously)
URL: http://people.freebsd.org/~bmilekic/mb_alloc/
2001-06-22 06:35:32 +00:00
Matthew Dillon
ef6a93ef81 Cleanup the tabbing 2001-06-11 19:17:05 +00:00
Matthew Dillon
ff2b5645b5 Two fixes to the out-of-swap process termination code. First, start killing
processes a little earlier to avoid a deadlock.  Second, when calculating
the 'largest process' do not just count RSS.  Instead count the RSS + SWAP
used by the process.  Without this the code tended to kill small
inconsequential processes like, oh, sshd, rather then one of the many
'eatmem 200MB' I run on a whim :-).  This fix has been extensively tested on
-stable and somewhat tested on -current and will be MFCd in a few days.

Shamed into fixing this by: ps
2001-06-09 18:06:58 +00:00
John Baldwin
21c641b2a9 - Add lots of vm_mtx assertions.
- Add a few KTR tracepoints to track the addition and removal of
  vm_map_entry's and the creation adn free'ing of vmspace's.
- Adjust a few portions of code so that we update the process' vmspace
  pointer to its new vmspace before freeing the old vmspace.
2001-05-23 22:38:00 +00:00
Alfred Perlstein
2395531439 Introduce a global lock for the vm subsystem (vm_mtx).
vm_mtx does not recurse and is required for most low level
vm operations.

faults can not be taken without holding Giant.

Memory subsystems can now call the base page allocators safely.

Almost all atomic ops were removed as they are covered under the
vm mutex.

Alpha and ia64 now need to catch up to i386's trap handlers.

FFS and NFS have been tested, other filesystems will need minor
changes (grabbing the vm lock when twiddling page properties).

Reviewed (partially) by: jake, jhb
2001-05-19 01:28:09 +00:00
Mark Murray
fb919e4d5a Undo part of the tangle of having sys/lock.h and sys/mutex.h included in
other "system" header files.

Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.

Sort sys/*.h includes where possible in affected files.

OK'ed by:	bde (with reservations)
2001-05-01 08:13:21 +00:00
Alfred Perlstein
b28cb1ca07 remove truncated part from commment 2001-04-12 21:50:03 +00:00
Matthew Dillon
b823bbd6be Fix a lock reversal problem in the VM subsystem related to threaded
programs.   There is a case during a fork() which can cause a deadlock.

From Tor -
The workaround that consists of setting a flag in the vm map that
indicates that a fork is in progress and using that mark in the page
fault handling to force a revalidation failure.  That change will only
affect (pessimize) page fault handling during fork for threaded
(linuxthreads style) applications and applications using aio_*().

Submited by: tegge
2001-03-14 06:48:53 +00:00
Matthew Dillon
1a484d28dd Temporarily remove the vm_map_simplify() call from vm_map_insert(). The
call is correct, but it interferes with the massive hack called
vm_map_growstack().  The call will be returned after our stack handling
code is fixed.

Reported by: tegge
2001-03-14 06:09:42 +00:00
Ian Dowse
d30344bdfa When creating a shadow vm_object in vmspace_fork(), only one
reference count was transferred to the new object, but both the
new and the old map entries had pointers to the new object.
Correct this by transferring the second reference.

This fixes a panic that can occur when mmap(2) is used with the
MAP_INHERIT flag.

PR:		i386/25603
Reviewed by:	dillon, alc
2001-03-09 18:25:54 +00:00
Matthew Dillon
4e71e795a1 This commit represents work mainly submitted by Tor and slightly modified
by myself.  It solves a serious vm_map corruption problem that can occur
with the buffer cache when block sizes > 64K are used.  This code has been
heavily tested in -stable but only tested somewhat on -current.  An MFC
will occur in a few days.  My additions include the vm_map_simplify_entry()
and minor buffer cache boundry case fix.

Make the buffer cache use a system map for buffer cache KVM rather then a
normal map.

Ensure that VM objects are not allocated for system maps.  There were cases
where a buffer map could wind up with a backing VM object -- normally
harmless, but this could also result in the buffer cache blocking in places
where it assumes no blocking will occur, possibly resulting in corrupted
maps.

Fix a minor boundry case in the buffer cache size limit is reached that
could result in non-optimal code.

Add vm_map_simplify_entry() calls to prevent 'creeping proliferation'
of vm_map_entry's in the buffer cache's vm_map.  Previously only a simple
linear optimization was made.  (The buffer vm_map typically has only a
handful of vm_map_entry's.  This stabilizes it at that level permanently).

PR: 20609
Submitted by: (Tor Egge) tegge
2001-02-04 06:19:28 +00:00
Seigo Tanimura
21cd6e6232 - If swap metadata does not fit into the KVM, reduce the number of
struct swblock entries by dividing the number of the entries by 2
until the swap metadata fits.

- Reject swapon(2) upon failure of swap_zone allocation.

This is just a temporary fix. Better solutions include:
(suggested by:	dillon)

o reserving swap in SWAP_META_PAGES chunks, and
o swapping the swblock structures themselves.

Reviewed by:	alfred, dillon
2000-12-13 10:01:00 +00:00
Tor Egge
028fe6ec24 Clear the MAP_ENTRY_USER_WIRED flag from cloned vm_map entries.
PR:		2840
2000-11-02 21:38:18 +00:00
Jason Evans
a18b1f1d4d Convert lockmgr locks from using simple locks to using mutexes.
Add lockdestroy() and appropriate invocations, which corresponds to
lockinit() and must be called to clean up after a lockmgr lock is no
longer needed.
2000-10-04 01:29:17 +00:00
Matthew Dillon
5f99b57c5d Fixed bug in madvise() / MADV_WILLNEED. When the request is offset
from the base of the first map_entry the call to pmap_object_init_pt()
    uses the wrong start VA.  MFC to follow.

PR: i386/18095
2000-05-14 18:46:40 +00:00
Philippe Charnier
5929bcfaba Revert spelling mistake I made in the previous commit
Requested by: Alan and Bruce
2000-03-27 20:41:17 +00:00
Philippe Charnier
956f31353c Spelling 2000-03-26 15:20:23 +00:00
Paul Saab
9730a5daab Add MAP_NOCORE to mmap(2), and MADV_NOCORE and MADV_CORE to madvise(2).
This
This feature allows you to specify if mmap'd data is included in
an application's corefile.

Change the type of eflags in struct vm_map_entry from u_char to
vm_eflags_t (an unsigned int).

Reviewed by:	dillon,jdp,alfred
Approved by:	jkh
2000-02-28 04:10:35 +00:00
Matthew Dillon
1f6889a1eb Fix null-pointer dereference crash when the system is intentionally
run out of KVM through a mmap()/fork() bomb that allocates hundreds
    of thousands of vm_map_entry structures.

    Add panic to make null-pointer dereference crash a little more verbose.

    Add a new sysctl, vm.max_proc_mmap, which specifies the maximum number
    of mmap()'d spaces (discrete vm_map_entry's in the process).  The value
    defaults to around 9000 for a 128MB machine.  The test is scaled for the
    number of processes sharing a vmspace (aka linux threads).  Setting
    the value to 0 disables the feature.

PR: kern/16573
Approved by: jkh
2000-02-16 21:11:33 +00:00
Matthew Dillon
ff359f84c9 Fix a deadlock between msync(..., MS_INVALIDATE) and vm_fault. The
invalidation code cannot wait for paging to complete while holding a
    vnode lock, so we don't wait.  Instead we simply allow the lower level
    code to simply block on any busy pages it encounters.  I think Yahoo
    may be the only entity in the entire world that actually uses this
    msync feature :-).

Bug reported by:  Paul Saab <paul@mu.org>
2000-01-21 20:17:01 +00:00
Matthew Dillon
4f79d873c1 Add MAP_NOSYNC feature to mmap(), and MADV_NOSYNC and MADV_AUTOSYNC to
madvise().

    This feature prevents the update daemon from gratuitously flushing
    dirty pages associated with a mapped file-backed region of memory.  The
    system pager will still page the memory as necessary and the VM system
    will still be fully coherent with the filesystem.  Modifications made
    by other means to the same area of memory, for example by write(), are
    unaffected.  The feature works on a page-granularity basis.

    MAP_NOSYNC allows one to use mmap() to share memory between processes
    without incuring any significant filesystem overhead, putting it in
    the same performance category as SysV Shared memory and anonymous memory.

Reviewed by: julian, alc, dg
1999-12-12 03:19:33 +00:00
Alan Cox
2b71c841f5 Remove nonsensical vm_map_{clear,set}_recursive() calls
from vm_map_pageable().  At the point they called, vm_map_pageable()
holds a read (or shared) lock on the map.  The purpose
of vm_map_{clear,set}_recursive() is to disable/enable repeated
write (or exclusive) lock requests by the same process.
1999-11-25 20:21:52 +00:00
Alan Cox
2ed14a92db Correct the following error: vm_map_pageable() on a COW'ed (post-fork)
vm_map always failed because vm_map_lookup() looked at
 "vm_map_entry->wired_count" instead of "(vm_map_entry->eflags &
 MAP_ENTRY_USER_WIRED)".  The effect was that many page
 wiring operations by sysctl were (silently) failing.
1999-11-23 06:51:28 +00:00
Alan Cox
79e1e3b9b4 Remove unused #include's.
Submitted by:	phk
1999-11-07 20:03:54 +00:00
Alan Cox
1ab41ed97c The functions declared by this header file no longer exist.
Submitted by:	phk (in part)
1999-11-07 06:46:48 +00:00
Poul-Henning Kamp
923502ff91 useracc() the prequel:
Merge the contents (less some trivial bordering the silly comments)
of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>.  This puts
the #defines for the vm_inherit_t and vm_prot_t types next to their
typedefs.

This paves the road for the commit to follow shortly: change
useracc() to use VM_PROT_{READ|WRITE} rather than B_{READ|WRITE}
as argument.
1999-10-29 18:09:36 +00:00
Matthew Dillon
b430905573 cleanup madvise code, add a few more sanity checks.
Reviewed by:	Alan Cox <alc@cs.rice.edu>,  dg@root.com
1999-09-21 05:00:48 +00:00
Peter Wemm
c3aac50f28 $Id$ -> $FreeBSD$ 1999-08-28 01:08:13 +00:00
Alan Cox
f7fc307ade vm_map_madvise:
A complete rewrite by dillon and myself to separate
	the implementation of behaviors that effect the vm_map_entry
	from those that effect the vm_object.

	A result of this change is that madvise(..., MADV_FREE);
	is much cheaper.
1999-08-13 17:45:34 +00:00
Alan Cox
5abfdd1eef vm_map_madvise:
Now that behaviors are stored in the vm_map_entry rather than
	the vm_object, it's no longer necessary to instantiate a vm_object
	just to hold the behavior.

Reviewed by:	dillon
1999-08-10 04:50:20 +00:00
Alan Cox
7f866e4b29 Move the memory access behavior information provided by madvise
from the vm_object to the vm_map.

Submitted by:	dillon
1999-08-01 06:05:09 +00:00
Alan Cox
d4da2dbae6 Fix the following problem:
When creating new processes (or performing exec), the new page
directory is initialized too early.  The kernel might grow before
p_vmspace is initialized for the new process.  Since pmap_growkernel
doesn't yet know about the new page directory, it isn't updated, and
subsequent use causes a failure.

The fix is (1) to clear p_vmspace early, to stop pmap_growkernel
from stomping on memory, and (2) to defer part of the initialization
of new page directories until p_vmspace is initialized.

PR:		kern/12378
Submitted by:	tegge
Reviewed by:	dfr
1999-07-21 18:02:27 +00:00
Alan Cox
32b76dfa8a Cleanup OBJ_ONEMAPPING management.
vm_map.c:
	Don't set OBJ_ONEMAPPING on arbitrary vm objects.  Only default
	and swap type vm objects should have it set.  vm_object_deallocate
	already handles these cases.

vm_object.c:
	If OBJ_ONEMAPPING isn't already clear in vm_object_shadow,
	we are in trouble.  Instead of clearing it, make it
	an assertion that it is already clear.
1999-07-11 18:30:32 +00:00
Peter Wemm
3efc015bae Fix some int/long printf problems for the Alpha 1999-07-01 19:53:43 +00:00
Alan Cox
6389da78d5 vm_map_growstack uses vmspace::vm_ssize as though it contained
the stack size in bytes when in fact it is the stack size in pages.
1999-06-17 21:29:38 +00:00
Alan Cox
29b45e9e99 vm_map_insert sometimes extends an existing vm_map entry, rather than
creating a new entry.  vm_map_stack and vm_map_growstack can panic when
a new entry isn't created.  Fixed vm_map_stack and vm_map_growstack.

Also, when extending the stack, always set the protection to VM_PROT_ALL.
1999-06-17 05:49:00 +00:00
Alan Cox
94f7e29a2a Move vm_map_stack and vm_map_growstack after the definition
of the vm_map_clip_end macro.  (The next commit will modify
vm_map_stack and vm_map_growstack to use vm_map_clip_end.)
1999-06-17 00:39:26 +00:00
Alan Cox
1fc43fd11d Remove some unused declarations and duplicate initialization. 1999-06-17 00:27:39 +00:00
Alan Cox
1c85e3df24 vm_map_protect:
The wrong vm_map_entry is used to determine if writes must not be
	allowed due to COW.
1999-06-12 23:10:38 +00:00