Commit Graph

2152 Commits

Author SHA1 Message Date
green
cf7423d4fe Remove extraneous locks on the VM free page queue mutex; it is not
meant to be recursed upon, and could cauuse a deadlock inside the
new contigmalloc (vm.old_contigmalloc=0) code.

Submitted by:	alc
2004-07-19 23:29:36 +00:00
alc
c700a8a66f - Eliminate the pte object from the pmap. Instead, page table pages are
allocated as "no object" pages.  Similar changes were made to the amd64
   and i386 pmap last year.  The primary reason being that maintaining
   a pte object leads to lock order violations.  A secondary reason being
   that the pte object is redundant, i.e., the page table itself can be
   used to lookup page table pages.  (Historical note: The pte object
   predates our ability to allocate "no object" pages.  Thus, the pte
   object was a necessary evil.)
 - Unconditionally check the vm object lock's status in vm_page_remove().
   Previously, this assertion could not be made on Alpha due to its use
   of a pte object.
2004-07-19 18:12:04 +00:00
green
3b66ac9138 Since breakage of malloc(9)/uma_zalloc(9) is totally non-optional in
GENERIC/for WITNESS users, make sure the sysctl to disable the behavior
is read-only and always enabled.
2004-07-19 15:05:24 +00:00
green
c4b4a8f048 Reimplement contigmalloc(9) with an algorithm which stands a greatly-
improved chance of working despite pressure from running programs.
Instead of trying to throw a bunch of pages out to swap and hope for
the best, only a range that can potentially fulfill contigmalloc(9)'s
request will have its contents paged out (potentially, not forcibly)
at a time.

The new contigmalloc operation still operates in three passes, but it
could potentially be tuned to more or less.  The first pass only looks
at pages in the cache and free pages, so they would be thrown out
without having to block.  If this is not enough, the subsequent passes
page out any unwired memory.  To combat memory pressure refragmenting
the section of memory being laundered, each page is removed from the
systems' free memory queue once it has been freed so that blocking
later doesn't cause the memory laundered so far to get reallocated.

The page-out operations are now blocking, as it would make little sense
to try to push out a page, then get its status immediately afterward
to remove it from the available free pages queue, if it's unlikely to
have been freed.  Another change is that if KVA allocation fails, the
allocated memory segment will be freed and not leaked.

There is a sysctl/tunable, defaulting to on, which causes the old
contigmalloc() algorithm to be used.  Nonetheless, I have been using
vm.old_contigmalloc=0 for over a month.  It is safe to switch at
run-time to see the difference it makes.

A new interface has been used which does not require mapping the
allocated pages into KVA: vm_page.h functions vm_page_alloc_contig()
and vm_page_release_contig().  These are what vm.old_contigmalloc=0
uses internally, so the sysctl/tunable does not affect their operation.

When using the contigmalloc(9) and contigfree(9) interfaces, memory
is now tracked with malloc(9) stats.  Several functions have been
exported from kern_malloc.c to allow other subsystems to use these
statistics, as well.  This invalidates the BUGS section of the
contigmalloc(9) manpage.
2004-07-19 06:21:27 +00:00
alc
c2ace5ec03 Remove the GIANT_REQUIRED preceding pmap_remove() in
vm_pageout_map_deactivate_pages().
2004-07-18 04:38:11 +00:00
alc
123cfa6b64 Push down the acquisition and release of the page queues lock into
pmap_protect() and pmap_remove().  In general, they require the lock in
order to modify a page's pv list or flags.  In some cases, however,
pmap_protect() can avoid acquiring the lock.
2004-07-15 18:00:43 +00:00
alc
f591650a79 Remove an unused and unimplemented sysctl. (For the record, it was marked
as unimplemented in revision 1.129 nearly six years ago.)
2004-07-12 17:45:37 +00:00
alc
a7935ebbec Increase the scope of the page queues lock in vm_page_alloc() to cover
a diagnostic check that accesses the cache queue count.
2004-07-10 22:12:49 +00:00
alc
4f5d8d196f Micro-optimize vmspace for 64-bit architectures: Colocate vm_refcnt and
vm_exitingcnt so that alignment does not result in wasted space.
2004-07-06 17:35:10 +00:00
bms
f91be5fba2 Properly brucify a string by outdenting it. 2004-07-06 02:27:30 +00:00
bmilekic
df84cdbe06 Introduce debug.nosleepwithlocks sysctl, 0 by default. If set to 1
and WITNESS is not built, then force all M_WAITOK allocations to
M_NOWAIT behavior (transparently).  This is to be used temporarily
if wierd deadlocks are reported because we still have code paths
that perform M_WAITOK allocations with lock(s) held, which can
lead to deadlock.  If WITNESS is compiled, then the sysctl is ignored
and we ask witness to tell us wether we have locks held, converting
to M_NOWAIT behavior only if it tells us that we do.

Note this removes the previous mbuf.h inclusion as well (only needed
by last revision), and cleans up unneeded [artificial] comparisons
to just the mbuf zones.  The problem described above has nothing to
do with previous mbuf wait behavior; it is a general problem.
2004-07-04 16:07:44 +00:00
green
77ef401fc6 Reextend the M_WAITOK-disabling-hack to all three of the mbuf-related
zones, and do it by direct comparison of uma_zone_t instead of strcmp.

The mbuf subsystem used to provide M_TRYWAIT/M_DONTWAIT semantics, but
this is mostly no longer the case.  M_WAITOK has taken over the spot
M_TRYWAIT used to have, and for mbuf things, still may return NULL if
the code path is incorrectly holding a mutex going into mbuf allocation
functions.

The M_WAITOK/M_NOWAIT semantics are absolute; though it may deadlock
the system to try to malloc or uma_zalloc something with a mutex held
and M_WAITOK specified, it is absolutely required to not return NULL
and will result in instability and/or security breaches otherwise.
There is still room to add the WITNESS_WARN() to all cases so that
we are notified of the possibility of deadlocks, but it cannot change
the value of the "badness" variable and allow allocation to actually
fail except for the specialized cases which used to be M_TRYWAIT.
2004-07-04 15:59:25 +00:00
green
b003469f2d Limit mbuma damage. Suddenly ALL allocations with M_WAITOK are subject
to failing -- that is, allocations via malloc(M_WAITOK) that are required
to never fail -- if WITNESS is not defined.  While everyone should be
running WITNESS, in any case, zone "Mbuf" allocations are really the only
ones that should be screwed with by this hack.

This hack is crashing people, and would continue to do so with or without
WITNESS.  Things shouldn't be allocating with M_WAITOK with locks held,
but it's not okay just to always remove M_WAITOK when !WITNESS.

Reported by:	Bernd Walter <ticso@cicely5.cicely.de>
2004-07-03 18:11:41 +00:00
jhb
696704716d Implement preemption of kernel threads natively in the scheduler rather
than as one-off hacks in various other parts of the kernel:
- Add a function maybe_preempt() that is called from sched_add() to
  determine if a thread about to be added to a run queue should be
  preempted to directly.  If it is not safe to preempt or if the new
  thread does not have a high enough priority, then the function returns
  false and sched_add() adds the thread to the run queue.  If the thread
  should be preempted to but the current thread is in a nested critical
  section, then the flag TDF_OWEPREEMPT is set and the thread is added
  to the run queue.  Otherwise, mi_switch() is called immediately and the
  thread is never added to the run queue since it is switch to directly.
  When exiting an outermost critical section, if TDF_OWEPREEMPT is set,
  then clear it and call mi_switch() to perform the deferred preemption.
- Remove explicit preemption from ithread_schedule() as calling
  setrunqueue() now does all the correct work.  This also removes the
  do_switch argument from ithread_schedule().
- Do not use the manual preemption code in mtx_unlock if the architecture
  supports native preemption.
- Don't call mi_switch() in a loop during shutdown to give ithreads a
  chance to run if the architecture supports native preemption since
  the ithreads will just preempt DELAY().
- Don't call mi_switch() from the page zeroing idle thread for
  architectures that support native preemption as it is unnecessary.
- Native preemption is enabled on the same archs that supported ithread
  preemption, namely alpha, i386, and amd64.

This change should largely be a NOP for the default case as committed
except that we will do fewer context switches in a few cases and will
avoid the run queues completely when preempting.

Approved by:	scottl (with his re@ hat)
2004-07-02 20:21:44 +00:00
jhb
1b16b181d1 - Change mi_switch() and sched_switch() to accept an optional thread to
switch to.  If a non-NULL thread pointer is passed in, then the CPU will
  switch to that thread directly rather than calling choosethread() to pick
  a thread to choose to.
- Make sched_switch() aware of idle threads and know to do
  TD_SET_CAN_RUN() instead of sticking them on the run queue rather than
  requiring all callers of mi_switch() to know to do this if they can be
  called from an idlethread.
- Move constants for arguments to mi_switch() and thread_single() out of
  the middle of the function prototypes and up above into their own
  section.
2004-07-02 19:09:50 +00:00
jhb
ce31015cd1 - Don't use a variable to point to the user area that we only use once.
Just use p2->p_uarea directly instead.
- Remove an old and mostly bogus assertion regarding p2->p_sigacts.
- Use RANGEOF macro ala fork1() to clean up bzero/bcopy of p_stats.
2004-07-02 03:45:07 +00:00
tegge
16dc5859f2 Initialize result->backing_object_offset before linking result onto the list of
vm objects shadowing source in vm_object_shadow().  This closes a race where
vm_object_collapse() could be called with a partially uninitialized object
argument causing symptoms that looked like hardware problems, e.g.  signal 6,
10, 11 or a /bin/sh busy-waiting for a nonexistant child process.
2004-06-28 20:26:35 +00:00
gallatin
f1f19c639c Use MIN() macro rather than ulmin() inline, and fix stray tab
that snuck in with my last commit.

Submitted by: green
2004-06-28 19:58:39 +00:00
gallatin
81e9341035 Fix alpha - the use of min() on longs was loosing the high bits and
returning wrong answers, leading to strange values vm2->vm_{s,t,d}size.
2004-06-28 19:15:40 +00:00
das
2aabb5e466 Update a stale comment. The heuristic to swap processes out based on
the number of pages already paged out was broken in rev 1.10 and
removed in rev 1.11.
2004-06-27 01:58:12 +00:00
alc
3121dcf134 Remove an unused field from the vmspace structure. 2004-06-26 19:16:35 +00:00
green
4d32bd722d Correct the tracking of various bits of the process's vmspace and vm_map
when not propogated on fork (due to minherit(2)).  Consistency checks
otherwise fail when the vm_map is freed and it appears to have not been
emptied completely, causing an INVARIANTS panic in vm_map_zdtor().

PR:		kern/68017
Submitted by:	Mark W. Krentel <krentel@dreamscape.com>
Reviewed by:	alc
2004-06-24 22:43:46 +00:00
alc
a96a1d58f7 Call vm_pageout_page_stats() with the page queues lock held. 2004-06-24 04:08:43 +00:00
alc
a035bc6c10 Remove spl calls. 2004-06-24 03:13:30 +00:00
bmilekic
7a6a2d65d4 Make uma_mtx MTX_RECURSE. Here's why:
The general UMA lock is a recursion-allowed lock because
there is a code path where, while we're still configured
to use startup_alloc() for backend page allocations, we
may end up in uma_reclaim() which calls zone_foreach(zone_drain),
which grabs uma_mtx, only to later call into startup_alloc()
because while freeing we needed to allocate a bucket.  Since
startup_alloc() also takes uma_mtx, we need to be able to
recurse on it.

This exact explanation also added as comment above mtx_init().

Trace showing recursion reported by: Peter Holm <peter-at-holm.cc>
2004-06-23 21:59:03 +00:00
bms
72f48d5467 In swap_pager_getpages(), bp->b_dev can be NULL, particularly for the
case of NFS mounted swap, so do not try to dereference it.

While we're here, brucify the printf() call which happens when we
time out on acquisition of vm_page_queue_mtx.

PR:		kern/67898
Submitted by:	bde (style)
2004-06-23 15:15:07 +00:00
alc
6edd95aba5 Remove spl() calls. Update comments to reflect the removal of spl() calls.
Remove '\n' from panic() format strings.  Remove some blank lines.
2004-06-19 04:19:47 +00:00
phk
40dd98a3bd Second half of the dev_t cleanup.
The big lines are:
	NODEV -> NULL
	NOUDEV -> NODEV
	udev_t -> dev_t
	udev2dev() -> findcdev()

Various minor adjustments including handling of userland access to kernel
space struct cdev etc.
2004-06-17 17:16:53 +00:00
alc
4c2e464ea2 Do not preset PG_BUSY on VM_ALLOC_NOOBJ pages. Such pages are not
accessible through an object.  Thus, PG_BUSY serves no purpose.
2004-06-17 06:16:58 +00:00
phk
dfd1f7fd50 Do the dreaded s/dev_t/struct cdev */
Bump __FreeBSD_version accordingly.
2004-06-16 09:47:26 +00:00
julian
6c9d81ae0d Nice, is a property of a process as a whole..
I mistakenly moved it to the ksegroup when breaking up the process
structure. Put it back in the proc structure.
2004-06-16 00:26:31 +00:00
green
52f66d0879 Make contigmalloc() more reliable:
1. Remove a race whereby contigmalloc() would deadlock against the
   running processes in the system if they kept reinstantiating
   the memory on the active and inactive page queues that it was
   trying to flush out.  The process doing the contigmalloc() would
   sit in "swwrt" forever and the swap pager would be going at full
   force, but never get anywhere.  Instead of doing it until the
   queues are empty, launder for as many iterations as there are
   pages in the queue.
2. Do all laundering to swap synchronously; previously, the vnode
   laundering was synchronous and the swap laundering not.
3. Increase the number of launder-or-allocate passes to three, from
   two, while failing without bothering to do all the laundering on
   the third pass if allocation was not possible.  This effectively
   gives exactly two chances to launder enough contiguous memory,
   helpful with high memory churn where a lot of memory from one pass
   to the next (and during a single laundering loop) becomes dirtied
   again.

I can now reliably hot-plug hardware requiring a 256KB contigmalloc()
without having the kldload/cbb ithread sit around failing to make
progress, while running a busy X session.  Previously, it took killing
X to get contigmalloc() to get further (that is, quiescing the system),
and even then contigmalloc() returned failure.
2004-06-15 01:02:00 +00:00
phk
86602fc06c Deorbit COMPAT_SUNOS.
We inherited this from the sparc32 port of BSD4.4-Lite1.  We have neither
a sparc32 port nor a SunOS4.x compatibility desire these days.
2004-06-11 11:16:26 +00:00
bmilekic
ea4a8a094f Backout previous change, I think Julian has a better solution which
does not require type-stable refcnts here.
2004-06-09 20:50:08 +00:00
bmilekic
1edc23feaa Make the slabrefzone, the zone from which we allocated slabs with
internal reference counters, UMA_ZONE_NOFREE.  This way, those slabs
(with their ref counts) will be effectively type-stable, then using
a trick like this on the refcount is no longer dangerous:

        MEXT_REM_REF(m);
        if (atomic_cmpset_int(m->m_ext.ref_cnt, 0, 1)) {
                if (m->m_ext.ext_type == EXT_PACKET) {
                        uma_zfree(zone_pack, m);
                        return;
                } else if (m->m_ext.ext_type == EXT_CLUSTER) {
                        uma_zfree(zone_clust, m->m_ext.ext_buf);
                        m->m_ext.ext_buf = NULL;
                } else {
                        (*(m->m_ext.ext_free))(m->m_ext.ext_buf,
                            m->m_ext.ext_args);
                        if (m->m_ext.ext_type != EXT_EXTREF)
                                free(m->m_ext.ref_cnt, M_MBUF);
                }
        }
        uma_zfree(zone_mbuf, m);

Previously, a second thread hitting the above cmpset might
actually read the refcnt AFTER it has already been freed.  A very
rare occurance.  Now we'll know that it won't be freed, though.

Spotted by: julian, pjd
2004-06-09 19:18:50 +00:00
netchild
8ec3bc4102 Remove references to L1 in the comments, according to Alan they are
historical leftovers.

Approved by:	alc
2004-06-07 19:33:05 +00:00
alc
ffc00b620f Update stale comments regarding page coloring. 2004-06-05 21:06:42 +00:00
alc
b5cd9ba03c Move the definitions of SWAPBLK_NONE and SWAPBLK_MASK from vm_page.h to
blist.h, enabling the removal of numerous #includes from subr_blist.c.
(subr_blist.c and swap_pager.c are the only users of these definitions.)
2004-06-04 04:03:26 +00:00
bmilekic
2ad1fea4f3 Fix a comment above uma_zsecond_create(), describing its arguments.
It doesn't take 'align' and 'flags' but 'master' instead, which is
a reference to the Master Zone, containing the backing Keg.

Pointed out by: Tim Robbins (tjr)
2004-06-01 01:36:26 +00:00
bmilekic
f7574a2276 Bring in mbuma to replace mballoc.
mbuma is an Mbuf & Cluster allocator built on top of a number of
extensions to the UMA framework, all included herein.

Extensions to UMA worth noting:
  - Better layering between slab <-> zone caches; introduce
    Keg structure which splits off slab cache away from the
    zone structure and allows multiple zones to be stacked
    on top of a single Keg (single type of slab cache);
    perhaps we should look into defining a subset API on
    top of the Keg for special use by malloc(9),
    for example.
  - UMA_ZONE_REFCNT zones can now be added, and reference
    counters automagically allocated for them within the end
    of the associated slab structures.  uma_find_refcnt()
    does a kextract to fetch the slab struct reference from
    the underlying page, and lookup the corresponding refcnt.

mbuma things worth noting:
  - integrates mbuf & cluster allocations with extended UMA
    and provides caches for commonly-allocated items; defines
    several zones (two primary, one secondary) and two kegs.
  - change up certain code paths that always used to do:
    m_get() + m_clget() to instead just use m_getcl() and
    try to take advantage of the newly defined secondary
    Packet zone.
  - netstat(1) and systat(1) quickly hacked up to do basic
    stat reporting but additional stats work needs to be
    done once some other details within UMA have been taken
    care of and it becomes clearer to how stats will work
    within the modified framework.

From the user perspective, one implication is that the
NMBCLUSTERS compile-time option is no longer used.  The
maximum number of clusters is still capped off according
to maxusers, but it can be made unlimited by setting
the kern.ipc.nmbclusters boot-time tunable to zero.
Work should be done to write an appropriate sysctl
handler allowing dynamic tuning of kern.ipc.nmbclusters
at runtime.

Additional things worth noting/known issues (READ):
   - One report of 'ips' (ServeRAID) driver acting really
     slow in conjunction with mbuma.  Need more data.
     Latest report is that ips is equally sucking with
     and without mbuma.
   - Giant leak in NFS code sometimes occurs, can't
     reproduce but currently analyzing; brueffer is
     able to reproduce but THIS IS NOT an mbuma-specific
     problem and currently occurs even WITHOUT mbuma.
   - Issues in network locking: there is at least one
     code path in the rip code where one or more locks
     are acquired and we end up in m_prepend() with
     M_WAITOK, which causes WITNESS to whine from within
     UMA.  Current temporary solution: force all UMA
     allocations to be M_NOWAIT from within UMA for now
     to avoid deadlocks unless WITNESS is defined and we
     can determine with certainty that we're not holding
     any locks when we're M_WAITOK.
   - I've seen at least one weird socketbuffer empty-but-
     mbuf-still-attached panic.  I don't believe this
     to be related to mbuma but please keep your eyes
     open, turn on debugging, and capture crash dumps.

This change removes more code than it adds.

A paper is available detailing the change and considering
various performance issues, it was presented at BSDCan2004:
http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf
Please read the paper for Future Work and implementation
details, as well as credits.

Testing and Debugging:
    rwatson,
    brueffer,
    Ketrien I. Saihr-Kesenchedra,
    ...
Reviewed by: Lots of people (for different parts)
2004-05-31 21:46:06 +00:00
alc
ec226ee689 Remove a stale comment: PG_DIRTY and PG_FILLED were removed in
revisions 1.17 and 1.12 respectively.
2004-05-30 20:48:15 +00:00
hmp
58fffa8ca6 Correct typo, vm_page_list_find() is called vm_pageq_find() for quite a
long time, i.e., since the cleanup of the VM Page-queues code done two
years ago.

Reviewed by:	Alan Cox <alc at freebsd.org>,
            	Matthew Dillon <dillon at backplane.com>
2004-05-30 00:42:38 +00:00
des
c1e01a1433 MFS: vm_map.c rev 1.187.2.27 through 1.187.2.29, fix MS_INVALIDATE
semantics but provide a sysctl knob for reverting to old ones.
2004-05-25 18:40:53 +00:00
des
9b7c776aa5 Back out previous commit; it went to the wrong file. 2004-05-25 18:28:52 +00:00
des
30a7255157 MFS: rev 1.187.2.27 through 1.187.2.29, fix MS_INVALIDATE semantics but
provide a sysctl knob for reverting to old ones.
2004-05-25 16:31:49 +00:00
alc
49093f70b7 Correct two error cases in vm_map_unwire():
1. Contrary to the Single Unix Specification our implementation of
   munlock(2) when performed on an unwired virtual address range has
   returned an error.  Correct this.  Note, however, that the behavior
   of "system" unwiring is unchanged, only "user" unwiring is changed.
   If "system" unwiring is performed on an unwired virtual address
   range, an error is still returned.

2. Performing an errant "system" unwiring on a virtual address range
   that was "user" (i.e., mlock(2)) but not "system" wired would
   incorrectly undo the "user" wiring instead of returning an error.
   Correct this.

Discussed with:  green@
Reviewed by:     tegge@
2004-05-25 05:51:17 +00:00
alc
5d0912f6d8 To date, unwiring a fictitious page has produced a panic. The reason
being that PHYS_TO_VM_PAGE() returns the wrong vm_page for fictitious
pages but unwiring uses PHYS_TO_VM_PAGE().  The resulting panic
reported an unexpected wired count.  Rather than attempting to fix
PHYS_TO_VM_PAGE(), this fix takes advantage of the properties of
fictitious pages.  Specifically, fictitious pages will never be
completely unwired.  Therefore, we can keep a fictitious page's wired
count forever set to one and thereby avoid the use of
PHYS_TO_VM_PAGE() when we know that we're working with a fictitious
page, just not which one.

In collaboration with: green@, tegge@
PR: kern/29915
2004-05-22 04:53:51 +00:00
alc
625c8dfb26 Restructure vm_page_select_cache() so that adding assertions is easy.
Some of the conditions that caused vm_page_select_cache() to deactivate a
page were wrong.  For example, deactivating an unmanaged or wired page is a
nop.  Thus, if vm_page_select_cache() had ever encountered an unmanaged or
wired page, it would have looped forever.  Now, we assert that the page is
neither unmanaged nor wired.
2004-05-12 04:27:18 +00:00
alc
db27194e31 Cache queue pages are not mapped. Thus, the pmap_remove_all() by
vm_pageout_scan()'s loop for freeing cache queue pages is unnecessary.
2004-05-12 04:10:35 +00:00
tjr
a87b9baadb To handle orphaned character device vnodes properly in mmap(), check that
v_mount is non-null before dereferencing it. If it's null, behave as if
MNT_NOEXEC was not set on the mount that originally containined it.
2004-05-11 10:26:37 +00:00
alc
0049080776 Cache queue pages are not mapped. Thus, the pmap_remove_all() by
vm_page_alloc() is unnecessary.
2004-05-09 01:00:15 +00:00
green
129c6b32bf In r1.190, vslock() and vsunlock() were bogusly made to do a "user wire"
and a "system unwire."  Make this a "system wire" and "system unwire."

Reviewed by:	alc
2004-05-07 11:43:24 +00:00
green
41ba6d4a43 Properly remove MAP_FUTUREWIRE when a vm_map_entry gets torn down.
Previously, mlockall(2) usage would leak MAP_FUTUREWIRE of the process's
vmspace::vm_map and subsequent processes would wire all of their memory.
Coupled with a wired-page leak in vm_fault_unwire(), this would run the
system out of free pages and cause programs to randomly SIGBUS when
faulting in new pages.

(Note that this is not the fix for the latter part; pages are still
 leaked when a wired area is unmapped in some cases.)

Reviewed by:	alc
PR		kern/62930
2004-05-07 00:17:07 +00:00
alc
b57e5e03fd Make vm_page's PG_ZERO flag immutable between the time of the page's
allocation and deallocation.  This flag's principal use is shortly after
allocation.  For such cases, clearing the flag is pointless.  The only
unusual use of PG_ZERO is in vfs_bio_clrbuf().  However, allocbuf() never
requests a prezeroed page.  So, vfs_bio_clrbuf() never sees a prezeroed
page.

Reviewed by:	tegge@
2004-05-06 05:03:23 +00:00
alc
44440f7818 Zero the physical page only if it is invalid and not prezeroed. 2004-04-25 07:58:59 +00:00
alc
899b7d0677 Add a VM_OBJECT_LOCK_ASSERT() call. Remove splvm() and splx() calls. Move
a comment.
2004-04-24 23:23:36 +00:00
alc
800747333a Update the comment describing vm_page_grab() to reflect the previous
revision and correct some of its style errors.
2004-04-24 21:36:23 +00:00
alc
106fdfcb2b Push down the responsibility for zeroing a physical page from the
caller to vm_page_grab().  Although this gives VM_ALLOC_ZERO a
different meaning for vm_page_grab() than for vm_page_alloc(), I feel
such change is necessary to accomplish other goals.  Specifically, I
want to make the PG_ZERO flag immutable between the time it is
allocated by vm_page_alloc() and freed by vm_page_free() or
vm_page_free_zero() to avoid locking overheads.  Once we gave up on
the ability to automatically recognize a zeroed page upon entry to
vm_page_free(), the ability to mutate the PG_ZERO flag became useless.
Instead, I would like to say that "Once a page becomes valid, its
PG_ZERO flag must be ignored."
2004-04-24 20:53:55 +00:00
alc
ff037b9220 In cases where a file was resident in memory mmap(..., PROT_NONE, ...)
would actually map the file with read access enabled.  According to
http://www.opengroup.org/onlinepubs/007904975/functions/mmap.html this is
an error.  Similarly, an madvise(..., MADV_WILLNEED) would enable read
access on a virtual address range that was PROT_NONE.

The solution implemented herein is (1) to pass a vm_prot_t to
vm_map_pmap_enter() describing the allowed access and (2) to make
vm_map_pmap_enter() responsible for understanding the limitations of
pmap_enter_quick().

Submitted by:	"Mark W. Krentel" <krentel@dreamscape.com>
PR:		kern/64573
2004-04-24 03:46:44 +00:00
alc
72331bccaf Push down Giant into vm_pager_get_pages(). The only get pages methods that
require Giant are in the device and vnode pagers.
2004-04-23 06:10:58 +00:00
alc
c380417937 - pmap_kenter_temporary() is unused by machine-independent code. Therefore,
move its declaration to the machine-dependent header file on those
   machines that use it.  In principle, only i386 should have it.
   Alpha and AMD64 should use their direct virtual-to-physical mapping.
 - Remove pmap_kenter_temporary() from ia64.  It is unused.  Approved
   by: marcel@
2004-04-10 22:41:46 +00:00
alc
e8b438d9b6 The demise of vm_pager_map_page() in revision 1.93 of vm/vm_pager.c permits
the reduction of the pager map's size by 8M bytes.  In other words, eight
megabytes of largely wasted KVA are returned to the kernel map for use
elsewhere.
2004-04-08 19:08:49 +00:00
imp
04c9462c02 Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999.

Approved by: core
2004-04-06 20:15:37 +00:00
alc
728ea9afc2 Eliminate vm_pager_map_page() and vm_pager_unmap_page() and their uses.
Use sf_buf_alloc() and sf_buf_free() instead.
2004-04-06 07:12:32 +00:00
kan
fee73bd419 Delay permission checks for VCHR vnodes until after vnode is locked in
vm_mmap_vnode function, where we can safely check for a special /dev/zero
case. Rev. 1.180 has reordered checks and introduced a regression.

Submitted by:	alc
Was broken by:	kan
2004-04-05 04:54:22 +00:00
alc
19e5eda309 Remove unused arguments from pmap_init(). 2004-04-05 00:37:50 +00:00
alc
e9f0dd4665 Eliminate unused arguments from vm_page_startup(). 2004-04-04 23:33:36 +00:00
tjr
1966edfcd2 Do not copy vm_exitingcnt to the new vmspace in vmspace_exec(). Copying
it led to impossibly high values in the new vmspace, causing it to never
drop to 0 and be freed.
2004-03-23 08:37:34 +00:00
guido
365db5dd01 When mmap-ing a file from a noexec mount, be sure not to grant the right
to mmap it PROT_EXEC. This also depends on the architecture, as some
architextures (e.g. i386) do not distinguish between read and exec pages

Inspired by: 	http://linux.bkbits.net:8080/linux-2.4/cset@1.1267.1.85
Reviewed by:	alc
2004-03-18 20:58:51 +00:00
truckman
df17b6c2c8 Make overflow/wraparound checking more robust and unbreak len=0 in
vslock(), mlock(), and munlock().

Reviewed by:	bde
2004-03-15 09:11:23 +00:00
truckman
87ef1d0222 Style(9) changes.
Pointed out by:	bde
2004-03-15 06:43:51 +00:00
truckman
b7a6af3cc9 Revert to the original vslock() and vsunlock() API with the following
exceptions:
	Retain the recently added vslock() error return.

	The type of the len argument should be size_t, not u_int.

Suggested by:	bde
2004-03-15 06:42:40 +00:00
truckman
4a8aedf7cf Remove redundant suser() check. 2004-03-15 06:36:55 +00:00
alc
4fad4b8d26 Remove GIANT_REQUIRED from contigfree(). 2004-03-13 07:09:15 +00:00
peter
2f1d89c4f5 Part 2 of rev 1.68. Update comment to match reality now that vm_endcopy
exists and we no longer copy to the end of the struct.

Forgotten by:  alfred and green
2004-03-12 00:16:48 +00:00
alc
dbdc402421 - Make the acquisition of Giant in vm_fault_unwire() conditional on the
pmap.  For the kernel pmap, Giant is not required.  In general, for
   other pmaps, Giant is required by i386's pmap_pte() implementation.
   Specifically, the use of PMAP2/PADDR2 is synchronized by Giant.
   Note: In principle, updates to the kernel pmap's wired count could be
   lost without Giant.  However, in practice, we never use the kernel
   pmap's wired count.  This will be resolved when pmap locking appears.
 - With the above change, cpu_thread_clean() and uma_large_free() need
   not acquire Giant.  (The first case is simply the revival of
   i386/i386/vm_machdep.c's revision 1.226 by peter.)
2004-03-10 04:44:43 +00:00
alc
81dacfa43e Implement a work around for the deadlock avoidance case in
vm_object_deallocate() so that it doesn't spin forever either.

Submitted by:	bde
2004-03-08 03:54:36 +00:00
alc
f13324f65b Retire pmap_pinit2(). Alpha was the last platform that used it. However,
ever since alpha/alpha/pmap.c revision 1.81 introduced the list allpmaps,
there has been no reason for having this function on Alpha.  Briefly,
when pmap_growkernel() relied upon the list of all processes to find and
update the various pmaps to reflect a growth in the kernel's valid
address space, pmap_init2() served to avoid a race between pmap
initialization and pmap_growkernel().  Specifically, pmap_pinit2() was
responsible for initializing the kernel portions of the pmap and
pmap_pinit2() was called after the process structure contained a pointer
to the new pmap for use by pmap_growkernel().  Thus, an update to the
kernel's address space might be applied to the new pmap unnecessarily,
but an update would never be lost.
2004-03-07 21:06:48 +00:00
rwatson
fa59040dad Mark uma_callout as CALLOUT_MPSAFE, as uma_timeout can run MPSAFE.
Reviewed by:	jeff
2004-03-07 07:00:46 +00:00
truckman
367b608998 Undo the merger of mlock()/vslock and munlock()/vsunlock() and the
introduction of kern_mlock() and kern_munlock() in
        src/sys/kern/kern_sysctl.c      1.150
        src/sys/vm/vm_extern.h          1.69
        src/sys/vm/vm_glue.c            1.190
        src/sys/vm/vm_mmap.c            1.179
because different resource limits are appropriate for transient and
"permanent" page wiring requests.

Retain the kern_mlock() and kern_munlock() API in the revived
vslock() and vsunlock() functions.

Combine the best parts of each of the original sets of implementations
with further code cleanup.  Make the mclock() and vslock()
implementations as similar as possible.

Retain the RLIMIT_MEMLOCK check in mlock().  Move the most strigent
test, which can return EAGAIN, last so that requests that have no
hope of ever being satisfied will not be retried unnecessarily.

Disable the test that can return EAGAIN in the vslock() implementation
because it will cause the sysctl code to wedge.

Tested by:	Cy Schubert <Cy.Schubert AT komquats.com>
2004-03-05 22:03:11 +00:00
alc
9687ab882b In the last revision, I introduced a physical contiguity check that is both
unnecessary and wrong.  While it is necessary to verify that the page is
still free after dropping and reacquiring the free page queue lock, the
physical contiguity of the page can not change, making this check
unnecessary.  This check was wrong in that it could cause an out-of-bounds
array access.

Tested by:	rwatson
2004-03-05 04:46:32 +00:00
bde
557b43a4bb Record exactly where this file was copied from. It wasn't repo-copied so
this is not very obvious.

Fixed some style bugs (mainly missing parentheses around return values).
2004-03-04 10:18:17 +00:00
bde
f60d27291e Minor style fixes. In vm_daemon(), don't fetch the rss limit long before
it is needed.
2004-03-04 09:36:46 +00:00
alc
9204c0aad9 Remove some long unused definitions. 2004-03-04 04:26:14 +00:00
alc
163a9eabd1 Modify contigmalloc1() so that the free page queues lock is not held when
vm_page_free() is called.  The problem with holding this lock is that it is
a spin lock and vm_page_free() may attempt the acquisition of a different
default-type lock.
2004-03-02 08:25:58 +00:00
kan
9fe9a30730 Pich up a do {} while(0) cleanup by phk that was discarded accidentally in
previous revision.

Submitted by:	alc
2004-03-01 02:44:33 +00:00
kan
68d7945627 Move the code dealing with vnode out of several functions into a single
helper function vm_mmap_vnode.

Discussed with:	jeffr,alc (a while ago)
2004-02-27 22:02:15 +00:00
truckman
1de257deb3 Split the mlock() kernel code into two parts, mlock(), which unpacks
the syscall arguments and does the suser() permission check, and
kern_mlock(), which does the resource limit checking and calls
vm_map_wire().  Split munlock() in a similar way.

Enable the RLIMIT_MEMLOCK checking code in kern_mlock().

Replace calls to vslock() and vsunlock() in the sysctl code with
calls to kern_mlock() and kern_munlock() so that the sysctl code
will obey the wired memory limits.

Nuke the vslock() and vsunlock() implementations, which are no
longer used.

Add a member to struct sysctl_req to track the amount of memory
that is wired to handle the request.

Modify sysctl_wire_old_buffer() to return an error if its call to
kern_mlock() fails.  Only wire the minimum of the length specified
in the sysctl request and the length specified in its argument list.
It is recommended that sysctl handlers that use sysctl_wire_old_buffer()
should specify reasonable estimates for the amount of data they
want to return so that only the minimum amount of memory is wired
no matter what length has been specified by the request.

Modify the callers of sysctl_wire_old_buffer() to look for the
error return.

Modify sysctl_old_user to obey the wired buffer length and clean up
its implementation.

Reviewed by:	bms
2004-02-26 00:27:04 +00:00
alc
2d844ba403 - Substitute bdone() and bwait() from vfs_bio.c for
swap_pager_putpages()'s buffer completion code.  Note: the only
   difference between swp_pager_sync_iodone() and bdone(), aside from
   the locking in the latter, was the unnecessary clearing of B_ASYNC.
 - Remove an unnecessary pmap_page_protect() from
   swp_pager_async_iodone().

Reviewed by:	tegge
2004-02-23 03:15:13 +00:00
alc
28e2e6f950 Correct a long-standing race condition in vm_object_page_remove() that
could result in a dirty page being unintentionally freed.

Reviewed by:	tegge
MFC after:	7 days
2004-02-22 03:36:51 +00:00
alc
ecf8be493e Eliminate the second, unnecessary call to pmap_page_protect() near the end
of vm_pageout_flush().  Instead, assert that the page is still write
protected.

Discussed with:	tegge
2004-02-21 23:32:00 +00:00
alc
4583be830a - Correct a long-standing race condition in vm_page_try_to_free() that
could result in a dirty page being unintentionally freed.
 - Simplify the dirty page check in vm_page_dontneed().

Reviewed by:	tegge
MFC after:	7 days
2004-02-19 07:43:55 +00:00
des
00e29c835f Back out previous commit due to objections. 2004-02-16 21:36:59 +00:00
des
26a38784fb Don't panic if we fail to satisfy an M_WAITOK request; return 0 instead.
The calling code will either handle that gracefully or cause a page fault.
2004-02-16 18:41:58 +00:00
alc
600caa157e Correct a long-standing race condition in vm_contig_launder() that could
result in a panic "vm_page_cache: caching a dirty page, ...": Access to the
page must be restricted or removed before calling vm_page_cache().  This
race condition is identical in nature to that which was addressed by
vm_pageout.c's revision 1.251 and vm_page.c's revision 1.275.

MFC after:	7 days
2004-02-16 03:43:57 +00:00
alc
aae81a61cf Correct a long-standing race condition in vm_fault() that could result in a
panic "vm_page_cache: caching a dirty page, ...": Access to the page must
be restricted or removed before calling vm_page_cache().  This race
condition is identical in nature to that which was addressed by
vm_pageout.c's revision 1.251 and vm_page.c's revision 1.275.

Reviewed by:	tegge
MFC after:	7 days
2004-02-15 00:42:26 +00:00
alc
86e8aa4dc7 - Correct a long-standing race condition in vm_page_try_to_cache() that
could result in a panic "vm_page_cache: caching a dirty page, ...":
   Access to the page must be restricted or removed before calling
   vm_page_cache().  This race condition is identical in nature to that
   which was addressed by vm_pageout.c's revision 1.251.
 - Simplify the code surrounding the fix to this same race condition
   in vm_pageout.c's revision 1.251.  There should be no behavioral
   change.  Reviewed by: tegge

MFC after:	7 days
2004-02-14 08:54:37 +00:00
phk
124977fbb6 Remove the absolute count g_access_abs() function since experience has
shown that it is not useful.

Rename the relative count g_access_rel() function to g_access(), only
the name has changed.

Change all g_access_rel() calls in our CVS tree to call g_access() instead.

Add an #ifndef BURN_BRIDGES #define of g_access_rel() for source
code compatibility.
2004-02-12 22:42:11 +00:00
alc
81636e22b0 Further reduce the use of Giant in vm_map_delete(): Perform pmap_remove()
on system maps, besides the kmem_map, without Giant.

In collaboration with:	tegge
2004-02-12 20:56:06 +00:00
alc
4a9de29e24 Correct a long-standing race condition in the inactive queue scan. (See
the added comment for low-level details.)  The effect of this race
condition is a panic "vm_page_cache: caching a dirty page, ..."

Reviewed by:	tegge
MFC after:	7 days
2004-02-10 18:34:27 +00:00
alc
8a8d62e1aa swp_pager_async_iodone() no longer requires Giant. Modify bufdone()
and swapgeom_done() to perform swp_pager_async_iodone() without Giant.

Reviewed by:	tegge
2004-02-07 08:54:50 +00:00
alc
cabea24620 - Locking for the per-process resource limits structure has eliminated
the need for Giant in vm_map_growstack().
 - Use the proc * that is passed to vm_map_growstack() rather than
   curthread->td_proc.
2004-02-05 06:33:18 +00:00
jhb
279b2b8278 Locking for the per-process resource limits structure.
- struct plimit includes a mutex to protect a reference count.  The plimit
  structure is treated similarly to struct ucred in that is is always copy
  on write, so having a reference to a structure is sufficient to read from
  it without needing a further lock.
- The proc lock protects the p_limit pointer and must be held while reading
  limits from a process to keep the limit structure from changing out from
  under you while reading from it.
- Various global limits that are ints are not protected by a lock since
  int writes are atomic on all the archs we support and thus a lock
  wouldn't buy us anything.
- All accesses to individual resource limits from a process are abstracted
  behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return
  either an rlimit, or the current or max individual limit of the specified
  resource from a process.
- dosetrlimit() was renamed to kern_setrlimit() to match existing style of
  other similar syscall helper functions.
- The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit()
  (it didn't used the stackgap when it should have) but uses lim_rlimit()
  and kern_setrlimit() instead.
- The svr4 compat no longer uses the stackgap for resource limits calls,
  but uses lim_rlimit() and kern_setrlimit() instead.
- The ibcs2 compat no longer uses the stackgap for resource limits.  It
  also no longer uses the stackgap for accessing sysctl's for the
  ibcs2_sysconf() syscall but uses kernel_sysctl() instead.  As a result,
  ibcs2_sysconf() no longer needs Giant.
- The p_rlimit macro no longer exists.

Submitted by:	mtm (mostly, I only did a few cleanups and catchups)
Tested on:	i386
Compiled on:	alpha, amd64
2004-02-04 21:52:57 +00:00
jhb
ab91a45803 Drop the reference count on the old vmspace after fully switching the
current thread to the new vmspace.

Suggested by:	dillon
2004-02-02 23:23:48 +00:00
phk
62a971f78f Check error return from g_clone_bio(). (netchild@)
Add XXX comment about why this is still not optimal. (phk@)

Submitted by:	netchild@
2004-02-02 13:08:03 +00:00
jeff
60c153f457 - Use a seperate startup function for the zeroidle kthread. Use this to
set P_NOLOAD prior to running the thread.
2004-02-02 07:51:03 +00:00
jeff
8b93703f2c - Fix a problem where we did not drain the cache of buckets in the zone
when uma_reclaim() was called.  This was introduced when the zone
   working-set algorithm was removed in favor of using the per cpu caches
   as the working set.
2004-02-01 06:15:17 +00:00
des
40b179743a Mechanical whitespace cleanup. 2004-01-30 16:26:29 +00:00
bde
708713d12b Fixed breakage of scheduling in rev.1.29 of subr_4bsd.c. The
"scheduler" here has very little to do with scheduling.  It is actually
the swapper, and it really must be the last SYSINIT'ed item like its
comment says, since proc0 metamorphoses into swapper by calling
scheduler() last in mi_start(), and scheduler() never returns..  Rev.1.29
of subr_4bsd.c broke this by adding another SI_ORDER_FIRST item
(kproc_start() for schedcpu_thread() onto the SI_SUB_RUN_SCHEDULER_LIST.
The sorting of SYSINITs with identical orders (at all levels) is
apparently nondeterministic, so this resulted in schedule() sometimes
being called second last and schedcpu_thread() not being called at all.

This quick fix just changes the code to almost match the comment
(SI_ORDER_FIRST -> SI_ORDER_ANY).  "LAST" is misspelled "ANY", and
there is no way to ensure that there is only 1 very lst SYSINIT.
A more complete fix would remove the SYSINIT obfuscation.
2004-01-29 12:35:11 +00:00
jeff
c85cdc3d0f - Add a flags parameter to mi_switch. The value of flags may be SW_VOL or
SW_INVOL.  Assert that one of these is set in mi_switch() and propery
   adjust the rusage statistics.  This is to simplify the large number of
   users of this interface which were previously all required to adjust the
   proper counter prior to calling mi_switch().  This also facilitates more
   switch and locking optimizations.
 - Change all callers of mi_switch() to pass the appropriate paramter and
   remove direct references to the process statistics.
2004-01-25 03:54:52 +00:00
alc
60b2122af9 1. Statically initialize swap_pager_full and swap_pager_almost_full to the
full state.  (When swap is added their state will change appropriately.)
2. Set swap_pager_full and swap_pager_almost_full to the full state when
   the last swap device is removed.
Combined these changes eliminate nonsense messages from the kernel on swap-
less machines.

Item 2 submitted by:	Divacky Roman <xdivac02@stud.fit.vutbr.cz>
Prodding by:		phk
2004-01-24 21:31:06 +00:00
alc
8c7902507f Increase UMA_BOOT_PAGES because of changes to pv entry initialization in
revision 1.457 of i386/i386/pmap.c.
2004-01-18 05:51:06 +00:00
alc
12eb36a03f Don't acquire Giant in vm_object_deallocate() unless the object is vnode-
backed.
2004-01-18 03:44:14 +00:00
alc
24162d19e2 Remove vm_page_alloc_contig(). It's now unused. 2004-01-14 06:21:38 +00:00
alc
5752042760 Remove long dead code, specifically, code related to munmapfd().
(See also vm/vm_mmap.c revision 1.173.)
2004-01-11 06:59:21 +00:00
alc
9aa6374009 - Unmanage pages allocated by contigmalloc1(). (There is no point in
having PV entries for these pages.)
 - Remove splvm() and splx() calls.
2004-01-10 21:17:53 +00:00
alc
cbae8dbd97 Unmanage pages allocated by kmem_alloc(). (There is no point in having PV
entries for these pages.)
2004-01-10 00:22:33 +00:00
alc
9f7878e05a - Enable recursive acquisition of the mutex synchronizing access to the
free pages queue.  This is presently needed by contigmalloc1().
 - Move a sanity check against attempted double allocation of two pages
   to the same vm object offset from vm_page_alloc() to vm_page_insert().
   This provides better protection because double allocation could occur
   through a direct call to vm_page_insert(), such as that by
   vm_page_rename().
 - Modify contigmalloc1() to hold the mutex synchronizing access to the
   free pages queue while it scans vm_page_array in search of free pages.
 - Correct a potential leak of pages by contigmalloc1() that I introduced
   in revision 1.20: We must convert all cache queue pages to free pages
   before we begin removing free pages from the free queue.  Otherwise,
   if we have to restart the scan because we are unable to acquire the
   vm object lock that is necessary to convert a cache queue page to a
   free page, we leak those free pages already removed from the free queue.
2004-01-08 20:48:26 +00:00
alc
90dc293a41 Don't bother clearing PG_ZERO in contigmalloc1(), kmem_alloc(), or
kmem_malloc().  It serves no purpose.
2004-01-06 20:52:55 +00:00
alc
8c0190db49 Simplify the various pager allocation routines by computing the desired
object size once and assigning that value to a local variable.
2004-01-04 20:55:15 +00:00
alc
010f82e132 Eliminate the acquisition and release of Giant from vnode_pager_alloc().
The vm object and vnode locking should suffice.

Discussed with:	jeff
2004-01-04 03:18:24 +00:00
alc
1379475c1e Reduce the scope of Giant in swap_pager_alloc(). 2004-01-03 20:02:17 +00:00
alc
343d2552f5 Revision 1.74 of vm_meter.c ("Avoid lock-order reversal") makes the release
and subsequent reacquisition of the same vm object lock in
vm_object_collapse() unnecessary.
2004-01-02 19:57:45 +00:00
alc
fca40d687b Avoid lock-order reversal between the vm object list mutex and the vm
object mutex.
2004-01-02 19:38:25 +00:00
alc
2548f7af85 - Increase the scope of the kmem_object's lock in kmem_malloc(). Add a
comment explaining why a further increase is not possible.
2004-01-01 19:48:56 +00:00
alc
6d0a5af017 In vm_page_lookup() check the root of the vm object's splay tree for the
desired page before calling vm_page_splay().
2003-12-31 19:02:01 +00:00
alc
e11aa2c75c Simplify vm_page_grab(): Don't bother with the generation check. If the
vm object hasn't changed, the desired page will be at or near the root
of the vm object's splay tree, making vm_page_lookup() cheap.  (The only
lock required for vm_page_lookup() is already held.)  If, however, the
vm object has changed and retry was requested, eliminating the generation
check also eliminates a pointless acquisition and release of the page
queues lock.
2003-12-31 01:44:45 +00:00
alc
8218d18537 - Modify vm_object_split() to expect a locked vm object on entry and
return on a locked vm object on exit.  Remove GIANT_REQUIRED.
 - Eliminate some unnecessary local variables from vm_object_split().
2003-12-30 22:28:36 +00:00
alc
df452634a0 Remove swap_pager_un_object_list; it is unused. 2003-12-29 04:21:44 +00:00
alc
1ffde0dbd9 Remove GIANT_REQUIRED from kmem_suballoc(). 2003-12-28 00:10:48 +00:00
alc
5eb32a5d39 - Reduce Giant's scope in vm_fault().
- Use vm_object_reference_locked() instead of vm_object_reference()
   in vm_fault().
2003-12-26 23:33:37 +00:00
alc
2ce64279e5 Minor correction to revision 1.258: Use the proc pointer that is passed to
vm_map_growstack() in the RLIMIT_VMEM check rather than curthread.
2003-12-26 21:54:45 +00:00
alc
dbc67551d3 - Create an unmapped guard page to trap access to vm_page_array[-1].
This guard page would have trapped the problems with the MFC of the PAE
   support to RELENG_4 at an earlier point in the sequence of events.

Submitted by:	tegge
2003-12-22 02:04:08 +00:00
alc
eae1da31ea - Significantly reduce the number of preallocated pv entries in
pmap_init().  Such a large preallocation is unnecessary and wastes
   nearly eight megabytes of kernel virtual address space per gigabyte
   of managed physical memory.
 - Increase UMA_BOOT_PAGES by two.  This enables the removal of
   pmap_pv_allocf().  (Note: this function was only used during
   initialization, specifically, after pmap_init() but before
   pmap_init2().  During pmap_init2(), a new allocator is installed.)
2003-12-22 01:01:32 +00:00
alc
a0a304d068 - Correct an error in mincore(2) that has existed since its introduction:
mincore(2) should check that the page is valid, not just allocated.
   Otherwise, it can return a false positive for a page that is not yet
   resident because it is being read from disk.
2003-12-21 06:03:40 +00:00
kan
ff9eb2d32f Remove trailing whitespace. 2003-12-08 02:45:45 +00:00
alc
672d48f582 Addendum to revision 1.174: In the case where vm_pager_allocate() is called
to create a vnode-backed object, the vnode lock must be held by the caller.

Reported by:	truckman
Discussed with:	kan
2003-12-08 00:47:33 +00:00
alc
3b8e185f65 Fix a deadlock between vm_fault() and vm_mmap(): The expected lock ordering
between vm_map and vnode locks is that vm_map locks are acquired first.  In
revision 1.150 mmap(2) was changed to pass a locked vnode into vm_mmap().
This creates a lock-order reversal when vm_mmap() calls one of the vm_map
routines that acquires a vm_map lock.  The solution implemented herein is
to release the vnode lock in mmap() before calling vm_mmap() and reacquire
this lock if necessary in vm_mmap().

Approved by:	re (scottl)
Reviewed by:	jeff, kan, rwatson
2003-12-06 05:45:32 +00:00
jhb
4b61439e79 Fix all users of mp_maxid to use the same semantics, namely:
1) mp_maxid is a valid FreeBSD CPU ID in the range 0 .. MAXCPU - 1.
2) For all active CPUs in the system, PCPU_GET(cpuid) <= mp_maxid.

Approved by:	re (scottl)
Tested on:	i386, amd64, alpha
2003-12-03 14:57:26 +00:00
jeff
d26b674d39 - Unbreak UP. mp_maxid is not defined on uni-processor machines, although
I believe it and the other MP variables should be.  For now, just define it
   here and wait for jhb to clean it up later.

Approved by:	re (rwatson)
2003-11-30 22:18:14 +00:00
jeff
80dcf38c3a - Replace the local maxcpu with mp_maxid. Previously, if mp_maxid
was equal to MAXCPU, we would overrun the pcpu_mtx array because maxcpu
   was calculated incorrectly.
 - Add some more debugging code so that memory leaks at the time of
   uma_zdestroy() are more easily diagnosed.

Approved by:	re (rwatson)
2003-11-30 08:04:01 +00:00
alc
e054e0d248 - Avoid a lock-order reversal between Giant and a system map mutex that
occurs when kmem_malloc() fails to allocate a sufficient number of vm
   pages.  Specifically, we avoid the lock-order reversal by not grabbing
   Giant around pmap_remove() if the map is the kmem_map.

Approved by:	re (jhb)
Reported by:	Eugene <eugene3@web.de>
2003-11-19 18:48:45 +00:00
tjr
081986cbde In vnode_pager_input_smlfs(), call VOP_STRATEGY instead of VOP_SPECSTRATEGY
on non-VCHR vnodes. This fixes a panic when reading data from files on a
filesystem with a small (less than a page) block size.

PR:		59271
Reviewed by:	alc
2003-11-15 09:54:11 +00:00
alc
48c9756047 - Remove use of Giant from uma_zone_set_obj(). 2003-11-14 17:49:07 +00:00
alc
9b4ee6c4dd - Remove long dead code. 2003-11-14 08:22:38 +00:00
alc
58630d7148 Changes to msync(2)
- Return EBUSY if the region was wired by mlock(2) and MS_INVALIDATE
   is specified to msync(2).  This is required by the Open Group Base
   Specifications Issue 6.
 - vm_map_sync() doesn't return KERN_FAILURE.  Thus, msync(2) can't
   possibly return EIO.
 - The second major loop in vm_map_sync() handles sub maps.  Thus,
   failing on sub maps in the first major loop isn't necessary.
2003-11-14 06:55:11 +00:00
alc
fa4ea5d2f2 - The Open Group Base Specifications Issue 6 specifies that an munmap(2)
must return EINVAL if size is zero.  Submitted by: tegge
 - In order to avoid a race condition in multithreaded applications, the
   check and removal operations by munmap(2) must be in the same critical
   section.  To accomodate this, vm_map_check_protection() is modified to
   require its caller to obtain at least a read lock on the map.
2003-11-10 01:37:40 +00:00
mini
918610ef5e NFC: Update stale comments.
Reviewed by:	alc
2003-11-10 00:44:00 +00:00
alc
b2bc11d840 - Remove Giant from msync(2). Giant is still acquired by the lower layers
if we drop into the pmap or vnode layers.
 - Migrate the handling of zero-length msync(2)s into vm_map_sync() so that
   multithread applications can't change the map between implementing the
   zero-length hack in msync(2) and reacquiring the map lock in
   vm_map_sync().

Reviewed by:	tegge
2003-11-09 22:09:04 +00:00
alc
269cf5aa09 - Rename vm_map_clean() to vm_map_sync(). This better reflects the fact
that msync(2) is its only caller.
 - Migrate the parts of the old vm_map_clean() that examined the internals
   of a vm object to a new function vm_object_sync() that is implemented in
   vm_object.c.  At the same, introduce the necessary vm object locking so
   that vm_map_sync() and vm_object_sync() can be called without Giant.

Reviewed by:	tegge
2003-11-09 05:25:35 +00:00
alc
883d4c8c44 - Move the implementation of OBJ_ONEMAPPING from vm_map_delete() to
vm_map_entry_delete() so that all of the vm object manipulation is
   performed in one place.
2003-11-05 05:48:22 +00:00
marcel
f8f0614f00 Update avail_ssize for rstacks after growing them. 2003-11-04 06:48:58 +00:00
des
3120373a25 Whitespace cleanup. 2003-11-03 16:14:45 +00:00
alc
1903f6228a - Increase the scope of the source object lock in vm_map_copy_entry(). 2003-11-03 00:59:54 +00:00
alc
f2e8aed3e6 - Increase the scope of two vm object locks in vm_object_split(). 2003-11-02 22:52:42 +00:00
alc
28e8cd183d - Introduce and use vm_object_reference_locked(). Unlike
vm_object_reference(), this function must not be used to reanimate dead
   vm objects.  This restriction simplifies locking.

Reviewed by:	tegge
2003-11-02 21:30:10 +00:00
alc
75da97558f - Increase the scope of two vm object locks in vm_object_collapse().
- Remove the acquisition and release of Giant from vm_object_coalesce().
2003-11-01 23:06:41 +00:00
alc
716130a6f9 - Modify swap_pager_copy() and its callers such that the source and
destination objects are locked on entry and exit.  Add comments to
   the callers noting that the locks can be released by swap_pager_copy().
 - Remove several instances of GIANT_REQUIRED.
2003-11-01 08:57:26 +00:00
alc
8ded4dfb69 - Additional vm object locking in vm_object_split()
- New vm object locking assertions in vm_page_insert() and
   vm_object_set_writeable_dirty()
2003-11-01 04:54:23 +00:00
alc
ab63139b09 - Revert a part of revision 1.73: Make vm_object_set_flag() an inline
function.  This function is so trivial that inlining reduces the size
   of the kernel.
2003-10-31 20:17:00 +00:00
alc
be546fdee4 - Take advantage of the swap pager locking: Eliminate the use of Giant
from vm_object_madvise().
 - Remove excessive blank lines from vm_object_madvise().
2003-10-31 18:32:03 +00:00
marcel
f48c12c462 Fix two bugs introduced with the rstack functionality and specific to
the rstack functionality:
1. Fix a KASSERT that tests for the address to be above the upward
   growable stack. Typically for rstack, the faulting address can be
   identical to the record end of the upward growable entry, and
   very likely is on ia64. The KASSERT tested for greater than, not
   greater equal, so whenever the register stack had to be grown
   the assertion fired.
2. When we grow the upward growable stack entry and adjust the
   unlying object, don't forget to adjust the size of the VM map.
   Not doing so would trigger an assert in vm_mapzdtor().

Pointy hat: marcel (for not testing with INVARIANTS).
2003-10-31 07:29:28 +00:00
alc
2a6ec0ca67 - Synchronize access to the swdevt's sw_flags with sw_dev_mtx.
- Remove several instances of GIANT_REQUIRED.
2003-10-31 05:18:45 +00:00
alc
bbdba26328 - Synchronize access to the swdevt's sw_blist with sw_dev_mtx.
- Remove several instances of GIANT_REQUIRED.
2003-10-30 09:12:43 +00:00
alc
28c9cd809b - Synchronize access to swdevhd using sw_dev_mtx.
- Use swp_sizecheck() rather than assignment to swap_pager_full in
   swaponsomething().
2003-10-30 07:11:06 +00:00
alc
e273855447 - Synchronize updates to nswapdev using sw_dev_mtx. 2003-10-29 07:51:41 +00:00
alc
4307e55d6c - Avoid a race in swaponsomething(): Calculate the new swdevt's first and
end swblk and insert this new swdevt into the list of swap devices
   in the same critical section.
2003-10-29 05:42:28 +00:00
alc
f42a987e4e - Complete the synchronization of accesses to the swblock hash table. 2003-10-27 05:58:15 +00:00
alc
9d9dedf30b - Introduce and use a mutex synchronizing access to the swblock hash table. 2003-10-26 19:55:35 +00:00
alc
dd1d2bd790 - Simplify vm_object_collapse()'s collapse case, reducing the number
of lock acquires and releases performed.
 - Move an assertion from vm_object_collapse() to vm_object_zdtor()
   because it applies to all cases of object destruction.
2003-10-26 06:29:26 +00:00
alc
23a376f078 - Add some of the required vm object locking, including assertions where
the vm object lock is required and already held.
2003-10-25 23:42:17 +00:00
alc
92470025a7 - Align a comment within struct vm_page.
- Annotate the vm_page's valid field as synchronized by the containing
   vm object's lock.
2003-10-25 18:33:04 +00:00
alc
e444c5c4e4 - Call vnode_pager_input_old() with the vm object locked. 2003-10-25 05:21:16 +00:00
alc
a71ff79234 - Push down Giant from vm_pageout() to vm_pageout_scan(), freeing
vm_pageout_page_stats() from Giant.
 - Modify vm_pager_put_pages() and vm_pager_page_unswapped() to expect the
   vm object to be locked on entry.  (All of the pager routines now expect
   this.)
2003-10-24 06:43:04 +00:00
alc
794553172b - Retire vm_pageout_page_free(). Instead, use vm_page_select_cache() from
vm_pageout_scan().  Rationale: I don't like leaving a busy page in the
   cache queue with neither the vm object nor the vm page queues lock held.
 - Assert that the page is active in vm_pageout_page_stats().
2003-10-22 18:41:32 +00:00
alc
8382cb8835 - Assert that every page found in the active queue is an active page. 2003-10-22 03:08:24 +00:00
alc
35743f84b8 - Assert that the containing vm object is locked in
vm_page_set_validclean().  (This function reads and modifies the
   vm page's valid field, which is synchronized by the lock on the
   containing vm object.)
2003-10-21 19:36:51 +00:00
alc
eecac55b7d - Remove some long unused code. 2003-10-20 18:57:01 +00:00
alc
d681abd0f8 - Remove comments referring to functions that no longer exist. 2003-10-20 05:16:27 +00:00
alc
9167e4667d - Hold the vm object's lock around calls to vm_page_set_validclean(). 2003-10-20 04:05:24 +00:00
alc
cc2c1485cf - Synchronize access to a vm page's valid field using the containing
vm object's lock.
 - Reduce the scope of the vm page queues lock in two places.
2003-10-19 00:01:56 +00:00
alc
fb13ea0aa9 - Synchronize access to the page's valid field in
vnode_pager_generic_getpages() using the containing object's lock.
2003-10-18 21:30:29 +00:00
alc
bccf1d15ab - Increase the object lock's scope in vm_contig_launder() so that access
to the object's type field and the call to vm_pageout_flush() are
   synchronized.
 - The above change allows for the eliminaton of the last parameter
   to vm_pageout_flush().
 - Synchronize access to the page's valid field in vm_pageout_flush()
   using the containing object's lock.
2003-10-18 21:09:21 +00:00
alc
d9e9583e64 Corrections to revision 1.305
- Specifying VM_MAP_WIRE_HOLESOK should not assume that the start
   address is the beginning of the map.  Instead, move to the first
   entry after the start address.
 - The implementation of VM_MAP_WIRE_HOLESOK was incomplete.  This
   caused the failure of mlockall(2) in some circumstances.
2003-10-18 18:48:17 +00:00
phk
4c2cb3f397 DuH!
bp->b_iooffset (the spot on the disk), not bp->b_offset (the offset in
the file)
2003-10-18 14:10:28 +00:00
phk
902030340d Initialize bp->b_offset before calling VOP_[SPEC]STRATEGY().
Remove stale comment about B_PHYS.
2003-10-18 11:11:05 +00:00
alc
29c05d8a9b - Synchronize access to a vm page's valid field using the containing
vm object's lock.
 - Release the vm object and vm page queues locks around vput().
2003-10-17 05:07:17 +00:00
alc
b722f9a630 - vm_fault_copy_entry() should not assume that the source object contains
every page.  If the source entry was read-only, one or more wired pages
   could be in backing objects.
 - vm_fault_copy_entry() should not set the PG_WRITEABLE flag on the page
   unless the destination entry is, in fact, writeable.
2003-10-15 08:00:45 +00:00
alc
352d9382c0 Lock the destination object in vm_fault_copy_entry(). 2003-10-08 07:11:19 +00:00
alc
76f6c3b059 Retire vm_page_copy(). Its reason for being ended when peter@ modified
pmap_copy_page() et al. to accept a vm_page_t rather than a physical
address.  Also, this change will facilitate locking access to the vm page's
valid field.
2003-10-08 05:35:12 +00:00
bms
0ff257ed72 Only the super-user should be able to wire pages via the mlock() family
of system calls at this time.  Remove various #ifdef's to enforce this.
2003-10-06 01:59:04 +00:00
bms
d8d01a1fa7 Move pmap_resident_count() from the MD pmap.h to the MI pmap.h.
Add a definition of pmap_wired_count().
Add a definition of vmspace_wired_count().

Reviewed by:	truckman
Discussed with:	peter
2003-10-06 01:47:12 +00:00
alc
ca1d7e62dd The addition of a locking assertion to vm_page_zero_invalid() has revealed
a long-time bug: vm_pager_get_pages() assumes that m[reqpage] contains a
valid page upon return from pgo_getpages().  In the case of the device
pager this page has been freed and replaced by a fake page.  The fake page
is properly inserted into the vm object but m[reqpage] is left pointing
to a freed page.  For now, update m[reqpage] to point to the fake page.

Submitted by:	tegge
2003-10-05 22:23:44 +00:00
bms
3a1ffe0880 Revert previous commit. Come back vslock(), all is forgiven.
Pointy hat to:	bms
2003-10-05 12:41:08 +00:00
bms
6cbb5c3a4f Retire vslock() and vsunlock() with extreme prejudice.
Discussed with:	pete
2003-10-05 09:47:54 +00:00
alc
57538c344f Assert that the containing vm object's lock is held in
vm_page_set_invalid().
2003-10-05 06:58:07 +00:00
alc
632febaf7e Assert that the containing vm object's lock is held in
vm_page_zero_invalid().
2003-10-04 21:56:27 +00:00
alc
3272fbe303 Synchronize access to a vm page's valid field using the containing
vm object's lock.
2003-10-04 21:35:48 +00:00
alc
7c11eaebac - Extend the scope the vm object lock to cover calls to
vm_page_is_valid().
 - Assert that the lock on the containing vm object is held in
   vm_page_is_valid().
2003-10-04 19:23:29 +00:00
alc
fa0f25a359 Synchronize access to a vm page's valid field using the containing
vm object's lock.
2003-10-04 19:13:27 +00:00
jeff
25821dd99f - Use the UMA_ZONE_VM flag on the fakepg and object zones to prevent
vm recursion and LORs.  This may be necessary for other zones created in
   the vm but this needs to be verified.
2003-10-04 14:21:53 +00:00
alc
b1691aebe4 Migrate pmap_prefault() into the machine-independent virtual memory layer.
A small helper function pmap_is_prefaultable() is added.  This function
encapsulate the few lines of pmap_prefault() that actually vary from
machine to machine.  Note: pmap_is_prefaultable() and pmap_mincore() have
much in common.  Going forward, it's worth considering their merger.
2003-10-03 22:46:53 +00:00
alc
8b797c473e In vm_page_remove(), assert that the vm object is locked, unless an Alpha.
(The Alpha still requires updates to its pmap.)
2003-09-28 04:50:48 +00:00
marcel
d75cf98307 Part 2 of implementing rstacks: add the ability to create rstacks and
use the ability on ia64 to map the register stack. The orientation of
the stack (i.e. its grow direction) is passed to vm_map_stack() in the
overloaded cow argument. Since the grow direction is represented by
bits, it is possible and allowed to create bi-directional stacks.
This is not an advertised feature, more of a side-effect.

Fix a bug in vm_map_growstack() that's specific to rstacks and which
we could only find by having the ability to create rstacks: when
the mapped stack ends at the faulting address, we have not actually
mapped the faulting address. we need to include or cover the faulting
address.

Note that at this time mmap(2) has not been extended to allow the
creation of rstacks by processes. If such a need arises, this can
be done.

Tested on: alpha, i386, ia64, sparc64
2003-09-27 22:28:14 +00:00
phk
6a0cf06f6a Provide a bit more help with "memory overwritten after free" style bugs. 2003-09-27 21:33:13 +00:00
peter
8ecb3577d8 Add sysentvec->sv_fixlimits() hook so that we can catch cases on 64 bit
systems where the data/stack/etc limits are too big for a 32 bit process.

Move the 5 or so identical instances of ELF_RTLD_ADDR() into imgact_elf.c.

Supply an ia32_fixlimits function.  Export the clip/default values to
sysctl under the compat.ia32 heirarchy.

Have mmap(0, ...) respect the current p->p_limits[RLIMIT_DATA].rlim_max
value rather than the sysctl tweakable variable.  This allows mmap to
place mappings at sensible locations when limits have been reduced.

Have the imgact_elf.c ld-elf.so.1 placement algorithm use the same
method as mmap(0, ...) now does.

Note that we cannot remove all references to the sysctl tweakable
maxdsiz etc variables because /etc/login.conf specifies a datasize
of 'unlimited'.  And that causes exec etc to fail since it can no
longer find space to mmap things.
2003-09-25 01:10:26 +00:00
silby
872a8ddb11 Adjust the kmapentzone limit so that it takes into account the size of
maxproc and maxfiles, as procs, pipes, and other structures cause allocations
from kmapentzone.

Submitted by:	tegge
2003-09-23 18:56:54 +00:00
alc
1adbd0035e Change the handling of the kernel and kmem objects in vm_map_delete(): In
order to use "unmanaged" pages in the kmem object, vm_map_delete() must
unconditionally perform pmap_remove().  Otherwise, sparc64 has problems.

Tested by:	jake
2003-09-23 04:28:04 +00:00
alc
601ceb70ef Initialize the page's pindex field even for VM_ALLOC_NOOBJ allocations.
(This field is useful for implementing sanity checks even if the page does
not belong to an object.)
2003-09-22 00:56:13 +00:00
jeff
9982722580 - Fix MD_SMALL_ALLOC on architectures that support it. Define a new alloc
function, startup_alloc(), that is used for single page allocations prior
   to the VM starting up.  If it is used after the VM startups up, it
   replaces the zone's allocf pointer with either page_alloc() or
   uma_small_alloc() where appropriate.

Pointy hat to:	me
Tested by:	phk/amd64, me/x86
2003-09-21 07:39:16 +00:00
peter
bfb0c45b8f Bad Jeffr! No cookie!
Temporarily disable the UMA_MD_SMALL_ALLOC stuff since recent commits
break sparc64, amd64, ia64 and alpha.  It appears only i386 and maybe
powerpc were not broken.
2003-09-20 23:35:33 +00:00
jeff
accdfbd626 - Remove the working-set algorithm. Instead, use the per cpu buckets as the
working set cache.  This has several advantages.  Firstly, we never touch
   the per cpu queues now in the timeout handler.  This removes one more
   reason for having per cpu locks.  Secondly, it reduces the size of the zone
   by 8 bytes, bringing it under 200 bytes for a single proc x86 box.  This
   tidies up other logic as well.
 - The 'destroy' flag no longer needs to be passed to zone_drain() since it
   always frees everything in the zone's slabs.
 - cache_drain() is now only called from zone_dtor() and so it destroys by
   default.  It also does not need the destroy parameter now.
2003-09-19 23:27:46 +00:00
jeff
a234ab2fa7 - Remove the cache colorization code. We can't use it due to all of the
broken consumers of the malloc interface who assume that the allocated
   address will be an even multiple of the size.
 - Remove disabled time delay code on uma_reclaim().  The comment there said
   it all.  It was not an effective strategy and it should not be left in
   #if 0'd for all eternity.
2003-09-19 23:04:44 +00:00
jeff
1abaac476b - There are an endless stream of style(9) errors in this file. Fix a few.
Also catch some spelling errors.
2003-09-19 22:31:45 +00:00
jeff
82c0b53020 - Don't inspect the zone in page_alloc(). It may be NULL.
- Don't cache more items than the zone would like in uma_zalloc_bucket().
2003-09-19 09:22:04 +00:00
jeff
2d4c121a6d - Move the logic for dealing with the uma_boot_pages cache into the
page_alloc() function from the slab_zalloc() function.  This allows us
   to unconditionally call uz_allocf().
 - In page_alloc() cleanup the boot_pages logic some.  Previously memory from
   this cache that was not used by the time the system started was left in
   the cache and never used.  Typically this wasn't more than a few pages,
   but now we will use this cache so long as memory is available.
2003-09-19 08:53:33 +00:00
jeff
30f275bb51 - Fix the silly flag situation in UMA. Remove redundant ZFLAG/ZONE flags
by accepting the user supplied flags directly.  Previously this was not
   done so that flags for the same field would not be defined in two
   different files.  Add comments in each header instructing future
   developers on how now to shoot their feet.
 - Fix a test for !OFFPAGE which should have been a test for HASH.  This would
   have caused a panic if we had ever destructed a malloc zone.  This also
   opens up the possibility that other zones could use the vsetobj() method
   rather than a hash.
2003-09-19 08:37:44 +00:00
jeff
620ea1ef61 - Don't abuse M_DEVBUF, define a tag for UMA hashes. 2003-09-19 07:23:50 +00:00
jeff
b6dd0c8bfb - Eliminate a pair of unnecessary variables. 2003-09-19 06:41:06 +00:00
jeff
b8696d32c3 - Initialize a pool of bucket zones so that we waste less space on zones that
don't cache as many items.
 - Introduce the bucket_alloc(), bucket_free() functions to wrap bucket
   allocation.  These functions select the appropriate bucket zone to
   allocate from or free to.
 - Rename ub_ptr to ub_cnt to reflect a change in its use.  ub_cnt now reflects
   the count of free items in the bucket.  This gets rid of many unnatural
   subtractions by 1 throughout the code.
 - Add ub_entries which reflects the number of entries possibly held in a
   bucket.
2003-09-19 06:26:45 +00:00
alc
757174bbed Merge vm_pageout_free_page_calc() into vm_pageout(), eliminating some
unneeded code.
2003-09-19 05:03:45 +00:00
alc
1644dd5fce Add vm object locking to vnode_pager_lock(). (This triggers the movement
of a VM_OBJECT_LOCK() in vm_fault().)
2003-09-18 02:26:03 +00:00
alc
1283b3d480 Remove GIANT_REQUIRED from vm_object_shadow(). 2003-09-17 07:00:14 +00:00
alc
eb3ecef7c8 When calling vget() on a vnode-backed vm object, acquire the vnode
interlock before releasing the vm object's lock.
2003-09-17 06:55:42 +00:00
alc
38e47abc4a Eliminate the use of Giant from vm_object_reference(). 2003-09-15 05:58:27 +00:00
alc
5b00bd3787 Call vm_page_unmanage() on pages belonging to the kmem_object. This
eliminates the unnecessary overhead of managing "PV" entries for these
pages.
2003-09-14 02:37:59 +00:00
alc
b182e8a6eb There is no need for an atomic increment on the vm object's generation
count in _vm_object_allocate().  (Access to the generation count is
governed by the vm object's lock.)  Note: the introduction of the
atomic increment in revision 1.238 appears to be an accident.  The
purpose of that commit was to fix an Alpha-specific bug in UMA's
debugging code.
2003-09-13 20:07:26 +00:00
alc
76fcb264a0 Add a new parameter to pmap_extract_and_hold() that is needed to eliminate
Giant from vmapbuf().

Idea from:	tegge
2003-09-12 07:07:49 +00:00
alc
a81d9ad0b9 Introduce a new pmap function, pmap_extract_and_hold(). This function
atomically extracts and holds the physical page that is associated with the
given pmap and virtual address.  Such a function is needed to make the
memory mapping optimizations used by, for example, pipes and raw disk I/O
MP-safe.

Reviewed by:	tegge
2003-09-08 02:45:03 +00:00
alc
1cb490a309 Revise the locking in mincore(2). 2003-09-07 18:47:54 +00:00
phk
b472eed434 Don't open with exclusive bit, swapon(8) wants to trash our swapdev.
Add XXX comment with a rating of this concept.
2003-09-02 05:53:44 +00:00
eivind
dbb76f12a3 Change clean_map from a global to an auto variable 2003-09-01 16:46:47 +00:00
alc
bb52206dba - Add vm object locking to the part of vm_pageout_scan() that launders
dirty pages.
 - Remove some unused variables.
2003-08-31 00:00:46 +00:00
marcel
8ce42b9bdd Introduce MAP_ENTRY_GROWS_DOWN and MAP_ENTRY_GROWS_UP to allow for
growable (stack) entries that not only grow down, but also grow up.
Have vm_map_growstack() take these flags into account when growing
an entry.

This is the first step in adding support for upward growable stacks.
It is a required feature on ia64 to support the register stack (or
rstack as I like to call it -- it also means reverse stack). We do
not currently create rstacks, so the upward growing is not exercised
and the change should be a functional no-op.

Reviewed by: alc
2003-08-30 21:25:23 +00:00
phk
2c584462d5 Add a close() method to a swapdev.
Add a GEOM based backend.

Remove the device/VOP_SPECSTRATEGY() based backend.
2003-08-30 16:44:26 +00:00
phk
0369559168 Protect the swapdevice tailq with a mutex.
Store the udev_t we will report to userland in the swdevt.
2003-08-30 16:10:28 +00:00
phk
8556106f5a Continue the objectification of the swapdev backends:
Remove the vnode and dev_t fields and replace them with a void *.

Introduce separate strategy functions for devices and regular (NFS)
vnodes.

For devices we don't need the vnode v_numoutput stuff.

Add a generic swaponsomething() function to add a swapdevice and
split the remainder of swaponvp() into swaponvp() and swapondev()
which calls this backend.
2003-08-30 11:33:25 +00:00
phk
d67c3c9151 Make the strategy function a method of the individual swapdev. 2003-08-30 09:42:00 +00:00
phk
1691bbc0f9 Consistent use modern function definitions 2003-08-30 08:32:42 +00:00
marcel
d35fa485d7 In vnode_pager_generic_putpages(), change the printf format specifier
to long and explicitly cast field dirty of struct vm_page to unsigned
long. When PAGE_SIZE is 32K, this field is actually unsigned long.
2003-08-29 00:16:30 +00:00
alc
de18724542 Recent pmap changes permit the use of a more precise locking assertion
in vm_page_lookup().
2003-08-28 23:23:04 +00:00
marcel
78b7eaa56b Assert that u_long is at least 64 bits if PAGE_SIZE is 32K.
Suggested by: phk
2003-08-25 19:58:01 +00:00
alc
5b4e761019 Held pages, just like wired pages, should not be added to the cache queues.
Submitted by:	tegge
2003-08-23 20:29:29 +00:00
alc
5d6f66de90 Hold the page queues lock when performing vm_page_clear_dirty() and
vm_page_set_invalid().
2003-08-23 18:11:53 +00:00
alc
06fbefe190 To implement the sequential access optimization, vm_fault() may need to
reacquire the "first" object's lock while a backing object's lock is held.
Since this is a lock-order reversal, vm_fault() uses trylock to acquire
the first object's lock, skipping the sequential access optimization in
the unlikely event that the trylock fails.
2003-08-23 06:52:32 +00:00
marcel
edbda376e1 Also define VM_PAGE_BITS_ALL for 16K and 32K pages. Make the constant
unsigned for all page sizes and unsigned long for 32K pages.
2003-08-23 06:30:47 +00:00
marcel
0dfb8cd786 Add support for 16K and 32K page sizes. The valid and dirty maps
in struct vm_page are defined as u_int for 16K pages and u_long
for 32K pages, with the implied assumption that long will at least
be 64 bits wide on platforms where we support 32K pages.
2003-08-23 06:24:00 +00:00
alc
9e89497b7d Assert that the vm object's lock is held on entry to vm_page_grab(); remove
code from this function that was needed when vm object locking was
incomplete.
2003-08-21 20:59:07 +00:00
alc
f8ecd895b9 Assert that the vm object lock is held in vm_page_alloc(). 2003-08-20 20:24:29 +00:00
bmilekic
f0a28c0844 In sysctl_vm_zone, do not calculate per-cpu cache stats on
UMA_ZFLAG_INTERNAL zones at all.  Apparently, Wilko's alpha
was crashing while entering multi-user because, I think, we
were calculating the garbage cachefree for pcpu caches that
essentially don't exist for at least the 'zones' zone and it so
happened that we were reading from an unmapped location.

Confirmed to fix crash: wilko
Helped debug: wilko, gallatin
2003-08-20 18:22:06 +00:00