Commit Graph

2317 Commits

Author SHA1 Message Date
pjd
a4513e9da8 When KVA is exhausted, try the vm_lowmem event for the last time before
panicing. This helps a lot in ZFS stability.
2007-04-05 20:52:51 +00:00
pjd
b6f1c3fccc Fix a problem for file systems that don't implement VOP_BMAP() operation.
The problem is this: vm_fault_additional_pages() calls vm_pager_has_page(),
which calls vnode_pager_haspage(). Now when VOP_BMAP() returns an error (eg.
EOPNOTSUPP), vnode_pager_haspage() returns TRUE without initializing 'before'
and 'after' arguments, so we have some accidental values there. This bascially
was causing this condition to be meet:

	if ((rahead + rbehind) >
	    ((cnt.v_free_count + cnt.v_cache_count) - cnt.v_free_reserved)) {
		pagedaemon_wakeup();
		[...]
	}

(we have some random values in rahead and rbehind variables)

I'm not entirely sure this is the right fix, maybe we should just return FALSE
in vnode_pager_haspage() when VOP_BMAP() fails?

alc@ knows about this problem, maybe he will be able to come up with a better
fix if this is not the right one.
2007-04-05 20:49:46 +00:00
alc
ecbefa2cc5 Prevent a race between vm_object_collapse() and vm_object_split() from
causing a crash.

Suppose that we have two objects, obj and backing_obj, where
backing_obj is obj's backing object.  Further, suppose that
backing_obj has a reference count of two.  One being the reference
held by obj and the other by a map entry.  Now, suppose that the map
entry is deallocated and its reference removed by
vm_object_deallocate().  vm_object_deallocate() recognizes that the
only remaining reference is from a shadow object, obj, and calls
vm_object_collapse() on obj.  vm_object_collapse() executes

                if (backing_object->ref_count == 1) {
                        /*
                         * If there is exactly one reference to the backing
                         * object, we can collapse it into the parent.
                         */
                        vm_object_backing_scan(object, OBSC_COLLAPSE_WAIT);

vm_object_backing_scan(OBSC_COLLAPSE_WAIT) executes

        if (op & OBSC_COLLAPSE_WAIT) {
                vm_object_set_flag(backing_object, OBJ_DEAD);
        }

Finally, suppose that either vm_object_backing_scan() or
vm_object_collapse() sleeps releasing its locks.  At this instant,
another thread executes vm_object_split().  It crashes in
vm_object_reference_locked() on the assertion that the object is not
dead.  If, however, assertions are not enabled, it crashes much later,
after the object has been recycled, in vm_object_deallocate() because
the shadow count and shadow list are inconsistent.

Reviewed by: tegge
Reported by: jhb
MFC after: 1 week
2007-03-27 08:55:17 +00:00
alc
989c09dfd5 Two small changes to vm_map_pmap_enter():
1) Eliminate an unnecessary check for fictitious pages.  Specifically,
only device-backed objects contain fictitious pages and the object is
not device-backed.

2) Change the types of "psize" and "tmpidx" to vm_pindex_t in order to
prevent possible wrap around with extremely large maps and objects,
respectively.  Observed by: tegge (last summer)
2007-03-25 19:33:40 +00:00
alc
584a970755 vm_page_busy() no longer requires the page queues lock to be held. Reduce
the scope of the page queues lock in vm_fault() accordingly.
2007-03-23 06:11:25 +00:00
alc
efc8daaecb Change the order of lock reacquisition in vm_object_split() in order to
simplify the code slightly.  Add a comment concerning lock ordering.
2007-03-22 07:02:43 +00:00
alc
2210798637 Use PCPU_LAZY_INC() to update page fault statistics. 2007-03-05 18:55:14 +00:00
jhb
54e4ea54b6 Use pause() in vm_object_deallocate() to yield the CPU to the lock holder
rather than a tsleep() on &proc0.  The only wakeup on &proc0 is intended
to awaken the swapper, not random threads blocked in
vm_object_deallocate().
2007-02-27 19:40:26 +00:00
jhb
9081d44243 Use pause() rather than tsleep() on stack variables and function pointers. 2007-02-27 17:23:29 +00:00
alc
573a964db6 Change the way that unmanaged pages are created. Specifically,
immediately flag any page that is allocated to a OBJT_PHYS object as
unmanaged in vm_page_alloc() rather than waiting for a later call to
vm_page_unmanage().  This allows for the elimination of some uses of
the page queues lock.

Change the type of the kernel and kmem objects from OBJT_DEFAULT to
OBJT_PHYS.  This allows us to take advantage of the above change to
simplify the allocation of unmanaged pages in kmem_alloc() and
kmem_malloc().

Remove vm_page_unmanage().  It is no longer used.
2007-02-25 06:14:58 +00:00
alc
e4e74de1c2 Change the page's CLEANCHK flag from being a page queue mutex synchronized
flag to a vm object mutex synchronized flag.
2007-02-22 06:15:52 +00:00
alc
c0ed1b65cf Enable vm_page_free() and vm_page_free_zero() to be called on some pages
without the page queues lock being held, specifically, pages that are not
contained in a vm object and not a member of a page queue.
2007-02-18 05:54:42 +00:00
alc
7913035c29 Remove a stale comment. Add punctuation to a nearby comment. 2007-02-17 19:37:00 +00:00
alc
3e7d1b7ebd Relax the page queue lock assertions in vm_page_remove() and
vm_page_free_toq() to account for recent changes that allow
vm_page_free_toq() to be called on some pages without the page queues lock
being held, specifically, pages that are not contained in a vm object and
not a member of a page queue.  (Examples of such pages include page table
pages, pv entry pages, and uma small alloc pages.)
2007-02-15 05:43:38 +00:00
alc
0250171bf6 Avoid the unnecessary acquisition of the free page queues lock when a page
is actually being added to the hold queue, not the free queue.  At the same
time, avoid unnecessary tests to wake up threads waiting for free memory
and the idle thread that zeroes free pages.  (These tests will be performed
later when the page finally moves from the hold queue to the free queue.)
2007-02-14 07:05:55 +00:00
rwatson
b9f6b60a84 Add uma_set_align() interface, which will be called at most once during
boot by MD code to indicated detected alignment preference.  Rather than
cache alignment being encoded in UMA consumers by defining a global
alignment value of (16 - 1) in UMA_ALIGN_CACHE, UMA_ALIGN_CACHE is now
a special value (-1) that causes UMA to look at registered alignment.  If
no preferred alignment has been selected by MD code, a default alignment
of (16 - 1) will be used.

Currently, no hardware platforms specify alignment; architecture
maintainers will need to modify MD startup code to specify an alignment
if desired.  This must occur before initialization of UMA so that all UMA
zones pick up the requested alignment.

Reviewed by:	jeff, alc
Submitted by:	attilio
2007-02-11 20:13:52 +00:00
alc
909177e227 Use the free page queue mutex instead of the page queue mutex to
synchronize sleeping and waking of the zero idle thread.
2007-02-11 05:18:40 +00:00
jhb
9c764c7fc3 - Move 'struct swdevt' back into swap_pager.h and expose it to userland.
- Restore support for fetching swap information from crash dumps via
  kvm_get_swapinfo(3) to fix pstat -T/-s on crash dumps.

Reviewed by:	arch@, phk
MFC after:	1 week
2007-02-07 17:43:11 +00:00
alc
2eb15b506b Change the pagedaemon, vm_wait(), and vm_waitpfault() to sleep on the
vm page queue free mutex instead of the vm page queue mutex.
2007-02-07 06:37:30 +00:00
alc
4881bd38e2 Change the free page queue lock from a spin mutex to a default (blocking)
mutex.  With the demise of Alpha support, there is no longer a reason for
it to be a spin mutex.
2007-02-05 06:02:55 +00:00
mohans
83064ec323 Fix for problems that occur when all mbuf clusters migrate to the mbuf packet
zone. Cluster allocations fail when this happens. Also processes that may have
blocked on cluster allocations will never be woken up. Thanks to rwatson for
an overview of the issue and pointers to the mbuma paper and his tool to dump
out UMA zones.

Reviewed by: andre@
2007-01-25 01:05:23 +00:00
mohans
9799fcf93c Fix for a bug where only one process (of multiple) blocked on
maxpages on a zone is woken up, with the rest never being woken up as
a result of the ZFLAG_FULL flag being cleared. Wakeup all such blocked
procsses instead. This change introduces a thundering herd, but since
this should be relatively infrequent, optimizing this (by introducing
a count of blocked processes, for example) may be premature.

Reviewd by: ups@
2007-01-24 22:49:11 +00:00
jeff
474b917526 - Remove setrunqueue and replace it with direct calls to sched_add().
setrunqueue() was mostly empty.  The few asserts and thread state
   setting were moved to the individual schedulers.  sched_add() was
   chosen to displace it for naming consistency reasons.
 - Remove adjustrunqueue, it was 4 lines of code that was ifdef'd to be
   different on all three schedulers where it was only called in one place
   each.
 - Remove the long ifdef'd out remrunqueue code.
 - Remove the now redundant ts_state.  Inspect the thread state directly.
 - Don't set TSF_* flags from kern_switch.c, we were only doing this to
   support a feature in one scheduler.
 - Change sched_choose() to return a thread rather than a td_sched.  Also,
   rely on the schedulers to return the idlethread.  This simplifies the
   logic in choosethread().  Aside from the run queue links kern_switch.c
   mostly does not care about the contents of td_sched.

Discussed with:	julian

 - Move the idle thread loop into the per scheduler area.  ULE wants to
   do something different from the other schedulers.

Suggested by:	jhb

Tested on:	x86/amd64 sched_{4BSD, ULE, CORE}.
2007-01-23 08:46:51 +00:00
delphij
9856d14ea1 Use FOREACH_PROC_IN_SYSTEM instead of using its unrolled form. 2007-01-17 15:05:52 +00:00
rwatson
d6f9a991c6 Remove uma_zalloc_arg() hack, which coerced M_WAITOK to M_NOWAIT when
allocations were made using improper flags in interrupt context.
Replace with a simple WITNESS warning call.  This restores the
invariant that M_WAITOK allocations will always succeed or die
horribly trying, which is relied on by many UMA consumers.

MFC after:	3 weeks
Discussed with:	jhb
2007-01-10 21:04:43 +00:00
alc
57699b6f9c Declare the map entry created by kmem_init() for the range from
VM_MIN_KERNEL_ADDRESS to the end of the kernel's bootstrap data as
MAP_NOFAULT.
2007-01-07 07:32:04 +00:00
jhb
db17c1d997 - Add a new function uma_zone_exhausted() to see if a zone is full.
- Add a printf in swp_pager_meta_build() to warn if the swapzone becomes
  exhausted so that there's at least a warning before a box that runs out
  of swapzone space before running out of swap space deadlocks.

MFC after:	1 week
Reviwed by:	alc
2007-01-05 19:09:01 +00:00
alc
143ffef93b Optimize vm_object_split(). Specifically, make the number of iterations
equal to the number of physical pages that are renamed to the new object
rather than the new object's virtual size.
2006-12-17 20:14:43 +00:00
alc
71d9f62bca Simplify the computation of the new object's size in vm_object_split(). 2006-12-16 08:17:07 +00:00
kmacy
d0a44b7b7c Remove the requirement that phys_avail be sorted in ascending order
by explicitly finding the lowest and highest addresses when calculating
the size of the vm_pages array

Reviewed by :alc
2006-12-08 08:44:47 +00:00
julian
396ed947f6 Threading cleanup.. part 2 of several.
Make part of John Birrell's KSE patch permanent..
Specifically, remove:
Any reference of the ksegrp structure. This feature was
never fully utilised and made things overly complicated.
All code in the scheduler that tried to make threaded programs
fair to unthreaded programs.  Libpthread processes will already
do this to some extent and libthr processes already disable it.

Also:
Since this makes such a big change to the scheduler(s), take the opportunity
to rename some structures and elements that had to be moved anyhow.
This makes the code a lot more readable.

The ULE scheduler compiles again but I have no idea if it works.

The 4bsd scheduler still reqires a little cleaning and some functions that now do
ALMOST nothing will go away, but I thought I'd do that as a separate commit.

Tested by David Xu, and Dan Eischen using libthr and libpthread.
2006-12-06 06:34:57 +00:00
ru
01fad2a3f6 The clean_map has been made local to vm_init.c long ago. 2006-11-20 16:23:34 +00:00
ru
825e064fde Remove a redundant pointer-type variable. 2006-11-20 08:33:55 +00:00
ru
e58221ee6f When counting vm totals, skip unreferenced objects, including
vnodes representing mounted file systems.

Reviewed by:	alc
MFC after:	3 days
2006-11-20 00:16:00 +00:00
alc
ce5848ab99 There is no point in setting PG_REFERENCED on kmem_object pages because
they are "unmanaged", i.e., non-pageable, pages.

Remove a stale comment.
2006-11-13 00:27:02 +00:00
alc
6093953d36 Make pmap_enter() responsible for setting PG_WRITEABLE instead
of its caller.  (As a beneficial side-effect, a high-contention
acquisition of the page queues lock in vm_fault() is eliminated.)
2006-11-12 21:48:34 +00:00
alc
375373276d I misplaced the assertion that was added to vm_page_startup() in the
previous change.  Correct its placement.
2006-11-08 19:11:54 +00:00
alc
763826feff Simplify the construction of the free queues in vm_page_startup(). Add
an assertion to test a hypothesis concerning other redundant computation
in vm_page_startup().
2006-11-08 18:43:47 +00:00
alc
1f317d8152 Ensure that the page's oflags field is initialized by contigmalloc(). 2006-11-08 06:23:29 +00:00
rwatson
10d0d9cf47 Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges.  These may
require some future tweaking.

Sponsored by:           nCircle Network Security, Inc.
Obtained from:          TrustedBSD Project
Discussed on:           arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
                        Alex Lyashkov <umka at sevcity dot net>,
                        Skip Ford <skip dot ford at verizon dot net>,
                        Antoine Brodin <antoine dot brodin at laposte dot net>
2006-11-06 13:42:10 +00:00
jb
f82c799735 Make KSE a kernel option, turned on by default in all GENERIC
kernel configs except sun4v (which doesn't process signals properly
with KSE).

Reviewed by:	davidxu@
2006-10-26 21:42:22 +00:00
rwatson
d4be3ff623 Better align output of "show uma" by moving from displaying the basic
counters of allocs/frees/use for each zone to the same statistics
shown by userspace "vmstat -z".

MFC after:	3 days
2006-10-26 12:55:32 +00:00
alc
f395a2d02c The page queues lock is no longer required by vm_page_wakeup(). 2006-10-23 05:27:31 +00:00
alc
5d9c66a3f8 The page queues lock is no longer required by vm_page_busy() or
vm_page_wakeup().  Reduce or eliminate its use accordingly.
2006-10-22 21:18:48 +00:00
rwatson
7beaaf5cd2 Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h
begun with a repo-copy of mac.h to mac_framework.h.  sys/mac.h now
contains the userspace and user<->kernel API and definitions, with all
in-kernel interfaces moved to mac_framework.h, which is now included
across most of the kernel instead.

This change is the first step in a larger cleanup and sweep of MAC
Framework interfaces in the kernel, and will not be MFC'd.

Obtained from:	TrustedBSD Project
Sponsored by:	SPARTA
2006-10-22 11:52:19 +00:00
alc
cbcb760109 Replace PG_BUSY with VPO_BUSY. In other words, changes to the page's
busy flag, i.e., VPO_BUSY, are now synchronized by the per-vm object
lock instead of the global page queues lock.
2006-10-22 04:28:14 +00:00
alc
7d7a43f1b4 Eliminate unnecessary PG_BUSY tests. They originally served a purpose
that is now handled by vm object locking.
2006-10-21 21:02:04 +00:00
alc
31fbaf5d5c Long ago, revision 1.22 of vm/vm_pager.h introduced a bug. Specifically,
it introduced a check after the call to file system's get pages method
that assumes that the get pages method does not change the array of pages
that is passed to it.  In the case of vnode_pager_generic_getpages(),
this assumption has been incorrect.  The contents of the array of pages
may be shifted by vnode_pager_generic_getpages().  Likely, the problem
has been hidden by vnode_pager_haspage() limiting the set of pages that
are passed to vnode_pager_generic_getpages() such that a shift never
occurs.

The fix implemented herein is to adjust the pointer to the array of pages
rather than shifting the pages within the array.

MFC after: 3 weeks
Fix suggested by: tegge
2006-10-14 23:21:48 +00:00
alc
07a6c3ab4e Change vnode_pager_addr() such that on returning it distinguishes between
an error returned by VOP_BMAP() and a hole in the file.

Change the callers to vnode_pager_addr() such that they return
VM_PAGER_ERROR when VOP_BMAP fails instead of a zero-filled page.

Reviewed by: tegge
MFC after: 3 weeks
2006-10-14 22:09:03 +00:00
kmacy
e728d44745 sun4v requires TSBs (translation storage buffers) to be contiguous and be
size aligned requiring heavy usage of vm_page_alloc_contig

This change makes vm_page_alloc_contig SMP safe

Approved by: scottl (acting as backup for mentor rwatson)
2006-10-12 04:41:39 +00:00