2388 Commits

Author SHA1 Message Date
pjd
f372a74b85 Change unused 'user_wait' argument to 'timo' argument, which will be
used to specify timeout for msleep(9).

Discussed with:	alc
Reviewed by:	alc
2007-11-07 21:56:58 +00:00
kib
9ae733819b Fix for the panic("vm_thread_new: kstack allocation failed") and
silent NULL pointer dereference in the i386 and sparc64 pmap_pinit()
when the kmem_alloc_nofault() failed to allocate address space. Both
functions now return error instead of panicing or dereferencing NULL.

As consequence, vmspace_exec() and vmspace_unshare() returns the errno
int. struct vmspace arg was added to vm_forkproc() to avoid dealing
with failed allocation when most of the fork1() job is already done.

The kernel stack for the thread is now set up in the thread_alloc(),
that itself may return NULL. Also, allocation of the first process
thread is performed in the fork1() to properly deal with stack
allocation failure. proc_linkup() is separated into proc_linkup()
called from fork1(), and proc_linkup0(), that is used to set up the
kernel process (was known as swapper).

In collaboration with:	Peter Holm
Reviewed by:	jhb
2007-11-05 11:36:16 +00:00
kib
8cd6397d8a The intent of the freeing the (zeroed) page in vm_page_cache() for
default object rather than cache it was to have
vm_pager_has_page(object, pindex, ...) == FALSE to imply that there is
no cached page in object at pindex. This allows to avoid explicit
checks for cached pages in vm_object_backing_scan().

For now, we need the same bandaid for the swap object, otherwise both
the vm_page_lookup() and the pager can report that there is no page at
offset, while page is stored in the cache. Also, this fixes another
instance of the KASSERT("object type is incompatible") failure in the
vm_page_cache_transfer().

Reported and tested by:	Peter Holm
Reviewed by:	alc
MFC after:	3 days
2007-11-05 10:25:12 +00:00
maxim
7382e0b507 o Fix panic message: it's swap_pager_putpages() not swap_pager_getpages().
Submitted by:	Mark Tinguely
2007-11-02 20:48:10 +00:00
remko
5d7d5a6a8a Correct a copy and paste'o in phys_pager.c, we are talking about phys here
and not about devices.

PR:		93755
Approved by:	imp (mentor, implicit when re-assigning the ticket to me).
2007-10-30 14:48:13 +00:00
alc
acb713befa Change vm_page_cache_transfer() such that it does not transfer pages
that would have an offset beyond the end of the target object.  Such
pages should remain in the source object.

MFC after:	3 days
Diagnosed and reviewed by:	Kostik Belousov
Reported and tested by:		Peter Holm
2007-10-27 00:09:30 +00:00
rwatson
60570a92bf Merge first in a series of TrustedBSD MAC Framework KPI changes
from Mac OS X Leopard--rationalize naming for entry points to
the following general forms:

  mac_<object>_<method/action>
  mac_<object>_check_<method/action>

The previous naming scheme was inconsistent and mostly
reversed from the new scheme.  Also, make object types more
consistent and remove spaces from object types that contain
multiple parts ("posix_sem" -> "posixsem") to make mechanical
parsing easier.  Introduce a new "netinet" object type for
certain IPv4/IPv6-related methods.  Also simplify, slightly,
some entry point names.

All MAC policy modules will need to be recompiled, and modules
not updates as part of this commit will need to be modified to
conform to the new KPI.

Sponsored by:	SPARTA (original patches against Mac OS X)
Obtained from:	TrustedBSD Project, Apple Computer
2007-10-24 19:04:04 +00:00
alc
7fd960900d Correct an error of omission in the reimplementation of the page
cache: vnode_pager_setsize() must handle the case where a file is
truncated to a non-page-size-aligned boundary and there is a cached
page underlying the new end of file.

Reported by:	kris, tegge
Tested by:	kris
MFC after:	3 days
2007-10-22 06:23:46 +00:00
alc
bb82ce71e3 Correct an error in vm_map_sync(), nee vm_map_clean(), that has existed
since revision 1.1.  Specifically, neither traversal of the vm map checks
whether the end of the vm map has been reached.  Consequently, the first
traversal can wrap around and bogusly return an error.

This error has gone unnoticed for so long because no one had ever before
tried msync(2)ing a region above the stack.

Reported by:	peter
MFC after:	1 week
2007-10-22 05:21:05 +00:00
julian
51d643caa6 Rename the kthread_xxx (e.g. kthread_create()) calls
to kproc_xxx as they actually make whole processes.
Thos makes way for us to add REAL kthread_create() and friends
that actually make theads. it turns out that most of these
calls actually end up being moved back to the thread version
when it's added. but we need to make this cosmetic change first.

I'd LOVE to do this rename in 7.0  so that we can eventually MFC the
new kthread_xxx() calls.
2007-10-20 23:23:23 +00:00
alc
79cc4a8646 The previous revision, updating vm_object_page_remove() for the new page
cache, did not account for the case where the vm object has nothing but
cached pages.

Reported by:	kris, tegge
Reviewed by:	tegge
MFC after:	3 days
2007-10-18 23:02:18 +00:00
peter
9b1e0dd3a8 Fix cosmetic bug in stale copy of msync_args. 'len' is size_t, not int. 2007-10-18 22:47:39 +00:00
ru
61b9ccb51d Fix CTL_VM_NAMES. 2007-10-16 11:32:57 +00:00
jhb
a448c36f7e Allow recursion on the 'zones' internal UMA zone.
Submitted by:	thompsa
MFC after:	1 week
Approved by:	re (kensmith)
Discussed with:	jeff
2007-10-11 20:11:27 +00:00
kib
53bbfe99fb Do not dereference NULL pointer.
Reported by:	Peter Holm
Reviewed by:	alc
Approved by:	re (kensmith)
2007-10-08 20:09:53 +00:00
alc
d53c0afe54 In the rare case that vm_page_cache() actually frees the given page,
it must first ensure that the page is no longer mapped.  This is
trivially accomplished by calling pmap_remove_all() a little earlier
in vm_page_cache().  While I'm in the neighborbood, make a related
panic message a little more useful.

Approved by:	re (kensmith)
Reported by:	Peter Holm and Konstantin Belousov
Reviewed by:	Konstantin Belousov
2007-10-08 18:01:38 +00:00
alc
19c4fce2e3 Correct a lock assertion failure in sparc64's pmap_page_is_mapped() that is
a consequence of sparc64/sparc64/vm_machdep.c revision 1.76.  It occurs
when uma_small_free() frees a page.  The solution has two parts: (1) Mark
pages allocated with VM_ALLOC_NOOBJ as PG_UNMANAGED.  (2) Defer the lock
assertion in pmap_page_is_mapped() until after PG_UNMANAGED is tested.
This is safe because both PG_UNMANAGED and PG_FICTITIOUS are immutable
flags, i.e., they do not change state between the time that a page is
allocated and freed.

Approved by:	re (kensmith)
PR:		116794
2007-10-07 18:03:03 +00:00
alc
9d3ffe57ce Correct an error of omission in the reimplementation of the page
cache: vm_object_page_remove() should convert any cached pages that
fall with the specified range to free pages.  Otherwise, there could
be a problem if a file is first truncated and then regrown.
Specifically, some old data from prior to the truncation might reappear.

Generalize vm_page_cache_free() to support the conversion of either a
subset or the entirety of an object's cached pages.

Reported by: tegge
Reviewed by: tegge
Approved by: re (kensmith)
2007-09-27 04:21:59 +00:00
alc
b72c80753d Correct an error in the previous revision, specifically,
vm_object_madvise() should request that the reactivated, cached page
not be busied.

Reported by: Rink Springer
Approved by: re (kensmith)
2007-09-25 21:01:10 +00:00
alc
d1bce06c64 Change the management of cached pages (PQ_CACHE) in two fundamental
ways:

(1) Cached pages are no longer kept in the object's resident page
splay tree and memq.  Instead, they are kept in a separate per-object
splay tree of cached pages.  However, access to this new per-object
splay tree is synchronized by the _free_ page queues lock, not to be
confused with the heavily contended page queues lock.  Consequently, a
cached page can be reclaimed by vm_page_alloc(9) without acquiring the
object's lock or the page queues lock.

This solves a problem independently reported by tegge@ and Isilon.
Specifically, they observed the page daemon consuming a great deal of
CPU time because of pages bouncing back and forth between the cache
queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE).  The source of
this problem turned out to be a deadlock avoidance strategy employed
when selecting a cached page to reclaim in vm_page_select_cache().
However, the root cause was really that reclaiming a cached page
required the acquisition of an object lock while the page queues lock
was already held.  Thus, this change addresses the problem at its
root, by eliminating the need to acquire the object's lock.

Moreover, keeping cached pages in the object's primary splay tree and
memq was, in effect, optimizing for the uncommon case.  Cached pages
are reclaimed far, far more often than they are reactivated.  Instead,
this change makes reclamation cheaper, especially in terms of
synchronization overhead, and reactivation more expensive, because
reactivated pages will have to be reentered into the object's primary
splay tree and memq.

(2) Cached pages are now stored alongside free pages in the physical
memory allocator's buddy queues, increasing the likelihood that large
allocations of contiguous physical memory (i.e., superpages) will
succeed.

Finally, as a result of this change long-standing restrictions on when
and where a cached page can be reclaimed and returned by
vm_page_alloc(9) are eliminated.  Specifically, calls to
vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and
return a formerly cached page.  Consequently, a call to malloc(9)
specifying M_NOWAIT is less likely to fail.

Discussed with: many over the course of the summer, including jeff@,
   Justin Husted @ Isilon, peter@, tegge@
Tested by: an earlier version by kris@
Approved by: re (kensmith)
2007-09-25 06:25:06 +00:00
jeff
e132f56a45 - Redefine p_swtime and td_slptime as p_swtick and td_slptick. This
changes the units from seconds to the value of 'ticks' when swapped
   in/out.  ULE does not have a periodic timer that scans all threads in
   the system and as such maintaining a per-second counter is difficult.
 - Change computations requiring the unit in seconds to subtract ticks
   and divide by hz.  This does make the wraparound condition hz times
   more frequent but this is still in the range of several months to
   years and the adverse effects are minimal.

Approved by:    re
2007-09-21 05:07:07 +00:00
jeff
3fc0f8b973 - Move all of the PS_ flags into either p_flag or td_flags.
- p_sflag was mostly protected by PROC_LOCK rather than the PROC_SLOCK or
   previously the sched_lock.  These bugs have existed for some time.
 - Allow swapout to try each thread in a process individually and then
   swapin the whole process if any of these fail.  This allows us to move
   most scheduler related swap flags into td_flags.
 - Keep ki_sflag for backwards compat but change all in source tools to
   use the new and more correct location of P_INMEM.

Reported by:	pho
Reviewed by:	attilio, kib
Approved by:	re (kensmith)
2007-09-17 05:31:39 +00:00
alc
0188378655 Correct an assertion in vm_pageout_flush(). Specifically, if a page's
status after vm_pager_put_pages() is VM_PAGER_PEND, then it could have
already been recycled, i.e., freed and reallocated to a new purpose;
thus, asserting that such pages cannot be written is inappropriate.

Reported by: kris
Submitted by: tegge
Approved by: re (kensmith)
MFC after: 1 week
2007-09-15 18:30:28 +00:00
kib
77766ce03f Do not drop vm_map lock between doing vm_map_remove() and vm_map_insert().
For this, introduce vm_map_fixed() that does that for MAP_FIXED case.

Dropping the lock allowed for parallel thread to occupy the freed space.

Reported by:	Tijl Coosemans <tijl ulyssis org>
Reviewed by:	alc
Approved by:	re (kensmith)
MFC after:	2 weeks
2007-08-20 12:05:45 +00:00
kib
05d51a15e9 Remove comment that is no longer quite true.
Noted by:	alc
Approved by:	re (kensmith)
2007-08-18 16:41:31 +00:00
kib
ba6ef6ecca Fix the phys_pager in the way similar to the rev. 1.83 of the
sys/vm/device_pager.c:

Protect the creation of the phys pager with non-NULL handle with the
phys_pager_mtx. Lookup of phys pager in the pagers list by handle is now
synchronized with its removal from the list, and phys_pager_mtx is put
before vm object lock in lock order. Dispose the phys_pager_alloc_lock
and tsleep calls, together with acquiring Giant, since phys_pager_mtx
now covers the same block.

Reviewed by:	alc
Approved by:	re (kensmith)
2007-08-18 16:40:33 +00:00
kib
8423133063 Protect the creation of the device pager with the dev_pager_mtx. Lookup
of device pager in the pagers list by handle is now synchronized with
its removal from the list, and dev_pager_mtx is put before vm object
lock in lock order. Dispose the dev_pager_sx lock, since dev_pager_mtx
now covers the same block.

Noted by:	kensmith
Reviewed by:	alc
Approved by:	re (kensmith)
2007-08-07 15:36:25 +00:00
alc
a1f1ba2d56 Consider a scenario in which one processor, call it Pt, is performing
vm_object_terminate() on a device-backed object at the same time that
another processor, call it Pa, is performing dev_pager_alloc() on the
same device.  The problem is that vm_pager_object_lookup() should not be
allowed to return a doomed object, i.e., an object with OBJ_DEAD set,
but it does.  In detail, the unfortunate sequence of events is: Pt in
vm_object_terminate() holds the doomed object's lock and sets OBJ_DEAD
on the object.  Pa in dev_pager_alloc() holds dev_pager_sx and calls
vm_pager_object_lookup(), which returns the doomed object.  Next, Pa
calls vm_object_reference(), which requires the doomed object's lock, so
Pa waits for Pt to release the doomed object's lock.  Pt proceeds to the
point in vm_object_terminate() where it releases the doomed object's
lock.  Pa is now able to complete vm_object_reference() because it can
now complete the acquisition of the doomed object's lock.  So, now the
doomed object has a reference count of one!  Pa releases dev_pager_sx
and returns the doomed object from dev_pager_alloc().  Pt now acquires
dev_pager_mtx, removes the doomed object from dev_pager_object_list,
releases dev_pager_mtx, and finally calls uma_zfree with the doomed
object.  However, the doomed object is still in use by Pa.

Repeating my key point, vm_pager_object_lookup() must not return a
doomed object.  Moreover, the test for the object's state, i.e.,
doomed or not, and the increment of the object's reference count
should be carried out atomically.

Reviewed by:	kib
Approved by:	re (kensmith)
MFC after:	3 weeks
2007-08-05 21:04:32 +00:00
kib
04fb5db609 Do not acquire Giant unconditionally around the calls to the cdevsw
d_mmap methods. prep_cdevsw() already installs the shims that
acquire/drop Giant for the methods of a driver that specified the
D_NEEDGIANT flag.

Reviewed by:	alc
Approved by:	re (kensmith)
2007-08-05 05:40:52 +00:00
alc
215153274b Add a counter for the total number of pages cached and support for
reporting the value of this counter in the program "vmstat".

Approved by:	re (rwatson)
2007-07-27 20:01:22 +00:00
pjd
fe74e944d1 When we do open, we should lock the vnode exclusively. This fixes few races:
- fifo race, where two threads assign v_fifoinfo,
- v_writecount modifications,
- v_object modifications,
- and probably more...

Discussed with:	kib, ups
Approved by:	re (rwatson)
2007-07-26 16:58:09 +00:00
alc
8765bda351 Two changes to vm_fault_additional_pages():
1. Rewrite the backward scan.  Specifically, reverse the order in which
   pages are allocated so that upon failure it is never necessary to
   free pages that were just allocated.  Moreover, any allocated pages
   can be put to use.  This makes the backward scan behave just like the
   forward scan.

2. Eliminate an explicit, unsynchronized check for low memory before
   calling vm_page_alloc().  It serves no useful purpose.  It is, in
   effect, optimizing the uncommon case at the expense of the common
   case.

Approved by:	re (hrs)
MFC after:	3 weeks
2007-07-20 06:55:11 +00:00
alc
dc39b85c98 Eliminate two unused functions: vm_phys_alloc_pages() and
vm_phys_free_pages().  Rename vm_phys_alloc_pages_locked() to
vm_phys_alloc_pages() and vm_phys_free_pages_locked() to
vm_phys_free_pages().  Add comments regarding the need for the free page
queues lock to be held by callers to these functions.  No functional
changes.

Approved by:	re (hrs)
2007-07-14 21:21:17 +00:00
alc
c29e755cdb Eliminate dead code, specifically, an unused sysctl: "vm.idlezero_maxrun".
Approved by:	re (hrs)
2007-07-14 19:00:44 +00:00
alc
3e2faffa45 Update a comment describing the page queues.
Approved by:	re (hrs)
2007-07-13 04:42:20 +00:00
alc
db9dd359b6 Eliminate dead code.
Approved by:	re (hrs)
2007-07-12 22:23:28 +00:00
alc
3b36c8e7eb Correct a problem in the ZERO_COPY_SOCKETS option, specifically, in
vm_page_cowfault().  Initially, if vm_page_cowfault() sleeps, the given
page is wired, preventing it from being recycled.  However, when
transmission of the page completes, the page is unwired and returned to
the page queues.  At that point, the page is not in any special state
that prevents it from being recycled.  Consequently, vm_page_cowfault()
should verify that the page is still held by the same vm object before
retrying the replacement of the page.  Note: The containing object is,
however, safe from being recycled by virtue of having a non-zero
paging-in-progress count.

While I'm here, add some assertions and comments.

Approved by: re (rwatson)
MFC After: 3 weeks
2007-07-10 18:41:34 +00:00
alc
f58d26b291 Eliminate the special case handling of OBJT_DEVICE objects in
vm_fault_additional_pages() that was introduced in revision 1.47.  Then
as now, it is unnecessary because dev_pager_haspage() returns zero for
both the number of pages to read ahead and read behind, producing the
same exact behavior by vm_fault_additional_pages() as the special case
handling.

Approved by: re (rwatson)
2007-07-08 19:42:52 +00:00
alc
0985df88fc When a cached page is reactivated in vm_fault(), update the counter that
tracks the total number of reactivated pages.  (We have not been
counting reactivations by vm_fault() since revision 1.46.)

Correct a comment in vm_fault_additional_pages().

Approved by:	re (kensmith)
MFC after:	1 week
2007-07-06 21:25:21 +00:00
peter
fd45cf2a6d Add freebsd6_ wrappers for mmap/lseek/pread/pwrite/truncate/ftruncate
Approved by: re (kensmith)
2007-07-04 22:57:21 +00:00
alc
ab67a07868 In the previous revision, when I replaced the unconditional acquisition
of Giant in vm_pageout_scan() with VFS_LOCK_GIANT(), I had to eliminate
the acquisition of the vnode interlock before releasing the vm object's
lock because the vnode interlock cannot be held when VFS_LOCK_GIANT() is
performed.  Unfortunately, this allows the vnode to be recycled between
the release of the vm object's lock and the vget() on the vnode.

In this revision, I prevent the vnode from being recycled by acquiring
another reference to the vm object and underlying vnode before releasing
the vm object's lock.

This change also addresses another preexisting but trivial problem.  By
acquiring another reference to the vm object, I also prevent the vm
object from being recycled.  Previously, the "vnodes skipped" counter
could be wrong because if it examined a recycled vm object.

Reported by:	kib
Reviewed by:	kib
Approved by:	re (kensmith)
MFC after:	3 weeks
2007-07-02 06:56:37 +00:00
alc
c6438ef575 Eliminate the use of Giant from vm_daemon(). Replace the unconditional
use of Giant in vm_pageout_scan() with VFS_LOCK_GIANT().

Approved by:	re (kensmith)
MFC after:	3 weeks
2007-06-26 18:24:05 +00:00
alc
df5f1d1131 Eliminate GIANT_REQUIRED from swap_pager_putpages().
Approved by:	re (mux)
MFC after:	1 week
2007-06-24 18:40:30 +00:00
alc
c7ee2c66ef Eliminate unnecessary checks from vm_pageout_clean(): The page that is
passed to vm_pageout_clean() cannot possibly be PG_UNMANAGED because
it came from the inactive queue and PG_UNMANAGED pages are not in any
page queue.  Moreover, PG_UNMANAGED pages only exist in OBJT_PHYS
objects, and all pages within a OBJT_PHYS object are PG_UNMANAGED.
So, if the page that is passed to vm_pageout_clean() is not
PG_UNMANAGED, then it cannot be from an OBJT_PHYS object and its
neighbors from the same object cannot themselves be PG_UNMANAGED.

Reviewed by:	tegge
2007-06-18 02:04:38 +00:00
mjacob
a7dcde4629 Don't declare inline a function which isn't. 2007-06-17 04:19:05 +00:00
mjacob
fadc531504 Make sure object is NULL- there is a possible case where you could
fall through to it being used w/o being set. Put a break in the default
case.
2007-06-17 04:17:48 +00:00
mjacob
cd7cf5829b Initialize reqpage to zero. 2007-06-17 04:14:27 +00:00
alc
011d4e557f If attempting to cache a "busy", panic instead of printing a diagnostic
message and returning.
2007-06-16 21:07:51 +00:00
alc
8386bd54e6 Update a comment. 2007-06-16 05:25:53 +00:00
alc
a8415c5a0d Enable the new physical memory allocator.
This allocator uses a binary buddy system with a twist.  First and
foremost, this allocator is required to support the implementation of
superpages.  As a side effect, it enables a more robust implementation
of contigmalloc(9).  Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).

The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages.  Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space.  The performance benefits vary.  In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.

This allocator does not implement page coloring.  The reason is that
superpages have much the same effect.  The contiguous physical memory
allocation necessary for a superpage is inherently colored.

Finally, the one caveat is that this allocator does not effectively
support prezeroed pages.  I hope this is temporary.  On i386, this is
a slight pessimization.  However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects.  I speculate
that this is true in general of machines with a direct map.

Approved by:	re
2007-06-16 04:57:06 +00:00