object has fallen off the end of the cached list - this is likely the
last reference to the vnode and it should be reused before non file
vnodes that are already on the free list (VDIR mostly).
New functions create - vm_object_pip_wakeup and pagedaemon_wakeup that
are used to reduce the actual number of wakeups.
New function vm_page_protect which is used in conjuction with some new
page flags to reduce the number of calls to pmap_page_protect.
Minor changes to reduce unnecessary spl nesting.
Rewrote vm_page_alloc() to improve readability.
Various other mostly cosmetic changes.
that 9 bits aren't lost in the conversion. Changed all callers to expect
this. This allows paging on large (>2GB) filesystems.
Submitted by: John Dyson
Use request==VM_ALLOC_NORMAL rather than object!=kmem_object in deciding
if the caller is "important" in vm_page_alloc(). Also established a new
low threshold for non-interrupt allocations via cnt.v_interrupt_free_min.
vm_pageout.c:
Various algorithmic cleanup. Some calculations simplified. Initialize
cnt.v_interrupt_free_min to 2 pages.
Submitted by: John Dyson
pager(). Almost completely rewrote vm_mmap(); when John gets done with
the bottom half, it will be a complete rewrite. Deprecated most use of
vm_object_setpager(). Removed side effect of setting object persist
in vm_object_enter and moved this into the pager(s). A few other
cosmetic changes.
Slight change to reverse collapsing so that vm_object_deallocate doesn't
have to be called recursively.
Removed half of a previous fix - the renamed page during a collapse doesn't
need to be marked dirty because the pager backing store pointers are copied
- thus preserving the page's data. This assumes that pages without backing
store are always dirty (except perhaps for when they are first zeroed, but
this doesn't matter).
Switch order of two lines of code so that the correct pager is removed
from the hash list. The previous code bogusly passed a NULL pointer to
vm_object_remove(). The call to vm_object_remove() should be unnecessary
if named anonymous objects were being dealt with correctly. They are
currently marked as OBJ_INTERNAL, which really screws up things (such as
this).
2) bump reference counts by 2 instead of 1 so that an object deallocate
doesn't try to recursively collapse the object.
3) mark pages renamed during the collapse as dirty so that their contents
are preserved.
Submitted by: John and me.
Fixed long standing bug in freeing swap space during object collapses.
Fixed 'out of space' messages from printing out too often.
Modified to use new kmem_malloc() calling convention.
Implemented an additional stat in the swap pager struct to count the
amount of space allocated to that pager. This may be removed at some
point in the future.
Minimized unnecessary wakeups.
vm_fault.c:
Don't try to collect fault stats on 'swapped' processes - there aren't
any upages to store the stats in.
Changed read-ahead policy (again!).
vm_glue.c:
Be sure to gain a reference to the process's map before swapping.
Be sure to lose it when done.
kern_malloc.c:
Added the ability to specify if allocations are at interrupt time or
are 'safe'; this affects what types of pages can be allocated.
vm_map.c:
Fixed a variety of map lock problems; there's still a lurking bug that
will eventually bite.
vm_object.c:
Explicitly initialize the object fields rather than bzeroing the struct.
Eliminated the 'rcollapse' code and folded it's functionality into the
"real" collapse routine.
Moved an object_unlock() so that the backing_object is protected in
the qcollapse routine.
Make sure nobody fools with the backing_object when we're destroying it.
Added some diagnostic code which can be called from the debugger that
looks through all the internal objects and makes certain that they
all belong to someone.
vm_page.c:
Fixed a rather serious logic bug that would result in random system
crashes. Changed pagedaemon wakeup policy (again!).
vm_pageout.c:
Removed unnecessary page rotations on the inactive queue.
Changed the number of pages to explicitly free to just free_reserved
level.
Submitted by: John Dyson
There is similar bogusness in the pageout daemon that will be fixed soon.
This fixes a panic pointed out to me by Bruce Evans that occurs when
/dev/mem is used to map managed memory.
Added hook for pmap_prefault() and use symbolic constant for new third
argument to vm_page_alloc() (vm_fault.c, various)
Changed the way that upages and page tables are held. (vm_glue.c)
Fixed architectural flaw in allocating pages at interrupt time that was
introduced with the merged cache changes. (vm_page.c, various)
Adjusted some algorithms to acheive better paging performance and to
accomodate the fix for the architectural flaw mentioned above. (vm_pageout.c)
Fixed pbuf handling problem, changed policy on handling read-behind page.
(vnode_pager.c)
Submitted by: John Dyson
need to be moved into the qcollapse and rcollapse routines, but I don't
have time at the moment to make all the required changes...this will do
for now.
being cleared in some cases for vnode backed objects; we now do this in
vnode_pager_alloc proper to guarantee it. Also be more careful in the
rcollapse code about messing with busy/bmapped pages.
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.
The majority of the merged VM/cache work is by John Dyson.
The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.
vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.
vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.
vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.
vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.
vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.
pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.
vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.
proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.
swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.
machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.
machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.
ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.
Submitted by: John Dyson and David Greenman
1. The pageout daemon used to block under certain
circumstances, and we needed to add new functionality
that would cause the pageout daemon to block more often.
Now, the pageout daemon mostly just gets rid of pages
and kills processes when the system is out of swap.
The swapping, rss limiting and object cache trimming
have been folded into a new daemon called "vmdaemon".
This new daemon does things that need to be done for
the VM system, but can block. For example, if the
vmdaemon blocks for memory, the pageout daemon
can take care of it. If the pageout daemon had
blocked for memory, it was difficult to handle
the situation correctly (and in some cases, was
impossible).
2. The collapse problem has now been entirely fixed.
It now appears to be impossible to accumulate unnecessary
vm objects. The object collapsing now occurs when ref counts
drop to one (where it is more likely to be more simple anyway
because less pages would be out on disk.) The original
fixes were incomplete in that pathological circumstances
could still be contrived to cause uncontrolled growth
of swap. Also, the old code still, under steady state
conditions, used more swap space than necessary. When
using the new code, users will generally notice a
significant decrease in swap space usage, and theoretically,
the system should be leaving fewer unused pages around
competing for memory.
Submitted by: John Dyson
stages of debugging LFS:
* if we can't bmap, use old VOP code
*/
! if (/* (vp->v_mount && vp->v_mount->mnt_stat.f_type == MOUNT_LFS) || */
! VOP_BMAP(vp, foff, &dp, 0, 0)) {
for (i = 0; i < count; i++) {
if (i != reqpage) {
vnode_pager_freepage(m[i]);
--- 804,810 ----
/*
* if we can't bmap, use old VOP code
*/
! if (VOP_BMAP(vp, foff, &dp, 0, 0)) {
Reviewed by: gibbs
Submitted by: John Dyson
Disable the bogus declaration of pmap_bootstrap(). Since its arg list
is machine-dependent, it must be declared in a machine-dependent header.
vm_page.h:
Change `inline' to `__inline' and old-style function parameter lists for
inlined functions to new-style.
`inline' and old-style function parameter lists should never be used in
system headers, even in very machine-dependent ones, because they cause
warnings from gcc -Wreally-all.
fault was at offset 0 in the object. This resulted in more overhead but
was othewise benign. Added incore() check in vnode_pager_has_page()
to work around a problem with LFS...other than slightly higher overhead,
this change has no affect on UFS.
Enabled via REL2_1.
Added support for doing object collapses "on the fly". Enabled via REL2_1a.
Improved object collapses so that they can happen in more cases. Improved
sensing of modified pages to fix an apparant race condition and improve
clustered pageout opportunities. Fixed an "oops" with not restarting page
scan after a potential block in vm_pageout_clean() (not doing this can result
in strange behavior in some cases).
Submitted by: John Dyson & David Greenman
that this is intended for use only in floppy situations and is done at
the sacrifice of performance in that case (in ther words, this is not the
best solution, but works okay for this exceptional situation).
Submitted by: John Dyson
From now on, >all< swapdevices must be activated with "swapon".
If you havn't got it, add this line to /etc/fstab:
/dev/wd0b none swap sw 0 0
ne sec
Reason:
We want our GENERIC* kernels to have a large selection of swap-devices, but
on the other hand, we don't want to use a wd0b as swap when we boot of a
floppy. This way, we will never use a unexpected swapdevice. Nothing else
has changed.
scheme of things, so I've changed them to be more appropriate. page in/ous
are now associated with the pager that did them. Nuked v_fault as the
only fault of interest that wouldn't be already counted in v_trap is a VM
fault, and this is counted seperately.
2) Implemented most of the remaining counters and corrected the counting of
some that were done wrong. They are all almost correct now...just a few
minor ones left to fix.