Commit Graph

394 Commits

Author SHA1 Message Date
alc
d18bfec38d Lock the vm_object when performing vm_pager_deallocate(). 2003-05-06 02:45:28 +00:00
alc
afa35c49d2 - Lock the vm_object when performing swap_pager_isswapped().
- Assert that the vm_object is locked in swap_pager_isswapped().
2003-04-28 17:13:53 +00:00
alc
373b18b5c3 - Convert vm_object_pip_wait() from using tsleep() to msleep().
- Make vm_object_pip_sleep() static.
 - Lock the vm_object when performing vm_object_pip_wait().
2003-04-26 18:33:18 +00:00
alc
114f28a272 - Lock the vm_object when performing vm_object_pip_add().
- Remove an unnecessary variable.
2003-04-20 07:08:30 +00:00
alc
5990076d78 - Lock the vm_object when performing vm_object_pip_add(). 2003-04-20 03:41:21 +00:00
alc
dc48d3db81 - Lock the vm_object when performing vm_object_pip_subtract().
- Assert that the vm_object lock is held in vm_object_pip_subtract().
2003-04-19 22:11:41 +00:00
alc
ef4e8a19cf - Lock the vm_object when performing vm_object_pip_wakeupn().
- Assert that the vm_object lock is held in vm_object_pip_wakeupn().
 - Add a new macro VM_OBJECT_LOCK_ASSERT().
2003-04-19 21:15:44 +00:00
imp
cf874b345d Back out M_* changes, per decision of the TRB.
Approved by: trb
2003-02-19 05:47:46 +00:00
alfred
bf8e8a6e8f Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
2003-01-21 08:56:16 +00:00
phk
88bd74d184 Avoid extern decls in .c files by putting them in the vm/swap_pager.h
include file where they belong.
Share the dmmax_mask variable.
2003-01-03 14:30:46 +00:00
phk
daf6948653 Convert calls to BUF_STRATEGY to VOP_STRATEGY calls. This is a no-op since
all BUF_STRATEGY did in the first place was call VOP_STRATEGY.
2003-01-03 06:32:15 +00:00
alc
a0ec4e5670 Hold the page queues lock when performing vm_page_flag_set(). 2002-12-18 04:02:02 +00:00
dillon
b43fb3e920 This is David Schultz's swapoff code which I am finally able to commit.
This should be considered highly experimental for the moment.

Submitted by:	David Schultz <dschultz@uclink.Berkeley.EDU>
MFC after:	3 weeks
2002-12-15 19:17:57 +00:00
alc
631af658fa Remove vm_page_protect(). Instead, use pmap_page_protect() directly. 2002-11-18 04:05:22 +00:00
cognet
1404d98db6 Remove extra #include<sys/vmmeter.h>. 2002-11-11 13:57:50 +00:00
phk
1dfc2c167f Be consistent about "static" functions: if the function is marked
static in its prototype, mark it static at the definition too.

Inspired by:    FlexeLint warning #512
2002-09-28 17:15:38 +00:00
jeff
4792c0673b - Lock access to numoutput on the swap devices. 2002-09-25 01:24:17 +00:00
dillon
1703af0c56 Reduce the maximum KVA reserved for swap meta structures from 70 to 32 MB.
Reduce the swap meta calculation by a factor of 2, it's still massive overkill.

X-MFC after: immediately
2002-08-31 21:15:29 +00:00
alc
b9718e8d79 o Lock page queue accesses by vm_page_free(). 2002-07-21 20:38:45 +00:00
alc
d9f77c5b73 o Lock page queue accesses by vm_page_try_to_cache(). (The accesses
in kern/vfs_bio.c are already locked.)
 o Assert that the page queues lock is held in vm_page_try_to_cache().
2002-07-20 20:58:46 +00:00
iedowse
e62ef89fc1 Avoid using the 64-bit vm_pindex_t in a few places where 64-bit
types are not required, as the overhead is unnecessary:

 o In the i386 pmap_protect(), `sindex' and `eindex' represent page
   indices within the 32-bit virtual address space.
 o In swp_pager_meta_build() and swp_pager_meta_ctl(), use a temporary
   variable to store the low few bits of a vm_pindex_t that gets used
   as an array index.
 o vm_uiomove() uses `osize' and `idx' for page offsets within a
   map entry.
 o In vm_object_split(), `idx' is a page offset within a map entry.
2002-06-26 20:32:51 +00:00
iedowse
c3868afc98 Use an explicit cast to avoid relying on sign extension to do the
right thing in code such as `vm_pindex_t x = ~SWAP_META_MASK'.

Reviewed by:	dillon
2002-06-26 19:18:14 +00:00
alc
ef43ad02fe o Replace GIANT_REQUIRED in swap_pager_alloc() by the acquisition and
release of Giant.  (Annotate as MPSAFE.)
2002-06-22 08:03:21 +00:00
jhb
db9aa81e23 Change callers of mtx_init() to pass in an appropriate lock type name. In
most cases NULL is passed, but in some cases such as network driver locks
(which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used.

Tested on:	i386, alpha, sparc64
2002-04-04 21:03:38 +00:00
jeff
9ef9bf2eaf Remove references to vm_zone.h and switch over to the new uma API. 2002-03-20 04:02:59 +00:00
alfred
1446d09429 Remove __P. 2002-03-19 22:20:14 +00:00
jeff
2923687da3 This is the first part of the new kernel memory allocator. This replaces
malloc(9) and vm_zone with a slab like allocator.

Reviewed by:	arch@
2002-03-19 09:11:49 +00:00
eivind
0799ec54b1 - Remove a number of extra newlines that do not belong here according to
style(9)
- Minor space adjustment in cases where we have "( ", " )", if(), return(),
  while(), for(), etc.
- Add /* SYMBOL */ after a few #endifs.

Reviewed by:	alc
2002-03-10 21:52:48 +00:00
jhb
b8b3ac8816 Use thread0.td_ucred instead of proc0.p_ucred. This change is cosmetic
and isn't strictly required.  However, it lowers the number of false
positives found when grep'ing the kernel sources for p_ucred to ensure
proper locking.
2002-02-27 19:18:10 +00:00
phk
b9b775cf13 GC: BIO_ORDERED, various infrastructure dealing with BIO_ORDERED. 2002-02-22 09:26:35 +00:00
tegge
56f1506892 Don't use an uninitialized field reserved for callers in the bio structure
passed to swap_pager_strategy().  Instead, use a field reserved for drivers
and initialize it before usage.

Reviewed by:	dillon
2001-10-15 23:02:54 +00:00
jhb
4806d88677 Change the kernel's ucred API as follows:
- crhold() returns a reference to the ucred whose refcount it bumps.
- crcopy() now simply copies the credentials from one credential to
  another and has no return value.
- a new crshared() primitive is added which returns true if a ucred's
  refcount is > 1 and false (0) otherwise.
2001-10-11 23:38:17 +00:00
dillon
05c33a209b Limit the amount of KVM reserved for the buffer cache and for swap-meta
information.  The default limits only effect machines with > 1GB of ram
and can be overriden with two new kernel conf variables VM_SWZONE_SIZE_MAX
and VM_BCACHE_SIZE_MAX, or with loader variables kern.maxswzone and
kern.maxbcache.  This has the effect of leaving more KVM available for
sizing NMBCLUSTERS and 'maxusers' and should avoid tripups where a sysad
adds memory to a machine and then sees the kernel panic on boot due to
running out of KVM.

Also change the default swap-meta auto-sizing calculation to allocate half
of what it was previously allocating.  The prior defaults were way too high.
Note that we cannot afford to run out of swap-meta structures so we still
stay somewhat conservative here.
2001-08-20 00:41:12 +00:00
alfred
1d105403d0 Fixups for the initial allocation by dillon:
1) allocate fewer buckets
  2) when failing to allocate swap zone, keep reducing the zone by
     a third rather than a half in order to reduce the chance of
     allocating way too little.

I also moved around some code for readability.

Suggested by: dillon
Reviewed by: dillon
2001-08-02 07:54:58 +00:00
dillon
cbc4469f38 whitespace / register cleanup 2001-07-04 19:00:13 +00:00
dillon
e028603b7e With Alfred's permission, remove vm_mtx in favor of a fine-grained approach
(this commit is just the first stage).  Also add various GIANT_ macros to
formalize the removal of Giant, making it easy to test in a more piecemeal
fashion. These macros will allow us to test fine-grained locks to a degree
before removing Giant, and also after, and to remove Giant in a piecemeal
fashion via sysctl's on those subsystems which the authors believe can
operate without Giant.
2001-07-04 16:20:28 +00:00
jhb
ce09fa9fbc - Protect all accesses to nsw_[rw]count{,_{,a}sync} with the pbuf mutex.
- Don't drop the vm mutex while grabbing the pbuf mutex to manipulate
  said variables.
2001-06-22 21:12:19 +00:00
jhb
38c4b5afb4 - Fix the sw_alloc_interlock to actually lock itself when the lock is
acquired.
- Assert Giant is held in the strategy, getpages, and putpages methods and
  the getchainbuf, flushchainbuf, and waitchainbuf functions.
- Always call flushchainbuf() w/o the VM lock.
2001-05-23 22:31:15 +00:00
alfred
587340efa6 aquire Giant when playing with the buffercache and doing IO.
use msleep against the vm mutex while waiting for a page IO to complete.
2001-05-23 10:28:11 +00:00
alfred
5382981252 aquire vm mutex in swp_pager_async_iodone. Don't call swp_pager_async_iodone
with the mutex held.
2001-05-22 19:01:26 +00:00
alfred
a3f0842419 Introduce a global lock for the vm subsystem (vm_mtx).
vm_mtx does not recurse and is required for most low level
vm operations.

faults can not be taken without holding Giant.

Memory subsystems can now call the base page allocators safely.

Almost all atomic ops were removed as they are covered under the
vm mutex.

Alpha and ia64 now need to catch up to i386's trap handlers.

FFS and NFS have been tested, other filesystems will need minor
changes (grabbing the vm lock when twiddling page properties).

Reviewed (partially) by: jake, jhb
2001-05-19 01:28:09 +00:00
phk
16caeec9b0 Actually biofinish(struct bio *, struct devstat *, int error) is more general
than the bioerror().

Most of this patch is generated by scripts.
2001-05-06 20:00:03 +00:00
alfred
e6ee97803e Protect pager object creation with sx locks.
Protect pager object list manipulation with a mutex.

It doesn't look possible to combine them under a single sx lock because
creation may block and we can't have the object list manipulation block
on anything other than a mutex because of interrupt requests.
2001-04-18 20:24:16 +00:00
alfred
bbee48d66d protect pbufs and associated counts with a mutex 2001-04-13 10:23:32 +00:00
rwatson
9a2d215a5f Introduce per-swap area accounting in the VM system, and export
this information via the vm.nswapdev sysctl (number of swap areas)
and vm.swapdevX nodes (where X is the device), which contain the MIBs
dev, blocks, used, and flags.  These changes are required to allow
top and other userland swap-monitoring utilities to run without
setgid kmem.

Submitted by:	Thomas Moestl <tmoestl@gmx.net>
Reviewed by:	freebsd-audit
2001-02-23 18:46:21 +00:00
tanimura
a8dbeb3f7b - If swap metadata does not fit into the KVM, reduce the number of
struct swblock entries by dividing the number of the entries by 2
until the swap metadata fits.

- Reject swapon(2) upon failure of swap_zone allocation.

This is just a temporary fix. Better solutions include:
(suggested by:	dillon)

o reserving swap in SWAP_META_PAGES chunks, and
o swapping the swblock structures themselves.

Reviewed by:	alfred, dillon
2000-12-13 10:01:00 +00:00
dwmalone
dd75d1d73b Convert more malloc+bzero to malloc+M_ZERO.
Submitted by:	josh@zipperup.org
Submitted by:	Robert Drehmel <robd@gmx.net>
2000-12-08 21:51:06 +00:00
rwatson
4a8be80602 o Export dmmax ("Maximum size of a swap block") using SYSCTL_INT.
This removes a reason that systat requires setgid kmem.  More to
  come.
2000-11-20 00:39:04 +00:00
dillon
2ace352085 Implement a low-memory deadlock solution.
Removed most of the hacks that were trying to deal with low-memory
    situations prior to now.

    The new code is based on the concept that I/O must be able to function in
    a low memory situation.  All major modules related to I/O (except
    networking) have been adjusted to allow allocation out of the system
    reserve memory pool.  These modules now detect a low memory situation but
    rather then block they instead continue to operate, then return resources
    to the memory pool instead of cache them or leave them wired.

    Code has been added to stall in a low-memory situation prior to a vnode
    being locked.

    Thus situations where a process blocks in a low-memory condition while
    holding a locked vnode have been reduced to near nothing.  Not only will
    I/O continue to operate, but many prior deadlock conditions simply no
    longer exist.

Implement a number of VFS/BIO fixes

	(found by Ian): in biodone(), bogus-page replacement code, the loop
        was not properly incrementing loop variables prior to a continue
        statement.  We do not believe this code can be hit anyway but we
        aren't taking any chances.  We'll turn the whole section into a
        panic (as it already is in brelse()) after the release is rolled.

	In biodone(), the foff calculation was incorrectly
        clamped to the iosize, causing the wrong foff to be calculated
        for pages in the case of an I/O error or biodone() called without
        initiating I/O.  The problem always caused a panic before.  Now it
        doesn't.  The problem is mainly an issue with NFS.

	Fixed casts for ~PAGE_MASK.  This code worked properly before only
        because the calculations use signed arithmatic.  Better to properly
        extend PAGE_MASK first before inverting it for the 64 bit masking
        op.

	In brelse(), the bogus_page fixup code was improperly throwing
        away the original contents of 'm' when it did the j-loop to
        fix the bogus pages.  The result was that it would potentially
        invalidate parts of the *WRONG* page(!), leading to corruption.

	There may still be cases where a background bitmap write is
        being duplicated, causing potential corruption.  We have identified
        a potentially serious bug related to this but the fix is still TBD.
        So instead this patch contains a KASSERT to detect the problem
  	and panic the machine rather then continue to corrupt the filesystem.
	The problem does not occur very often..  it is very hard to
	reproduce, and it may or may not be the cause of the corruption
	people have reported.

Review by: (VFS/BIO: mckusick, Ian Dowse <iedowse@maths.tcd.ie>)
Testing by: (VM/Deadlock) Paul Saab <ps@yahoo-inc.com>
2000-11-18 23:06:26 +00:00
dillon
15a44d16ca This patchset fixes a large number of file descriptor race conditions.
Pre-rfork code assumed inherent locking of a process's file descriptor
    array.  However, with the advent of rfork() the file descriptor table
    could be shared between processes.  This patch closes over a dozen
    serious race conditions related to one thread manipulating the table
    (e.g. closing or dup()ing a descriptor) while another is blocked in
    an open(), close(), fcntl(), read(), write(), etc...

PR: kern/11629
Discussed with: Alexander Viro <viro@math.psu.edu>
2000-11-18 21:01:04 +00:00
dillon
b9b696f1ab The swap bitmap allocator was not calculating the bitmap size properly
in the face of non-stripe-aligned swap areas.  The bug could cause a
      panic during boot.

      Refuse to configure a swap area that is too large (67 GB or so)

      Properly document the power-of-2 requirement for SWB_NPAGES.

      The patch is slightly different then the one Tor enclosed in the P.R.,
      but accomplishes the same thing.

PR: kern/20273
Submitted by: Tor.Egge@fast.no
2000-10-13 16:44:34 +00:00
peter
ee5cd6988f Implement an optimization of the VM<->pmap API. Pass vm_page_t's directly
to various pmap_*() functions instead of looking up the physical address
and passing that.  In many cases, the first thing the pmap code was doing
was going to a lot of trouble to get back the original vm_page_t, or
it's shadow pv_table entry.

Inspired by: John Dyson's 1998 patches.

Also:
Eliminate pv_table as a seperate thing and build it into a machine
dependent part of vm_page_t.  This eliminates having a seperate set of
structions that shadow each other in a 1:1 fashion that we often went to
a lot of trouble to translate from one to the other. (see above)
This happens to save 4 bytes of physical memory for each page in the
system.  (8 bytes on the Alpha).

Eliminate the use of the phys_avail[] array to determine if a page is
managed (ie: it has pv_entries etc).  Store this information in a flag.
Things like device_pager set it because they create vm_page_t's on the
fly that do not have pv_entries.  This makes it easier to "unmanage" a
page of physical memory (this will be taken advantage of in subsequent
commits).

Add a function to add a new page to the freelist.  This could be used
for reclaiming the previously wasted pages left over from preloaded
loader(8) files.

Reviewed by:	dillon
2000-05-21 12:50:18 +00:00
phk
36c3965ff9 Separate the struct bio related stuff out of <sys/buf.h> into
<sys/bio.h>.

<sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall
not be made a nested include according to bdes teachings on the
subject of nested includes.

Diskdrivers and similar stuff below specfs::strategy() should no
longer need to include <sys/buf.> unless they need caching of data.

Still a few bogus uses of struct buf to track down.

Repocopy by:    peter
2000-05-05 09:59:14 +00:00
phk
c7feb17572 Convert the vm_pager_strategy() interface to take a struct bio instead of
a struct buf.  Don't try to examine B_ASYNC, it is a layering violation
to do so.  The only current user of this interface is vn(4) which, since
it emulates a disk interface, operates on struct bio already.
2000-05-03 07:47:46 +00:00
phk
54f7afc04f Move and staticize the bufchain functions so they become local to the
only piece of code using them.  This will ease a rewrite of them.
2000-05-01 19:38:51 +00:00
phk
aaaef0b54e Complete the bio/buf divorce for all code below devfs::strategy
Exceptions:
        Vinum untouched.  This means that it cannot be compiled.
        Greg Lehey is on the case.

        CCD not converted yet, casts to struct buf (still safe)

        atapi-cd casts to struct buf to examine B_PHYS
2000-04-15 05:54:02 +00:00
phk
8ee11d587f Move B_ERROR flag to b_ioflags and call it BIO_ERROR.
(Much of this done by script)

Move B_ORDERED flag to b_ioflags and call it BIO_ORDERED.

Move b_pblkno and b_iodone_chain to struct bio while we transition, they
will be obsoleted once bio structs chain/stack.

Add bio_queue field for struct bio aware disksort.

Address a lot of stylistic issues brought up by bde.
2000-04-02 15:24:56 +00:00
dillon
5ccef75e02 Add necessary spl protection for swapper. The problem was located by
Alfred while testing his SPLASSERT stuff.   This is not a complete fix,
    more protections are probably needed.
2000-03-27 21:33:32 +00:00
charnier
5c16c2a8f1 Revert spelling mistake I made in the previous commit
Requested by: Alan and Bruce
2000-03-27 20:41:17 +00:00
charnier
686df89909 Spelling 2000-03-26 15:20:23 +00:00
phk
1b817afc6d Fix one place which knew that B_WRITE was zero.
Fix a stylistic mistake of mine while here.

Found by:	Stephen Hocking <shocking@prth.pgs.com>
2000-03-22 08:40:13 +00:00
phk
5df766a0f8 Rename the existing BUF_STRATEGY() to DEV_STRATEGY()
substitute BUF_WRITE(foo) for VOP_BWRITE(foo->b_vp, foo)

substitute BUF_STRATEGY(foo) for VOP_STRATEGY(foo->b_vp, foo)

This patch is machine generated except for the ccd.c and buf.h parts.
2000-03-20 11:29:10 +00:00
phk
a246e10f55 Remove B_READ, B_WRITE and B_FREEBUF and replace them with a new
field in struct buf: b_iocmd.  The b_iocmd is enforced to have
exactly one bit set.

B_WRITE was bogusly defined as zero giving rise to obvious coding
mistakes.

Also eliminate the redundant struct buf flag B_CALL, it can just
as efficiently be done by comparing b_iodone to NULL.

Should you get a panic or drop into the debugger, complaining about
"b_iocmd", don't continue.  It is likely to write on your disk
where it should have been reading.

This change is a step in the direction towards a stackable BIO capability.

A lot of this patch were machine generated (Thanks to style(9) compliance!)

Vinum users:  Greg has not had time to test this yet, be careful.
2000-03-20 10:44:49 +00:00
phk
6b3385b773 Eliminate the undocumented, experimental, non-delivering and highly
dangerous MAX_PERF option.
2000-03-16 08:51:55 +00:00
peter
38a9421a1e Fix the swap backed vn case - this was broken by my rev 1.128 to
swap_pager.c and related commits.

Essentially swap_pager.c is backed out to before the changes, but
swapdev_vp is converted into a real vnode with just VOP_STRATEGY().
It no longer abuses specfs vnops and no longer needs a dev_t and
/dev/drum (or /dev/swapdev) for the intermediate layer.

This essentially restores the vnode interface as the interface to the
bottom of the swap pager, and vm_swap.c provides a clean vnode interface.

This will need to be revisited when we swap to files (vnodes) - which
is the other reason for keeping the vnode interface between the swap pager
and the swap devices.

OK'ed by:	dillon
1999-12-28 07:30:55 +00:00
phk
84a3f8a8d2 Isolate the swapdev_vp "not quite" vnode in the only source file which
needs it now that /dev/drum is gone.

Reviewed by: eivind, peter
1999-11-22 15:27:09 +00:00
peter
bae4ed31fd Remove the non-functional "swap device" userland front-end to the
multiplexed underlying swap devices (/dev/drum).  The only thing it did
was to allow root to open /dev/drum, but not do anything with it.
Various utilities used to grovel around in here, but Matt has written
a much nicer (and clean) front-end to this for libkvm, and nothing uses
the old system any more.

The VM system was calling VOP_STRATEGY() on the vp of the first underlying
swap device (not the /dev/drum one, the first real device), and using
the VOP system to indirectly (and only) call swstrategy() to choose
an underlying device and enqueue it on that device.  I have changed it
to avoid diverting through the VOP system and to call the only possible
target directly, saving a little bit of time and some complexity.

In all, nothing much changes, except some scaffolding to support the
roundabout way of calling swstrategy() is gone.

Matt gave me the ok to do this some time ago, and I apologize for taking
so long to get around to it.
1999-11-18 06:55:40 +00:00
phk
8e3c3eafed useracc() the prequel:
Merge the contents (less some trivial bordering the silly comments)
of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>.  This puts
the #defines for the vm_inherit_t and vm_prot_t types next to their
typedefs.

This paves the road for the commit to follow shortly: change
useracc() to use VM_PROT_{READ|WRITE} rather than B_{READ|WRITE}
as argument.
1999-10-29 18:09:36 +00:00
dillon
7a0052d268 Fix a number of spl bugs related to reserving and freeing swap space.
Swap space can be freed from an interrupt and so swap reservation and
    freeing must occur at splvm.

    Add swap_pager_reserve() code to support a new swap pre-reservation
    capability for the VN device.

    Generally cleanup the swap code by simplifying the swp_pager_meta_build()
    static function and consolidating the SWAPBLK_NONE test from a bit test
    to an absolute compare.  The bit test was left over from a rejected
    swap allocation scheme that was not ultimately committed.  A few other
    minor cleanups were also made.

    Reorganize the swap strategy code, again for VN support, to not
    reallocate swap when writing as this messes up pre-reservation and
    can fragment I/O unnecessarily as VN-baesd disk is messed around with.

Reviewed by:	Alan Cox <alc@cs.rice.edu>, David Greenman <dg@root.com>
1999-09-17 05:09:24 +00:00
peter
3b842d34e8 $Id$ -> $FreeBSD$ 1999-08-28 01:08:13 +00:00
bde
aff861d544 Use devtoname to print dev_t's instead of casting them to u_long for
misprinting with %lx.

Cast pointers to intptr_t instead of casting them to long.  Cosmetic.
1999-08-23 23:55:03 +00:00
alc
fb63bd8ded Correct an accidental omission of one "vm_page_undirty" replacement
from the previous commit.
1999-08-17 05:56:00 +00:00
alc
075745f2e2 Add the (inline) function vm_page_undirty for clearing the dirty bitmask
of a vm_page.

Use it.

Submitted by:	dillon
1999-08-17 04:02:34 +00:00
alc
ab1033c835 Remove vm_object::last_read. It is used by the old swap pager, but
not by the new one, i.e., vm/swap_pager.c rev 1.108.

Reviewed by:	dillon@backplane.com
1999-07-16 05:11:37 +00:00
peter
5d3d376190 Kirk missed a required BUF_KERNPROC(). Even though this is a non-async
transfer, the b_iodone hook causes biodone() to release it from interrupt
context.
1999-06-27 22:08:38 +00:00
mckusick
5b58f2f951 Convert buffer locking from using the B_BUSY and B_WANTED flags to using
lockmgr locks. This commit should be functionally equivalent to the old
semantics. That is, all buffer locking is done with LK_EXCLUSIVE
requests. Changes to take advantage of LK_SHARED and LK_RECURSIVE will
be done in future commits.
1999-06-26 02:47:16 +00:00
phk
f57a01ebfc remove b_proc from struct buf, it's (now) unused.
Reviewed by:	dillon, bde
1999-05-06 20:00:34 +00:00
julian
0c3f3973d2 Submitted by: Matt Dillon <dillon@freebsd.org>
The old VN device broke in -4.x when the definition of B_PAGING
changed. This patch fixes this plus implements additional capabilities.
The new VN device can be backed by a file ( as per normal ), or it can
be directly backed by swap.

Due to dependencies in VM include files  (on opt_xxx options) the new
vn device cannot be a module yet. This will be fixed in a later commit.
This commit delimitted by tags {PRE,POST}_MATT_VNDEV
1999-03-14 09:20:01 +00:00
dillon
fa06d8f968 Remove conditional sysctl's
Leave swap_async_max sysctl intact, remove swap_cluster_max sysctl.

Reviewed by:	     Alan Cox <alc@cs.rice.edu>
1999-02-21 08:34:15 +00:00
dillon
338a9c530d Reviewed by: Alan Cox <alc@cs.rice.edu>
Fix problem w/ low-swap/low-memory handling as reported by Bruce Evans.
1999-02-21 08:30:49 +00:00
dillon
03d8f2679e Limit number of simultanious asynchronous swap pager I/Os that can
be in progress at any given moment.

    Add two swap tuneables to sysctl:

	vm.swap_async_max: 4
	vm.swap_cluster_max: 16

    Recommended values are a cluster size of 8 or 16 pages.  async_max is
    about right for 1-4 swap devices.  Reduce to 2 if swap is eating too much
    bandwidth, or even 1 if swap is both eating too much bandwidth and sitting
    on a slow network (10BaseT).

    The defaults work well across a broad range of configurations and should
    normally be left alone.
1999-02-18 19:57:33 +00:00
dillon
638895c103 Add hysteresis to the 'swap_pager_getswapspace; failed' console message.
Also widen the hysteresis levels a little ( these really should be
    dynamically configured ).
1999-02-06 07:22:21 +00:00
dillon
ca8ef4ff13 Remove unintended trigraph sequences in comments for -Wall 1999-01-27 18:19:53 +00:00
dillon
a4c067a459 Change all manual settings of vm_page_t->dirty = VM_PAGE_BITS_ALL
to use the vm_page_dirty() inline.

    The inline can thus do sanity checks ( or not ) over all cases.
1999-01-24 06:04:52 +00:00
dillon
9ed8ff237b vm_pager_put_pages() is passed an rcval array to hold per-page return
values.  The 'int' return value for the procedure was never used and
    not well defined in any case when there are mixed errors on pages, so
    it has been removed.  vm_pager_put_pages() and associated vm_pager
    functions now return void.
1999-01-24 02:32:15 +00:00
dillon
e3c11331d3 The default_pager's interaction with the swap_pager has been reorganized,
and the swap_pager has been completely replaced.

    The new swap pager uses the new blist radix-tree based bitmap allocator
    for low level swap allocation and deallocation.   The new allocator
    is effectively O(5) while the old one was O(N), and the new allocator
    allocates all required memory at init time rather then at allocate
    memory on the fly at run time.

    Swap metadata is allocated in clusters and stored in a hash table,
    eliminating linearly allocated structures.

    Many, many features have been rewritten or added.  Swap space is now
    reallocated on the fly providing a poor-mans auto defragmentation of
    swap space.  Swap space that is no longer needed is freed on a timely
    basis so no garbage collection is necessary.

    Swap I/O is marked B_ASYNC and NFS has been fixed to do the right
    thing with it, so NFS-based paging now has around 10x the performance
    as it did before ( previously NFS enforced synchronous I/O for paging ).
1999-01-21 09:33:07 +00:00
dillon
df24433bbe This is a rather large commit that encompasses the new swapper,
changes to the VM system to support the new swapper, VM bug
    fixes, several VM optimizations, and some additional revamping of the
    VM code.  The specific bug fixes will be documented with additional
    forced commits.  This commit is somewhat rough in regards to code
    cleanup issues.

Reviewed by:	"John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>
1999-01-21 08:29:12 +00:00
eivind
89e1199534 KNFize, by bde. 1999-01-10 01:58:29 +00:00
eivind
a8dc66f457 Split DIAGNOSTIC -> DIAGNOSTIC, INVARIANTS, and INVARIANT_SUPPORT as
discussed on -hackers.

Introduce 'KASSERT(assertion, ("panic message", args))' for simple
check + panic.

Reviewed by:	msmith
1999-01-08 17:31:30 +00:00
dt
065be55870 Don't free swap in swap_pager_getpages(): this code probably cause the
"dying daemons" problem. (I thought this code was introduced in rev.1.80,
but it just relaxed the condition.)

Also, kill related "suggest more swap space" warning (also introduced in
1.80). It was confusing, to say the least...

Requested by:		msmith
Not objected by:	dg
1998-12-29 22:53:51 +00:00
bde
9094f7c05e Fixed a null pointer panic in spc_free(). swap_pager_putpages()
almost always causes this panic for the curproc != pageproc case.
This case apparently doesn't happen in normal operation, but it
happens when vm_page_alloc_contig() is called when there is a memory
hogging application that hasn't already been paged out.

PR:		8632
Reviewed by:	info@opensound.com (Dev Mazumdar), dg
Broken in:	rev.1.89 (1998/02/23)
1998-11-19 06:20:42 +00:00
peter
8ef35acf90 Use TAILQ macros for clean/dirty block list processing. Set b_xflags
rather than abusing the list next pointer with a magic number.
1998-10-31 15:31:29 +00:00
dg
3defb6d13f Fixed two potentially serious classes of bugs:
1) The vnode pager wasn't properly tracking the file size due to
   "size" being page rounded in some cases and not in others.
   This sometimes resulted in corrupted files. First noticed by
   Terry Lambert.
   Fixed by changing the "size" pager_alloc parameter to be a 64bit
   byte value (as opposed to a 32bit page index) and changing the
   pagers and their callers to deal with this properly.
2) Fixed a bogus type cast in round_page() and trunc_page() that
   caused some 64bit offsets and sizes to be scrambled. Removing
   the cast required adding casts at a few dozen callers.
   There may be problems with other bogus casts in close-by
   macros. A quick check seemed to indicate that those were okay,
   however.
1998-10-13 08:24:45 +00:00
dfr
e2df972eb1 Cosmetic changes to the PAGE_XXX macros to make them consistent with
the other objects in vm.
1998-09-04 08:06:57 +00:00
dfr
5fdaeb281d Change various syscalls to use size_t arguments instead of u_int.
Add some overflow checks to read/write (from bde).

Change all modifications to vm_page::flags, vm_page::busy, vm_object::flags
and vm_object::paging_in_progress to use operations which are not
interruptable.

Reviewed by: Bruce Evans <bde@zeta.org.au>
1998-08-24 08:39:39 +00:00
dfr
a1b2079000 Protect all modifications to paging_in_progress with splvm(). 1998-08-13 08:05:13 +00:00
bde
d7aa77e789 Fixed two spl nesting bugs. They caused (at least) the entire pageout
daemon to run at splvm() forever after swap_pager_putpages() is called
from vm_pageout_scan().

Broken in: rev.1.189 (1998/02/23)
1998-07-28 15:30:01 +00:00
bde
f0b863f4b5 Fixed printf format errors. 1998-07-11 07:46:16 +00:00
julian
4363221ba2 VOP_STRATEGY grows an (struct vnode *) argument
as the value in b_vp is often not really what you want.
(and needs to be frobbed). more cleanups will follow this.
Reviewed by: Bruce Evans <bde@freebsd.org>
1998-07-04 20:45:42 +00:00
dyson
dfdb369a7d Work around some VM bugs, the worst being an overly aggressive
swap space free calculation.  More complete fixes will be forthcoming,
in a week.
1998-05-04 03:01:44 +00:00
dyson
b5a79794cd Tighten up management of memory and swap space during map allocation,
deallocation cycles.  This should provide a measurable improvement
on swap and memory allocation on loaded systems.  It is unlikely a
complete solution.  Also, provide more map info with procfs.
Chuck Cranor spurred on this improvement.
1998-04-29 04:28:22 +00:00
bde
b598f559b2 Support compiling with `gcc -ansi'. 1998-04-15 17:47:40 +00:00
dyson
8ceb6160f4 This mega-commit is meant to fix numerous interrelated problems. There
has been some bitrot and incorrect assumptions in the vfs_bio code.  These
problems have manifest themselves worse on NFS type filesystems, but can
still affect local filesystems under certain circumstances.  Most of
the problems have involved mmap consistancy, and as a side-effect broke
the vfs.ioopt code.  This code might have been committed seperately, but
almost everything is interrelated.

1)	Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that
	are fully valid.
2)	Rather than deactivating erroneously read initial (header) pages in
	kern_exec, we now free them.
3)	Fix the rundown of non-VMIO buffers that are in an inconsistent
	(missing vp) state.
4)	Fix the disassociation of pages from buffers in brelse.  The previous
	code had rotted and was faulty in a couple of important circumstances.
5)	Remove a gratuitious buffer wakeup in vfs_vmio_release.
6)	Remove a crufty and currently unused cluster mechanism for VBLK
	files in vfs_bio_awrite.  When the code is functional, I'll add back
	a cleaner version.
7)	The page busy count wakeups assocated with the buffer cache usage were
	incorrectly cleaned up in a previous commit by me.  Revert to the
	original, correct version, but with a cleaner implementation.
8)	The cluster read code now tries to keep data associated with buffers
	more aggressively (without breaking the heuristics) when it is presumed
	that the read data (buffers) will be soon needed.
9)	Change to filesystem lockmgr locks so that they use LK_NOPAUSE.  The
	delay loop waiting is not useful for filesystem locks, due to the
	length of the time intervals.
10)	Correct and clean-up spec_getpages.
11)	Implement a fully functional nfs_getpages, nfs_putpages.
12)	Fix nfs_write so that modifications are coherent with the NFS data on
	the server disk (at least as well as NFS seems to allow.)
13)	Properly support MS_INVALIDATE on NFS.
14)	Properly pass down MS_INVALIDATE to lower levels of the VM code from
	vm_map_clean.
15)	Better support the notion of pages being busy but valid, so that
	fewer in-transit waits occur.  (use p->busy more for pageouts instead
	of PG_BUSY.)  Since the page is fully valid, it is still usable for
	reads.
16)	It is possible (in error) for cached pages to be busy.  Make the
	page allocation code handle that case correctly.  (It should probably
	be a printf or panic, but I want the system to handle coding errors
	robustly.  I'll probably add a printf.)
17)	Correct the design and usage of vm_page_sleep.  It didn't handle
	consistancy problems very well, so make the design a little less
	lofty.  After vm_page_sleep, if it ever blocked, it is still important
	to relookup the page (if the object generation count changed), and
	verify it's status (always.)
18)	In vm_pageout.c, vm_pageout_clean had rotted, so clean that up.
19)	Push the page busy for writes and VM_PROT_READ into vm_pageout_flush.
20)	Fix vm_pager_put_pages and it's descendents to support an int flag
	instead of a boolean, so that we can pass down the invalidate bit.
1998-03-07 21:37:31 +00:00
dyson
69e5a1e9f5 1) Use a more consistent page wait methodology.
2)	Do not unnecessarily force page blocking when paging
	pages out.
3)	Further improve swap pager performance and correctness,
	including fixing the paging in progress deadlock (except
	in severe I/O error conditions.)
4)	Enable vfs_ioopt=1 as a default.
5)	Fix and enable the page prezeroing in SMP mode.

All in all, SMP systems especially should show a significant
improvement in "snappyness."
1998-03-01 04:18:54 +00:00
dyson
014146c040 Fix page prezeroing for SMP, and fix some potential paging-in-progress
hangs.  The paging-in-progress diagnosis was a result of Tor Egge's
excellent detective work.
Submitted by:	Partially from Tor Egge.
1998-02-25 03:56:15 +00:00
dyson
c4e82fbab0 Significantly improve the efficiency of the swap pager, which appears to
have declined due to code-rot over time.  The swap pager rundown code
has been clean-up, and unneeded wakeups removed.  Lots of splbio's
are changed to splvm's.  Also, set the dynamic tunables for the
pageout daemon to be more sane for larger systems (thereby decreasing
the daemon overheadla.)
1998-02-23 08:22:48 +00:00
eivind
d7a6ab2803 Staticize. 1998-02-09 06:11:36 +00:00
eivind
4547a09753 Back out DIAGNOSTIC changes. 1998-02-06 12:14:30 +00:00
eivind
c552a9a1c3 Turn DIAGNOSTIC into a new-style option. 1998-02-04 22:34:03 +00:00
dyson
7fde1e8379 This fix should help the panic problems in -current. There
were some errors in "interval" management.  Due to the
clustering mechanism, the code is necessarily complex and
error prone.
1998-02-03 00:50:36 +00:00
dyson
ef6e7f7b8d Fix a performance problem caused by an earlier commit. 1998-02-01 02:00:20 +00:00
dyson
2aacd1ab4f Change the busy page mgmt, so that when pages are freed, they
MUST be PG_BUSY.  It is bogus to free a page that isn't busy,
because it is in a state of being "unavailable" when being
freed.  The additional advantage is that the page_remove code
has a better cross-check that the page should be busy and
unavailable for other use.  There were some minor problems
with the collapse code, and this plugs those subtile "holes."

Also, the vfs_bio code wasn't checking correctly for PG_BUSY
pages.  I am going to develop a more consistant scheme for
grabbing pages, busy or otherwise.  For now, we are stuck
with the current morass.
1998-01-31 11:56:53 +00:00
dyson
197bd655c4 VM level code cleanups.
1)	Start using TSM.
	Struct procs continue to point to upages structure, after being freed.
	Struct vmspace continues to point to pte object and kva space for kstack.
	u_map is now superfluous.
2)	vm_map's don't need to be reference counted.  They always exist either
	in the kernel or in a vmspace.  The vmspaces are managed by reference
	counts.
3)	Remove the "wired" vm_map nonsense.
4)	No need to keep a cache of kernel stack kva's.
5)	Get rid of strange looking ++var, and change to var++.
6)	Change more data structures to use our "zone" allocator.  Added
	struct proc, struct vmspace and struct vnode.  This saves a significant
	amount of kva space and physical memory.  Additionally, this enables
	TSM for the zone managed memory.
7)	Keep ioopt disabled for now.
8)	Remove the now bogus "single use" map concept.
9)	Use generation counts or id's for data structures residing in TSM, where
	it allows us to avoid unneeded restart overhead during traversals, where
	blocking might occur.
10)	Account better for memory deficits, so the pageout daemon will be able
	to make enough memory available (experimental.)
11)	Fix some vnode locking problems. (From Tor, I think.)
12)	Add a check in ufs_lookup, to avoid lots of unneeded calls to bcmp.
	(experimental.)
13)	Significantly shrink, cleanup, and make slightly faster the vm_fault.c
	code.  Use generation counts, get rid of unneded collpase operations,
	and clean up the cluster code.
14)	Make vm_zone more suitable for TSM.

This commit is partially as a result of discussions and contributions from
other people, including DG, Tor Egge, PHK, and probably others that I
have forgotten to attribute (so let me know, if I forgot.)

This is not the infamous, final cleanup of the vnode stuff, but a necessary
step.  Vnode mgmt should be correct, but things might still change, and
there is still some missing stuff (like ioopt, and physical backing of
non-merged cache files, debugging of layering concepts.)
1998-01-22 17:30:44 +00:00
dyson
b130b30c96 Tie up some loose ends in vnode/object management. Remove an unneeded
config option in pmap.  Fix a problem with faulting in pages.  Clean-up
some loose ends in swap pager memory management.

The system should be much more stable, but all subtile bugs aren't fixed yet.
1998-01-17 09:17:02 +00:00
dyson
d97fabbb53 Support running with inadequate swap space. Additionally, the code
will complain with a suggestion of increasing it.
1997-12-24 15:05:25 +00:00
phk
a1bfb618d9 In all such uses of struct buf: 's/b_un.b_addr/b_data/g' 1997-12-02 21:07:20 +00:00
bde
a08aff2d02 Removed unused #includes. 1997-09-01 03:17:34 +00:00
bde
98fcb3f476 Print a device number in hex instead of decimal. 1997-09-01 02:28:32 +00:00
bde
c978fb3652 Fixed type mismatches for functions with args of type vm_prot_t and/or
vm_inherit_t.  These types are smaller than ints, so the prototypes
should have used the promoted type (int) to match the old-style function
definitions.  They use just vm_prot_t and/or vm_inherit_t.  This depends
on gcc features to work.  I fixed the definitions since this is easiest.
The correct fix may be to change the small types to u_int, to optimize
for time instead of space.
1997-08-25 22:15:31 +00:00
peter
94b6d72794 Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.
1997-02-22 09:48:43 +00:00
jkh
808a36ef65 Make the long-awaited change from $Id$ to $FreeBSD$
This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore.  This update would have been
insane otherwise.
1997-01-14 07:20:47 +00:00
dyson
73dafb0b2c Prepare better for multi-platform by eliminating another required
pmap routine (pmap_is_referenced.)  Upper level recoded to use
pmap_ts_referenced.
1997-01-11 07:19:02 +00:00
bde
2529e37cab Removed __pure's and __pure2's. __pure is a no-op for recent versions
of gcc by definition, and __pure2 is a no-op in effect (presumably the
compiler can see when an inline function has no side effects).
1996-10-12 20:09:48 +00:00
dyson
62b009f8b1 Addition of page coloring support. Various levels of coloring are afforded.
The default level works with minimal overhead, but one can also enable
full, efficient use of a 512K cache.  (Parameters can be generated
to support arbitrary cache sizes also.)
1996-09-08 20:44:49 +00:00
dyson
01ce9d323a Backed out the recent changes/enhancements to the VM code. The
problem with the 'shell scripts' was found, but there was a 'strange'
problem found with a 486 laptop that we could not find.  This commit
backs the code back to 25-jul, and will be re-entered after the snapshot
in smaller (more easily tested) chunks.
1996-07-30 03:08:57 +00:00
dyson
293abd3564 This commit is meant to solve a couple of VM system problems or
performance issues.

	1) The pmap module has had too many inlines, and so the
	   object file is simply bigger than it needs to be.
	   Some common code is also merged into subroutines.
	2) Removal of some *evil* PHYS_TO_VM_PAGE macro calls.
	   Unfortunately, a few have needed to be added also.
	   The removal caused the need for more vm_page_lookups.
	   I added lookup hints to minimize the need for the
	   page table lookup operations.
	3) Removal of some bogus performance improvements, that
	   mostly made the code more complex (tracking individual
	   page table page updates unnecessarily).  Those improvements
	   actually hurt 386 processors perf (not that people who
	   worry about perf use 386 processors anymore :-)).
	4) Changed pv queue manipulations/structures to be TAILQ's.
	5) The pv queue code has had some performance problems since
	   day one.  Some significant scalability issues are resolved
	   by threading the pv entries from the pmap AND the physical
	   address instead of just the physical address.  This makes
	   certain pmap operations run much faster.  This does
	   not affect most micro-benchmarks, but should help loaded system
	   performance *significantly*.  DG helped and came up with most
	   of the solution for this one.
	6) Most if not all pmap bit operations follow the pattern:
		pmap_test_bit();
		pmap_clear_bit();
	   That made for twice the necessary pv list traversal.   The
	   pmap interface now supports only pmap_tc_bit type operations:
	   pmap_[test/clear]_modified, pmap_[test/clear]_referenced.
	   Additionally, the modified routine now takes a vm_page_t arg
	   instead of a phys address.  This eliminates a PHYS_TO_VM_PAGE
	   operation.
	7) Several rewrites of routines that contain redundant code to
	   use common routines, so that there is a greater likelihood of
	   keeping the cache footprint smaller.
1996-07-27 03:24:10 +00:00
dyson
8a5ac3e758 Mostly superficial code improvements, add a diagnostic. The
code improvements include significant simplification of the reservation
of the swap pager control blocks for reads.  Add a panic for an inconsistent
swap pager control block count.
1996-06-10 04:58:48 +00:00
dyson
509f02d4a3 Initial support for MADV_FREE, support for pages that we don't care
about the contents anymore.  This gives us alot of the advantage of
freeing individual pages through munmap, but with almost none of the
overhead.
1996-05-23 00:45:58 +00:00
dyson
242e10df11 This set of commits to the VM system does the following, and contain
contributions or ideas from Stephen McKay <syssgm@devetir.qld.gov.au>,
Alan Cox <alc@cs.rice.edu>, David Greenman <davidg@freebsd.org> and me:

	More usage of the TAILQ macros.  Additional minor fix to queue.h.
	Performance enhancements to the pageout daemon.
		Addition of a wait in the case that the pageout daemon
		has to run immediately.
		Slightly modify the pageout algorithm.
	Significant revamp of the pmap/fork code:
		1) PTE's and UPAGES's are NO LONGER in the process's map.
		2) PTE's and UPAGES's reside in their own objects.
		3) TOTAL elimination of recursive page table pagefaults.
		4) The page directory now resides in the PTE object.
		5) Implemented pmap_copy, thereby speeding up fork time.
		6) Changed the pv entries so that the head is a pointer
		   and not an entire entry.
		7) Significant cleanup of pmap_protect, and pmap_remove.
		8) Removed significant amounts of machine dependent
		   fork code from vm_glue.  Pushed much of that code into
		   the machine dependent pmap module.
		9) Support more completely the reuse of already zeroed
		   pages (Page table pages and page directories) as being
		   already zeroed.
	Performance and code cleanups in vm_map:
		1) Improved and simplified allocation of map entries.
		2) Improved vm_map_copy code.
		3) Corrected some minor problems in the simplify code.
	Implemented splvm (combo of splbio and splimp.)  The VM code now
		seldom uses splhigh.
	Improved the speed of and simplified kmem_malloc.
	Minor mod to vm_fault to avoid using pre-zeroed pages in the case
		of objects with backing objects along with the already
		existant condition of having a vnode.  (If there is a backing
		object, there will likely be a COW...  With a COW, it isn't
		necessary to start with a pre-zeroed page.)
	Minor reorg of source to perhaps improve locality of ref.
1996-05-18 03:38:05 +00:00
phk
5d01dc3d50 Another sweep over the pmap/vm macros, this time with more focus on
the usage.  I'm not satisfied with the naming, but now at least there is
less bogus stuff around.
1996-05-03 21:01:54 +00:00
phk
5a6fb3a7da removed:
CLBYTES PD_SHIFT PGSHIFT NBPG PGOFSET CLSIZELOG2 CLSIZE pdei()
        ptei() kvtopte() ptetov() ispt() ptetoav() &c &c
new:
        NPDEPG

Major macro cleanup.
1996-05-02 14:21:14 +00:00
dyson
51a9444d94 Fix a problem in the swap pager that caused some of the pages that
were paged in under low swap space conditions to both loose their
backing store and their dirty bits.  This would cause pages to
be demand zeroed under certain conditions in low VM space conditions
and consequential sig-11's or sig-10's.  This situation was made
worse lately when the level for swap space reclaim threshold was
increased.
1996-03-06 04:31:46 +00:00
dyson
1b4bcc8dfd In order to fix some concurrency problems with the swap pager early
on in the FreeBSD development, I had made a global lock around the
rlist code.  This was bogus, and now the lock is maintained on a
per resource list basis.  This now allows the rlist code to be used for
almost any non-interrupt level application.
1996-03-03 21:11:08 +00:00
dyson
ed1fa57da8 1) Eliminate unnecessary bzero of UPAGES.
2) Eliminate unnecessary copying of pages during/after forks.
3) Add user map simplification.
1996-03-02 02:54:24 +00:00
dg
f7e207c60f "out of space" -> "out of swap space". 1996-01-31 13:14:21 +00:00
dyson
8fc8a772af Eliminated many redundant vm_map_lookup operations for vm_mmap.
Speed up for vfs_bio -- addition of a routine bqrelse to greatly diminish
	overhead for merged cache.
Efficiency improvement for vfs_cluster.  It used to do alot of redundant
	calls to cluster_rbuild.
Correct the ordering for vrele of .text and release of credentials.
Use the selective tlb update for 486/586/P6.
Numerous fixes to the size of objects allocated for files.  Additionally,
	fixes in the various pagers.
Fixes for proper positioning of vnode_pager_setsize in msdosfs and ext2fs.
Fixes in the swap pager for exhausted resources.  The pageout code
	will not as readily thrash.
Change the page queue flags (PG_ACTIVE, PG_INACTIVE, PG_FREE, PG_CACHE) into
	page queue indices (PQ_ACTIVE, PQ_INACTIVE, PQ_FREE, PQ_CACHE),
	thereby improving efficiency of several routines.
Eliminate even more unnecessary vm_page_protect operations.
Significantly speed up process forks.
Make vm_object_page_clean more efficient, thereby eliminating the pause
	that happens every 30seconds.
Make sequential clustered writes B_ASYNC instead of B_DELWRI even in the
	case of filesystems mounted async.
Fix a panic with busy pages when write clustering is done for non-VMIO
	buffers.
1996-01-19 04:00:31 +00:00
bde
ca36bb34d1 Fixed 1TB filesize changes. Some pindexes had bogus names and types
but worked because vm_pindex_t is indistinuishable from vm_offset_t.
1995-12-17 07:19:58 +00:00
phk
9cb413a93c Another mega commit to staticize things. 1995-12-14 09:55:16 +00:00
phk
63ec2c0ae9 A Major staticize sweep. Generates a couple of warnings that I'll deal
with later.
A number of unused vars removed.
A number of unused procs removed or #ifdefed.
1995-12-14 08:32:45 +00:00
dyson
c8faa63068 Some new anti-deadlock code ended up messing up the paging stats. A modified
version of the code is now in place, and gausspage performance is back
up to where it should be.
1995-12-11 15:43:33 +00:00
dyson
601ed1a4c0 Changes to support 1Tb filesizes. Pages are now named by an
(object,index) pair instead of (object,offset) pair.
1995-12-11 04:58:34 +00:00
dg
c30f46c534 Untangled the vm.h include file spaghetti. 1995-12-07 12:48:31 +00:00
bde
754df9d25d Completed function declarations and/or added prototypes.
Staticized some functions.

__purified some functions.  Some functions were bogusly declared as
returning `const'.  This hasn't done anything since gcc-2.5.  For
later versions of gcc, the equivalent is __attribute__((const)) at
the end of function declarations.
1995-12-03 12:18:39 +00:00
phk
07252df1ff Remove unused vars & funcs, make things static, protoize a little bit. 1995-11-20 12:20:02 +00:00
bde
fd57258459 Fixed recent staticizations. Some protypes for static functions were
left in headers and not staticized.
1995-11-16 09:51:22 +00:00
phk
010c461fbe staticize. 1995-11-14 20:53:20 +00:00
dg
c85bf90168 Move page fixups (pmap_clear_modify, etc) that happen after paging input
completes out of vm_fault and into the pagers. This get rid of some
redundancy and improves the architecture.

Reviewed by:	John Dyson <dyson>
1995-11-02 06:42:47 +00:00
dg
56e87c0fef Check that the swap block is valid before including it in a cluster.
Submitted by:	John Dyson
1995-09-24 04:40:19 +00:00
dyson
4f7c5687d4 Make sure that the prezero flag is cleared when needed. 1995-09-11 00:47:17 +00:00
dyson
bd69f4b400 Fixed a sign reversal problem -- might have cause some Sig-11s that
people have been seeing.
1995-09-06 07:08:45 +00:00
dyson
3f4ad92abf Allow the fault code to use additional clustering info from both
bmap and the swap pager.  Improved fault clustering performance.
1995-09-04 04:44:26 +00:00
dg
1c7ffd3f4c 1) Merged swpager structure into vm_object.
2) Changed swap_pager internal interfaces to cope w/#1.
3) Eliminated object->copy as we no longer have copy objects.
4) Minor stylistic changes.
1995-07-16 13:28:37 +00:00
dg
c8b0a7332c NOTE: libkvm, w, ps, 'top', and any other utility which depends on struct
proc or any VM system structure will have to be rebuilt!!!

Much needed overhaul of the VM system. Included in this first round of
changes:

1) Improved pager interfaces: init, alloc, dealloc, getpages, putpages,
   haspage, and sync operations are supported. The haspage interface now
   provides information about clusterability. All pager routines now take
   struct vm_object's instead of "pagers".

2) Improved data structures. In the previous paradigm, there is constant
   confusion caused by pagers being both a data structure ("allocate a
   pager") and a collection of routines. The idea of a pager structure has
   escentially been eliminated. Objects now have types, and this type is
   used to index the appropriate pager. In most cases, items in the pager
   structure were duplicated in the object data structure and thus were
   unnecessary. In the few cases that remained, a un_pager structure union
   was created in the object to contain these items.

3) Because of the cleanup of #1 & #2, a lot of unnecessary layering can now
   be removed. For instance, vm_object_enter(), vm_object_lookup(),
   vm_object_remove(), and the associated object hash list were some of the
   things that were removed.

4) simple_lock's removed. Discussion with several people reveals that the
   SMP locking primitives used in the VM system aren't likely the mechanism
   that we'll be adopting. Even if it were, the locking that was in the code
   was very inadequate and would have to be mostly re-done anyway. The
   locking in a uni-processor kernel was a no-op but went a long way toward
   making the code difficult to read and debug.

5) Places that attempted to kludge-up the fact that we don't have kernel
   thread support have been fixed to reflect the reality that we are really
   dealing with processes, not threads. The VM system didn't have complete
   thread support, so the comments and mis-named routines were just wrong.
   We now use tsleep and wakeup directly in the lock routines, for instance.

6) Where appropriate, the pagers have been improved, especially in the
   pager_alloc routines. Most of the pager_allocs have been rewritten and
   are now faster and easier to maintain.

7) The pagedaemon pageout clustering algorithm has been rewritten and
   now tries harder to output an even number of pages before and after
   the requested page. This is sort of the reverse of the ideal pagein
   algorithm and should provide better overall performance.

8) Unnecessary (incorrect) casts to caddr_t in calls to tsleep & wakeup
   have been removed. Some other unnecessary casts have also been removed.

9) Some almost useless debugging code removed.

10) Terminology of shadow objects vs. backing objects straightened out.
    The fact that the vm_object data structure escentially had this
    backwards really confused things. The use of "shadow" and "backing
    object" throughout the code is now internally consistent and correct
    in the Mach terminology.

11) Several minor bug fixes, including one in the vm daemon that caused
    0 RSS objects to not get purged as intended.

12) A "default pager" has now been created which cleans up the transition
    of objects to the "swap" type. The previous checks throughout the code
    for swp->pg_data != NULL were really ugly. This change also provides
    the rudiments for future backing of "anonymous" memory by something
    other than the swap pager (via the vnode pager, for example), and it
    allows the decision about which of these pagers to use to be made
    dynamically (although will need some additional decision code to do
    this, of course).

13) (dyson) MAP_COPY has been deprecated and the corresponding "copy
    object" code has been removed. MAP_COPY was undocumented and non-
    standard. It was furthermore broken in several ways which caused its
    behavior to degrade to MAP_PRIVATE. Binaries that use MAP_COPY will
    continue to work correctly, but via the slightly different semantics
    of MAP_PRIVATE.

14) (dyson) Sharing maps have been removed. It's marginal usefulness in a
    threads design can be worked around in other ways. Both #12 and #13
    were done to simplify the code and improve readability and maintain-
    ability. (As were most all of these changes)

TODO:

1) Rewrite most of the vnode pager to use VOP_GETPAGES/PUTPAGES. Doing
   this will reduce the vnode pager to a mere fraction of its current size.

2) Rewrite vm_fault and the swap/vnode pagers to use the clustering
   information provided by the new haspage pager interface. This will
   substantially reduce the overhead by eliminating a large number of
   VOP_BMAP() calls. The VOP_BMAP() filesystem interface should be
   improved to provide both a "behind" and "ahead" indication of
   contiguousness.

3) Implement the extended features of pager_haspage in swap_pager_haspage().
   It currently just says 0 pages ahead/behind.

4) Re-implement the swap device (swstrategy) in a more elegant way, perhaps
   via a much more general mechanism that could also be used for disk
   striping of regular filesystems.

5) Do something to improve the architecture of vm_object_collapse(). The
   fact that it makes calls into the swap pager and knows too much about
   how the swap pager operates really bothers me. It also doesn't allow
   for collapsing of non-swap pager objects ("unnamed" objects backed by
   other pagers).
1995-07-13 08:48:48 +00:00
rgrimes
c86f0c7a71 Remove trailing whitespace. 1995-05-30 08:16:23 +00:00
dg
56d21b4218 Accessing pages beyond the end of a mapped file results in internal
inconsistencies in the VM system that eventually lead to a panic. These
changes fix the behavior to conform to the behavior in SunOS, which is
to deny faults to pages beyond the EOF (returning SIGBUS). Internally,
this is implemented by requiring faults to be within the object size
boundaries. These changes exposed another bug, namely that passing in
an offset to mmap when trying to map an unnamed anonymous region also
results in internal inconsistencies. In this case, the offset is forced
to zero.

Reviewed by:	John Dyson and others
1995-05-18 02:59:26 +00:00
dg
b649d7b9c7 Changed swap partition handling/allocation so that it doesn't
require specific partitions be mentioned in the kernel config
file ("swap on foo" is now obsolete).

From Poul-Henning:

The visible effect is this:

As default, unless
        options "NSWAPDEV=23"
is in your config, you will have four swap-devices.
You can swapon(2) any block device you feel like, it doesn't have
to be in the kernel config.

There is a performance/resource win available by getting the NSWAPDEV right
(but only if you have just one swap-device ??), but using that as default
would be too restrictive.

The invisible effect is that:

Swap-handling disappears from the $arch part of the kernel.
It gets a lot simpler (-145 lines) and cleaner.

Reviewed by:	John Dyson, David Greenman
Submitted by:	Poul-Henning Kamp, with minor changes by me.
1995-05-14 03:00:10 +00:00
dg
a2a89cc5d7 Changed "handle" from type caddr_t to void *; "handle" is several different
types of pointers, and "char *" is a bad choice for the type.
1995-05-10 18:56:09 +00:00
dyson
02299bb821 Another error in the correction for trimming swap allocation for
small objects.  (This code needs to be revisited.)
1995-05-07 06:36:59 +00:00
dyson
f90d96a223 Fixed a calculation that would once-in-a-while cause the swap_pager
to emit spurious page outside of object type messages.  It is not
a fatal condition anyway, so the message will be omitted for
release.  Also, the code that "clips" the allocation size, associated
with the above problem, was fixed.
1995-05-07 03:48:54 +00:00
dg
c2e217461c New flag: B_PAGING. Added as part of the vn driver hack. 1995-04-19 10:32:11 +00:00
dg
7a85187013 Removed obsolete/unused variable declarations.
Removed some extern declarations and included the correct include files.
1995-04-16 13:58:42 +00:00
dg
a616136616 Moved some zero-initialized variables into .bss. Made code intended to be
called only from DDB #ifdef DDB. Removed some completely unused globals.
1995-04-16 12:56:22 +00:00
dg
08726c6d9b Added a check for wrong object size; print a warning, but deal with it
correctly. The warning will tell us that there is a bug somewhere else
in sizing the object correctly.

Submitted by:	John Dyson
1995-03-22 05:12:18 +00:00
dg
9cd78521d8 Removed redundant newlines that were in some panic strings. 1995-03-19 14:29:26 +00:00
dg
ff36772fd1 Clear OBJ_INTERNAL flag for device pager objects and named anonymous
objects.
1995-03-11 22:25:02 +00:00
dg
eb198debb8 Various changes from John and myself that do the following:
New functions create - vm_object_pip_wakeup and pagedaemon_wakeup that
are used to reduce the actual number of wakeups.
New function vm_page_protect which is used in conjuction with some new
page flags to reduce the number of calls to pmap_page_protect.
Minor changes to reduce unnecessary spl nesting.
Rewrote vm_page_alloc() to improve readability.
Various other mostly cosmetic changes.
1995-03-01 23:30:04 +00:00
dg
0da562d454 Fixed severely broken printf (arguments out of order, no newline). 1995-02-25 17:02:48 +00:00
dg
b523cbbaad Only do object paging_in_progress wakeups if someone is waiting on this
condition.

Submitted by:	John Dyson
1995-02-22 09:15:35 +00:00
dg
23db83ee1f Deprecated remaining use of vm_deallocate. Deprecated vm_allocate_with_
pager(). Almost completely rewrote vm_mmap(); when John gets done with
the bottom half, it will be a complete rewrite. Deprecated most use of
vm_object_setpager(). Removed side effect of setting object persist
in vm_object_enter and moved this into the pager(s). A few other
cosmetic changes.
1995-02-21 01:22:48 +00:00
dg
7fc0fd1f47 swap_pager.c:
Fixed long standing bug in freeing swap space during object collapses.
Fixed 'out of space' messages from printing out too often.
Modified to use new kmem_malloc() calling convention.
Implemented an additional stat in the swap pager struct to count the
amount of space allocated to that pager. This may be removed at some
point in the future.
Minimized unnecessary wakeups.

vm_fault.c:
Don't try to collect fault stats on 'swapped' processes - there aren't
any upages to store the stats in.
Changed read-ahead policy (again!).

vm_glue.c:
Be sure to gain a reference to the process's map before swapping.
Be sure to lose it when done.

kern_malloc.c:
Added the ability to specify if allocations are at interrupt time or
are 'safe'; this affects what types of pages can be allocated.

vm_map.c:
Fixed a variety of map lock problems; there's still a lurking bug that
will eventually bite.

vm_object.c:
Explicitly initialize the object fields rather than bzeroing the struct.
Eliminated the 'rcollapse' code and folded it's functionality into the
"real" collapse routine.
Moved an object_unlock() so that the backing_object is protected in
the qcollapse routine.
Make sure nobody fools with the backing_object when we're destroying it.
Added some diagnostic code which can be called from the debugger that
looks through all the internal objects and makes certain that they
all belong to someone.

vm_page.c:
Fixed a rather serious logic bug that would result in random system
crashes. Changed pagedaemon wakeup policy (again!).

vm_pageout.c:
Removed unnecessary page rotations on the inactive queue.
Changed the number of pages to explicitly free to just free_reserved
level.

Submitted by:	John Dyson
1995-02-02 09:09:15 +00:00
dg
a9e08ab1e3 Added ability to detect sequential faults and DTRT. (swap_pager.c)
Added hook for pmap_prefault() and use symbolic constant for new third
argument to vm_page_alloc() (vm_fault.c, various)
Changed the way that upages and page tables are held. (vm_glue.c)
Fixed architectural flaw in allocating pages at interrupt time that was
introduced with the merged cache changes. (vm_page.c, various)
Adjusted some algorithms to acheive better paging performance and to
accomodate the fix for the architectural flaw mentioned above. (vm_pageout.c)
Fixed pbuf handling problem, changed policy on handling read-behind page.
(vnode_pager.c)

Submitted by:	John Dyson
1995-01-24 10:14:09 +00:00
dg
6491ec70c9 Fixed some formatting weirdness that I overlooked in the previous commit. 1995-01-10 07:32:52 +00:00
dg
1707d41102 These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.

The majority of the merged VM/cache work is by John Dyson.

The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.

vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme.  The scheme is almost fully compatible with the old filesystem
interface.  Significant improvement in the number of opportunities for write
clustering.

vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache.  Fixup of vfs_cluster to eliminate the bogus pagemove stuff.

vm_object.c:
Yet more improvements in the collapse code.  Elimination of some windows that
can cause list corruption.

vm_pageout.c:
Fixed it, it really works better now.  Somehow in 2.0, some "enhancements"
broke the code.  This code has been reworked from the ground-up.

vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.

pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.

vm_glue.c
Much simpler and more effective swapping code.  No more gratuitous swapping.

proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.

swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency.  Now the
code doesn't need it anymore.

machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.

machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.

ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache.  Add "bypass" support for sneaking in on
busy buffers.

Submitted by:	John Dyson and David Greenman
1995-01-09 16:06:02 +00:00
dg
e98518e769 Initialize b_vnbuf.le_next before returning a new buffer in getpbuf and
trypbuf. Move a couple of splbio's to be slightly less conservative.
1994-12-23 04:56:51 +00:00
dg
d4b8853a79 Fixed a benign off by one error. 1994-12-22 05:18:12 +00:00
dg
05d48b0c7e Don't ever clear B_BUSY on a pbuf (or any other flag for that matter).
This appears to be the cause of some buffer confusion that leads to
a panic during heavy paging.

Submitted by:	 John Dyson
1994-12-19 00:02:56 +00:00
dg
cf1d2fea84 Fixed bugs in accounting of swap space that resulted in the pager thinking
it was out of space when it really wasn't.

Submitted by:	John Dyson
1994-11-13 15:36:48 +00:00
dg
196a487259 Fixed return status from pagers. Ahem...the previous method would manufacture
data when it couldn't get it legitimately. :-(

Submitted by:	John Dyson
1994-11-06 09:55:31 +00:00
dg
2837d58a98 Improved I/O error reporting. 1994-10-25 07:06:20 +00:00
dg
e8b2d4b14c Various changes to allow operation without any swapspace configured. Note
that this is intended for use only in floppy situations and is done at
the sacrifice of performance in that case (in ther words, this is not the
best solution, but works okay for this exceptional situation).

Submitted by:	John Dyson
1994-10-22 02:18:03 +00:00
dg
eb282b107c 1) Some of the counters in the vmmeter struct don't fit well into the Mach VM
scheme of things, so I've changed them to be more appropriate. page in/ous
   are now associated with the pager that did them. Nuked v_fault as the
   only fault of interest that wouldn't be already counted in v_trap is a VM
   fault, and this is counted seperately.
2) Implemented most of the remaining counters and corrected the counting of
   some that were done wrong. They are all almost correct now...just a few
   minor ones left to fix.
1994-10-15 13:33:09 +00:00
dg
28cf896d68 Got rid of redundant declaration warnings. 1994-10-14 12:26:18 +00:00
dg
449e14aa40 Fixed bug where page modifications would be lost when swap space was
almost depleted.

Reviewed by:	John Dyson
1994-10-14 01:58:52 +00:00
dg
8ec51aef1b Got rid of map.h. It's a leftover from the rmap code, and we use rlists.
Changed swapmap into swaplist.
1994-10-09 07:35:18 +00:00
phk
09c3293a0f Cosmetics: unused vars, ()'s, #include's &c &c to silence gcc.
Reviewed by:	davidg
1994-10-09 01:52:19 +00:00
dg
b769fbff39 Disabled swap anti-fragmentation code. It reduces swap paging performance
by 20% in my tests, and it appears to be the cause of a swap leak.

Submitted by:	John Dyson
1994-09-25 04:02:10 +00:00
dg
9bf406e501 Patches from John Dyson to improve swap code efficiency.
Religiously add back pmap_clear_modify() in vnode_pager_input until we figure
out why system performance isn't what we expect.

Submitted by:	John Dyson (swap_pager) & David Greenman (vnode_pager)
1994-08-29 06:23:19 +00:00
wollman
f9fc827448 Fix up some sloppy coding practices:
- Delete redundant declarations.
- Add -Wredundant-declarations to Makefile.i386 so they don't come back.
- Delete sloppy COMMON-style declarations of uninitialized data in
  header files.
- Add a few prototypes.
- Clean up warnings resulting from the above.

NB: ioconf.c will still generate a redundant-declaration warning, which
is unavoidable unless somebody volunteers to make `config' smarter.
1994-08-18 22:36:09 +00:00
dg
81dc370a9d Provide support for upcoming merged VM/buffer cache, and fixed a few bugs
that haven't appeared to manifest themselves (yet).

Submitted by:	John Dyson
1994-08-07 13:10:43 +00:00
dg
8b20309268 Incorporated post 1.1.5 work from John Dyson. This includes performance
improvements via the new routines pmap_qenter/pmap_qremove and pmap_kenter/
pmap_kremove. These routine allow fast mapping of pages for those
architectures that have "normal" MMUs. Also included is a fix to the
pageout daemon to properly check a queue end condition.

Submitted by:	John Dyson
1994-08-06 09:15:42 +00:00
dg
8d205697aa Added $Id$ 1994-08-02 07:55:43 +00:00
dg
0e87163cbf Removed all code related to the pagescan daemon, and changed 'act_count'
adjustments to compensate for a world without the pagescan daemon.
1994-08-01 11:25:45 +00:00
rgrimes
2469c867a1 The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.
Reviewed by:	Rodney W. Grimes
Submitted by:	John Dyson and David Greenman
1994-05-25 09:21:21 +00:00
rgrimes
8fb65ce818 BSD 4.4 Lite Kernel Sources 1994-05-24 10:09:53 +00:00