Commit Graph

969 Commits

Author SHA1 Message Date
Alan Cox
c207703465 Set cnt.v_page_size to PAGE_SIZE rather than DEFAULT_PAGE_SIZE so that
"vmstat -s" reports the correct value on the Alpha.

Submitted by:	Hidetoshi Shimokawa <simokawa@sat.t.u-tokyo.ac.jp>
1999-06-20 04:55:29 +00:00
Alan Cox
6ea5bd80fe Remove some unused function and variable declarations. 1999-06-19 18:42:53 +00:00
Alan Cox
6389da78d5 vm_map_growstack uses vmspace::vm_ssize as though it contained
the stack size in bytes when in fact it is the stack size in pages.
1999-06-17 21:29:38 +00:00
Alan Cox
29b45e9e99 vm_map_insert sometimes extends an existing vm_map entry, rather than
creating a new entry.  vm_map_stack and vm_map_growstack can panic when
a new entry isn't created.  Fixed vm_map_stack and vm_map_growstack.

Also, when extending the stack, always set the protection to VM_PROT_ALL.
1999-06-17 05:49:00 +00:00
Alan Cox
94f7e29a2a Move vm_map_stack and vm_map_growstack after the definition
of the vm_map_clip_end macro.  (The next commit will modify
vm_map_stack and vm_map_growstack to use vm_map_clip_end.)
1999-06-17 00:39:26 +00:00
Alan Cox
1fc43fd11d Remove some unused declarations and duplicate initialization. 1999-06-17 00:27:39 +00:00
Alan Cox
1c85e3df24 vm_map_protect:
The wrong vm_map_entry is used to determine if writes must not be
	allowed due to COW.
1999-06-12 23:10:38 +00:00
Dmitrij Tejblum
a839bdc8af Add a function kmem_alloc_nofault() - same as kmem_alloc_pageable(), but
create a nofault entry. It will be used to allocate kmem for upages.

(I am not too happy with all this, but it's better than nothing).
1999-06-08 17:03:28 +00:00
Alan Cox
4738fa0970 vm_mmap:
Insure that device mappings get MAP_PREFAULT(_PARTIAL) set,
	so that 4M page mappings are used when possible.

Reviewed by:	Luoqi Chen <luoqi@watermarkgroup.com>
1999-06-05 18:21:53 +00:00
Poul-Henning Kamp
ae19718ee2 Shorten a detour around dev_t to get a udev_t created. 1999-06-01 17:11:27 +00:00
Poul-Henning Kamp
2447bec829 Simplify cdevsw registration.
The cdevsw_add() function now finds the major number(s) in the
struct cdevsw passed to it.  cdevsw_add_generic() is no longer
needed, cdevsw_add() does the same thing.

cdevsw_add() will print an message if the d_maj field looks bogus.

Remove nblkdev and nchrdev variables.  Most places they were used
bogusly.  Instead check a dev_t for validity by seeing if devsw()
or bdevsw() returns NULL.

Move bdevsw() and devsw() functions to kern/kern_conf.c

Bump __FreeBSD_version to 400006

This commit removes:
        72 bogus makedev() calls
        26 bogus SYSINIT functions

if_xe.c bogusly accessed cdevsw[], author/maintainer please fix.

I4b and vinum not changed.  Patches emailed to authors.  LINT
probably broken until they catch up.
1999-05-31 11:29:30 +00:00
Poul-Henning Kamp
4e2f199e0c This commit should be a extensive NO-OP:
Reformat and initialize correctly all "struct cdevsw".

        Initialize the d_maj and d_bmaj fields.

        The d_reset field was not removed, although it is never used.

I used a program to do most of this, so all the files now use the
same consistent format.  Please keep it that way.

Vinum and i4b not modified, patches emailed to respective authors.
1999-05-30 16:53:49 +00:00
Alan Cox
c7997d57f1 Addendum to 1.155. Verify the existence of the object before checking
its reference count.
1999-05-30 01:12:19 +00:00
Alan Cox
9a2f6362a7 Avoid the creation of unnecessary shadow objects. 1999-05-28 03:39:44 +00:00
Alan Cox
4e045f937b vm_map_insert:
General cleanup.  Eliminate coalescing checks that are duplicated
	by vm_object_coalesce.
1999-05-18 05:38:48 +00:00
Alan Cox
e972780a11 Add the options MAP_PREFAULT and MAP_PREFAULT_PARTIAL to vm_map_find/insert,
eliminating the need for the pmap_object_init_pt calls in imgact_* and
mmap.

Reviewed by:	David Greenman <dg@root.com>
1999-05-17 00:53:56 +00:00
Alan Cox
ea41812fe5 Remove prototypes for functions that don't exist anymore (vm_map.h).
Remove a useless argument from vm_map_madvise's interface (vm_map.c,
	vm_map.h, and vm_mmap.c).

Remove a redundant test in vm_uiomove (vm_map.c).

Make two changes to vm_object_coalesce:

1. Determine whether the new range of pages actually overlaps
the existing object's range of pages before calling vm_object_page_remove.
(Prior to this change almost 90% of the calls to vm_object_page_remove
were to remove pages that were beyond the end of the object.)

2. Free any swap space allocated to removed pages.
1999-05-16 05:07:34 +00:00
Dmitrij Tejblum
54746b676c Fix confusion of size of transfer with size of the pager.
PR:		11658
Broken in:	1.89 (1998/03/07)
1999-05-15 23:42:39 +00:00
Alan Cox
e5f13bdd09 Simplify vm_map_find/insert's interface: remove the MAP_COPY_NEEDED option.
It never makes sense to specify MAP_COPY_NEEDED without also specifying
MAP_COPY_ON_WRITE, and vice versa.  Thus, MAP_COPY_ON_WRITE suffices.

Reviewed by:	David Greenman <dg@root.com>
1999-05-14 23:09:34 +00:00
Bruce Evans
ebb4a31711 Casting handles from void * to uintptr_t on the way to dev_t became
especially bogus when dev_t became a pointer.
1999-05-13 12:55:37 +00:00
Luoqi Chen
7a73ea0414 Device pager's handle is dev_t not udev_t. 1999-05-13 04:02:07 +00:00
Poul-Henning Kamp
c32e6392b5 Fix a udev_t/dev_t mismatch which prevent paging from working. 1999-05-12 11:05:23 +00:00
Poul-Henning Kamp
bfbb9ce670 Divorce "dev_t" from the "major|minor" bitmap, which is now called
udev_t in the kernel but still called dev_t in userland.

Provide functions to manipulate both types:
        major()         umajor()
        minor()         uminor()
        makedev()       umakedev()
        dev2udev()      udev2dev()

For now they're functions, they will become in-line functions
after one of the next two steps in this process.

Return major/minor/makedev to macro-hood for userland.

Register a name in cdevsw[] for the "filedescriptor" driver.

In the kernel the udev_t appears in places where we have the
major/minor number combination, (ie: a potential device: we
may not have the driver nor the device), like in inodes, vattr,
cdevsw registration and so on, whereas the dev_t appears where
we carry around a reference to a actual device.

In the future the cdevsw and the aliased-from vnode will be hung
directly from the dev_t, along with up to two softc pointers for
the device driver and a few houskeeping bits.  This will essentially
replace the current "alias" check code (same buck, bigger bang).

A little stunt has been provided to try to catch places where the
wrong type is being used (dev_t vs udev_t), if you see something
not working, #undef DEVT_FASCIST in kern/kern_conf.c and see if
it makes a difference.  If it does, please try to track it down
(many hands make light work) or at least try to reproduce it
as simply as possible, and describe how to do that.

Without DEVT_FASCIST I belive this patch is a no-op.

Stylistic/posixoid comments about the userland view of the <sys/*.h>
files welcome now, from userland they now contain the end result.

Next planned step: make all dev_t's refer to the same devsw[] which
means convert BLK's to CHR's at the perimeter of the vnodes and
other places where they enter the game (bootdev, mknod, sysctl).
1999-05-11 19:55:07 +00:00
Poul-Henning Kamp
b19d4b12c0 No point in swapdev being a static global when used only locally. 1999-05-09 17:28:00 +00:00
Poul-Henning Kamp
4be2eb8c49 I got tired of seeing all the cdevsw[major(foo)] all over the place.
Made a new (inline) function devsw(dev_t dev) and substituted it.

Changed to the BDEV variant to this format as well: bdevsw(dev_t dev)

DEVFS will eventually benefit from this change too.
1999-05-08 06:40:31 +00:00
Poul-Henning Kamp
46eede0058 Continue where Julian left off in July 1998:
Virtualize bdevsw[] from cdevsw.  bdevsw() is now an (inline)
        function.

        Join CDEV_MODULE and BDEV_MODULE to DEV_MODULE (please pay attention
        to the order of the cmaj/bmaj arguments!)

        Join CDEV_DRIVER_MODULE and BDEV_DRIVER_MODULE to DEV_DRIVER_MODULE
        (ditto!)

(Next step will be to convert all bdev dev_t's to cdev dev_t's
before they get to do any damage^H^H^H^H^H^Hwork in the kernel.)
1999-05-07 10:11:40 +00:00
Poul-Henning Kamp
c48d17750f Introduce two functions: physread() and physwrite() and use these directly
in *devsw[] rather than the 46 local copies of the same functions.

(grog will do the same for vinum when he has time)
1999-05-07 07:03:47 +00:00
Peter Wemm
4d38e6b5ec Add brackets to silence egcs and help clarity. 1999-05-06 22:06:45 +00:00
Poul-Henning Kamp
b0eeea2042 remove b_proc from struct buf, it's (now) unused.
Reviewed by:	dillon, bde
1999-05-06 20:00:34 +00:00
Luoqi Chen
d28ab90f02 Don't ignore mmap() address hint below the text section. 1999-05-06 00:46:19 +00:00
Bill Fumerola
3d177f465a Add sysctl descriptions to many SYSCTL_XXXs
PR:		kern/11197
Submitted by:	Adrian Chadd <adrian@FreeBSD.org>
Reviewed by:	billf(spelling/style/minor nits)
Looked at by:	bde(style)
1999-05-03 23:57:32 +00:00
Alan Cox
4221e284a3 The VFS/BIO subsystem contained a number of hacks in order to optimize
piecemeal, middle-of-file writes for NFS.  These hacks have caused no
end of trouble, especially when combined with mmap().  I've removed
them.  Instead, NFS will issue a read-before-write to fully
instantiate the struct buf containing the write.  NFS does, however,
optimize piecemeal appends to files.  For most common file operations,
you will not notice the difference.  The sole remaining fragment in
the VFS/BIO system is b_dirtyoff/end, which NFS uses to avoid cache
coherency issues with read-merge-write style operations.  NFS also
optimizes the write-covers-entire-buffer case by avoiding the
read-before-write.  There is quite a bit of room for further
optimization in these areas.

The VM system marks pages fully-valid (AKA vm_page_t->valid =
VM_PAGE_BITS_ALL) in several places, most noteably in vm_fault.  This
is not correct operation.  The vm_pager_get_pages() code is now
responsible for marking VM pages all-valid.  A number of VM helper
routines have been added to aid in zeroing-out the invalid portions of
a VM page prior to the page being marked all-valid.  This operation is
necessary to properly support mmap().  The zeroing occurs most often
when dealing with file-EOF situations.  Several bugs have been fixed
in the NFS subsystem, including bits handling file and directory EOF
situations and buf->b_flags consistancy issues relating to clearing
B_ERROR & B_INVAL, and handling B_DONE.

getblk() and allocbuf() have been rewritten.  B_CACHE operation is now
formally defined in comments and more straightforward in
implementation.  B_CACHE for VMIO buffers is based on the validity of
the backing store.  B_CACHE for non-VMIO buffers is based simply on
whether the buffer is B_INVAL or not (B_CACHE set if B_INVAL clear,
and vise-versa).  biodone() is now responsible for setting B_CACHE
when a successful read completes.  B_CACHE is also set when a bdwrite()
is initiated and when a bwrite() is initiated.  VFS VOP_BWRITE
routines (there are only two - nfs_bwrite() and bwrite()) are now
expected to set B_CACHE.  This means that bowrite() and bawrite() also
set B_CACHE indirectly.

There are a number of places in the code which were previously using
buf->b_bufsize (which is DEV_BSIZE aligned) when they should have
been using buf->b_bcount.  These have been fixed.  getblk() now clears
B_DONE on return because the rest of the system is so bad about
dealing with B_DONE.

Major fixes to NFS/TCP have been made.  A server-side bug could cause
requests to be lost by the server due to nfs_realign() overwriting
other rpc's in the same TCP mbuf chain.  The server's kernel must be
recompiled to get the benefit of the fixes.

Submitted by:	Matthew Dillon <dillon@apollo.backplane.com>
1999-05-02 23:57:16 +00:00
Dmitrij Tejblum
604359cf9b s/static foo_devsw_installed = 0;/static int foo_devsw_installed;/.
(Edited automatically)
1999-04-28 10:54:24 +00:00
Poul-Henning Kamp
f711d546d2 Suser() simplification:
1:
  s/suser/suser_xxx/

2:
  Add new function: suser(struct proc *), prototyped in <sys/proc.h>.

3:
  s/suser_xxx(\([a-zA-Z0-9_]*\)->p_ucred, \&\1->p_acflag)/suser(\1)/

The remaining suser_xxx() calls will be scrutinized and dealt with
later.

There may be some unneeded #include <sys/cred.h>, but they are left
as an exercise for Bruce.

More changes to the suser() API will come along with the "jail" code.
1999-04-27 11:18:52 +00:00
Dmitrij Tejblum
11a9f83f80 Make pmap_collect() an official pmap interface. 1999-04-23 20:29:58 +00:00
Peter Wemm
db42d90829 unifdef -DVM_STACK - it's been on for a while for x86 and was checked
and appeared to be working for the Alpha some time ago.
1999-04-19 14:14:14 +00:00
Peter Wemm
b8df55a044 Move the declaration of faultin() from the vm headers to proc.h, since
it is now referenced from a macro there (PHOLD()).
1999-04-13 19:17:15 +00:00
Eivind Eklund
0776e10c71 Staticize 1999-04-11 02:16:27 +00:00
Dmitrij Tejblum
897a45eff9 Convert usage of vm_page_bits() to the new convention ("Inputs are required
to range within a page").
1999-04-10 20:52:11 +00:00
Eivind Eklund
c523e8b21d Lock vnode correctly for VOP_OPEN.
Discussed with:	alc, dillon
1999-04-10 17:54:43 +00:00
Peter Wemm
c8da68e917 Don't forcibly kill processes that are locked in-core via PHOLD - it was
just checking P_NOSWAP before.
1999-04-06 03:14:56 +00:00
Peter Wemm
637cae1dd4 Only use p->p_lock (manage by PHOLD()/PRELE()) - P_NOSWAP/P_PHYSIO is no
longer set.
1999-04-06 03:11:34 +00:00
Julian Elischer
8d17e69460 Catch a case spotted by Tor where files mmapped could leave garbage in the
unallocated parts of the last page when the file ended on a frag
but not a page boundary.
Delimitted by tags PRE_MATT_MMAP_EOF and POST_MATT_MMAP_EOF,
in files alpha/alpha/pmap.c i386/i386/pmap.c nfs/nfs_bio.c vm/pmap.h
    vm/vm_page.c vm/vm_page.h vm/vnode_pager.c miscfs/specfs/spec_vnops.c
    ufs/ufs/ufs_readwrite.c kern/vfs_bio.c

Submitted by: Matt Dillon <dillon@freebsd.org>
Reviewed by: Alan Cox <alc@freebsd.org>
1999-04-05 19:38:30 +00:00
Alan Cox
876318eca0 Two changes to vm_map_delete:
1. Don't bother checking object->ref_count == 1 in order to set
OBJ_ONEMAPPING.  It's a waste of time.  If object->ref_count == 1,
vm_map_entry_delete will "run-down" the object and its pages.

2. If object->ref_count == 1, ignore OBJ_ONEMAPPING.  Wait for
vm_map_entry_delete to "run-down" the object and its pages.
Otherwise, we're calling two different procedures to delete
the object's pages.

Note: "vmstat -s" will once again show a non-zero value
for "pages freed by exiting processes".
1999-04-04 07:11:02 +00:00
Alan Cox
ad5fca3b4a Mainly, eliminate the comments about share maps. (We don't have share maps
any more.)  Also, eliminate an incorrect comment that says that we don't
coalesce vm_map_entry's.  (We do.)
1999-03-27 23:46:04 +00:00
Eivind Eklund
4491ea9111 Correct a comment. 1999-03-27 02:39:01 +00:00
Alan Cox
99c81ca94d Two changes:
Remove more (redundant) map timestamp increments from properly
synchronized routines.  (Changed: vm_map_entry_link, vm_map_entry_unlink,
and vm_map_pageable.)

Micro-optimize vm_map_entry_link and vm_map_entry_unlink, eliminating
unnecessary dereferences.  At the same time, converted them from macros
to inline functions.
1999-03-21 23:37:00 +00:00
Alan Cox
61fc5ee627 Construct the free queue(s) in descending order (by physical
address) so that the first 16MB of physical memory is allocated
last rather than first.  On large-memory machines, this avoids
the exhaustion of low physical memory before isa_dmainit has run.
1999-03-19 05:21:03 +00:00
Alan Cox
c7003c6991 Correct a problem in kmem_malloc: A kmem_malloc allowing "wait" may
block (VM_WAIT) holding the map lock.  This is bad.  For example, a subsequent
kmem_malloc by an interrupt handler on the same map may find the lock held
and panic in the lockmgr.
1999-03-16 07:39:07 +00:00
Alan Cox
44428f621d Two changes:
In general, vm_map_simplify_entry should be performed INSIDE
the loop that traverses the map, not outside.  (Changed:
vm_map_inherit, vm_map_pageable.)

vm_fault_unwire doesn't acquire the map lock (or block holding
it).  Thus, vm_map_set/clear_recursive shouldn't be called.
(Changed: vm_map_user_pageable, vm_map_pageable.)
1999-03-15 06:24:52 +00:00
Julian Elischer
811c2e1a76 Fix breakage in last commit
Submitted by: Brian Feldman <green@unixhelp.org>
1999-03-15 05:09:48 +00:00
Julian Elischer
0237469f43 A bit of a hack, but allows the vn device to be a module again.
Submitted by: Matt Dillon <dillon@freebsd.org>
1999-03-14 20:40:15 +00:00
Julian Elischer
a5296b05b4 Submitted by: Matt Dillon <dillon@freebsd.org>
The old VN device broke in -4.x when the definition of B_PAGING
changed. This patch fixes this plus implements additional capabilities.
The new VN device can be backed by a file ( as per normal ), or it can
be directly backed by swap.

Due to dependencies in VM include files  (on opt_xxx options) the new
vn device cannot be a module yet. This will be fixed in a later commit.
This commit delimitted by tags {PRE,POST}_MATT_VNDEV
1999-03-14 09:20:01 +00:00
Alan Cox
a1a54e9fc1 Correct two optimization errors in vm_object_page_remove:
1. The size of vm_object::memq is vm_object::resident_page_count,
not vm_object::size.

2. The "size > 4" test sometimes results in the traversal of a ~1000 page
memq in order to locate ~10 pages.
1999-03-14 06:36:00 +00:00
Alan Cox
b73d0eb905 Remove vm_page_frees from kmem_malloc that are performed
by vm_map_delete/vm_object_page_remove anyway.
1999-03-12 08:05:49 +00:00
Julian Elischer
51df594922 Stop the mfs from trying to swap out crucial bits of the mfs
as this can lead to deadlock.
Submitted by: Mat dillon <dillon@freebsd.org>
1999-03-12 00:44:03 +00:00
Alan Cox
00d4f4a5f4 Remove (redundant) map timestamp increments from some properly
synchronized routines.
1999-03-09 08:00:17 +00:00
Alan Cox
da3a3026b9 Remove an unused variable from vmspace_fork. 1999-03-08 03:53:07 +00:00
Alan Cox
9de3dd734e Change vm_map_growstack to acquire and hold a read lock (instead of a write
lock) until it actually needs to modify the vm_map.

Note: it is legal to modify vm_map::hint without holding a write lock.

Submitted by:	"Richard Seaman, Jr." <dick@tar.com> with minor changes
		by myself.
1999-03-07 21:25:42 +00:00
Alan Cox
f59e8eb9b1 Upgrading a map's lock to exclusive status should increment
the map's timestamp.  In general, whenever an exclusive lock is
acquired the timestamp should be incremented.
1999-03-06 07:11:33 +00:00
Alan Cox
dd2622a8cd To avoid a conflict for the vm_map's lock with vm_fault, release
the read lock around the subyte operations in mincore.  After the lock is
reacquired, use the map's timestamp to determine if we need to restart
the scan.
1999-03-02 22:55:02 +00:00
Alan Cox
e5f251d2d3 Remove the last of the share map code: struct vm_map::is_main_map.
Reviewed by:	Matthew Dillon <dillon@apollo.backplane.com>
1999-03-02 05:43:18 +00:00
Alan Cox
eff50fcd4c mincore doesn't modify the vm_map. Therefore, it doesn't require
an exclusive lock.  A read lock will suffice.
1999-03-01 20:42:16 +00:00
Alan Cox
0e3cdf2cf8 Reviewed by: "John S. Dyson" <dyson@iquest.net>
Submitted by:	Matthew Dillon <dillon@apollo.backplane.com>
To prevent a deadlock, if we are extremely low on memory, force synchronous
operation by the VOP_PUTPAGES in vnode_pager_putpages.
1999-02-27 23:39:28 +00:00
Alan Cox
14286e5e8f Reviewed by: Matthew Dillon <dillon@apollo.backplane.com>
Corrected the computation of cnt.v_ozfod in vm_fault: vm_fault
was counting the number of unoptimized rather than optimized
zero-fill faults.
1999-02-25 06:00:52 +00:00
Matthew Dillon
82e5072fcd Comment swstrategy() routine. 1999-02-25 05:37:18 +00:00
Matthew Dillon
d1bf5d56b6 Remove unnecessary page protects on map_split and collapse operations.
Fix bug where an object's OBJ_WRITEABLE/OBJ_MIGHTBEDIRTY flags do
    not get set under certain circumstances ( page rename case ).

Reviewed by:	Alan Cox <alc@cs.rice.edu>, John Dyson
1999-02-24 21:26:26 +00:00
Matthew Dillon
c4812f564a Removed ENOMEM error on swap_pager_full condition which ignored the
availability of physical memory.  As per original bug report by
    Bruce.

Reviewed by:	Alan Cox <alc@cs.rice.edu>
1999-02-22 08:42:16 +00:00
Matthew Dillon
ad3cce2041 Remove conditional sysctl's
Leave swap_async_max sysctl intact, remove swap_cluster_max sysctl.

Reviewed by:	     Alan Cox <alc@cs.rice.edu>
1999-02-21 08:34:15 +00:00
Matthew Dillon
20d3034f39 Reviewed by: Alan Cox <alc@cs.rice.edu>
Fix problem w/ low-swap/low-memory handling as reported by Bruce Evans.
1999-02-21 08:30:49 +00:00
Luoqi Chen
fe2144fd5a Eliminate a possible numerical overflow. 1999-02-19 19:14:48 +00:00
Luoqi Chen
b1028ad122 Hide access to vmspace:vm_pmap with inline function vmspace_pmap(). This
is the preparation step for moving pmap storage out of vmspace proper.

Reviewed by:	Alan Cox	<alc@cs.rice.edu>
		Matthew Dillion	<dillon@apollo.backplane.com>
1999-02-19 14:25:37 +00:00
Matthew Dillon
9b09b6c73f Submitted by: Alan Cox <alc@cs.rice.edu>
Remove remaining share map garbage from vm_map_lookup() and clean out
    old #if 0 stuff.
1999-02-19 03:11:37 +00:00
Matthew Dillon
327f4e8394 Limit number of simultanious asynchronous swap pager I/Os that can
be in progress at any given moment.

    Add two swap tuneables to sysctl:

	vm.swap_async_max: 4
	vm.swap_cluster_max: 16

    Recommended values are a cluster size of 8 or 16 pages.  async_max is
    about right for 1-4 swap devices.  Reduce to 2 if swap is eating too much
    bandwidth, or even 1 if swap is both eating too much bandwidth and sitting
    on a slow network (10BaseT).

    The defaults work well across a broad range of configurations and should
    normally be left alone.
1999-02-18 19:57:33 +00:00
Matthew Dillon
b33fb764f1 Submitted by: Luoqi Chen <luoqi@watermarkgroup.com>
Unlock vnode before messing with map to avoid deadlock between map and
    vnode ( e.g. with exec_map and underlying program binary vnode ).  Solves
    a deadlock that most often occurs during a large -j# buildworld reported
    by three people.
1999-02-17 09:08:29 +00:00
Matthew Dillon
efcae3d355 Minor reorganization of vm_page_alloc(). No functional changes have
been made but the code has been reorganized and documented to make
    it more readable, reduce the size of the code, and optimize the branch
    path caching capabilities that most modern processors have.
1999-02-15 06:52:14 +00:00
Matthew Dillon
1ce137be82 Fix a bug in the new madvise() code that would possibly (improperly)
free swap space out from under a busy page.  This is not legal because
    the swap may be reallocated and I/O issued while I/O is still in
    progress on the same swap page from the madvise()'d object.  This bug
    could only occur under extreme paging conditions but might not cause
    an error until much later.  As a side-benefit, madvise() is now even
    smaller.
1999-02-15 02:03:40 +00:00
Matthew Dillon
41c67e12bd Minor optimization to madvise() MADV_FREE to make page as freeable as
possible without actually unmapping it from the process.

    As of now, I declare madvise() on OBJT_DEFAULT/OBJT_SWAP objects to be
    'working and complete'.
1999-02-12 20:42:19 +00:00
Matthew Dillon
2aaeadf8d9 Fix non-fatal bug in vm_map_insert() which improperly cleared
OBJ_ONEMAPPING in the case where an object is extended by an
    additional vm_map_entry must be allocated.

    In vm_object_madvise(), remove calll to vm_page_cache() in MADV_FREE
    case in order to avoid a page fault on page reuse.  However, we still
    mark the page as clean and destroy any swap backing store.

Submitted by:	Alan Cox <alc@cs.rice.edu>
1999-02-12 09:51:43 +00:00
Matthew Dillon
b4f8f16e56 Addendum to vm_map coalesce optimization. Also, this was backed-out
because there was a concensus on current in regards to leaving bss r+w+x
    instead of r+w.  This is in order to maintain reasonable compatibility
    with existing JIT compilers (e.g. kaffe) and possibly other programs.
1999-02-09 01:39:29 +00:00
Matthew Dillon
2ad1a3f729 Revamp vm_object_[q]collapse(). Despite the complexity of this patch,
no major operational changes were made.  The three core object->memq loops
    were moved into a single inline procedure and various operational
    characteristics of the collapse function were documented.
1999-02-08 19:00:15 +00:00
Matthew Dillon
d031cff181 General cleanup. Remove #if 0's and remove useless register qualifiers. 1999-02-08 05:15:54 +00:00
Matthew Dillon
faa273d5c2 Rip out PQ_ZERO queue. PQ_ZERO functionality is now combined in with
PQ_FREE.  There is little operational difference other then the kernel
    being a few kilobytes smaller and the code being more readable.

    * vm_page_select_free() has been *greatly* simplified.
    * The PQ_ZERO page queue and supporting structures have been removed
    * vm_page_zero_idle() revamped (see below)

    PG_ZERO setting and clearing has been migrated from vm_page_alloc()
    to vm_page_free[_zero]() and will eventually be guarenteed to remain
    tracked throughout a page's life ( if it isn't already ).

    When a page is freed, PG_ZERO pages are appended to the appropriate
    tailq in the PQ_FREE queue while non-PG_ZERO pages are prepended.
    When locating a new free page, PG_ZERO selection operates from within
    vm_page_list_find() ( get page from end of queue instead of beginning
    of queue ) and then only occurs in the nominal critical path case.  If
    the nominal case misses, both normal and zero-page allocation devolves
    into the same _vm_page_list_find() select code without any specific
    zero-page optimizations.

    Additionally, vm_page_zero_idle() has been revamped.  Hysteresis has been
    added and zero-page tracking adjusted to conform with the other changes.
    Currently hysteresis is set at 1/3 (lo) and 1/2 (hi) the number of free
    pages.  We may wish to increase both parameters as time permits.  The
    hysteresis is designed to avoid silly zeroing in borderline allocation/free
    situations.
1999-02-08 00:37:36 +00:00
Matthew Dillon
5313b05fe0 Backed out vm_map coalesce optimization - it resulted in 22% more page
faults for reasons unknown ( under investigation ).
    /usr/bin/time -l make in /usr/src/bin went from 67000 faults to 90000
    faults.
1999-02-08 00:27:56 +00:00
Matthew Dillon
9fdfe602fc Remove MAP_ENTRY_IS_A_MAP 'share' maps. These maps were once used to
attempt to optimize forks but were essentially given-up on due to
    problems and replaced with an explicit dup of the vm_map_entry structure.
    Prior to the removal, they were entirely unused.
1999-02-07 21:48:23 +00:00
Matthew Dillon
a0e7b3e5ce Remove L1 cache coloring optimization ( leave L2 cache coloring opt ).
Rewrite vm_page_list_find() and vm_page_select_free() - make inline out
    of nominal case.
1999-02-07 20:45:15 +00:00
Matthew Dillon
9b09fe24a4 When shadowing objects, adjust the page coloring of the shadowing object
such that pages in the combined/shadowed object are consistantly
    colored.

Submitted by:	"John S. Dyson" <dyson@iquest.net>
1999-02-07 08:44:53 +00:00
Matthew Dillon
2b0d37a4f8 Add hysteresis to the 'swap_pager_getswapspace; failed' console message.
Also widen the hysteresis levels a little ( these really should be
    dynamically configured ).
1999-02-06 07:22:21 +00:00
Matthew Dillon
5b02fcc3d0 The elf loader sets the permissions on bss to VM_PROT_READ|VM_PROT_WRITE
rather then VM_PROT_ALL.  obreak, on the otherhand, uses VM_PROT_ALL.
    This prevents vm_map_insert() from being able to coalesce the heap and
    creates an extra map entry.  Since current architectures ignore
    VM_PROT_EXECUTE anyway, and since not having VM_PROT_EXECUTE on data/bss
    may provide protection in the future, obreak now uses read+write rather
    then all (r+w+x).

    This is an optimization, not a bug fix.

Submitted by:	Alan Cox <alc@cs.rice.edu>
1999-02-05 07:49:29 +00:00
Matthew Dillon
588059bea0 Fix bug in a KASSERT I introduced in vm_page_qcollapse() rev 1.139.
Since paging is in progress, page scan in vm_page_qcollapse() must be
    protected at atleast splbio() to prevent pages from being ripped out from
    under the scan.
1999-02-04 17:47:52 +00:00
Matthew Dillon
4112823fc7 Submitted by: Alan Cox
The vm_map_insert()/vm_object_coalesce() optimization has been extended
    to include OBJT_SWAP objects as well as OBJT_DEFAULT objects.  This is
    possible because it costs nothing to extend an OBJT_SWAP object with
    the new swapper.  We can't do this with the old swapper.  The old swapper
    used a linear array that would have had to have been reallocated, costing
    time as well as a potential low-memory deadlock.
1999-02-03 01:57:17 +00:00
Matthew Dillon
b406c0f55c This patch eliminates a pointless test from appearing twice
in vm_map_simplify_entry.  Basically, once you've verified that
    the objects in the adjacent vm_map_entry's are the same, either
    NULL or the same vm_object, there's no point in checking that the
    objects have the same behavior.

Obtained from:  Alan Cox <alc@cs.rice.edu>
1999-02-01 08:49:30 +00:00
Julian Elischer
287457c2e7 Submitted by: Alan Cox <alc@cs.rice.edu>
Checked by: "Richard Seaman, Jr." <dick@tar.com>
Fix the following problem:
As the code stands now, growing any stack, and not just the process's
main stack, modifies vm->vm_ssize.  This is inconsistent with the code
earlier in the same procedure.
1999-01-31 14:09:25 +00:00
Matthew Dillon
8aef171243 Fix warnings in preparation for adding -Wall -Wcast-qual to the
kernel compile
1999-01-28 00:57:57 +00:00
Matthew Dillon
5e24f1a2f6 Remove unintended trigraph sequences in comments for -Wall 1999-01-27 18:19:53 +00:00
Julian Elischer
2907af2a96 Mostly remove the VM_STACK OPTION.
This changes the definitions of a few items so that structures are the
same whether or not the option itself is enabled. This allows
people to enable and disable the option without recompilng the world.

As the author says:

|I ran into a problem pulling out the VM_STACK option.  I was aware of this
|when I first did the work, but then forgot about it.  The VM_STACK stuff
|has some code changes in the i386 branch.  There need to be corresponding
|changes in the alpha branch before it can come out completely.

what is done:
|
|1) Pull the VM_STACK option out of the header files it appears in.  This
|really shouldn't affect anything that executes with or without the rest
|of the VM_STACK patches.  The vm_map_entry will then always have one
|extra element (avail_ssize).  It just won't be used if the VM_STACK
|option is not turned on.
|
|I've also pulled the option out of vm_map.c.  This shouldn't harm anything,
|since the routines that are enabled as a result are not called unless
|the VM_STACK option is enabled elsewhere.
|
|2) Add what appears to be appropriate code the the alpha branch, still
|protected behind the VM_STACK switch.  I don't have an alpha machine,
|so we would need to get some testers with alpha machines to try it out.
|
|Once there is some testing, we can consider making the change permanent
|for both i386 and alpha.
|
[..]
|
|Once the alpha code is adequately tested, we can pull VM_STACK out
|everywhere.
|

Submitted by:	"Richard Seaman, Jr." <dick@tar.com>
1999-01-26 02:49:52 +00:00
Julian Elischer
88c5ea4574 Enable Linux threads support by default.
This takes the conditionals out of the code that has been tested by
various people for a while.
ps and friends (libkvm) will need a recompile as some proc structure
changes are made.

Submitted by:	"Richard Seaman, Jr." <dick@tar.com>
1999-01-26 02:38:12 +00:00
Matthew Dillon
2f586e1b2c Undo last commit - not a bug, just duplicate code. PG_MAPPED and
PG_WRITEABLE are already cleared by vm_page_protect().
1999-01-24 07:06:52 +00:00
Matthew Dillon
7dbf82dc13 Change all manual settings of vm_page_t->dirty = VM_PAGE_BITS_ALL
to use the vm_page_dirty() inline.

    The inline can thus do sanity checks ( or not ) over all cases.
1999-01-24 06:04:52 +00:00
Matthew Dillon
68af6d169b vm_map_split() used to dirty the page manually after calling
vm_page_rename(), but never pulled the page off PQ_CACHE if it was on
    PQ_CACHE.  Dirty pages in PQ_CACHE are not allowed and a KASSERT was
    added in -4.x to test for this... and got hit.

    In -4.x, vm_page_rename() automatically dirties the page.  This commit
    also has it deal with the PQ_CACHE case, deactivating the page in that
    case.
1999-01-24 06:00:31 +00:00
Matthew Dillon
161a8f6519 Add vm_page_dirty() inline with PQ_CACHE sanity check 1999-01-24 05:57:50 +00:00
Matthew Dillon
e4542174b0 vm_pager_put_pages() is passed an rcval array to hold per-page return
values.  The 'int' return value for the procedure was never used and
    not well defined in any case when there are mixed errors on pages, so
    it has been removed.  vm_pager_put_pages() and associated vm_pager
    functions now return void.
1999-01-24 02:32:15 +00:00
Matthew Dillon
e1a4feafd0 Clear PG_MAPPED as well as PG_WRITEABLE when a page is moved to the
cache.
1999-01-24 02:29:26 +00:00
Matthew Dillon
d044d7bfb6 Added warning printf ( needs INVARIANTS ) when busy cache page is found
while trying to free memory.
1999-01-24 01:33:22 +00:00
Matthew Dillon
aaba53da90 It is possible for a page in the cache to be busy. vm_pageout.c was not
checking for this condition while it tried to free cache pages.  Fixed.
1999-01-24 01:06:31 +00:00
Matthew Dillon
a7039a1d42 Add invariants to vm_page_busy() and vm_page_wakeup() to check for
PG_BUSY stupidity.
1999-01-24 01:05:15 +00:00
Matthew Dillon
c9fa34cf07 Clear PG_WRITEABLE in vm_page_cache(). This may or may not be a bug,
but the bit should definitely be cleared.
1999-01-24 01:04:04 +00:00
Matthew Dillon
8e3ad7c918 Depreciate vm_object_pmap_copy() - nobody uses it. Everyone uses
vm_object_pmap_copt_1() now, apparently.
1999-01-24 01:01:38 +00:00
Matthew Dillon
bc6d84a6a3 Get rid of unused old_m in vm_fault. Add INVARIANTS to test whether
page is still busy after all the hell vm_fault goes through.. it is
    supposed to be, and printf() if it isn't.  don't panic, though.
1999-01-24 00:55:04 +00:00
Matthew Dillon
7615edaa49 Reenable John Dyson's low-memory VM_WAIT code for page reactivations out
of PQ_CACHE.  Add comments explaining what it accomplishes and its
    limitations.
1999-01-23 06:00:27 +00:00
Matthew Dillon
04d986a5fc Mainly changes to support the new swapper. The big adjustment is that
swap blocks are now in PAGE_SIZE'd increments instead of DEV_BSIZE'd
    increments.  We still convert to DEV_BSIZE'd increments for the
    backing store I/O, but everything else is in PAGE_SIZE increments.
1999-01-21 10:17:12 +00:00
Matthew Dillon
aa91dfc69b Move many of the vm_pager_*() functions from vm_pager.c to inlines in
vm_pager.h
1999-01-21 10:15:47 +00:00
Matthew Dillon
f85f2fa903 Move many of the vm_pager_*() functions from vm_pager.c to inlines in
vm_pager.h

     Added argument to getpbuf() and relpbuf() to allow each subsystem to
     specify a different hard limit on the number of simultanious physical
     bufferes that said subsystem may allocate.  Without this feature, one
     subsystem ( e.g. the vfs clustering code ) could hog *ALL* the pbufs,
     causing a deadlock in the pager in a low memory situation.

     Same for trypbuf().
1999-01-21 10:15:24 +00:00
Matthew Dillon
a489f7614b Reorganized some of the low memory testing code to make it more useful.
Removed call to vm_object_collapse(), which can block.  This was being
    called without the pageout code holding any sort of reference on the
    vm_object or vm_page_t structures being manipulated.  Since this code
    can block, it was possible for other kernel code to shred the state
    the pageout code was assuming remained intact.

    Fixed potential blocking condition in vm_pageout_page_free() ( which
    could cause a deadlock in a low-memory situation ).

    Currently there is a hack in-place to deal with clean filesystem meta-data
    polluting the inactive page queue.  John doesn't like the hack, and neither
    do I.

    Revamped and commented a portion of the pageout loop.

    Added protection against potential memory deadlocks with OBJT_VNODE
    when using VOP_ISLOCKED().  The problem is that vp->v_data can be NULL
    which causes VOP_ISLOCKED() to return a less informed answer.

    remove vm_pager_sync() -- none of the pagers use it any more ( the old
    swapper used to.  The new one does not ).
1999-01-21 10:12:54 +00:00
Matthew Dillon
41dbefba28 The TAILQ hashq has been turned into a singly-linked=list link,
reducing the size of vm_page_t.

    SWAPBLK_NONE and SWAPBLK_MASK are defined here.  These actually are
    more generalized then their names imply, but their placement is somewhat
    of a legacy issue from a prior test version of this code that put
    the swapblk in the vm_page_t structure.  That test code was eventually
    thrown away.  The legacy remains.

    Added vm_page_flash() inline.  Similar to vm_page_wakeup() except that
    it does not clear PG_BUSY ( one assumes that PG_BUSY is already clear ).
    Used by a number of routines to wakeup waiters.

    Collapsed some of the code in inline calls to make other inline calls.
    GCC will optimize this well and it reduces duplication.

    vm_page_free() and vm_page_free_zero() inlines added to convert to
    the proper vm_page_free_toq() call.

    vm_page_sleep_busy() inline added, replacing vm_page_sleep() ( which has
    been removed ).  This implements a much more optimizable page-waiting
    function.
1999-01-21 10:06:24 +00:00
Matthew Dillon
060282de8a The hash table used to be a table of doubly-link list headers ( two
pointers per entry ).  The table has been changed to a singly linked
    list of vm_page_t pointers.  The table has been doubled in size, but
    the entries only take half the space so a net-zero change in memory use.

    The hash function has been changed, hopefully for the better.  The
    combination of the larger hash table size of changed function should
    keep the chain length down to a reasonable number (0-3, average 1).

    vm_object->page_hint has been removed.  This 'optimization' was not
    only never needed, but costs as much as a hash chain link to implement.
    While having page_hint in vm_object might result in better locality
    of reference, the cost is not worth the space in vm_object or the
    extra instructions in my view.

    vm_page_alloc*() functions have been inlined and call a generalized
    non-inlined vm_page_alloc_toq() which combines the standard alloc
    and zero-page alloc functions together, reducing code size and the L1
    cache footprint.  Some reordering has been done... not much.  The
    delinking code should be faster ( because unlinking a doubly-linked list
    requires four memory ops and unlinking a singly linked list only requires
    two ), and we get a hash consistancy check for free.

    vm_page_rename() now automatically sets the page's dirty bits.

    vm_page_alloc() does not try to manually inline freeing a cache page.
    Instead, it now properly calls vm_page_free(m) ... vm_page_free() is
    really too complex to manually inline.

    vm_await(), supporting asleep(), has been added.
1999-01-21 10:01:49 +00:00
Matthew Dillon
48f1335479 The vm_object structure is now somewhat smaller due to the removal
of most of the swap-pager-specific fields, the removal of the id,
    and the removal of paging_offset.

    A new inline, vm_object_pip_wakeupn() has been added to subtract an
    arbitrary number n from the paging_in_progress count and then wakeup
    waiters as necessary.  n may be 0, resulting in a 'flash'.
1999-01-21 09:51:21 +00:00
Matthew Dillon
7bc9e80ecc object->id was badly implemented. It has simply been removed.
object->paging_offset has been removed - it was used to optimize a
    single OBJT_SWAP collapse case yet introduced massive confusion throughout
    vm_object.c.  The optimization was inconsequential except for the
    claim that it didn't have to allocate any memory.  The optimization
    has been removed.

    madvise() has been fixed.  The old madvise() could be made to operate
    on shared objects which is a big no-no.  The new one is much more careful
    in what it modifies.  MADV_FREE was totally broken and has now been fixed.

    vm_page_rename() now automatically dirties a page, so explicit dirtying
    of the page prior to calling vm_page_rename() has been removed.
1999-01-21 09:46:55 +00:00
Matthew Dillon
e4ba1db60a Objects associated with raw devices are no longer counted in the VM stats
total because they may contain absurd numbers ( like the size of all
    of physical memory if you mmap() /dev/mem ).
1999-01-21 09:41:52 +00:00
Matthew Dillon
81522c62fa General cleanup related to the new pager. We no longer have to worry
about conversions of objects to OBJT_SWAP, it is done automatically
    now.

    Replaced manually inserted code with inline calls for busy waiting on
    pages, which also incidently fixes a potential PG_BUSY race due to
    the code not running at splvm().

    vm_objects no longer have a paging_offset field ( see vm/vm_object.c )
1999-01-21 09:40:48 +00:00
Matthew Dillon
1d58b2bc7d Potential bug fix, do not just clear PG_BUSY... call vm_page_wakeup()
instead to properly handle any waiters.

    Added comments, added support for M_ASLEEP.  Generally treat M_ flags
    as flags instead of constants to compare against.
1999-01-21 09:38:20 +00:00
Matthew Dillon
6de6079300 Removed low-memory blockages at fork. This is the wrong place to put
this sort of test.  We need to fix the low-memory handling in general.
1999-01-21 09:36:23 +00:00
Matthew Dillon
4c23ae0916 Mainly cleanup. Removed some inappropriate low-memory handling code
and added lots of comments.  Add tie-in to vm_pager ( and thus the
    new swapper ) to deallocate backing swap for dirtied pages on the fly.
1999-01-21 09:35:38 +00:00
Matthew Dillon
9f6fed9017 The default_pager's interaction with the swap_pager has been reorganized,
and the swap_pager has been completely replaced.

    The new swap pager uses the new blist radix-tree based bitmap allocator
    for low level swap allocation and deallocation.   The new allocator
    is effectively O(5) while the old one was O(N), and the new allocator
    allocates all required memory at init time rather then at allocate
    memory on the fly at run time.

    Swap metadata is allocated in clusters and stored in a hash table,
    eliminating linearly allocated structures.

    Many, many features have been rewritten or added.  Swap space is now
    reallocated on the fly providing a poor-mans auto defragmentation of
    swap space.  Swap space that is no longer needed is freed on a timely
    basis so no garbage collection is necessary.

    Swap I/O is marked B_ASYNC and NFS has been fixed to do the right
    thing with it, so NFS-based paging now has around 10x the performance
    as it did before ( previously NFS enforced synchronous I/O for paging ).
1999-01-21 09:33:07 +00:00
Matthew Dillon
1c7c3c6a86 This is a rather large commit that encompasses the new swapper,
changes to the VM system to support the new swapper, VM bug
    fixes, several VM optimizations, and some additional revamping of the
    VM code.  The specific bug fixes will be documented with additional
    forced commits.  This commit is somewhat rough in regards to code
    cleanup issues.

Reviewed by:	"John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>
1999-01-21 08:29:12 +00:00
Eivind Eklund
219cbf59f2 KNFize, by bde. 1999-01-10 01:58:29 +00:00
Eivind Eklund
5526d2d920 Split DIAGNOSTIC -> DIAGNOSTIC, INVARIANTS, and INVARIANT_SUPPORT as
discussed on -hackers.

Introduce 'KASSERT(assertion, ("panic message", args))' for simple
check + panic.

Reviewed by:	msmith
1999-01-08 17:31:30 +00:00
Julian Elischer
dc9c271aa1 Changes to the LINUX_THREADS support to only allocate extra memory for
shared signal handling when there is shared signal handling being
used.

This removes the main objection to making the shared signal handling
a standard ability in rfork() and friends and 'unconditionalising'
this code. (i.e. the allocation of an extra 328 bytes per process).

Signal handling information remains in the U area until such a time as
it's reference count would be incremented to > 1. At that point a new
struct is malloc'd and maintained in KVM so that it can be shared between
the processes (threads) using it.

A function to check the reference count and move the struct back to the U
area when it drops back to 1 is also supplied. Signal information is
therefore now swapable for all processes that are not sharing that
information with other processes. THis should addres the concerns raised
by Garrett and others.

Submitted by:	"Richard Seaman, Jr." <dick@tar.com>
1999-01-07 21:23:50 +00:00
Julian Elischer
2267af789e Add (but don't activate) code for a special VM option to make
downward growing stacks more general.
Add (but don't activate) code to use the new stack facility
when running threads, (specifically the linux threads support).
This allows people to use both linux compiled linuxthreads, and also the
native FreeBSD linux-threads port.

The code is conditional on VM_STACK. Not using this will
produce the old heavily tested system.

Submitted by: Richard Seaman <dick@tar.com>
1999-01-06 23:05:42 +00:00
Bruce Evans
289bdf33d3 Ifdefed conditionally used simplock variables. 1999-01-02 11:34:57 +00:00
Dmitrij Tejblum
7a91724556 Don't free swap in swap_pager_getpages(): this code probably cause the
"dying daemons" problem. (I thought this code was introduced in rev.1.80,
but it just relaxed the condition.)

Also, kill related "suggest more swap space" warning (also introduced in
1.80). It was confusing, to say the least...

Requested by:		msmith
Not objected by:	dg
1998-12-29 22:53:51 +00:00
Matthew Dillon
9858fcda2e Update comments to routines in vm_page.c, most especially whether a
routine can block or not as part of a general effort to carefully
    document blocking/non-blocking calls in the kernel.
1998-12-23 01:52:47 +00:00
Julian Elischer
39fb8e6b3e Fix two bogons created by 'patch(1)' in my last commit. 1998-12-19 08:23:31 +00:00
Julian Elischer
6626c6045c Reviewed by: Luoqi Chen, Jordan Hubbard
Submitted by:	 "Richard Seaman, Jr." <lists@tar.com>
Obtained from:	linux :-)

Code to allow Linux Threads to run under FreeBSD.

By default not enabled
This code is dependent on the conditional
COMPAT_LINUX_THREADS (suggested by Garret)
This is not yet a 'real' option but will be within some number of hours.
1998-12-19 02:55:34 +00:00
Dmitrij Tejblum
fc56545639 Don't disable mmap with large file offset. 1998-12-09 20:22:21 +00:00
Archie Cobbs
f1d19042b0 The "easy" fixes for compiling the kernel -Wunused: remove unreferenced static
and local variables, goto labels, and functions declared but not defined.
1998-12-07 21:58:50 +00:00
Archie Cobbs
2127f26023 Examine all occurrences of sprintf(), strcat(), and str[n]cpy()
for possible buffer overflow problems. Replaced most sprintf()'s
with snprintf(); for others cases, added terminating NUL bytes where
appropriate, replaced constants like "16" with sizeof(), etc.

These changes include several bug fixes, but most changes are for
maintainability's sake. Any instance where it wasn't "immediately
obvious" that a buffer overflow could not occur was made safer.

Reviewed by:	Bruce Evans <bde@zeta.org.au>
Reviewed by:	Matthew Dillon <dillon@apollo.backplane.com>
Reviewed by:	Mike Spengler <mks@networkcs.com>
1998-12-04 22:54:57 +00:00
Robert V. Baron
af1f63c7eb In vnode_pager_input_old, set auio.uio_procp = curproc
vs auio.uio_procp = (struct proc *) 0
1998-12-04 18:39:44 +00:00
David Greenman
c699f45e35 Add missing splvm protection around unqueue call. Without this, the page
queues would eventually get corrupted.
1998-11-25 07:40:49 +00:00
Bruce Evans
04258de351 Fixed a null pointer panic in spc_free(). swap_pager_putpages()
almost always causes this panic for the curproc != pageproc case.
This case apparently doesn't happen in normal operation, but it
happens when vm_page_alloc_contig() is called when there is a memory
hogging application that hasn't already been paged out.

PR:		8632
Reviewed by:	info@opensound.com (Dev Mazumdar), dg
Broken in:	rev.1.89 (1998/02/23)
1998-11-19 06:20:42 +00:00
David Greenman
4f6e1f8bfc Closed a small race condition between wiring/unwiring pages that involved
the page's wire_count.
1998-11-11 15:07:57 +00:00
Peter Wemm
1c5bb3eaa1 add #include <sys/kernel.h> where it's needed by MALLOC_DEFINE() 1998-11-10 09:16:29 +00:00
Doug Rabson
7095ee912b * Fix a couple of places in the device pager where an address was
truncated to 32 bits.
* Change the calling convention of the device mmap entry point to
  pass a vm_offset_t instead of an int for the offset allowing
  devices with a larger memory map than (1<<32) to be supported
  on the alpha (/dev/mem is one such).

These changes are required to allow the X server to mmap the various
I/O regions used for device port and memory access on the alpha.
1998-11-08 12:39:07 +00:00
David Greenman
dd0b2081f4 Implemented zero-copy TCP/IP extensions via sendfile(2) - send a
file to a stream socket. sendfile(2) is similar to implementations in
HP-UX, Linux, and other systems, but the API is more extensive and
addresses many of the complaints that the Apache Group and others have
had with those other implementations. Thanks to Marc Slemko of the
Apache Group for helping me work out the best API for this.
Anyway, this has the "net" result of speeding up sends of files over
TCP/IP sockets by about 10X (that is to say, uses 1/10th of the CPU
cycles) when compared to a traditional read/write loop.
1998-11-05 14:28:26 +00:00
Peter Wemm
b0359e2c11 Add John Dyson's SYSCTL descriptions, and an export of more stats to
a sysctl hierarchy (vm.stats.*).  SYSCTL descriptions are only present
in source, they do not get compiled into the binaries taking up memory.
1998-10-31 17:21:31 +00:00
Peter Wemm
40c8cfe552 Use TAILQ macros for clean/dirty block list processing. Set b_xflags
rather than abusing the list next pointer with a magic number.
1998-10-31 15:31:29 +00:00
David Greenman
c8d14c765f Fixed wrong comments in and about vm_page_deactivate(). 1998-10-28 13:41:43 +00:00
David Greenman
730075613a Added a second argument, "activate" to the vm_page_unwire() call so that
the caller can select either inactive or active queue to put the page on.
1998-10-28 13:37:02 +00:00
David Greenman
e4b7635de2 Added needed splvm() protection around object page traversal in
vm_object_terminate().
1998-10-27 13:22:51 +00:00
Bruce Evans
9cd93b3aec Don't follow null bdevsw pointers. The `major(dev) < nblkdev' test rotted
when bdevsw[] became sparse.  We still depend on magic to avoid having to
check that (v_rdev) device numbers in vnodes are not NODEV.

Removed a redundant `major(dev) < nblkdev' test instead of updating it.

Don't follow a garbage bdevsw pointer for attempts to swap on empty
regular files.  This case currently can't happen.  Swapping on regular
files is ifdefed out in swapon() and isn't attempted for empty files
in nfs_mountroot().
1998-10-25 19:24:04 +00:00
Poul-Henning Kamp
f5ef029e92 Nitpicking and dusting performed on a train. Removes trivial warnings
about unused variables, labels and other lint.
1998-10-25 17:44:59 +00:00
David Greenman
9fcfb650d1 Oops, revert part of last fix. vm_pager_dealloc() can't be called until
after the pages are removed from the object...so fix the problem by
not printing the diagnostic for wired fictitious pages (which is normal).
1998-10-23 05:43:13 +00:00
David Greenman
356863eb01 Fixed two bugs in recent commit: in vm_object_terminate, vm_pager_dealloc
needs to be called prior to freeing remaining pages in the object so that
the device pager has an opportunity to grab its "fake" pages. Also, in
the case of wired pages, the page must be made busy prior to calling
vm_page_remove. This is a difference from 2.2.x that I overlooked when
I brought these changes forward.
1998-10-23 05:25:49 +00:00
David Greenman
0b10ba9822 Make the VM system handle the case where a terminating object contains
legitimately wired pages. Currently we print a diagnostic when this
happens, but this will be removed soon when it will be common for this
to occur with zero-copy TCP/IP buffers.
1998-10-22 02:16:53 +00:00
David Greenman
24166bb84b Convert fake page allocs to use the zone allocator, thus eliminating the
private pool management code in here.
1998-10-22 01:45:29 +00:00
David Greenman
bb7db2c011 Set m->object to NULL in dev_pager_getfake(). 1998-10-21 23:06:50 +00:00
David Greenman
300ee8246e Nuked PG_TABLED flag. Replaced with m->object != NULL. 1998-10-21 14:46:42 +00:00
David Greenman
12d534d2f9 Add a diagnostic printf for freeing a wired page. This will eventually
be turned into a panic, but I want to make sure that all cases of freeing
pages with wire_count==1 (which is/was allowed) have first been fixed.
1998-10-21 11:43:04 +00:00
David Greenman
6cde7a165f Fixed two potentially serious classes of bugs:
1) The vnode pager wasn't properly tracking the file size due to
   "size" being page rounded in some cases and not in others.
   This sometimes resulted in corrupted files. First noticed by
   Terry Lambert.
   Fixed by changing the "size" pager_alloc parameter to be a 64bit
   byte value (as opposed to a 32bit page index) and changing the
   pagers and their callers to deal with this properly.
2) Fixed a bogus type cast in round_page() and trunc_page() that
   caused some 64bit offsets and sizes to be scrambled. Removing
   the cast required adding casts at a few dozen callers.
   There may be problems with other bogus casts in close-by
   macros. A quick check seemed to indicate that those were okay,
   however.
1998-10-13 08:24:45 +00:00
John Polstra
9b35a0d694 Fix a panic on SMP systems, caused by sleeping while holding a
simple-lock.

The reviewer raises the following caveat: "I believe these changes
open a non-critical race condition when adding memory to the pool
for the zone. I think what will happen is that you could have two
threads that are simultaneously adding additional memory when the
pool runs out. This appears to not be a problem, however, since
the re-aquisition of the lock will protect the list pointers."
The submitter agrees that the race is non-critical, and points out
that it already existed for the non-SMP case.  He suggests that
perhaps a sleep lock (using the lock manager) should be used to
close that race.  This might be worth revisiting after 3.0 is
released.

Reviewed by:	dg (David Greenman)
Submitted by:	tegge (Tor Egge)
1998-10-09 00:24:49 +00:00
John Polstra
a0fce82724 Fix a bug in which a page index was used where a byte offset was
expected.  This bug caused builds of Modula-3 to fail in mysterious
ways on SMP kernels.  More precisely, such builds failed on systems
with kern.fast_vfork equal to 0, the default and only supported
value for SMP kernels.

PR:		kern/7468
Submitted by:	tegge (Tor Egge)
1998-10-01 20:46:41 +00:00
Andrzej Bialecki
faa5f8d8da Make #define NO_SWAPPING a normal kernel config option.
Reviewed by:	jkh
1998-09-29 17:33:59 +00:00
Robert V. Baron
6e3a3f387c John Dyson approved of this solution; make vnode_pager_input_old set m->valid 1998-09-28 23:58:10 +00:00
David Greenman
ce65e68c03 Be more selctive about when we clear p->valid.
Submitted by:	John Dyson <toor@dyson.iquest.net>
1998-09-28 02:40:11 +00:00
Bruce Evans
4af2cddffe Removed unused file. 1998-09-20 06:28:10 +00:00
Bruce Evans
500b04a257 Instantiate `nfs_mount_type' in a standard file so that it is present
when nfs is an LKM.  Declare it in a header file.  Don't forget to use
it in non-Lite2 code.  Initialize it to -1 instead of to 0, since 0
will soon be the mount type number for the first vfs loaded.

NetBSD uses strcmp() to avoid this ugly global.
1998-09-05 15:17:34 +00:00
Doug Rabson
e69763a315 Cosmetic changes to the PAGE_XXX macros to make them consistent with
the other objects in vm.
1998-09-04 08:06:57 +00:00
Garrett Wollman
85e7f5492b Separate wakeup conditions for page I/O count (pg_busy) and lock (PG_BUSY).
This is not sa completely solution to the deadlock, but the additional wakeups
have helped in my observation.

Suggested by: John Dyson
1998-09-01 17:12:19 +00:00
Luoqi Chen
c576d12115 Fix a rounding problem that causes vnode pager to fail to remove the last
partially filled page during a truncation.

PR:		kern/7422
1998-08-25 13:47:37 +00:00
Doug Rabson
069e9bc1b4 Change various syscalls to use size_t arguments instead of u_int.
Add some overflow checks to read/write (from bde).

Change all modifications to vm_page::flags, vm_page::busy, vm_object::flags
and vm_object::paging_in_progress to use operations which are not
interruptable.

Reviewed by: Bruce Evans <bde@zeta.org.au>
1998-08-24 08:39:39 +00:00
Stephen McKay
9e80236560 Correct/clarify some comments. 1998-08-22 15:24:09 +00:00
Doug Rabson
196e9a52ec Protect all modifications to paging_in_progress with splvm(). 1998-08-13 08:05:13 +00:00
Doug Rabson
d474eaaa5f Protect all modifications to paging_in_progress with splvm(). The i386
managed to avoid corruption of this variable by luck (the compiler used a
memory read-modify-write instruction which wasn't interruptable) but other
architectures cannot.

With this change, I am now able to 'make buildworld' on the alpha (sfx: the
crowd goes wild...)
1998-08-06 08:33:19 +00:00
Bruce Evans
ccbbd9271b Fixed two spl nesting bugs. They caused (at least) the entire pageout
daemon to run at splvm() forever after swap_pager_putpages() is called
from vm_pageout_scan().

Broken in: rev.1.189 (1998/02/23)
1998-07-28 15:30:01 +00:00
Doug Rabson
56e7ede1c4 Notify pmap when a page is freed on the alpha to allow it to clean up
its emulated modified/referenced bits.
1998-07-26 18:15:20 +00:00
David Greenman
f3679e351a Improved pager input failure message. 1998-07-22 09:38:04 +00:00
Poul-Henning Kamp
db7ac2451b There is a comment in vm_param.h which doesn't belong to the
code still left in there.  The macros it describes disapeared some-
time since 4.4BSD lite.

PR:		7246
Reviewed by:	phk
Submitted by:	Stefan Eggers <seggers@semyam.dinoco.de>
1998-07-22 06:21:55 +00:00
Bruce Evans
15c7382561 Cast pointers to [u]intptr_t instead of to [unsigned] long. 1998-07-15 04:17:55 +00:00
Bruce Evans
a23d65bfc8 Cast pointers to uintptr_t/intptr_t instead of to u_long/long,
respectively.  Most of the longs should probably have been
u_longs, but this changes is just to prevent warnings about
casts between pointers and integers of different sizes, not
to fix poorly chosen types.
1998-07-15 02:32:35 +00:00
Bruce Evans
eb95adeff5 Print pointers using %p instead of attempting to print them by
casting them to long, etc.  Fixed some nearby printf bogons (sign
errors not warned about by gcc, and style bugs, but not truncation
of vm_ooffset_t's).
1998-07-14 12:26:15 +00:00
Bruce Evans
101eeb7f9f Print pointers using %p instead of attempting to print them by
casting them to long, etc.  Fixed some nearby printf bogons (sign
errors not warned about by gcc, and style bugs, but not truncation
of vm_ooffset_t's).

Use slightly less bogus casts for passing pointers to ddb command
functions.
1998-07-14 12:14:58 +00:00
Bruce Evans
92c4c4eb52 Fixed printf format errors. 1998-07-11 12:07:52 +00:00
Bruce Evans
fc62ef1fb5 Fixed printf format errors. 1998-07-11 11:30:46 +00:00
Bruce Evans
ac1e407b32 Fixed printf format errors. 1998-07-11 07:46:16 +00:00
Alexander Langer
c5b75d8223 Removed no longer valid comment about swb_block being int instead of
daddr_t.

PR:		7238
Submitted by:	Stefan Eggers <seggers@semyam.dinoco.de>
1998-07-10 21:50:17 +00:00
Alexander Langer
427e99a0b8 Removed unnecessary test from if/else construct.
PR:		7233
Submitted by:	Stefan Eggers <seggers@semyam.dinoco.de>
1998-07-10 17:58:35 +00:00
Doug Rabson
711458e3e9 Don't truncate the return value of mmap to sizeof(int). 1998-07-05 11:56:52 +00:00
Julian Elischer
f7ea2f55d1 There is no such thing any more as "struct bdevsw".
There is only cdevsw (which should be renamed in a later edit to deventry
or something). cdevsw contains the union of what were in both bdevsw an
cdevsw entries.  The bdevsw[] table stiff exists and is a second pointer
to the cdevsw entry of the device. it's major is in d_bmaj rather than
d_maj. some cleanup still to happen (e.g. dsopen now gets two pointers
to the same cdevsw struct instead of one to a bdevsw and one to a cdevsw).

rawread()/rawwrite() went away as part of this though it's not strictly
the same  patch, just that it involves all the same lines in the drivers.

cdroms no longer have write() entries (they did have rawwrite (?)).
tapes no longer have support for bdev operations.

Reviewed by: Eivind Eklund and Mike Smith
	Changes suggested by eivind.
1998-07-04 22:30:26 +00:00
Julian Elischer
fd5d1124e2 VOP_STRATEGY grows an (struct vnode *) argument
as the value in b_vp is often not really what you want.
(and needs to be frobbed). more cleanups will follow this.
Reviewed by: Bruce Evans <bde@freebsd.org>
1998-07-04 20:45:42 +00:00
John-Mark Gurney
20f718132d document some VM paging options for cache sizes:
PQ_NOOPT	no coloring
PQ_LARGECACHE	used for 512k/16k cache
PQ_HUGECACHE	used for 1024k/16k cache
1998-06-30 08:01:30 +00:00
Poul-Henning Kamp
b62591052c Remove bdevsw_add(), change the only two users to use bdevsw_add_generic().
Extend cdevsw to be superset of bdevsw.
Remove non-functional bdev lkm support.
Teach wcd what the open() args mean.
1998-06-25 11:28:07 +00:00
Bruce Evans
be160d60ab Removed unused includes. 1998-06-21 18:02:50 +00:00
Bruce Evans
e5b19842ef Removed unused includes. 1998-06-21 14:53:44 +00:00
Doug Rabson
ecbb00a262 This commit fixes various 64bit portability problems required for
FreeBSD/alpha.  The most significant item is to change the command
argument to ioctl functions from int to u_long.  This change brings us
inline with various other BSD versions.  Driver writers may like to
use (__FreeBSD_version == 300003) to detect this change.

The prototype FreeBSD/alpha machdep will follow in a couple of days
time.
1998-06-07 17:13:14 +00:00
David Greenman
5994d8937d Changed the log() of "Out of mbuf clusters - increase maxusers" to a
printf() of "Out of mbuf clusters - adjust NMBCLUSTERS or increase
maxusers" so that the message is more informative and so that it will
appear in the kernel message buffer.
1998-06-05 21:48:45 +00:00
John Dyson
976f208be3 Cleanup and remove some dead code from the initialization. 1998-06-02 05:50:08 +00:00
John Dyson
e8f367853b Correct sleep priority. 1998-06-02 05:39:13 +00:00
John Dyson
b9cefc08e2 Support a 16K first level cache for 512K 2nd level. Also, add support
for 1MB 2nd level cache.
1998-05-24 04:25:27 +00:00
John Dyson
cf2819ccb8 Make flushing dirty pages work correctly on filesystems that
unexpectedly do not complete writes even with sync I/O requests.
This should help the behavior of mmaped files when using
softupdates (and perhaps in other circumstances also.)
1998-05-21 07:47:58 +00:00
Peter Wemm
4183b6b622 Make the previous commit compile.. 1998-05-19 07:13:21 +00:00
Guido van Rooij
05feb99ff1 Plug hole reported on Bugtraq: do not allow mmap with WRITE privs for
append-only and immutable files.

Obtained from: OpenBSD (partly)
1998-05-18 18:26:27 +00:00
John Dyson
bd6be9150d An important fix for proper inheritance of backing objects for
object splits.  Another excellent detective job by Tor.
Submitted by:	Tor Egge <Tor.Egge@idi.ntnu.no>
1998-05-16 23:03:20 +00:00
John Dyson
96fb8cf258 Fix the shm panic. I mistakenly used the shadow_count to keep the object
from being split, and instead added an OBJ_NOSPLIT.
1998-05-04 17:12:53 +00:00
John Dyson
cbd8ec0902 Work around some VM bugs, the worst being an overly aggressive
swap space free calculation.  More complete fixes will be forthcoming,
in a week.
1998-05-04 03:01:44 +00:00
John Dyson
86524867d1 Another minor cleanup of the split code. Make sure that pages are
busied during the entire time, so that the waits for pages being
unbusy don't make the objects inconsistant.
1998-05-02 06:36:16 +00:00
Peter Wemm
3c33646725 Seatbelts for vm_page_bits() in case a file offset is passed in rather than
the page offset.  If a large file offset was passed in, a large negative
array index could be generated which could cause page faults etc at worst
and file corruption at the least.  (Pages are allocated within file
space on page alignment boundaries, so a file offset being passed in here
is harmless to DTRT.  The case where this was happening has already been
fixed though, this is in case it happens again).

Reviewed by: dyson
1998-05-02 03:02:13 +00:00
John Dyson
e493d28abc Fix minor bug with new over used swap fix. 1998-05-01 02:25:29 +00:00
John Dyson
dda6b17151 Add a needed prototype, and fix a panic problem with the new
memory code.
1998-04-29 06:59:08 +00:00
John Dyson
c0877f103f Tighten up management of memory and swap space during map allocation,
deallocation cycles.  This should provide a measurable improvement
on swap and memory allocation on loaded systems.  It is unlikely a
complete solution.  Also, provide more map info with procfs.
Chuck Cranor spurred on this improvement.
1998-04-29 04:28:22 +00:00
John Dyson
2dbea5d2e3 Fix a pseudo-swap leak problem. This mitigates "leaks" due to
freeing partial objects, not freeing entire objects didn't
free any of it.  Simple fix to the map code.
Reviewed by:	dg
1998-04-28 05:54:47 +00:00
John Dyson
adc78b8c71 Correct copyright. 1998-04-25 04:50:03 +00:00
Bruce Evans
c1087c1324 Support compiling with `gcc -ansi'. 1998-04-15 17:47:40 +00:00
Poul-Henning Kamp
227ee8a188 Eradicate the variable "time" from the kernel, using various measures.
"time" wasn't a atomic variable, so splfoo() protection were needed
around any access to it, unless you just wanted the seconds part.

Most uses of time.tv_sec now uses the new variable time_second instead.

gettime() changed to getmicrotime(0.

Remove a couple of unneeded splfoo() protections, the new getmicrotime()
is atomic, (until Bruce sets a breakpoint in it).

A couple of places needed random data, so use read_random() instead
of mucking about with time which isn't random.

Add a new nfs_curusec() function.

Mark a couple of bogosities involving the now disappeard time variable.

Update ffs_update() to avoid the weird "== &time" checks, by fixing the
one remaining call that passwd &time as args.

Change profiling in ncr.c to use ticks instead of time.  Resolution is
the same.

Add new function "tvtohz()" to avoid the bogus "splfoo(), add time, call
hzto() which subtracts time" sequences.

Reviewed by:	bde
1998-03-30 09:56:58 +00:00
Bruce Evans
08637435f2 Moved some #includes from <sys/param.h> nearer to where they are actually
used.
1998-03-28 10:33:27 +00:00
John Dyson
bef608bd7e Some VM improvements, including elimination of alot of Sig-11
problems.  Tor Egge and others have helped with various VM bugs
lately, but don't blame him -- blame me!!!

pmap.c:
1)	Create an object for kernel page table allocations.  This
	fixes a bogus allocation method previously used for such, by
	grabbing pages from the kernel object, using bogus pindexes.
	(This was a code cleanup, and perhaps a minor system stability
	 issue.)

pmap.c:
2)	Pre-set the modify and accessed bits when prudent.  This will
	decrease bus traffic under certain circumstances.

vfs_bio.c, vfs_cluster.c:
3)	Rather than calculating the beginning virtual byte offset
	multiple times, stick the offset into the buffer header, so
	that the calculated offset can be reused.  (Long long multiplies
	are often expensive, and this is a probably unmeasurable performance
	improvement, and code cleanup.)

vfs_bio.c:
4)	Handle write recursion more intelligently (but not perfectly) so
	that it is less likely to cause a system panic, and is also
	much more robust.

vfs_bio.c:
5)	getblk incorrectly wrote out blocks that are incorrectly sized.
	The problem is fixed, and writes blocks out ONLY when B_DELWRI
	is true.

vfs_bio.c:
6)	Check that already constituted buffers have fully valid pages.  If
	not, then make sure that the B_CACHE bit is not set. (This was
	a major source of Sig-11 type problems.)

vfs_bio.c:
7)	Fix a potential system deadlock due to an incorrectly specified
	sleep priority while waiting for a buffer write operation.  The
	change that I made opens the system up to serious problems, and
	we need to examine the issue of process sleep priorities.

vfs_cluster.c, vfs_bio.c:
8)	Make clustered reads work more correctly (and more completely)
	when buffers are already constituted, but not fully valid.
	(This was another system reliability issue.)

vfs_subr.c, ffs_inode.c:
9)	Create a vtruncbuf function, which is used by filesystems that
	can truncate files.  The vinvalbuf forced a file sync type operation,
	while vtruncbuf only invalidates the buffers past the new end of file,
	and also invalidates the appropriate pages.  (This was a system reliabiliy
	and performance issue.)

10)	Modify FFS to use vtruncbuf.

vm_object.c:
11)	Make the object rundown mechanism for OBJT_VNODE type objects work
	more correctly.  Included in that fix, create pager entries for
	the OBJT_DEAD pager type, so that paging requests that might slip
	in during race conditions are properly handled.  (This was a system
	reliability issue.)

vm_page.c:
12)	Make some of the page validation routines be a little less picky
	about arguments passed to them.  Also, support page invalidation
	change the object generation count so that we handle generation
	counts a little more robustly.

vm_pageout.c:
13)	Further reduce pageout daemon activity when the system doesn't
	need help from it.  There should be no additional performance
	decrease even when the pageout daemon is running.  (This was
	a significant performance issue.)

vnode_pager.c:
14)	Teach the vnode pager to handle race conditions during vnode
	deallocations.
1998-03-16 01:56:03 +00:00
Guido van Rooij
c8bdd56b3a Fix for mmap of char devices bug as described in OpenBSD advisory of
1998/02/20
Reviewed by:	John Dyson
Submitted by:	"Cy Schubert" <cschuber@uumail.gov.bc.ca>
1998-03-12 19:36:18 +00:00
Mike Smith
86ffbd76d0 Complement diagnostic messages about missing per-FS VOP page operations,
but don't make their absence fatal.
Submitted by:	terry
1998-03-09 08:58:53 +00:00
John Dyson
be01eafd5f Quell unneeded pageout daemon activity. 1998-03-08 18:19:17 +00:00
John Dyson
6215e86272 Remove a very ill advised vm_page_protect. This was being called
for a non-managed page.  That is a big no-no.
1998-03-08 18:05:59 +00:00
John Dyson
e163e201ef Some cruft left over from my megacommit. A page rotation optimization
was a good idea, but can cause instability.  That optimization is
now removed.
1998-03-08 06:27:30 +00:00
John Dyson
edd97f3a37 Several minor fixes:
1) When freeing pages, it is a good idea to protect them off.
	   (This is probably gratuitious, but good form.)
	2) Allow collapsing pages in the backing object that are
	   PQ_CACHE.  This will improve memory utilization.
	3) Correct the collapse code so that pages that were on the
	   cache queue are moved to the inactive queue.  This is
	   done when pages are marked dirty (so that those pages
	   will be properly paged out instead of freed), so that
	   cached pages will not be paradoxically marked dirty.
1998-03-08 06:25:59 +00:00
John Dyson
8f9110f6a1 This mega-commit is meant to fix numerous interrelated problems. There
has been some bitrot and incorrect assumptions in the vfs_bio code.  These
problems have manifest themselves worse on NFS type filesystems, but can
still affect local filesystems under certain circumstances.  Most of
the problems have involved mmap consistancy, and as a side-effect broke
the vfs.ioopt code.  This code might have been committed seperately, but
almost everything is interrelated.

1)	Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that
	are fully valid.
2)	Rather than deactivating erroneously read initial (header) pages in
	kern_exec, we now free them.
3)	Fix the rundown of non-VMIO buffers that are in an inconsistent
	(missing vp) state.
4)	Fix the disassociation of pages from buffers in brelse.  The previous
	code had rotted and was faulty in a couple of important circumstances.
5)	Remove a gratuitious buffer wakeup in vfs_vmio_release.
6)	Remove a crufty and currently unused cluster mechanism for VBLK
	files in vfs_bio_awrite.  When the code is functional, I'll add back
	a cleaner version.
7)	The page busy count wakeups assocated with the buffer cache usage were
	incorrectly cleaned up in a previous commit by me.  Revert to the
	original, correct version, but with a cleaner implementation.
8)	The cluster read code now tries to keep data associated with buffers
	more aggressively (without breaking the heuristics) when it is presumed
	that the read data (buffers) will be soon needed.
9)	Change to filesystem lockmgr locks so that they use LK_NOPAUSE.  The
	delay loop waiting is not useful for filesystem locks, due to the
	length of the time intervals.
10)	Correct and clean-up spec_getpages.
11)	Implement a fully functional nfs_getpages, nfs_putpages.
12)	Fix nfs_write so that modifications are coherent with the NFS data on
	the server disk (at least as well as NFS seems to allow.)
13)	Properly support MS_INVALIDATE on NFS.
14)	Properly pass down MS_INVALIDATE to lower levels of the VM code from
	vm_map_clean.
15)	Better support the notion of pages being busy but valid, so that
	fewer in-transit waits occur.  (use p->busy more for pageouts instead
	of PG_BUSY.)  Since the page is fully valid, it is still usable for
	reads.
16)	It is possible (in error) for cached pages to be busy.  Make the
	page allocation code handle that case correctly.  (It should probably
	be a printf or panic, but I want the system to handle coding errors
	robustly.  I'll probably add a printf.)
17)	Correct the design and usage of vm_page_sleep.  It didn't handle
	consistancy problems very well, so make the design a little less
	lofty.  After vm_page_sleep, if it ever blocked, it is still important
	to relookup the page (if the object generation count changed), and
	verify it's status (always.)
18)	In vm_pageout.c, vm_pageout_clean had rotted, so clean that up.
19)	Push the page busy for writes and VM_PROT_READ into vm_pageout_flush.
20)	Fix vm_pager_put_pages and it's descendents to support an int flag
	instead of a boolean, so that we can pass down the invalidate bit.
1998-03-07 21:37:31 +00:00
John Dyson
4866e0856c Make vm_fault much cleaner by removing the evil macro inlines, and
put alot of it's context into a data structure.  This allows
significant shortening of its codepath, and will significantly
decrease it's cache footprint.

Also, add some stats to vmmeter.  Note that you'll have to
rebuild/recompile vmstat, systat, etc... Otherwise, you'll
get "very interesting" paging stats.
1998-03-07 20:45:47 +00:00
Peter Dufault
917e476dad Reviewed by: msmith, bde long ago
POSIX.4 headers and sysctl variables.  Nothing should change
unless POSIX4 is defined or _POSIX_VERSION is set to 199309.
1998-03-04 10:27:00 +00:00
John Dyson
ffc82b0a70 1) Use a more consistent page wait methodology.
2)	Do not unnecessarily force page blocking when paging
	pages out.
3)	Further improve swap pager performance and correctness,
	including fixing the paging in progress deadlock (except
	in severe I/O error conditions.)
4)	Enable vfs_ioopt=1 as a default.
5)	Fix and enable the page prezeroing in SMP mode.

All in all, SMP systems especially should show a significant
improvement in "snappyness."
1998-03-01 04:18:54 +00:00
Mike Smith
ce75f2c365 In the author's words:
These diffs implement the first stage of a VOP_{GET|PUT}PAGES pushdown
for local media FS's.

See ffs_putpages in /sys/ufs/ufs/ufs_readwrite.c for implementation
details for generic *_{get|put}pages for local media FS's.  Support
is trivial to add for any FS that formerly relied on the default
behaviour of the vnode_pager in in EOPNOTSUPP cases (just copy the
ffs_getpages() code for the FS in question's *_{get|put}pages).

Obviously, it would be better if each local media FS implemented a
more optimal method, instead of calling an exported interface from
the /sys/vm/vnode_pager.c, but this is a necessary first step in
getting the FS's to a point where they can be supplied with better
implementations on a case-by-case basis.

Obviously, the cd9660_putpages() can be rather trivial (since it
is a read-only FS type 8-)).

A slight (temporary) modification is made to print a diagnostic message
in the case where the underlying filesystem attempts to engage in the
previous behaviour.  Failure is likely to be ungraceful.

Submitted by:	terry@freebsd.org (Terry Lambert)
1998-02-26 06:39:59 +00:00
John Dyson
660957521c Fix page prezeroing for SMP, and fix some potential paging-in-progress
hangs.  The paging-in-progress diagnosis was a result of Tor Egge's
excellent detective work.
Submitted by:	Partially from Tor Egge.
1998-02-25 03:56:15 +00:00
John Dyson
a15403de7b Correct some severe VM tuning problems for small systems (<=16MB), and
improve tuning on larger systems.  (A couple of the VM tuning params for
small systems were so badly chosen that the system could hang under load.)

The broken tuning was originaly my fault.
1998-02-24 10:16:23 +00:00
John Dyson
e47ed70b0f Significantly improve the efficiency of the swap pager, which appears to
have declined due to code-rot over time.  The swap pager rundown code
has been clean-up, and unneeded wakeups removed.  Lots of splbio's
are changed to splvm's.  Also, set the dynamic tunables for the
pageout daemon to be more sane for larger systems (thereby decreasing
the daemon overheadla.)
1998-02-23 08:22:48 +00:00
John Dyson
d9bed5bee1 Try to dynamically size the VM_KMEM_SIZE (but is still able to be overridden
in a way identically as before.)  I had problems with the system properly
handling the number of vnodes when there is alot of system memory, and the
default VM_KMEM_SIZE.  Two new options "VM_KMEM_SIZE_SCALE" and
"VM_KMEM_SIZE_MAX" have been added to support better auto-sizing for systems
with greater than 128MB.

Add some accouting for vm_zone memory allocations, and provide properly
for vm_zone allocations out of the kmem_map.  Also move the vm_zone
allocation stats to the VM OID tree from the KERN OID tree.
1998-02-23 07:42:43 +00:00
Bruce Evans
39e4376ba7 Removed unused #includes. 1998-02-20 13:11:54 +00:00
Mike Smith
eed5086e1a Move the 'sw' device off block major #1, which is now occupied by 'wfd'. 1998-02-19 12:15:06 +00:00
Eivind Eklund
303b270b0a Staticize. 1998-02-09 06:11:36 +00:00
John Dyson
157ac55f97 Fix an argument to vn_lock. It appears that alot of the vn_lock usage
is a bit undisciplined, and should be checked carefully.
1998-02-08 14:55:13 +00:00
Eivind Eklund
0b08f5f737 Back out DIAGNOSTIC changes. 1998-02-06 12:14:30 +00:00
John Dyson
95461b450d 1) Start using a cleaner and more consistant page allocator instead
of the various ad-hoc schemes.
2)	When bringing in UPAGES, the pmap code needs to do another vm_page_lookup.
3)	When appropriate, set the PG_A or PG_M bits a-priori to both avoid some
	processor errata, and to minimize redundant processor updating of page
	tables.
4)	Modify pmap_protect so that it can only remove permissions (as it
	originally supported.)  The additional capability is not needed.
5)	Streamline read-only to read-write page mappings.
6)	For pmap_copy_page, don't enable write mapping for source page.
7)	Correct and clean-up pmap_incore.
8)	Cluster initial kern_exec pagin.
9)	Removal of some minor lint from kern_malloc.
10)	Correct some ioopt code.
11)	Remove some dead code from the MI swapout routine.
12)	Correct vm_object_deallocate (to remove backing_object ref.)
13)	Fix dead object handling, that had problems under heavy memory load.
14)	Add minor vm_page_lookup improvements.
15)	Some pages are not in objects, and make sure that the vm_page.c can
	properly support such pages.
16)	Add some more page deficit handling.
17)	Some minor code readability improvements.
1998-02-05 03:32:49 +00:00
Eivind Eklund
47cfdb166d Turn DIAGNOSTIC into a new-style option. 1998-02-04 22:34:03 +00:00
Bruce Evans
e7a5897899 Added #include of <sys/queue.h> so that this file is more "self"-sufficent. 1998-02-03 22:19:35 +00:00
John Dyson
e736cd05cb This fix should help the panic problems in -current. There
were some errors in "interval" management.  Due to the
clustering mechanism, the code is necessarily complex and
error prone.
1998-02-03 00:50:36 +00:00
Bruce Evans
8bcc577e92 Forward declare more structs that are used in prototypes here - don't
depend on <sys/types.h> forward declaring common ones.
1998-02-01 20:08:39 +00:00
John Dyson
1f13bdaa97 Fix a performance problem caused by an earlier commit. 1998-02-01 02:00:20 +00:00
John Dyson
c15541e7a7 contigalloc doesn't place the allocated page(s) into an object, and
now this breaks vm_page_wire (due to wired page accounting per object.)

This should fix a problem as described by Donald Maddox.
1998-01-31 20:30:18 +00:00
John Dyson
eaf13dd73a Change the busy page mgmt, so that when pages are freed, they
MUST be PG_BUSY.  It is bogus to free a page that isn't busy,
because it is in a state of being "unavailable" when being
freed.  The additional advantage is that the page_remove code
has a better cross-check that the page should be busy and
unavailable for other use.  There were some minor problems
with the collapse code, and this plugs those subtile "holes."

Also, the vfs_bio code wasn't checking correctly for PG_BUSY
pages.  I am going to develop a more consistant scheme for
grabbing pages, busy or otherwise.  For now, we are stuck
with the current morass.
1998-01-31 11:56:53 +00:00
Eivind Eklund
f71bb3a160 Turn NSWAPDEV into a new-style option. 1998-01-25 04:13:25 +00:00
Eivind Eklund
7b778b5e61 Make all file-system (MFS, FFS, NFS, LFS, DEVFS) related option new-style.
This introduce an xxxFS_BOOT for each of the rootable filesystems.
(Presently not required, but encouraged to allow a smooth move of option *FS
to opt_dontuse.h later.)

LFS is temporarily disabled, and will be re-enabled tomorrow.
1998-01-24 02:54:56 +00:00
John Dyson
50ce7ff499 Add better support for larger I/O clusters, including larger physical
I/O.  The support is not mature yet, and some of the underlying implementation
needs help.  However, support does exist for IDE devices now.
1998-01-24 02:01:46 +00:00
John Dyson
2d8acc0f4a VM level code cleanups.
1)	Start using TSM.
	Struct procs continue to point to upages structure, after being freed.
	Struct vmspace continues to point to pte object and kva space for kstack.
	u_map is now superfluous.
2)	vm_map's don't need to be reference counted.  They always exist either
	in the kernel or in a vmspace.  The vmspaces are managed by reference
	counts.
3)	Remove the "wired" vm_map nonsense.
4)	No need to keep a cache of kernel stack kva's.
5)	Get rid of strange looking ++var, and change to var++.
6)	Change more data structures to use our "zone" allocator.  Added
	struct proc, struct vmspace and struct vnode.  This saves a significant
	amount of kva space and physical memory.  Additionally, this enables
	TSM for the zone managed memory.
7)	Keep ioopt disabled for now.
8)	Remove the now bogus "single use" map concept.
9)	Use generation counts or id's for data structures residing in TSM, where
	it allows us to avoid unneeded restart overhead during traversals, where
	blocking might occur.
10)	Account better for memory deficits, so the pageout daemon will be able
	to make enough memory available (experimental.)
11)	Fix some vnode locking problems. (From Tor, I think.)
12)	Add a check in ufs_lookup, to avoid lots of unneeded calls to bcmp.
	(experimental.)
13)	Significantly shrink, cleanup, and make slightly faster the vm_fault.c
	code.  Use generation counts, get rid of unneded collpase operations,
	and clean up the cluster code.
14)	Make vm_zone more suitable for TSM.

This commit is partially as a result of discussions and contributions from
other people, including DG, Tor Egge, PHK, and probably others that I
have forgotten to attribute (so let me know, if I forgot.)

This is not the infamous, final cleanup of the vnode stuff, but a necessary
step.  Vnode mgmt should be correct, but things might still change, and
there is still some missing stuff (like ioopt, and physical backing of
non-merged cache files, debugging of layering concepts.)
1998-01-22 17:30:44 +00:00
John Dyson
480ba2f552 Allow gdb to work again. 1998-01-21 12:18:00 +00:00
John Dyson
4722175765 Tie up some loose ends in vnode/object management. Remove an unneeded
config option in pmap.  Fix a problem with faulting in pages.  Clean-up
some loose ends in swap pager memory management.

The system should be much more stable, but all subtile bugs aren't fixed yet.
1998-01-17 09:17:02 +00:00
John Dyson
925a3a419a Fix some vnode management problems, and better mgmt of vnode free list.
Fix the UIO optimization code.
Fix an assumption in vm_map_insert regarding allocation of swap pagers.
Fix an spl problem in the collapse handling in vm_object_deallocate.
When pages are freed from vnode objects, and the criteria for putting
the associated vnode onto the free list is reached, either put the
vnode onto the list, or put it onto an interrupt safe version of the
list, for further transfer onto the actual free list.
Some minor syntax changes changing pre-decs, pre-incs to post versions.
Remove a bogus timeout (that I added for debugging) from vn_lock.

PHK will likely still have problems with the vnode list management, and
so do I, but it is better than it was.
1998-01-12 01:46:33 +00:00