Commit Graph

311 Commits

Author SHA1 Message Date
wollman
16604ab260 Implement POSIX.1b shared memory objects. In this implementation,
shared memory objects are regular files; the shm_open(3) routine
uses fcntl(2) to set a flag on the descriptor which tells mmap(2)
to automatically apply MAP_NOSYNC.

Not objected to by: bde, dillon, dufault, jasone
2000-04-22 15:22:31 +00:00
charnier
5c16c2a8f1 Revert spelling mistake I made in the previous commit
Requested by: Alan and Bruce
2000-03-27 20:41:17 +00:00
charnier
686df89909 Spelling 2000-03-26 15:20:23 +00:00
ps
c3800346ab Add MAP_NOCORE to mmap(2), and MADV_NOCORE and MADV_CORE to madvise(2).
This
This feature allows you to specify if mmap'd data is included in
an application's corefile.

Change the type of eflags in struct vm_map_entry from u_char to
vm_eflags_t (an unsigned int).

Reviewed by:	dillon,jdp,alfred
Approved by:	jkh
2000-02-28 04:10:35 +00:00
dillon
7a2987cf94 Fix null-pointer dereference crash when the system is intentionally
run out of KVM through a mmap()/fork() bomb that allocates hundreds
    of thousands of vm_map_entry structures.

    Add panic to make null-pointer dereference crash a little more verbose.

    Add a new sysctl, vm.max_proc_mmap, which specifies the maximum number
    of mmap()'d spaces (discrete vm_map_entry's in the process).  The value
    defaults to around 9000 for a 128MB machine.  The test is scaled for the
    number of processes sharing a vmspace (aka linux threads).  Setting
    the value to 0 disables the feature.

PR: kern/16573
Approved by: jkh
2000-02-16 21:11:33 +00:00
guido
ce046933de Use MAP_NOSYNC for vnodes without any links in their filesystem.
This is necessary for vmware: it does not use an anonymous mmap for
the memory of the virtual system. In stead it creates a temp file an
unlinks it. For a 50 MB file, this results in a ot of syncing
every 30 seconds.

Reviewed by:	Matthew Dillon <dillon@backplane.com>
2000-01-03 19:13:53 +00:00
dillon
b66fb2c648 Add MAP_NOSYNC feature to mmap(), and MADV_NOSYNC and MADV_AUTOSYNC to
madvise().

    This feature prevents the update daemon from gratuitously flushing
    dirty pages associated with a mapped file-backed region of memory.  The
    system pager will still page the memory as necessary and the VM system
    will still be fully coherent with the filesystem.  Modifications made
    by other means to the same area of memory, for example by write(), are
    unaffected.  The feature works on a page-granularity basis.

    MAP_NOSYNC allows one to use mmap() to share memory between processes
    without incuring any significant filesystem overhead, putting it in
    the same performance category as SysV Shared memory and anonymous memory.

Reviewed by: julian, alc, dg
1999-12-12 03:19:33 +00:00
phk
8e3c3eafed useracc() the prequel:
Merge the contents (less some trivial bordering the silly comments)
of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>.  This puts
the #defines for the vm_inherit_t and vm_prot_t types next to their
typedefs.

This paves the road for the commit to follow shortly: change
useracc() to use VM_PROT_{READ|WRITE} rather than B_{READ|WRITE}
as argument.
1999-10-29 18:09:36 +00:00
dillon
37bee3bb3f cleanup madvise code, add a few more sanity checks.
Reviewed by:	Alan Cox <alc@cs.rice.edu>,  dg@root.com
1999-09-21 05:00:48 +00:00
peter
3b842d34e8 $Id$ -> $FreeBSD$ 1999-08-28 01:08:13 +00:00
phk
e938d317d5 Decommision miscfs/specfs/specdev.h. Most of it goes into <sys/conf.h>,
a few lines into <sys/vnode.h>.

Add a few fields to struct specinfo, paving the way for the fun part.
1999-08-08 18:43:05 +00:00
alc
3d024af49f vm_mmap:
Insure that device mappings get MAP_PREFAULT(_PARTIAL) set,
	so that 4M page mappings are used when possible.

Reviewed by:	Luoqi Chen <luoqi@watermarkgroup.com>
1999-06-05 18:21:53 +00:00
alc
936a553033 Add the options MAP_PREFAULT and MAP_PREFAULT_PARTIAL to vm_map_find/insert,
eliminating the need for the pmap_object_init_pt calls in imgact_* and
mmap.

Reviewed by:	David Greenman <dg@root.com>
1999-05-17 00:53:56 +00:00
alc
83a9863079 Remove prototypes for functions that don't exist anymore (vm_map.h).
Remove a useless argument from vm_map_madvise's interface (vm_map.c,
	vm_map.h, and vm_mmap.c).

Remove a redundant test in vm_uiomove (vm_map.c).

Make two changes to vm_object_coalesce:

1. Determine whether the new range of pages actually overlaps
the existing object's range of pages before calling vm_object_page_remove.
(Prior to this change almost 90% of the calls to vm_object_page_remove
were to remove pages that were beyond the end of the object.)

2. Free any swap space allocated to removed pages.
1999-05-16 05:07:34 +00:00
alc
e79d7db00b Simplify vm_map_find/insert's interface: remove the MAP_COPY_NEEDED option.
It never makes sense to specify MAP_COPY_NEEDED without also specifying
MAP_COPY_ON_WRITE, and vice versa.  Thus, MAP_COPY_ON_WRITE suffices.

Reviewed by:	David Greenman <dg@root.com>
1999-05-14 23:09:34 +00:00
peter
d247469429 Add brackets to silence egcs and help clarity. 1999-05-06 22:06:45 +00:00
luoqi
148df5d8ab Don't ignore mmap() address hint below the text section. 1999-05-06 00:46:19 +00:00
phk
16e3fbd2c1 Suser() simplification:
1:
  s/suser/suser_xxx/

2:
  Add new function: suser(struct proc *), prototyped in <sys/proc.h>.

3:
  s/suser_xxx(\([a-zA-Z0-9_]*\)->p_ucred, \&\1->p_acflag)/suser(\1)/

The remaining suser_xxx() calls will be scrutinized and dealt with
later.

There may be some unneeded #include <sys/cred.h>, but they are left
as an exercise for Bruce.

More changes to the suser() API will come along with the "jail" code.
1999-04-27 11:18:52 +00:00
peter
a74bdeb7d1 unifdef -DVM_STACK - it's been on for a while for x86 and was checked
and appeared to be working for the Alpha some time ago.
1999-04-19 14:14:14 +00:00
alc
9e3d479b9d To avoid a conflict for the vm_map's lock with vm_fault, release
the read lock around the subyte operations in mincore.  After the lock is
reacquired, use the map's timestamp to determine if we need to restart
the scan.
1999-03-02 22:55:02 +00:00
alc
4d728cf4b3 mincore doesn't modify the vm_map. Therefore, it doesn't require
an exclusive lock.  A read lock will suffice.
1999-03-01 20:42:16 +00:00
luoqi
082d37c1ac Hide access to vmspace:vm_pmap with inline function vmspace_pmap(). This
is the preparation step for moving pmap storage out of vmspace proper.

Reviewed by:	Alan Cox	<alc@cs.rice.edu>
		Matthew Dillion	<dillon@apollo.backplane.com>
1999-02-19 14:25:37 +00:00
dillon
98732ec693 Remove MAP_ENTRY_IS_A_MAP 'share' maps. These maps were once used to
attempt to optimize forks but were essentially given-up on due to
    problems and replaced with an explicit dup of the vm_map_entry structure.
    Prior to the removal, they were entirely unused.
1999-02-07 21:48:23 +00:00
julian
4b7738dba1 Mostly remove the VM_STACK OPTION.
This changes the definitions of a few items so that structures are the
same whether or not the option itself is enabled. This allows
people to enable and disable the option without recompilng the world.

As the author says:

|I ran into a problem pulling out the VM_STACK option.  I was aware of this
|when I first did the work, but then forgot about it.  The VM_STACK stuff
|has some code changes in the i386 branch.  There need to be corresponding
|changes in the alpha branch before it can come out completely.

what is done:
|
|1) Pull the VM_STACK option out of the header files it appears in.  This
|really shouldn't affect anything that executes with or without the rest
|of the VM_STACK patches.  The vm_map_entry will then always have one
|extra element (avail_ssize).  It just won't be used if the VM_STACK
|option is not turned on.
|
|I've also pulled the option out of vm_map.c.  This shouldn't harm anything,
|since the routines that are enabled as a result are not called unless
|the VM_STACK option is enabled elsewhere.
|
|2) Add what appears to be appropriate code the the alpha branch, still
|protected behind the VM_STACK switch.  I don't have an alpha machine,
|so we would need to get some testers with alpha machines to try it out.
|
|Once there is some testing, we can consider making the change permanent
|for both i386 and alpha.
|
[..]
|
|Once the alpha code is adequately tested, we can pull VM_STACK out
|everywhere.
|

Submitted by:	"Richard Seaman, Jr." <dick@tar.com>
1999-01-26 02:49:52 +00:00
dillon
df24433bbe This is a rather large commit that encompasses the new swapper,
changes to the VM system to support the new swapper, VM bug
    fixes, several VM optimizations, and some additional revamping of the
    VM code.  The specific bug fixes will be documented with additional
    forced commits.  This commit is somewhat rough in regards to code
    cleanup issues.

Reviewed by:	"John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>
1999-01-21 08:29:12 +00:00
julian
4666ac5027 Add (but don't activate) code for a special VM option to make
downward growing stacks more general.
Add (but don't activate) code to use the new stack facility
when running threads, (specifically the linux threads support).
This allows people to use both linux compiled linuxthreads, and also the
native FreeBSD linux-threads port.

The code is conditional on VM_STACK. Not using this will
produce the old heavily tested system.

Submitted by: Richard Seaman <dick@tar.com>
1999-01-06 23:05:42 +00:00
dt
b35cc94e30 Don't disable mmap with large file offset. 1998-12-09 20:22:21 +00:00
dg
3defb6d13f Fixed two potentially serious classes of bugs:
1) The vnode pager wasn't properly tracking the file size due to
   "size" being page rounded in some cases and not in others.
   This sometimes resulted in corrupted files. First noticed by
   Terry Lambert.
   Fixed by changing the "size" pager_alloc parameter to be a 64bit
   byte value (as opposed to a 32bit page index) and changing the
   pagers and their callers to deal with this properly.
2) Fixed a bogus type cast in round_page() and trunc_page() that
   caused some 64bit offsets and sizes to be scrambled. Removing
   the cast required adding casts at a few dozen callers.
   There may be problems with other bogus casts in close-by
   macros. A quick check seemed to indicate that those were okay,
   however.
1998-10-13 08:24:45 +00:00
dfr
e2df972eb1 Cosmetic changes to the PAGE_XXX macros to make them consistent with
the other objects in vm.
1998-09-04 08:06:57 +00:00
dfr
5fdaeb281d Change various syscalls to use size_t arguments instead of u_int.
Add some overflow checks to read/write (from bde).

Change all modifications to vm_page::flags, vm_page::busy, vm_object::flags
and vm_object::paging_in_progress to use operations which are not
interruptable.

Reviewed by: Bruce Evans <bde@zeta.org.au>
1998-08-24 08:39:39 +00:00
bde
863d5c8b68 Cast pointers to uintptr_t/intptr_t instead of to u_long/long,
respectively.  Most of the longs should probably have been
u_longs, but this changes is just to prevent warnings about
casts between pointers and integers of different sizes, not
to fix poorly chosen types.
1998-07-15 02:32:35 +00:00
dfr
918a149d3c Don't truncate the return value of mmap to sizeof(int). 1998-07-05 11:56:52 +00:00
bde
9e868cbb1a Removed unused includes. 1998-06-21 18:02:50 +00:00
dfr
1d5f38ac22 This commit fixes various 64bit portability problems required for
FreeBSD/alpha.  The most significant item is to change the command
argument to ioctl functions from int to u_long.  This change brings us
inline with various other BSD versions.  Driver writers may like to
use (__FreeBSD_version == 300003) to detect this change.

The prototype FreeBSD/alpha machdep will follow in a couple of days
time.
1998-06-07 17:13:14 +00:00
peter
34e429e59e Make the previous commit compile.. 1998-05-19 07:13:21 +00:00
guido
6f968c2fa2 Plug hole reported on Bugtraq: do not allow mmap with WRITE privs for
append-only and immutable files.

Obtained from: OpenBSD (partly)
1998-05-18 18:26:27 +00:00
guido
7549a1c33e Fix for mmap of char devices bug as described in OpenBSD advisory of
1998/02/20
Reviewed by:	John Dyson
Submitted by:	"Cy Schubert" <cschuber@uumail.gov.bc.ca>
1998-03-12 19:36:18 +00:00
dyson
8ceb6160f4 This mega-commit is meant to fix numerous interrelated problems. There
has been some bitrot and incorrect assumptions in the vfs_bio code.  These
problems have manifest themselves worse on NFS type filesystems, but can
still affect local filesystems under certain circumstances.  Most of
the problems have involved mmap consistancy, and as a side-effect broke
the vfs.ioopt code.  This code might have been committed seperately, but
almost everything is interrelated.

1)	Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that
	are fully valid.
2)	Rather than deactivating erroneously read initial (header) pages in
	kern_exec, we now free them.
3)	Fix the rundown of non-VMIO buffers that are in an inconsistent
	(missing vp) state.
4)	Fix the disassociation of pages from buffers in brelse.  The previous
	code had rotted and was faulty in a couple of important circumstances.
5)	Remove a gratuitious buffer wakeup in vfs_vmio_release.
6)	Remove a crufty and currently unused cluster mechanism for VBLK
	files in vfs_bio_awrite.  When the code is functional, I'll add back
	a cleaner version.
7)	The page busy count wakeups assocated with the buffer cache usage were
	incorrectly cleaned up in a previous commit by me.  Revert to the
	original, correct version, but with a cleaner implementation.
8)	The cluster read code now tries to keep data associated with buffers
	more aggressively (without breaking the heuristics) when it is presumed
	that the read data (buffers) will be soon needed.
9)	Change to filesystem lockmgr locks so that they use LK_NOPAUSE.  The
	delay loop waiting is not useful for filesystem locks, due to the
	length of the time intervals.
10)	Correct and clean-up spec_getpages.
11)	Implement a fully functional nfs_getpages, nfs_putpages.
12)	Fix nfs_write so that modifications are coherent with the NFS data on
	the server disk (at least as well as NFS seems to allow.)
13)	Properly support MS_INVALIDATE on NFS.
14)	Properly pass down MS_INVALIDATE to lower levels of the VM code from
	vm_map_clean.
15)	Better support the notion of pages being busy but valid, so that
	fewer in-transit waits occur.  (use p->busy more for pageouts instead
	of PG_BUSY.)  Since the page is fully valid, it is still usable for
	reads.
16)	It is possible (in error) for cached pages to be busy.  Make the
	page allocation code handle that case correctly.  (It should probably
	be a printf or panic, but I want the system to handle coding errors
	robustly.  I'll probably add a printf.)
17)	Correct the design and usage of vm_page_sleep.  It didn't handle
	consistancy problems very well, so make the design a little less
	lofty.  After vm_page_sleep, if it ever blocked, it is still important
	to relookup the page (if the object generation count changed), and
	verify it's status (always.)
18)	In vm_pageout.c, vm_pageout_clean had rotted, so clean that up.
19)	Push the page busy for writes and VM_PROT_READ into vm_pageout_flush.
20)	Fix vm_pager_put_pages and it's descendents to support an int flag
	instead of a boolean, so that we can pass down the invalidate bit.
1998-03-07 21:37:31 +00:00
eivind
4547a09753 Back out DIAGNOSTIC changes. 1998-02-06 12:14:30 +00:00
eivind
c552a9a1c3 Turn DIAGNOSTIC into a new-style option. 1998-02-04 22:34:03 +00:00
alex
e87ef9d530 caddr_t --> void * 1997-12-31 02:35:29 +00:00
eivind
01dd6091ed Make COMPAT_43 and COMPAT_SUNOS new-style options. 1997-12-16 17:40:42 +00:00
phk
4c8218a5c7 Move the "retval" (3rd) parameter from all syscall functions and put
it in struct proc instead.

This fixes a boatload of compiler warning, and removes a lot of cruft
from the sources.

I have not removed the /*ARGSUSED*/, they will require some looking at.

libkvm, ps and other userland struct proc frobbing programs will need
recompiled.
1997-11-06 19:29:57 +00:00
bde
a08aff2d02 Removed unused #includes. 1997-09-01 03:17:34 +00:00
peter
29e2c84e7a Allow non-page aligned file offset mmap's, providing that the system is
allowed to choose the address, or that the MAP_FIXED address has the same
remainder when modulo PAGE_SIZE as the file offset.  Apparently this is
posix1003.1b specified behavior.  SVR4 and the other *BSD's allow it too.
It costs us nothing to support and means we don't get EINVAL on some mmap
code that works perfectly elsewhere.

Obtained from: NetBSD
1997-08-30 18:50:06 +00:00
bde
c978fb3652 Fixed type mismatches for functions with args of type vm_prot_t and/or
vm_inherit_t.  These types are smaller than ints, so the prototypes
should have used the promoted type (int) to match the old-style function
definitions.  They use just vm_prot_t and/or vm_inherit_t.  This depends
on gcc features to work.  I fixed the definitions since this is easiest.
The correct fix may be to change the small types to u_int, to optimize
for time instead of space.
1997-08-25 22:15:31 +00:00
dyson
b39089e3e9 Add support for 4MB pages. This includes the .text, .data, .data parts
of the kernel, and also most of the dynamic parts of the kernel.  Additionally,
4MB pages will be allocated for display buffers as appropriate (only.)

The 4MB support for SMP isn't complete, but doesn't interfere with operation
either.
1997-07-17 04:34:03 +00:00
dyson
db14cfe28c Correct the return code for the mlock system call. Also add the stubs
for mlockall and munlockall.
1997-06-15 23:35:32 +00:00
bde
0d3591bdbd Don't #include <sys/fcntl.h> in <sys/file.h> if KERNEL is defined.
Fixed everything that depended on getting fcntl.h stuff from the wrong
place.  Most things don't depend on file.h stuff at all.
1997-03-23 03:37:54 +00:00
peter
94b6d72794 Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.
1997-02-22 09:48:43 +00:00
dyson
10f666af84 This is the kernel Lite/2 commit. There are some requisite userland
changes, so don't expect to be able to run the kernel as-is (very well)
without the appropriate Lite/2 userland changes.

The system boots and can mount UFS filesystems.

Untested: ext2fs, msdosfs, NFS
Known problems: Incorrect Berkeley ID strings in some files.
		Mount_std mounts will not work until the getfsent
		library routine is changed.

Reviewed by:	various people
Submitted by:	Jeffery Hsu <hsu@freebsd.org>
1997-02-10 02:22:35 +00:00
dyson
52f682b582 Change the map entry flags from bitfields to bitmasks. Allows
for some code simplification.
1997-01-16 04:16:22 +00:00
jkh
808a36ef65 Make the long-awaited change from $Id$ to $FreeBSD$
This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore.  This update would have been
insane otherwise.
1997-01-14 07:20:47 +00:00
dyson
73dafb0b2c Prepare better for multi-platform by eliminating another required
pmap routine (pmap_is_referenced.)  Upper level recoded to use
pmap_ts_referenced.
1997-01-11 07:19:02 +00:00
dyson
c232302d3f Let the VM system know that on certain arch's that VM_PROT_READ
also implies VM_PROT_EXEC.  We support it that way for now,
since the break system call by default gives VM_PROT_ALL.  Now
we have a better chance of coalesing map entries when mixing
mmap/break type operations.  This was contributing to excessive
numbers of map entries on the modula-3 runtime system.  The
problem is still not "solved", but the situation makes more
sense.

Eventually, when we work on architectures where VM_PROT_READ
is orthogonal to VM_PROT_EXEC, we will have to visit this
issue carefully (esp. regarding security issues.)
1996-12-30 05:31:21 +00:00
dyson
527a08777f The code unnecessarily created an object with no handle up-front, which
has the negative effect of disabling some map optimizations.  This
patch defers the creation of the object until it needs to be at fault time.
Submitted by:	Alan Cox <alc@cs.rice.edu>
1996-12-28 22:40:44 +00:00
joerg
63b6a05776 Make DFLDSIZ and MAXDSIZ fully-supported options.
"Don't forget to do a ``make depend''" :-)
1996-12-22 23:17:09 +00:00
dyson
765e5fd282 Implement closer-to POSIX mlock semantics. The major difference is
that we do allow mlock to span unallocated regions (of course, not
mlocking them.)  We also allow mlocking of RO regions (which the old
code couldn't.)  The restriction there is that once a RO region is
wired (mlocked), it cannot be debugged (or EVER written to.)

Under normal usage, the new mlock code will be a significant improvement
over our old stuff.
1996-12-14 17:54:17 +00:00
dyson
70bcdaf44a Change mmap to use OBJT_DEFAULT instead of OBJT_SWAP by default
for anonymous objects.  The system will automatically change the
type to SWAP if needed (for size or pageout reasons.)
1996-10-29 22:07:11 +00:00
dyson
30a1549ad1 Remove a bogus optimization in the mmap code. It is superfluous,
and at best is the same speed as the unoptimized code.  At worst, it
slows down trivial programs.
1996-10-24 02:56:23 +00:00
phk
622bd3da23 Remove a stale comment. 1996-10-13 07:16:50 +00:00
dg
f44d436f84 Fixed bug with reversed trunc/round_page() in madvise...start must be
trunced, end must be rounded.
1996-09-19 10:12:41 +00:00
dyson
01ce9d323a Backed out the recent changes/enhancements to the VM code. The
problem with the 'shell scripts' was found, but there was a 'strange'
problem found with a 486 laptop that we could not find.  This commit
backs the code back to 25-jul, and will be re-entered after the snapshot
in smaller (more easily tested) chunks.
1996-07-30 03:08:57 +00:00
dg
3b8df2418d Slight performance tweak for previous commit. 1996-07-28 02:54:09 +00:00
dyson
566ed4fc29 Allow sequentially created mmap'ed anonymous regions to coalesce. There
is little or no reason to create a swap pager for small mmap's.  The
vm_map_insert code will automatically create a swap pager if the object
becomes too large.  This fix, per a request from phk.
1996-07-27 17:21:41 +00:00
dyson
3a44c87901 Remove experimental header file. My test-build must have picked it
up in an unexpected place.
Submitted by:	jkh
1996-07-27 04:06:11 +00:00
dyson
293abd3564 This commit is meant to solve a couple of VM system problems or
performance issues.

	1) The pmap module has had too many inlines, and so the
	   object file is simply bigger than it needs to be.
	   Some common code is also merged into subroutines.
	2) Removal of some *evil* PHYS_TO_VM_PAGE macro calls.
	   Unfortunately, a few have needed to be added also.
	   The removal caused the need for more vm_page_lookups.
	   I added lookup hints to minimize the need for the
	   page table lookup operations.
	3) Removal of some bogus performance improvements, that
	   mostly made the code more complex (tracking individual
	   page table page updates unnecessarily).  Those improvements
	   actually hurt 386 processors perf (not that people who
	   worry about perf use 386 processors anymore :-)).
	4) Changed pv queue manipulations/structures to be TAILQ's.
	5) The pv queue code has had some performance problems since
	   day one.  Some significant scalability issues are resolved
	   by threading the pv entries from the pmap AND the physical
	   address instead of just the physical address.  This makes
	   certain pmap operations run much faster.  This does
	   not affect most micro-benchmarks, but should help loaded system
	   performance *significantly*.  DG helped and came up with most
	   of the solution for this one.
	6) Most if not all pmap bit operations follow the pattern:
		pmap_test_bit();
		pmap_clear_bit();
	   That made for twice the necessary pv list traversal.   The
	   pmap interface now supports only pmap_tc_bit type operations:
	   pmap_[test/clear]_modified, pmap_[test/clear]_referenced.
	   Additionally, the modified routine now takes a vm_page_t arg
	   instead of a phys address.  This eliminates a PHYS_TO_VM_PAGE
	   operation.
	7) Several rewrites of routines that contain redundant code to
	   use common routines, so that there is a greater likelihood of
	   keeping the cache footprint smaller.
1996-07-27 03:24:10 +00:00
dyson
65214cd0c8 This commit is dual-purpose, to fix more of the pageout daemon
queue corruption problems, and to apply Gary Palmer's code cleanups.
David Greenman helped with these problems also.  There is still
a hang problem using X in small memory machines.
1996-05-31 00:38:04 +00:00
dyson
d3600176f4 Initial support for mincore and madvise. Both are almost fully
supported, except madvise does not page in with MADV_WILLNEED, and
MADV_DONTNEED doesn't force dirty pages out.
1996-05-19 07:36:50 +00:00
dyson
242e10df11 This set of commits to the VM system does the following, and contain
contributions or ideas from Stephen McKay <syssgm@devetir.qld.gov.au>,
Alan Cox <alc@cs.rice.edu>, David Greenman <davidg@freebsd.org> and me:

	More usage of the TAILQ macros.  Additional minor fix to queue.h.
	Performance enhancements to the pageout daemon.
		Addition of a wait in the case that the pageout daemon
		has to run immediately.
		Slightly modify the pageout algorithm.
	Significant revamp of the pmap/fork code:
		1) PTE's and UPAGES's are NO LONGER in the process's map.
		2) PTE's and UPAGES's reside in their own objects.
		3) TOTAL elimination of recursive page table pagefaults.
		4) The page directory now resides in the PTE object.
		5) Implemented pmap_copy, thereby speeding up fork time.
		6) Changed the pv entries so that the head is a pointer
		   and not an entire entry.
		7) Significant cleanup of pmap_protect, and pmap_remove.
		8) Removed significant amounts of machine dependent
		   fork code from vm_glue.  Pushed much of that code into
		   the machine dependent pmap module.
		9) Support more completely the reuse of already zeroed
		   pages (Page table pages and page directories) as being
		   already zeroed.
	Performance and code cleanups in vm_map:
		1) Improved and simplified allocation of map entries.
		2) Improved vm_map_copy code.
		3) Corrected some minor problems in the simplify code.
	Implemented splvm (combo of splbio and splimp.)  The VM code now
		seldom uses splhigh.
	Improved the speed of and simplified kmem_malloc.
	Minor mod to vm_fault to avoid using pre-zeroed pages in the case
		of objects with backing objects along with the already
		existant condition of having a vnode.  (If there is a backing
		object, there will likely be a COW...  With a COW, it isn't
		necessary to start with a pre-zeroed page.)
	Minor reorg of source to perhaps improve locality of ref.
1996-05-18 03:38:05 +00:00
phk
5d01dc3d50 Another sweep over the pmap/vm macros, this time with more focus on
the usage.  I'm not satisfied with the naming, but now at least there is
less bogus stuff around.
1996-05-03 21:01:54 +00:00
dg
47ab1071a2 Force device mappings to always be shared. It doesn't make sense for them
to ever be COW and we need the mappings to be shared for backward
compatibilty.

Reviewed by:	dyson
1996-03-16 15:00:05 +00:00
dyson
df05bab7f8 Allow mmap'ed devices to work correctly across forks. The sanest
solution appeared to be to allow the child to maintain the same mapping as
the parent.
1996-03-12 02:27:20 +00:00
peter
7a5076f346 Oops.. I nearly forgot the actual core of the length/rounding/etc fixes
that Bruce asked for.

These still are not quite perfect, and in particular, it can get
upset on extreme boundary cases (addr = 0xfff, len = 0xffffffff,
which would end up mapping a single page rather than failing), but
this is better code that I committed before.

(note, the VM system does not (apparently) support single mmap segment
sizes above 0x80000000 anyway)
1996-03-02 17:14:09 +00:00
dyson
ed1fa57da8 1) Eliminate unnecessary bzero of UPAGES.
2) Eliminate unnecessary copying of pages during/after forks.
3) Add user map simplification.
1996-03-02 02:54:24 +00:00
peter
5239b23b5d kern_descrip.c: add fdshare()/fdcopy()
kern_fork.c: add the tiny bit of code for rfork operation.
kern/sysv_*: shmfork() takes one less arg, it was never used.
sys/shm.h: drop "isvfork" arg from shmfork() prototype
sys/param.h: declare rfork args.. (this is where OpenBSD put it..)
sys/filedesc.h: protos for fdshare/fdcopy.
vm/vm_mmap.c: add minherit code, add rounding to mmap() type args where
it makes sense.
vm/*: drop unused isvfork arg.

Note: this rfork() implementation copies the address space mappings,
it does not connect the mappings together.  ie: once the two processes
have split, the pages may be shared, but the address space is not. If one
does a mmap() etc, it does not appear in the other.  This makes it not
useful for pthreads, but it is useful in it's own right for having
light-weight threads in a static shared address space.

Obtained from: Original by Ron Minnich, extended by OpenBSD
1996-02-23 18:49:25 +00:00
dyson
8fc8a772af Eliminated many redundant vm_map_lookup operations for vm_mmap.
Speed up for vfs_bio -- addition of a routine bqrelse to greatly diminish
	overhead for merged cache.
Efficiency improvement for vfs_cluster.  It used to do alot of redundant
	calls to cluster_rbuild.
Correct the ordering for vrele of .text and release of credentials.
Use the selective tlb update for 486/586/P6.
Numerous fixes to the size of objects allocated for files.  Additionally,
	fixes in the various pagers.
Fixes for proper positioning of vnode_pager_setsize in msdosfs and ext2fs.
Fixes in the swap pager for exhausted resources.  The pageout code
	will not as readily thrash.
Change the page queue flags (PG_ACTIVE, PG_INACTIVE, PG_FREE, PG_CACHE) into
	page queue indices (PQ_ACTIVE, PQ_INACTIVE, PQ_FREE, PQ_CACHE),
	thereby improving efficiency of several routines.
Eliminate even more unnecessary vm_page_protect operations.
Significantly speed up process forks.
Make vm_object_page_clean more efficient, thereby eliminating the pause
	that happens every 30seconds.
Make sequential clustered writes B_ASYNC instead of B_DELWRI even in the
	case of filesystems mounted async.
Fix a panic with busy pages when write clustering is done for non-VMIO
	buffers.
1996-01-19 04:00:31 +00:00
bde
ca36bb34d1 Fixed 1TB filesize changes. Some pindexes had bogus names and types
but worked because vm_pindex_t is indistinuishable from vm_offset_t.
1995-12-17 07:19:58 +00:00
dyson
37543c2e24 There was a bug that the size for an msync'ed region was not rounded
up.  The effect of this was that msync with a size would generally sync
1 page less than it should.  This problem was brought to my attention
by Darrel Herbst <dherbst@gradin.cis.upenn.edu> and Ron Minnich
<rminnich@sarnoff.com>.
1995-12-13 12:28:39 +00:00
dyson
601ed1a4c0 Changes to support 1Tb filesizes. Pages are now named by an
(object,index) pair instead of (object,offset) pair.
1995-12-11 04:58:34 +00:00
dg
c30f46c534 Untangled the vm.h include file spaghetti. 1995-12-07 12:48:31 +00:00
bde
754df9d25d Completed function declarations and/or added prototypes.
Staticized some functions.

__purified some functions.  Some functions were bogusly declared as
returning `const'.  This hasn't done anything since gcc-2.5.  For
later versions of gcc, the equivalent is __attribute__((const)) at
the end of function declarations.
1995-12-03 12:18:39 +00:00
bde
aa9a60640e Included <sys/sysproto.h> to get central declarations for syscall args
structs and prototypes for syscalls.

Ifdefed duplicated decentralized declarations of args structs.  It's
convenient to have this visible but they are hard to maintain.  Some
are already different from the central declarations.  4.4lite2 puts
them in comments in the function headers but I wanted to avoid the
large changes for that.
1995-11-12 06:43:28 +00:00
dyson
eca6c1d907 First phase of removing the PG_COPYONWRITE flag, and an architectural
cleanup of mapping files.
1995-10-23 03:49:43 +00:00
dyson
4519f95061 Implement mincore system call. 1995-10-21 17:42:28 +00:00
dg
c8b0a7332c NOTE: libkvm, w, ps, 'top', and any other utility which depends on struct
proc or any VM system structure will have to be rebuilt!!!

Much needed overhaul of the VM system. Included in this first round of
changes:

1) Improved pager interfaces: init, alloc, dealloc, getpages, putpages,
   haspage, and sync operations are supported. The haspage interface now
   provides information about clusterability. All pager routines now take
   struct vm_object's instead of "pagers".

2) Improved data structures. In the previous paradigm, there is constant
   confusion caused by pagers being both a data structure ("allocate a
   pager") and a collection of routines. The idea of a pager structure has
   escentially been eliminated. Objects now have types, and this type is
   used to index the appropriate pager. In most cases, items in the pager
   structure were duplicated in the object data structure and thus were
   unnecessary. In the few cases that remained, a un_pager structure union
   was created in the object to contain these items.

3) Because of the cleanup of #1 & #2, a lot of unnecessary layering can now
   be removed. For instance, vm_object_enter(), vm_object_lookup(),
   vm_object_remove(), and the associated object hash list were some of the
   things that were removed.

4) simple_lock's removed. Discussion with several people reveals that the
   SMP locking primitives used in the VM system aren't likely the mechanism
   that we'll be adopting. Even if it were, the locking that was in the code
   was very inadequate and would have to be mostly re-done anyway. The
   locking in a uni-processor kernel was a no-op but went a long way toward
   making the code difficult to read and debug.

5) Places that attempted to kludge-up the fact that we don't have kernel
   thread support have been fixed to reflect the reality that we are really
   dealing with processes, not threads. The VM system didn't have complete
   thread support, so the comments and mis-named routines were just wrong.
   We now use tsleep and wakeup directly in the lock routines, for instance.

6) Where appropriate, the pagers have been improved, especially in the
   pager_alloc routines. Most of the pager_allocs have been rewritten and
   are now faster and easier to maintain.

7) The pagedaemon pageout clustering algorithm has been rewritten and
   now tries harder to output an even number of pages before and after
   the requested page. This is sort of the reverse of the ideal pagein
   algorithm and should provide better overall performance.

8) Unnecessary (incorrect) casts to caddr_t in calls to tsleep & wakeup
   have been removed. Some other unnecessary casts have also been removed.

9) Some almost useless debugging code removed.

10) Terminology of shadow objects vs. backing objects straightened out.
    The fact that the vm_object data structure escentially had this
    backwards really confused things. The use of "shadow" and "backing
    object" throughout the code is now internally consistent and correct
    in the Mach terminology.

11) Several minor bug fixes, including one in the vm daemon that caused
    0 RSS objects to not get purged as intended.

12) A "default pager" has now been created which cleans up the transition
    of objects to the "swap" type. The previous checks throughout the code
    for swp->pg_data != NULL were really ugly. This change also provides
    the rudiments for future backing of "anonymous" memory by something
    other than the swap pager (via the vnode pager, for example), and it
    allows the decision about which of these pagers to use to be made
    dynamically (although will need some additional decision code to do
    this, of course).

13) (dyson) MAP_COPY has been deprecated and the corresponding "copy
    object" code has been removed. MAP_COPY was undocumented and non-
    standard. It was furthermore broken in several ways which caused its
    behavior to degrade to MAP_PRIVATE. Binaries that use MAP_COPY will
    continue to work correctly, but via the slightly different semantics
    of MAP_PRIVATE.

14) (dyson) Sharing maps have been removed. It's marginal usefulness in a
    threads design can be worked around in other ways. Both #12 and #13
    were done to simplify the code and improve readability and maintain-
    ability. (As were most all of these changes)

TODO:

1) Rewrite most of the vnode pager to use VOP_GETPAGES/PUTPAGES. Doing
   this will reduce the vnode pager to a mere fraction of its current size.

2) Rewrite vm_fault and the swap/vnode pagers to use the clustering
   information provided by the new haspage pager interface. This will
   substantially reduce the overhead by eliminating a large number of
   VOP_BMAP() calls. The VOP_BMAP() filesystem interface should be
   improved to provide both a "behind" and "ahead" indication of
   contiguousness.

3) Implement the extended features of pager_haspage in swap_pager_haspage().
   It currently just says 0 pages ahead/behind.

4) Re-implement the swap device (swstrategy) in a more elegant way, perhaps
   via a much more general mechanism that could also be used for disk
   striping of regular filesystems.

5) Do something to improve the architecture of vm_object_collapse(). The
   fact that it makes calls into the swap pager and knows too much about
   how the swap pager operates really bothers me. It also doesn't allow
   for collapsing of non-swap pager objects ("unnamed" objects backed by
   other pagers).
1995-07-13 08:48:48 +00:00
dg
24d64ff84a Moved call to VOP_GETATTR() out of vnode_pager_alloc() and into the places
that call vnode_pager_alloc() so that a failure return can be dealt with.
This fixes a panic seen on NFS clients when a file being opened is deleted
on the server before the open completes.
1995-07-09 06:58:03 +00:00
rgrimes
c86f0c7a71 Remove trailing whitespace. 1995-05-30 08:16:23 +00:00
dg
56d21b4218 Accessing pages beyond the end of a mapped file results in internal
inconsistencies in the VM system that eventually lead to a panic. These
changes fix the behavior to conform to the behavior in SunOS, which is
to deny faults to pages beyond the EOF (returning SIGBUS). Internally,
this is implemented by requiring faults to be within the object size
boundaries. These changes exposed another bug, namely that passing in
an offset to mmap when trying to map an unnamed anonymous region also
results in internal inconsistencies. In this case, the offset is forced
to zero.

Reviewed by:	John Dyson and others
1995-05-18 02:59:26 +00:00
dg
a616136616 Moved some zero-initialized variables into .bss. Made code intended to be
called only from DDB #ifdef DDB. Removed some completely unused globals.
1995-04-16 12:56:22 +00:00
dg
1648fcf733 Fix logic bug I just introduced with the flags to msync(). 1995-03-25 17:44:03 +00:00
dg
28ae667091 Disallow both MS_ASYNC and MS_INVALIDATE flags being set at the same time
in msync().
1995-03-25 17:36:00 +00:00
dg
05c4d98f93 Added "flags" argument to msync, and implemented MS_ASYNC and MS_INVALIDATE.
The MS_ASYNC flag doesn't current work, and MS_INVALIDATE will only toss out
the pages in the address space (not all pages in the shadow chain).
1995-03-25 16:55:46 +00:00
dg
0c24a80ae8 Fixed bug in vm_mmap() where the object that is created in some cases
was the wrong size. This is the likely cause of panics reported by
Lars Fredriksen and Paul Richards related to a -1 blkno when paging
via the swap_pager.

Submitted by:	John Dyson
1995-03-22 05:08:41 +00:00
dg
0d274572e3 Disallow non page-aligned file offsets in vm_mmap(). We don't support this
in either the high or low level parts of the VM system. Just return EINVAL
in this case, just like SunOS does.
1995-03-21 10:15:52 +00:00
dg
d0d2e323fa Fixed bug in the size == 0 case of msync() caused by a bogus return value
check..
1995-03-21 02:54:04 +00:00
dg
9cd78521d8 Removed redundant newlines that were in some panic strings. 1995-03-19 14:29:26 +00:00
bde
289f11acb4 Add and move declarations to fix all of the warnings from `gcc -Wimplicit'
(except in netccitt, netiso and netns) and most of the warnings from
`gcc -Wnested-externs'.  Fix all the bugs found.  There were no serious
ones.
1995-03-16 18:17:34 +00:00
dg
266265f981 Fixed obsolete comment. 1995-03-12 08:11:34 +00:00
dg
ffdff08c04 Fixed object reference count problem that occurred in the MAP_PRIVATE
case after we rewrote vm_mmap(). Added some comments to make it easier
to follow the reference counts.
1995-03-07 17:27:49 +00:00
dg
a40bad2c2d Rewrote MAP_PRIVATE case of vm_mmap() - all of the COW portion of this
routine was highly convoluted.

Submitted by:	John Dyson
1995-02-22 08:40:54 +00:00
dg
23db83ee1f Deprecated remaining use of vm_deallocate. Deprecated vm_allocate_with_
pager(). Almost completely rewrote vm_mmap(); when John gets done with
the bottom half, it will be a complete rewrite. Deprecated most use of
vm_object_setpager(). Removed side effect of setting object persist
in vm_object_enter and moved this into the pager(s). A few other
cosmetic changes.
1995-02-21 01:22:48 +00:00
dg
a33bb795d3 Don't bother calling pmap_create() when creating the temporary map.
The whole COW section of vm_mmap() should be rewritten; the current
implementation is highly convoluted.
1995-02-15 09:22:17 +00:00
dg
1707d41102 These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.

The majority of the merged VM/cache work is by John Dyson.

The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.

vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme.  The scheme is almost fully compatible with the old filesystem
interface.  Significant improvement in the number of opportunities for write
clustering.

vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache.  Fixup of vfs_cluster to eliminate the bogus pagemove stuff.

vm_object.c:
Yet more improvements in the collapse code.  Elimination of some windows that
can cause list corruption.

vm_pageout.c:
Fixed it, it really works better now.  Somehow in 2.0, some "enhancements"
broke the code.  This code has been reworked from the ground-up.

vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.

pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.

vm_glue.c
Much simpler and more effective swapping code.  No more gratuitous swapping.

proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.

swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency.  Now the
code doesn't need it anymore.

machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.

machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.

ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache.  Add "bypass" support for sneaking in on
busy buffers.

Submitted by:	John Dyson and David Greenman
1995-01-09 16:06:02 +00:00
phk
09c3293a0f Cosmetics: unused vars, ()'s, #include's &c &c to silence gcc.
Reviewed by:	davidg
1994-10-09 01:52:19 +00:00
dg
ff6fc2bc2a Whoops, accidently left out some pieces of the munmapfd patch. 1994-09-02 15:06:51 +00:00
dg
f35b3c52c8 Enabled page table preloading of cached objects.
Submitted by:	John Dyson
1994-08-06 09:00:50 +00:00
dg
0711a9cff6 Integrated VM system improvements/fixes from FreeBSD-1.1.5. 1994-08-04 03:06:48 +00:00
dg
8d205697aa Added $Id$ 1994-08-02 07:55:43 +00:00
rgrimes
2469c867a1 The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.
Reviewed by:	Rodney W. Grimes
Submitted by:	John Dyson and David Greenman
1994-05-25 09:21:21 +00:00
rgrimes
8fb65ce818 BSD 4.4 Lite Kernel Sources 1994-05-24 10:09:53 +00:00