Commit Graph

721 Commits

Author SHA1 Message Date
David Greenman
e4b7635de2 Added needed splvm() protection around object page traversal in
vm_object_terminate().
1998-10-27 13:22:51 +00:00
Bruce Evans
9cd93b3aec Don't follow null bdevsw pointers. The `major(dev) < nblkdev' test rotted
when bdevsw[] became sparse.  We still depend on magic to avoid having to
check that (v_rdev) device numbers in vnodes are not NODEV.

Removed a redundant `major(dev) < nblkdev' test instead of updating it.

Don't follow a garbage bdevsw pointer for attempts to swap on empty
regular files.  This case currently can't happen.  Swapping on regular
files is ifdefed out in swapon() and isn't attempted for empty files
in nfs_mountroot().
1998-10-25 19:24:04 +00:00
Poul-Henning Kamp
f5ef029e92 Nitpicking and dusting performed on a train. Removes trivial warnings
about unused variables, labels and other lint.
1998-10-25 17:44:59 +00:00
David Greenman
9fcfb650d1 Oops, revert part of last fix. vm_pager_dealloc() can't be called until
after the pages are removed from the object...so fix the problem by
not printing the diagnostic for wired fictitious pages (which is normal).
1998-10-23 05:43:13 +00:00
David Greenman
356863eb01 Fixed two bugs in recent commit: in vm_object_terminate, vm_pager_dealloc
needs to be called prior to freeing remaining pages in the object so that
the device pager has an opportunity to grab its "fake" pages. Also, in
the case of wired pages, the page must be made busy prior to calling
vm_page_remove. This is a difference from 2.2.x that I overlooked when
I brought these changes forward.
1998-10-23 05:25:49 +00:00
David Greenman
0b10ba9822 Make the VM system handle the case where a terminating object contains
legitimately wired pages. Currently we print a diagnostic when this
happens, but this will be removed soon when it will be common for this
to occur with zero-copy TCP/IP buffers.
1998-10-22 02:16:53 +00:00
David Greenman
24166bb84b Convert fake page allocs to use the zone allocator, thus eliminating the
private pool management code in here.
1998-10-22 01:45:29 +00:00
David Greenman
bb7db2c011 Set m->object to NULL in dev_pager_getfake(). 1998-10-21 23:06:50 +00:00
David Greenman
300ee8246e Nuked PG_TABLED flag. Replaced with m->object != NULL. 1998-10-21 14:46:42 +00:00
David Greenman
12d534d2f9 Add a diagnostic printf for freeing a wired page. This will eventually
be turned into a panic, but I want to make sure that all cases of freeing
pages with wire_count==1 (which is/was allowed) have first been fixed.
1998-10-21 11:43:04 +00:00
David Greenman
6cde7a165f Fixed two potentially serious classes of bugs:
1) The vnode pager wasn't properly tracking the file size due to
   "size" being page rounded in some cases and not in others.
   This sometimes resulted in corrupted files. First noticed by
   Terry Lambert.
   Fixed by changing the "size" pager_alloc parameter to be a 64bit
   byte value (as opposed to a 32bit page index) and changing the
   pagers and their callers to deal with this properly.
2) Fixed a bogus type cast in round_page() and trunc_page() that
   caused some 64bit offsets and sizes to be scrambled. Removing
   the cast required adding casts at a few dozen callers.
   There may be problems with other bogus casts in close-by
   macros. A quick check seemed to indicate that those were okay,
   however.
1998-10-13 08:24:45 +00:00
John Polstra
9b35a0d694 Fix a panic on SMP systems, caused by sleeping while holding a
simple-lock.

The reviewer raises the following caveat: "I believe these changes
open a non-critical race condition when adding memory to the pool
for the zone. I think what will happen is that you could have two
threads that are simultaneously adding additional memory when the
pool runs out. This appears to not be a problem, however, since
the re-aquisition of the lock will protect the list pointers."
The submitter agrees that the race is non-critical, and points out
that it already existed for the non-SMP case.  He suggests that
perhaps a sleep lock (using the lock manager) should be used to
close that race.  This might be worth revisiting after 3.0 is
released.

Reviewed by:	dg (David Greenman)
Submitted by:	tegge (Tor Egge)
1998-10-09 00:24:49 +00:00
John Polstra
a0fce82724 Fix a bug in which a page index was used where a byte offset was
expected.  This bug caused builds of Modula-3 to fail in mysterious
ways on SMP kernels.  More precisely, such builds failed on systems
with kern.fast_vfork equal to 0, the default and only supported
value for SMP kernels.

PR:		kern/7468
Submitted by:	tegge (Tor Egge)
1998-10-01 20:46:41 +00:00
Andrzej Bialecki
faa5f8d8da Make #define NO_SWAPPING a normal kernel config option.
Reviewed by:	jkh
1998-09-29 17:33:59 +00:00
Robert V. Baron
6e3a3f387c John Dyson approved of this solution; make vnode_pager_input_old set m->valid 1998-09-28 23:58:10 +00:00
David Greenman
ce65e68c03 Be more selctive about when we clear p->valid.
Submitted by:	John Dyson <toor@dyson.iquest.net>
1998-09-28 02:40:11 +00:00
Bruce Evans
4af2cddffe Removed unused file. 1998-09-20 06:28:10 +00:00
Bruce Evans
500b04a257 Instantiate `nfs_mount_type' in a standard file so that it is present
when nfs is an LKM.  Declare it in a header file.  Don't forget to use
it in non-Lite2 code.  Initialize it to -1 instead of to 0, since 0
will soon be the mount type number for the first vfs loaded.

NetBSD uses strcmp() to avoid this ugly global.
1998-09-05 15:17:34 +00:00
Doug Rabson
e69763a315 Cosmetic changes to the PAGE_XXX macros to make them consistent with
the other objects in vm.
1998-09-04 08:06:57 +00:00
Garrett Wollman
85e7f5492b Separate wakeup conditions for page I/O count (pg_busy) and lock (PG_BUSY).
This is not sa completely solution to the deadlock, but the additional wakeups
have helped in my observation.

Suggested by: John Dyson
1998-09-01 17:12:19 +00:00
Luoqi Chen
c576d12115 Fix a rounding problem that causes vnode pager to fail to remove the last
partially filled page during a truncation.

PR:		kern/7422
1998-08-25 13:47:37 +00:00
Doug Rabson
069e9bc1b4 Change various syscalls to use size_t arguments instead of u_int.
Add some overflow checks to read/write (from bde).

Change all modifications to vm_page::flags, vm_page::busy, vm_object::flags
and vm_object::paging_in_progress to use operations which are not
interruptable.

Reviewed by: Bruce Evans <bde@zeta.org.au>
1998-08-24 08:39:39 +00:00
Stephen McKay
9e80236560 Correct/clarify some comments. 1998-08-22 15:24:09 +00:00
Doug Rabson
196e9a52ec Protect all modifications to paging_in_progress with splvm(). 1998-08-13 08:05:13 +00:00
Doug Rabson
d474eaaa5f Protect all modifications to paging_in_progress with splvm(). The i386
managed to avoid corruption of this variable by luck (the compiler used a
memory read-modify-write instruction which wasn't interruptable) but other
architectures cannot.

With this change, I am now able to 'make buildworld' on the alpha (sfx: the
crowd goes wild...)
1998-08-06 08:33:19 +00:00
Bruce Evans
ccbbd9271b Fixed two spl nesting bugs. They caused (at least) the entire pageout
daemon to run at splvm() forever after swap_pager_putpages() is called
from vm_pageout_scan().

Broken in: rev.1.189 (1998/02/23)
1998-07-28 15:30:01 +00:00
Doug Rabson
56e7ede1c4 Notify pmap when a page is freed on the alpha to allow it to clean up
its emulated modified/referenced bits.
1998-07-26 18:15:20 +00:00
David Greenman
f3679e351a Improved pager input failure message. 1998-07-22 09:38:04 +00:00
Poul-Henning Kamp
db7ac2451b There is a comment in vm_param.h which doesn't belong to the
code still left in there.  The macros it describes disapeared some-
time since 4.4BSD lite.

PR:		7246
Reviewed by:	phk
Submitted by:	Stefan Eggers <seggers@semyam.dinoco.de>
1998-07-22 06:21:55 +00:00
Bruce Evans
15c7382561 Cast pointers to [u]intptr_t instead of to [unsigned] long. 1998-07-15 04:17:55 +00:00
Bruce Evans
a23d65bfc8 Cast pointers to uintptr_t/intptr_t instead of to u_long/long,
respectively.  Most of the longs should probably have been
u_longs, but this changes is just to prevent warnings about
casts between pointers and integers of different sizes, not
to fix poorly chosen types.
1998-07-15 02:32:35 +00:00
Bruce Evans
eb95adeff5 Print pointers using %p instead of attempting to print them by
casting them to long, etc.  Fixed some nearby printf bogons (sign
errors not warned about by gcc, and style bugs, but not truncation
of vm_ooffset_t's).
1998-07-14 12:26:15 +00:00
Bruce Evans
101eeb7f9f Print pointers using %p instead of attempting to print them by
casting them to long, etc.  Fixed some nearby printf bogons (sign
errors not warned about by gcc, and style bugs, but not truncation
of vm_ooffset_t's).

Use slightly less bogus casts for passing pointers to ddb command
functions.
1998-07-14 12:14:58 +00:00
Bruce Evans
92c4c4eb52 Fixed printf format errors. 1998-07-11 12:07:52 +00:00
Bruce Evans
fc62ef1fb5 Fixed printf format errors. 1998-07-11 11:30:46 +00:00
Bruce Evans
ac1e407b32 Fixed printf format errors. 1998-07-11 07:46:16 +00:00
Alexander Langer
c5b75d8223 Removed no longer valid comment about swb_block being int instead of
daddr_t.

PR:		7238
Submitted by:	Stefan Eggers <seggers@semyam.dinoco.de>
1998-07-10 21:50:17 +00:00
Alexander Langer
427e99a0b8 Removed unnecessary test from if/else construct.
PR:		7233
Submitted by:	Stefan Eggers <seggers@semyam.dinoco.de>
1998-07-10 17:58:35 +00:00
Doug Rabson
711458e3e9 Don't truncate the return value of mmap to sizeof(int). 1998-07-05 11:56:52 +00:00
Julian Elischer
f7ea2f55d1 There is no such thing any more as "struct bdevsw".
There is only cdevsw (which should be renamed in a later edit to deventry
or something). cdevsw contains the union of what were in both bdevsw an
cdevsw entries.  The bdevsw[] table stiff exists and is a second pointer
to the cdevsw entry of the device. it's major is in d_bmaj rather than
d_maj. some cleanup still to happen (e.g. dsopen now gets two pointers
to the same cdevsw struct instead of one to a bdevsw and one to a cdevsw).

rawread()/rawwrite() went away as part of this though it's not strictly
the same  patch, just that it involves all the same lines in the drivers.

cdroms no longer have write() entries (they did have rawwrite (?)).
tapes no longer have support for bdev operations.

Reviewed by: Eivind Eklund and Mike Smith
	Changes suggested by eivind.
1998-07-04 22:30:26 +00:00
Julian Elischer
fd5d1124e2 VOP_STRATEGY grows an (struct vnode *) argument
as the value in b_vp is often not really what you want.
(and needs to be frobbed). more cleanups will follow this.
Reviewed by: Bruce Evans <bde@freebsd.org>
1998-07-04 20:45:42 +00:00
John-Mark Gurney
20f718132d document some VM paging options for cache sizes:
PQ_NOOPT	no coloring
PQ_LARGECACHE	used for 512k/16k cache
PQ_HUGECACHE	used for 1024k/16k cache
1998-06-30 08:01:30 +00:00
Poul-Henning Kamp
b62591052c Remove bdevsw_add(), change the only two users to use bdevsw_add_generic().
Extend cdevsw to be superset of bdevsw.
Remove non-functional bdev lkm support.
Teach wcd what the open() args mean.
1998-06-25 11:28:07 +00:00
Bruce Evans
be160d60ab Removed unused includes. 1998-06-21 18:02:50 +00:00
Bruce Evans
e5b19842ef Removed unused includes. 1998-06-21 14:53:44 +00:00
Doug Rabson
ecbb00a262 This commit fixes various 64bit portability problems required for
FreeBSD/alpha.  The most significant item is to change the command
argument to ioctl functions from int to u_long.  This change brings us
inline with various other BSD versions.  Driver writers may like to
use (__FreeBSD_version == 300003) to detect this change.

The prototype FreeBSD/alpha machdep will follow in a couple of days
time.
1998-06-07 17:13:14 +00:00
David Greenman
5994d8937d Changed the log() of "Out of mbuf clusters - increase maxusers" to a
printf() of "Out of mbuf clusters - adjust NMBCLUSTERS or increase
maxusers" so that the message is more informative and so that it will
appear in the kernel message buffer.
1998-06-05 21:48:45 +00:00
John Dyson
976f208be3 Cleanup and remove some dead code from the initialization. 1998-06-02 05:50:08 +00:00
John Dyson
e8f367853b Correct sleep priority. 1998-06-02 05:39:13 +00:00
John Dyson
b9cefc08e2 Support a 16K first level cache for 512K 2nd level. Also, add support
for 1MB 2nd level cache.
1998-05-24 04:25:27 +00:00
John Dyson
cf2819ccb8 Make flushing dirty pages work correctly on filesystems that
unexpectedly do not complete writes even with sync I/O requests.
This should help the behavior of mmaped files when using
softupdates (and perhaps in other circumstances also.)
1998-05-21 07:47:58 +00:00
Peter Wemm
4183b6b622 Make the previous commit compile.. 1998-05-19 07:13:21 +00:00
Guido van Rooij
05feb99ff1 Plug hole reported on Bugtraq: do not allow mmap with WRITE privs for
append-only and immutable files.

Obtained from: OpenBSD (partly)
1998-05-18 18:26:27 +00:00
John Dyson
bd6be9150d An important fix for proper inheritance of backing objects for
object splits.  Another excellent detective job by Tor.
Submitted by:	Tor Egge <Tor.Egge@idi.ntnu.no>
1998-05-16 23:03:20 +00:00
John Dyson
96fb8cf258 Fix the shm panic. I mistakenly used the shadow_count to keep the object
from being split, and instead added an OBJ_NOSPLIT.
1998-05-04 17:12:53 +00:00
John Dyson
cbd8ec0902 Work around some VM bugs, the worst being an overly aggressive
swap space free calculation.  More complete fixes will be forthcoming,
in a week.
1998-05-04 03:01:44 +00:00
John Dyson
86524867d1 Another minor cleanup of the split code. Make sure that pages are
busied during the entire time, so that the waits for pages being
unbusy don't make the objects inconsistant.
1998-05-02 06:36:16 +00:00
Peter Wemm
3c33646725 Seatbelts for vm_page_bits() in case a file offset is passed in rather than
the page offset.  If a large file offset was passed in, a large negative
array index could be generated which could cause page faults etc at worst
and file corruption at the least.  (Pages are allocated within file
space on page alignment boundaries, so a file offset being passed in here
is harmless to DTRT.  The case where this was happening has already been
fixed though, this is in case it happens again).

Reviewed by: dyson
1998-05-02 03:02:13 +00:00
John Dyson
e493d28abc Fix minor bug with new over used swap fix. 1998-05-01 02:25:29 +00:00
John Dyson
dda6b17151 Add a needed prototype, and fix a panic problem with the new
memory code.
1998-04-29 06:59:08 +00:00
John Dyson
c0877f103f Tighten up management of memory and swap space during map allocation,
deallocation cycles.  This should provide a measurable improvement
on swap and memory allocation on loaded systems.  It is unlikely a
complete solution.  Also, provide more map info with procfs.
Chuck Cranor spurred on this improvement.
1998-04-29 04:28:22 +00:00
John Dyson
2dbea5d2e3 Fix a pseudo-swap leak problem. This mitigates "leaks" due to
freeing partial objects, not freeing entire objects didn't
free any of it.  Simple fix to the map code.
Reviewed by:	dg
1998-04-28 05:54:47 +00:00
John Dyson
adc78b8c71 Correct copyright. 1998-04-25 04:50:03 +00:00
Bruce Evans
c1087c1324 Support compiling with `gcc -ansi'. 1998-04-15 17:47:40 +00:00
Poul-Henning Kamp
227ee8a188 Eradicate the variable "time" from the kernel, using various measures.
"time" wasn't a atomic variable, so splfoo() protection were needed
around any access to it, unless you just wanted the seconds part.

Most uses of time.tv_sec now uses the new variable time_second instead.

gettime() changed to getmicrotime(0.

Remove a couple of unneeded splfoo() protections, the new getmicrotime()
is atomic, (until Bruce sets a breakpoint in it).

A couple of places needed random data, so use read_random() instead
of mucking about with time which isn't random.

Add a new nfs_curusec() function.

Mark a couple of bogosities involving the now disappeard time variable.

Update ffs_update() to avoid the weird "== &time" checks, by fixing the
one remaining call that passwd &time as args.

Change profiling in ncr.c to use ticks instead of time.  Resolution is
the same.

Add new function "tvtohz()" to avoid the bogus "splfoo(), add time, call
hzto() which subtracts time" sequences.

Reviewed by:	bde
1998-03-30 09:56:58 +00:00
Bruce Evans
08637435f2 Moved some #includes from <sys/param.h> nearer to where they are actually
used.
1998-03-28 10:33:27 +00:00
John Dyson
bef608bd7e Some VM improvements, including elimination of alot of Sig-11
problems.  Tor Egge and others have helped with various VM bugs
lately, but don't blame him -- blame me!!!

pmap.c:
1)	Create an object for kernel page table allocations.  This
	fixes a bogus allocation method previously used for such, by
	grabbing pages from the kernel object, using bogus pindexes.
	(This was a code cleanup, and perhaps a minor system stability
	 issue.)

pmap.c:
2)	Pre-set the modify and accessed bits when prudent.  This will
	decrease bus traffic under certain circumstances.

vfs_bio.c, vfs_cluster.c:
3)	Rather than calculating the beginning virtual byte offset
	multiple times, stick the offset into the buffer header, so
	that the calculated offset can be reused.  (Long long multiplies
	are often expensive, and this is a probably unmeasurable performance
	improvement, and code cleanup.)

vfs_bio.c:
4)	Handle write recursion more intelligently (but not perfectly) so
	that it is less likely to cause a system panic, and is also
	much more robust.

vfs_bio.c:
5)	getblk incorrectly wrote out blocks that are incorrectly sized.
	The problem is fixed, and writes blocks out ONLY when B_DELWRI
	is true.

vfs_bio.c:
6)	Check that already constituted buffers have fully valid pages.  If
	not, then make sure that the B_CACHE bit is not set. (This was
	a major source of Sig-11 type problems.)

vfs_bio.c:
7)	Fix a potential system deadlock due to an incorrectly specified
	sleep priority while waiting for a buffer write operation.  The
	change that I made opens the system up to serious problems, and
	we need to examine the issue of process sleep priorities.

vfs_cluster.c, vfs_bio.c:
8)	Make clustered reads work more correctly (and more completely)
	when buffers are already constituted, but not fully valid.
	(This was another system reliability issue.)

vfs_subr.c, ffs_inode.c:
9)	Create a vtruncbuf function, which is used by filesystems that
	can truncate files.  The vinvalbuf forced a file sync type operation,
	while vtruncbuf only invalidates the buffers past the new end of file,
	and also invalidates the appropriate pages.  (This was a system reliabiliy
	and performance issue.)

10)	Modify FFS to use vtruncbuf.

vm_object.c:
11)	Make the object rundown mechanism for OBJT_VNODE type objects work
	more correctly.  Included in that fix, create pager entries for
	the OBJT_DEAD pager type, so that paging requests that might slip
	in during race conditions are properly handled.  (This was a system
	reliability issue.)

vm_page.c:
12)	Make some of the page validation routines be a little less picky
	about arguments passed to them.  Also, support page invalidation
	change the object generation count so that we handle generation
	counts a little more robustly.

vm_pageout.c:
13)	Further reduce pageout daemon activity when the system doesn't
	need help from it.  There should be no additional performance
	decrease even when the pageout daemon is running.  (This was
	a significant performance issue.)

vnode_pager.c:
14)	Teach the vnode pager to handle race conditions during vnode
	deallocations.
1998-03-16 01:56:03 +00:00
Guido van Rooij
c8bdd56b3a Fix for mmap of char devices bug as described in OpenBSD advisory of
1998/02/20
Reviewed by:	John Dyson
Submitted by:	"Cy Schubert" <cschuber@uumail.gov.bc.ca>
1998-03-12 19:36:18 +00:00
Mike Smith
86ffbd76d0 Complement diagnostic messages about missing per-FS VOP page operations,
but don't make their absence fatal.
Submitted by:	terry
1998-03-09 08:58:53 +00:00
John Dyson
be01eafd5f Quell unneeded pageout daemon activity. 1998-03-08 18:19:17 +00:00
John Dyson
6215e86272 Remove a very ill advised vm_page_protect. This was being called
for a non-managed page.  That is a big no-no.
1998-03-08 18:05:59 +00:00
John Dyson
e163e201ef Some cruft left over from my megacommit. A page rotation optimization
was a good idea, but can cause instability.  That optimization is
now removed.
1998-03-08 06:27:30 +00:00
John Dyson
edd97f3a37 Several minor fixes:
1) When freeing pages, it is a good idea to protect them off.
	   (This is probably gratuitious, but good form.)
	2) Allow collapsing pages in the backing object that are
	   PQ_CACHE.  This will improve memory utilization.
	3) Correct the collapse code so that pages that were on the
	   cache queue are moved to the inactive queue.  This is
	   done when pages are marked dirty (so that those pages
	   will be properly paged out instead of freed), so that
	   cached pages will not be paradoxically marked dirty.
1998-03-08 06:25:59 +00:00
John Dyson
8f9110f6a1 This mega-commit is meant to fix numerous interrelated problems. There
has been some bitrot and incorrect assumptions in the vfs_bio code.  These
problems have manifest themselves worse on NFS type filesystems, but can
still affect local filesystems under certain circumstances.  Most of
the problems have involved mmap consistancy, and as a side-effect broke
the vfs.ioopt code.  This code might have been committed seperately, but
almost everything is interrelated.

1)	Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that
	are fully valid.
2)	Rather than deactivating erroneously read initial (header) pages in
	kern_exec, we now free them.
3)	Fix the rundown of non-VMIO buffers that are in an inconsistent
	(missing vp) state.
4)	Fix the disassociation of pages from buffers in brelse.  The previous
	code had rotted and was faulty in a couple of important circumstances.
5)	Remove a gratuitious buffer wakeup in vfs_vmio_release.
6)	Remove a crufty and currently unused cluster mechanism for VBLK
	files in vfs_bio_awrite.  When the code is functional, I'll add back
	a cleaner version.
7)	The page busy count wakeups assocated with the buffer cache usage were
	incorrectly cleaned up in a previous commit by me.  Revert to the
	original, correct version, but with a cleaner implementation.
8)	The cluster read code now tries to keep data associated with buffers
	more aggressively (without breaking the heuristics) when it is presumed
	that the read data (buffers) will be soon needed.
9)	Change to filesystem lockmgr locks so that they use LK_NOPAUSE.  The
	delay loop waiting is not useful for filesystem locks, due to the
	length of the time intervals.
10)	Correct and clean-up spec_getpages.
11)	Implement a fully functional nfs_getpages, nfs_putpages.
12)	Fix nfs_write so that modifications are coherent with the NFS data on
	the server disk (at least as well as NFS seems to allow.)
13)	Properly support MS_INVALIDATE on NFS.
14)	Properly pass down MS_INVALIDATE to lower levels of the VM code from
	vm_map_clean.
15)	Better support the notion of pages being busy but valid, so that
	fewer in-transit waits occur.  (use p->busy more for pageouts instead
	of PG_BUSY.)  Since the page is fully valid, it is still usable for
	reads.
16)	It is possible (in error) for cached pages to be busy.  Make the
	page allocation code handle that case correctly.  (It should probably
	be a printf or panic, but I want the system to handle coding errors
	robustly.  I'll probably add a printf.)
17)	Correct the design and usage of vm_page_sleep.  It didn't handle
	consistancy problems very well, so make the design a little less
	lofty.  After vm_page_sleep, if it ever blocked, it is still important
	to relookup the page (if the object generation count changed), and
	verify it's status (always.)
18)	In vm_pageout.c, vm_pageout_clean had rotted, so clean that up.
19)	Push the page busy for writes and VM_PROT_READ into vm_pageout_flush.
20)	Fix vm_pager_put_pages and it's descendents to support an int flag
	instead of a boolean, so that we can pass down the invalidate bit.
1998-03-07 21:37:31 +00:00
John Dyson
4866e0856c Make vm_fault much cleaner by removing the evil macro inlines, and
put alot of it's context into a data structure.  This allows
significant shortening of its codepath, and will significantly
decrease it's cache footprint.

Also, add some stats to vmmeter.  Note that you'll have to
rebuild/recompile vmstat, systat, etc... Otherwise, you'll
get "very interesting" paging stats.
1998-03-07 20:45:47 +00:00
Peter Dufault
917e476dad Reviewed by: msmith, bde long ago
POSIX.4 headers and sysctl variables.  Nothing should change
unless POSIX4 is defined or _POSIX_VERSION is set to 199309.
1998-03-04 10:27:00 +00:00
John Dyson
ffc82b0a70 1) Use a more consistent page wait methodology.
2)	Do not unnecessarily force page blocking when paging
	pages out.
3)	Further improve swap pager performance and correctness,
	including fixing the paging in progress deadlock (except
	in severe I/O error conditions.)
4)	Enable vfs_ioopt=1 as a default.
5)	Fix and enable the page prezeroing in SMP mode.

All in all, SMP systems especially should show a significant
improvement in "snappyness."
1998-03-01 04:18:54 +00:00
Mike Smith
ce75f2c365 In the author's words:
These diffs implement the first stage of a VOP_{GET|PUT}PAGES pushdown
for local media FS's.

See ffs_putpages in /sys/ufs/ufs/ufs_readwrite.c for implementation
details for generic *_{get|put}pages for local media FS's.  Support
is trivial to add for any FS that formerly relied on the default
behaviour of the vnode_pager in in EOPNOTSUPP cases (just copy the
ffs_getpages() code for the FS in question's *_{get|put}pages).

Obviously, it would be better if each local media FS implemented a
more optimal method, instead of calling an exported interface from
the /sys/vm/vnode_pager.c, but this is a necessary first step in
getting the FS's to a point where they can be supplied with better
implementations on a case-by-case basis.

Obviously, the cd9660_putpages() can be rather trivial (since it
is a read-only FS type 8-)).

A slight (temporary) modification is made to print a diagnostic message
in the case where the underlying filesystem attempts to engage in the
previous behaviour.  Failure is likely to be ungraceful.

Submitted by:	terry@freebsd.org (Terry Lambert)
1998-02-26 06:39:59 +00:00
John Dyson
660957521c Fix page prezeroing for SMP, and fix some potential paging-in-progress
hangs.  The paging-in-progress diagnosis was a result of Tor Egge's
excellent detective work.
Submitted by:	Partially from Tor Egge.
1998-02-25 03:56:15 +00:00
John Dyson
a15403de7b Correct some severe VM tuning problems for small systems (<=16MB), and
improve tuning on larger systems.  (A couple of the VM tuning params for
small systems were so badly chosen that the system could hang under load.)

The broken tuning was originaly my fault.
1998-02-24 10:16:23 +00:00
John Dyson
e47ed70b0f Significantly improve the efficiency of the swap pager, which appears to
have declined due to code-rot over time.  The swap pager rundown code
has been clean-up, and unneeded wakeups removed.  Lots of splbio's
are changed to splvm's.  Also, set the dynamic tunables for the
pageout daemon to be more sane for larger systems (thereby decreasing
the daemon overheadla.)
1998-02-23 08:22:48 +00:00
John Dyson
d9bed5bee1 Try to dynamically size the VM_KMEM_SIZE (but is still able to be overridden
in a way identically as before.)  I had problems with the system properly
handling the number of vnodes when there is alot of system memory, and the
default VM_KMEM_SIZE.  Two new options "VM_KMEM_SIZE_SCALE" and
"VM_KMEM_SIZE_MAX" have been added to support better auto-sizing for systems
with greater than 128MB.

Add some accouting for vm_zone memory allocations, and provide properly
for vm_zone allocations out of the kmem_map.  Also move the vm_zone
allocation stats to the VM OID tree from the KERN OID tree.
1998-02-23 07:42:43 +00:00
Bruce Evans
39e4376ba7 Removed unused #includes. 1998-02-20 13:11:54 +00:00
Mike Smith
eed5086e1a Move the 'sw' device off block major #1, which is now occupied by 'wfd'. 1998-02-19 12:15:06 +00:00
Eivind Eklund
303b270b0a Staticize. 1998-02-09 06:11:36 +00:00
John Dyson
157ac55f97 Fix an argument to vn_lock. It appears that alot of the vn_lock usage
is a bit undisciplined, and should be checked carefully.
1998-02-08 14:55:13 +00:00
Eivind Eklund
0b08f5f737 Back out DIAGNOSTIC changes. 1998-02-06 12:14:30 +00:00
John Dyson
95461b450d 1) Start using a cleaner and more consistant page allocator instead
of the various ad-hoc schemes.
2)	When bringing in UPAGES, the pmap code needs to do another vm_page_lookup.
3)	When appropriate, set the PG_A or PG_M bits a-priori to both avoid some
	processor errata, and to minimize redundant processor updating of page
	tables.
4)	Modify pmap_protect so that it can only remove permissions (as it
	originally supported.)  The additional capability is not needed.
5)	Streamline read-only to read-write page mappings.
6)	For pmap_copy_page, don't enable write mapping for source page.
7)	Correct and clean-up pmap_incore.
8)	Cluster initial kern_exec pagin.
9)	Removal of some minor lint from kern_malloc.
10)	Correct some ioopt code.
11)	Remove some dead code from the MI swapout routine.
12)	Correct vm_object_deallocate (to remove backing_object ref.)
13)	Fix dead object handling, that had problems under heavy memory load.
14)	Add minor vm_page_lookup improvements.
15)	Some pages are not in objects, and make sure that the vm_page.c can
	properly support such pages.
16)	Add some more page deficit handling.
17)	Some minor code readability improvements.
1998-02-05 03:32:49 +00:00
Eivind Eklund
47cfdb166d Turn DIAGNOSTIC into a new-style option. 1998-02-04 22:34:03 +00:00
Bruce Evans
e7a5897899 Added #include of <sys/queue.h> so that this file is more "self"-sufficent. 1998-02-03 22:19:35 +00:00
John Dyson
e736cd05cb This fix should help the panic problems in -current. There
were some errors in "interval" management.  Due to the
clustering mechanism, the code is necessarily complex and
error prone.
1998-02-03 00:50:36 +00:00
Bruce Evans
8bcc577e92 Forward declare more structs that are used in prototypes here - don't
depend on <sys/types.h> forward declaring common ones.
1998-02-01 20:08:39 +00:00
John Dyson
1f13bdaa97 Fix a performance problem caused by an earlier commit. 1998-02-01 02:00:20 +00:00
John Dyson
c15541e7a7 contigalloc doesn't place the allocated page(s) into an object, and
now this breaks vm_page_wire (due to wired page accounting per object.)

This should fix a problem as described by Donald Maddox.
1998-01-31 20:30:18 +00:00
John Dyson
eaf13dd73a Change the busy page mgmt, so that when pages are freed, they
MUST be PG_BUSY.  It is bogus to free a page that isn't busy,
because it is in a state of being "unavailable" when being
freed.  The additional advantage is that the page_remove code
has a better cross-check that the page should be busy and
unavailable for other use.  There were some minor problems
with the collapse code, and this plugs those subtile "holes."

Also, the vfs_bio code wasn't checking correctly for PG_BUSY
pages.  I am going to develop a more consistant scheme for
grabbing pages, busy or otherwise.  For now, we are stuck
with the current morass.
1998-01-31 11:56:53 +00:00
Eivind Eklund
f71bb3a160 Turn NSWAPDEV into a new-style option. 1998-01-25 04:13:25 +00:00
Eivind Eklund
7b778b5e61 Make all file-system (MFS, FFS, NFS, LFS, DEVFS) related option new-style.
This introduce an xxxFS_BOOT for each of the rootable filesystems.
(Presently not required, but encouraged to allow a smooth move of option *FS
to opt_dontuse.h later.)

LFS is temporarily disabled, and will be re-enabled tomorrow.
1998-01-24 02:54:56 +00:00
John Dyson
50ce7ff499 Add better support for larger I/O clusters, including larger physical
I/O.  The support is not mature yet, and some of the underlying implementation
needs help.  However, support does exist for IDE devices now.
1998-01-24 02:01:46 +00:00
John Dyson
2d8acc0f4a VM level code cleanups.
1)	Start using TSM.
	Struct procs continue to point to upages structure, after being freed.
	Struct vmspace continues to point to pte object and kva space for kstack.
	u_map is now superfluous.
2)	vm_map's don't need to be reference counted.  They always exist either
	in the kernel or in a vmspace.  The vmspaces are managed by reference
	counts.
3)	Remove the "wired" vm_map nonsense.
4)	No need to keep a cache of kernel stack kva's.
5)	Get rid of strange looking ++var, and change to var++.
6)	Change more data structures to use our "zone" allocator.  Added
	struct proc, struct vmspace and struct vnode.  This saves a significant
	amount of kva space and physical memory.  Additionally, this enables
	TSM for the zone managed memory.
7)	Keep ioopt disabled for now.
8)	Remove the now bogus "single use" map concept.
9)	Use generation counts or id's for data structures residing in TSM, where
	it allows us to avoid unneeded restart overhead during traversals, where
	blocking might occur.
10)	Account better for memory deficits, so the pageout daemon will be able
	to make enough memory available (experimental.)
11)	Fix some vnode locking problems. (From Tor, I think.)
12)	Add a check in ufs_lookup, to avoid lots of unneeded calls to bcmp.
	(experimental.)
13)	Significantly shrink, cleanup, and make slightly faster the vm_fault.c
	code.  Use generation counts, get rid of unneded collpase operations,
	and clean up the cluster code.
14)	Make vm_zone more suitable for TSM.

This commit is partially as a result of discussions and contributions from
other people, including DG, Tor Egge, PHK, and probably others that I
have forgotten to attribute (so let me know, if I forgot.)

This is not the infamous, final cleanup of the vnode stuff, but a necessary
step.  Vnode mgmt should be correct, but things might still change, and
there is still some missing stuff (like ioopt, and physical backing of
non-merged cache files, debugging of layering concepts.)
1998-01-22 17:30:44 +00:00
John Dyson
480ba2f552 Allow gdb to work again. 1998-01-21 12:18:00 +00:00
John Dyson
4722175765 Tie up some loose ends in vnode/object management. Remove an unneeded
config option in pmap.  Fix a problem with faulting in pages.  Clean-up
some loose ends in swap pager memory management.

The system should be much more stable, but all subtile bugs aren't fixed yet.
1998-01-17 09:17:02 +00:00
John Dyson
925a3a419a Fix some vnode management problems, and better mgmt of vnode free list.
Fix the UIO optimization code.
Fix an assumption in vm_map_insert regarding allocation of swap pagers.
Fix an spl problem in the collapse handling in vm_object_deallocate.
When pages are freed from vnode objects, and the criteria for putting
the associated vnode onto the free list is reached, either put the
vnode onto the list, or put it onto an interrupt safe version of the
list, for further transfer onto the actual free list.
Some minor syntax changes changing pre-decs, pre-incs to post versions.
Remove a bogus timeout (that I added for debugging) from vn_lock.

PHK will likely still have problems with the vnode list management, and
so do I, but it is better than it was.
1998-01-12 01:46:33 +00:00
John Dyson
bf27292b35 Turn off the VTEXT flag when an object is no longer referenced, so
that an executable that is no longer running can be written to.  Also,
clear the OBJ_OPT flag more often, when appropriate.
1998-01-07 03:12:19 +00:00
John Dyson
95e5e988e0 Make our v_usecount vnode reference count work identically to the
original BSD code.  The association between the vnode and the vm_object
no longer includes reference counts.  The major difference is that
vm_object's are no longer freed gratuitiously from the vnode, and so
once an object is created for the vnode, it will last as long as the
vnode does.

When a vnode object reference count is incremented, then the underlying
vnode reference count is incremented also.  The two "objects" are now
more intimately related, and so the interactions are now much less
complex.

When vnodes are now normally placed onto the free queue with an object still
attached.  The rundown of the object happens at vnode rundown time, and
happens with exactly the same filesystem semantics of the original VFS
code.  There is absolutely no need for vnode_pager_uncache and other
travesties like that anymore.

A side-effect of these changes is that SMP locking should be much simpler,
the I/O copyin/copyout optimizations work, NFS should be more ponderable,
and further work on layered filesystems should be less frustrating, because
of the totally coherent management of the vnode objects and vnodes.

Please be careful with your system while running this code, but I would
greatly appreciate feedback as soon a reasonably possible.
1998-01-06 05:26:17 +00:00
Alexander Langer
651bb81717 caddr_t --> void * 1997-12-31 02:35:29 +00:00
John Dyson
60f8d46448 Fix the decl of vfs_ioopt, allow LFS to compile again, fix a minor problem
with the object cache removal.
1997-12-29 01:03:55 +00:00
John Dyson
2be70f79f6 Lots of improvements, including restructring the caching and management
of vnodes and objects.  There are some metadata performance improvements
that come along with this.  There are also a few prototypes added when
the need is noticed.  Changes include:

1) Cleaning up vref, vget.
2) Removal of the object cache.
3) Nuke vnode_pager_uncache and friends, because they aren't needed anymore.
4) Correct some missing LK_RETRY's in vn_lock.
5) Correct the page range in the code for msync.

Be gentle, and please give me feedback asap.
1997-12-29 00:25:11 +00:00
John Dyson
6d1756a948 The ioopt code is still buggy, but wasn't fully disabled. 1997-12-25 20:55:15 +00:00
John Dyson
b44e4b7a2b Support running with inadequate swap space. Additionally, the code
will complain with a suggestion of increasing it.
1997-12-24 15:05:25 +00:00
John Dyson
998d8cd662 Improve my copyright. 1997-12-22 11:48:13 +00:00
John Dyson
c2e11a039d Change bogus usage of btoc to atop. The incorrect usage of btoc was
pointed out by bde.
1997-12-19 15:31:13 +00:00
John Dyson
1efb74fbcc Some performance improvements, and code cleanups (including changing our
expensive OFF_TO_IDX to btoc whenever possible.)
1997-12-19 09:03:37 +00:00
Eivind Eklund
5591b823d1 Make COMPAT_43 and COMPAT_SUNOS new-style options. 1997-12-16 17:40:42 +00:00
John Dyson
bd28588799 Fix a recursive kernel_map lock problem in vm_zone allocator.
PR:	5298
1997-12-15 05:16:09 +00:00
John Dyson
b0d8408e21 Slight improvement to the vm_zone stats output. Also, some other superficial
cleanups.
1997-12-14 05:17:44 +00:00
John Dyson
8256655132 After one of my analysis passes to evaluate methods for SMP TLB mgmt, I
noticed some major enhancements available for UP situations.  The number
of UP TLB flushes is decreased much more than significantly with these
changes.  Since a TLB flush appears to cost minimally approx 80 cycles,
this is a "nice" enhancement, equiv to eliminating between 40 and 160
instructions per TLB flush.

Changes include making sure that kernel threads all use the same PTD,
and eliminate unneeded PTD switches at context switch time.
1997-12-14 02:11:23 +00:00
John Dyson
3a2dc656bc Fix the prototype for swapout_procs();
Submitted by:	dima@best.net
1997-12-11 02:10:55 +00:00
John Dyson
ceb0cf87e8 Support an optional, sysctl enabled feature of idle process swapout. This
is apparently useful for large shell systems, or systems  with long running
idle processes.  To enable the feature:

	sysctl -w vm.swap_idle_enabled=1

Please note that some of the other vm sysctl variables have been renamed
to be more accurate.
Submitted by:	Much of it from Matt Dillon <dillon@best.net>
1997-12-06 02:23:36 +00:00
Bruce Evans
1cd52ec333 Don't include <sys/lock.h> in headers when only `struct simplelock' is
required.  Fixed everything that depended on the pollution.
1997-12-05 19:55:52 +00:00
John Dyson
70111b9016 Add new (very useful) tunable for pageout daemon. The flag changes
the maximum pageout rate:

sysctl -w vm.vm_maxlaunder=n

 1 < n < inf.

If paging heavily on large systems, it is likely that a performance
improvement can be achieved by increasing the parameter.  On a large
system, the parm is 32, but numbers as large as 128 can make a big
difference.  If paging is expensive, you might try decreasing the
number to 1-8.
1997-12-05 05:41:06 +00:00
John Dyson
12ac6a1dbb Support applications that need to resist or deny use of swap space.
sysctl -w vm.defer_swap_pageouts=1
	Causes the system to resist the use of swap space.  In low memory
	conditions, performance will decrease.
sysctl -w vm.disable_swap_pageouts=1
	Causes the system to mostly disable the use of swap space.  In
	low memory conditions, the system will likely start killing
	processes.
1997-12-04 19:00:56 +00:00
Poul-Henning Kamp
ab3f746966 In all such uses of struct buf: 's/b_un.b_addr/b_data/g' 1997-12-02 21:07:20 +00:00
Bruce Evans
b672aa4ba6 Removed all traces of P_IDLEPROC. It was tested but never set. 1997-11-24 15:15:33 +00:00
Bruce Evans
5270ecea67 Don't #define max() to get a version that works with vm_ooffset's.
Just use qmax().

This should be fixed more generally using overloaded functions.
1997-11-24 15:03:13 +00:00
Bruce Evans
fe0dd4acd3 Removed unused #include of <sys/malloc.h>. This file now uses only
zalloc().  Many more cases like this are probably obscured by not
including <vm/zone.h> explicitly (it is spammed into <sys/malloc.h>).
1997-11-18 11:02:19 +00:00
Tor Egge
b44959ce49 Simplify map entries during user page wire and user page unwire operations in
vm_map_user_pageable().

Check return value of vm_map_lock_upgrade() during a user page wire operation.
1997-11-14 23:42:10 +00:00
Poul-Henning Kamp
0abc78a697 Rename some local variables to avoid shadowing other local variables.
Found by: -Wshadow
1997-11-07 09:21:01 +00:00
Poul-Henning Kamp
4a11ca4e29 Remove a bunch of variables which were unused both in GENERIC and LINT.
Found by:	-Wunused
1997-11-07 08:53:44 +00:00
Poul-Henning Kamp
cb226aaa62 Move the "retval" (3rd) parameter from all syscall functions and put
it in struct proc instead.

This fixes a boatload of compiler warning, and removes a lot of cruft
from the sources.

I have not removed the /*ARGSUSED*/, they will require some looking at.

libkvm, ps and other userland struct proc frobbing programs will need
recompiled.
1997-11-06 19:29:57 +00:00
John Dyson
0aa8918597 Fix the "missing page" problem. Also, improve the performance of page
allocation in common cases.
1997-11-06 08:35:50 +00:00
Bruce Evans
55b211e3af Removed unused #includes. 1997-10-28 15:59:26 +00:00
John Dyson
5985940e79 Support garbage collecting the pmap pv entries. The management doesn't
happen until the system would have nearly failed anyway, so no signficant
overhead is added.  This helps large systems with lots of processes.
1997-10-25 02:41:56 +00:00
John Dyson
0a80f406b3 Decrease the initial allocation for the zone allocations. 1997-10-24 23:41:04 +00:00
Poul-Henning Kamp
a1c995b626 Last major round (Unless Bruce thinks of somthing :-) of malloc changes.
Distribute all but the most fundamental malloc types.  This time I also
remembered the trick to making things static:  Put "static" in front of
them.

A couple of finer points by:	bde
1997-10-12 20:26:33 +00:00
Poul-Henning Kamp
55166637cd Distribute and statizice a lot of the malloc M_* types.
Substantial input from:	bde
1997-10-11 18:31:40 +00:00
Peter Wemm
3820ec1d4d Attempt to fix the previous fix to the contigmalloc1 prototype.
struct malloc_type isn't defined in all cases (eg: from ddb), and the line
wrapping was very badly mangled.
1997-10-11 10:39:19 +00:00
Poul-Henning Kamp
f0d45e6aae Fix contigmalloc() and contigmalloc1() arguments. 1997-10-10 18:18:47 +00:00
John Dyson
7e00649986 Improve management of pages moving from the inactive to active queue. Additionally,
add some much needed comments.
1997-10-06 02:48:16 +00:00
John Dyson
e7b0208f61 Relax the vnode locking for read only operations. 1997-10-06 02:38:30 +00:00
Peter Wemm
af866d9a23 Fix some style(9) and formatting problems. tabsize 4 formatting doesn't
look too great with 'more' etc.

Approved by: dyson (with a minor grumble :-)
1997-09-21 11:41:12 +00:00
John Dyson
99448ed11d Change the M_NAMEI allocations to use the zone allocator. This change
plus the previous changes to use the zone allocator decrease the useage
of malloc by half.  The Zone allocator will be upgradeable to be able
to use per CPU-pools, and has more intelligent usage of SPLs.  Additionally,
it has reasonable stats gathering capabilities, while making most calls
inline.
1997-09-21 04:24:27 +00:00
Peter Wemm
35b8b2ddab Update select -> poll in drivers. 1997-09-14 03:19:42 +00:00
Peter Wemm
f8ddc1e209 Print correct function name in panics 1997-09-13 15:04:52 +00:00
Jonathan Lemon
987b847efc Do not consider VM_PROT_OVERRIDE_WRITE to be part of the protection
entry when handling a fault.  This is set by procfs whenever it wants
to write to a page, as a means of overriding `r-x COW' entries, but
causes failures in the `rwx' case.

Submitted by:	 bde
1997-09-12 15:58:47 +00:00
Bruce Evans
41fadeeb28 Removed yet more vestiges of config-time swap configuration and/or
cleaned up nearby cruft.
1997-09-07 16:21:11 +00:00
Bruce Evans
79624e2147 Removed unused #includes. 1997-09-01 03:17:34 +00:00
Bruce Evans
4de628dec4 Some staticized variables were still declared to be extern. 1997-09-01 02:55:50 +00:00
Bruce Evans
dfeca1b8ae Print a device number in hex instead of decimal. 1997-09-01 02:28:32 +00:00
Poul-Henning Kamp
a051452ae2 Change the 0xdeadb hack to a flag called VDOOMED.
Introduce VFREE which indicates that vnode is on freelist.
Rename vholdrele() to vdrop().
Create vfree() and vbusy() to add/delete vnode from freelist.
Add vfree()/vbusy() to keep (v_holdcnt != 0 || v_usecount != 0)
  vnodes off the freelist.
Generalize vhold()/v_holdcnt to mean "do not recycle".
Fix reassignbuf()s lack of use of vhold().
Use vhold() instead of checking v_cache_src list.
Remove vtouch(), the vnodes are always vget'ed soon enough
  after for it to have any measuable effect.
Add sysctl debug.freevnodes to keep track of things.
Move cache_purge() up in getnewvnodes to avoid race.
Decrement v_usecount after VOP_INACTIVE(), put a vhold() on
  it during VOP_INACTIVE()
Unmacroize vhold()/vdrop()
Print out VDOOMED and VFREE flags (XXX: should use %b)

Reviewed by:		dyson
1997-08-31 07:32:39 +00:00
Peter Wemm
54f42e4ba0 Allow non-page aligned file offset mmap's, providing that the system is
allowed to choose the address, or that the MAP_FIXED address has the same
remainder when modulo PAGE_SIZE as the file offset.  Apparently this is
posix1003.1b specified behavior.  SVR4 and the other *BSD's allow it too.
It costs us nothing to support and means we don't get EINVAL on some mmap
code that works perfectly elsewhere.

Obtained from: NetBSD
1997-08-30 18:50:06 +00:00