Commit Graph

118 Commits

Author SHA1 Message Date
John Dyson
8f9110f6a1 This mega-commit is meant to fix numerous interrelated problems. There
has been some bitrot and incorrect assumptions in the vfs_bio code.  These
problems have manifest themselves worse on NFS type filesystems, but can
still affect local filesystems under certain circumstances.  Most of
the problems have involved mmap consistancy, and as a side-effect broke
the vfs.ioopt code.  This code might have been committed seperately, but
almost everything is interrelated.

1)	Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that
	are fully valid.
2)	Rather than deactivating erroneously read initial (header) pages in
	kern_exec, we now free them.
3)	Fix the rundown of non-VMIO buffers that are in an inconsistent
	(missing vp) state.
4)	Fix the disassociation of pages from buffers in brelse.  The previous
	code had rotted and was faulty in a couple of important circumstances.
5)	Remove a gratuitious buffer wakeup in vfs_vmio_release.
6)	Remove a crufty and currently unused cluster mechanism for VBLK
	files in vfs_bio_awrite.  When the code is functional, I'll add back
	a cleaner version.
7)	The page busy count wakeups assocated with the buffer cache usage were
	incorrectly cleaned up in a previous commit by me.  Revert to the
	original, correct version, but with a cleaner implementation.
8)	The cluster read code now tries to keep data associated with buffers
	more aggressively (without breaking the heuristics) when it is presumed
	that the read data (buffers) will be soon needed.
9)	Change to filesystem lockmgr locks so that they use LK_NOPAUSE.  The
	delay loop waiting is not useful for filesystem locks, due to the
	length of the time intervals.
10)	Correct and clean-up spec_getpages.
11)	Implement a fully functional nfs_getpages, nfs_putpages.
12)	Fix nfs_write so that modifications are coherent with the NFS data on
	the server disk (at least as well as NFS seems to allow.)
13)	Properly support MS_INVALIDATE on NFS.
14)	Properly pass down MS_INVALIDATE to lower levels of the VM code from
	vm_map_clean.
15)	Better support the notion of pages being busy but valid, so that
	fewer in-transit waits occur.  (use p->busy more for pageouts instead
	of PG_BUSY.)  Since the page is fully valid, it is still usable for
	reads.
16)	It is possible (in error) for cached pages to be busy.  Make the
	page allocation code handle that case correctly.  (It should probably
	be a printf or panic, but I want the system to handle coding errors
	robustly.  I'll probably add a printf.)
17)	Correct the design and usage of vm_page_sleep.  It didn't handle
	consistancy problems very well, so make the design a little less
	lofty.  After vm_page_sleep, if it ever blocked, it is still important
	to relookup the page (if the object generation count changed), and
	verify it's status (always.)
18)	In vm_pageout.c, vm_pageout_clean had rotted, so clean that up.
19)	Push the page busy for writes and VM_PROT_READ into vm_pageout_flush.
20)	Fix vm_pager_put_pages and it's descendents to support an int flag
	instead of a boolean, so that we can pass down the invalidate bit.
1998-03-07 21:37:31 +00:00
John Dyson
660957521c Fix page prezeroing for SMP, and fix some potential paging-in-progress
hangs.  The paging-in-progress diagnosis was a result of Tor Egge's
excellent detective work.
Submitted by:	Partially from Tor Egge.
1998-02-25 03:56:15 +00:00
John Dyson
e47ed70b0f Significantly improve the efficiency of the swap pager, which appears to
have declined due to code-rot over time.  The swap pager rundown code
has been clean-up, and unneeded wakeups removed.  Lots of splbio's
are changed to splvm's.  Also, set the dynamic tunables for the
pageout daemon to be more sane for larger systems (thereby decreasing
the daemon overheadla.)
1998-02-23 08:22:48 +00:00
Bruce Evans
39e4376ba7 Removed unused #includes. 1998-02-20 13:11:54 +00:00
Eivind Eklund
303b270b0a Staticize. 1998-02-09 06:11:36 +00:00
John Dyson
157ac55f97 Fix an argument to vn_lock. It appears that alot of the vn_lock usage
is a bit undisciplined, and should be checked carefully.
1998-02-08 14:55:13 +00:00
Eivind Eklund
0b08f5f737 Back out DIAGNOSTIC changes. 1998-02-06 12:14:30 +00:00
John Dyson
95461b450d 1) Start using a cleaner and more consistant page allocator instead
of the various ad-hoc schemes.
2)	When bringing in UPAGES, the pmap code needs to do another vm_page_lookup.
3)	When appropriate, set the PG_A or PG_M bits a-priori to both avoid some
	processor errata, and to minimize redundant processor updating of page
	tables.
4)	Modify pmap_protect so that it can only remove permissions (as it
	originally supported.)  The additional capability is not needed.
5)	Streamline read-only to read-write page mappings.
6)	For pmap_copy_page, don't enable write mapping for source page.
7)	Correct and clean-up pmap_incore.
8)	Cluster initial kern_exec pagin.
9)	Removal of some minor lint from kern_malloc.
10)	Correct some ioopt code.
11)	Remove some dead code from the MI swapout routine.
12)	Correct vm_object_deallocate (to remove backing_object ref.)
13)	Fix dead object handling, that had problems under heavy memory load.
14)	Add minor vm_page_lookup improvements.
15)	Some pages are not in objects, and make sure that the vm_page.c can
	properly support such pages.
16)	Add some more page deficit handling.
17)	Some minor code readability improvements.
1998-02-05 03:32:49 +00:00
Eivind Eklund
47cfdb166d Turn DIAGNOSTIC into a new-style option. 1998-02-04 22:34:03 +00:00
John Dyson
eaf13dd73a Change the busy page mgmt, so that when pages are freed, they
MUST be PG_BUSY.  It is bogus to free a page that isn't busy,
because it is in a state of being "unavailable" when being
freed.  The additional advantage is that the page_remove code
has a better cross-check that the page should be busy and
unavailable for other use.  There were some minor problems
with the collapse code, and this plugs those subtile "holes."

Also, the vfs_bio code wasn't checking correctly for PG_BUSY
pages.  I am going to develop a more consistant scheme for
grabbing pages, busy or otherwise.  For now, we are stuck
with the current morass.
1998-01-31 11:56:53 +00:00
John Dyson
2d8acc0f4a VM level code cleanups.
1)	Start using TSM.
	Struct procs continue to point to upages structure, after being freed.
	Struct vmspace continues to point to pte object and kva space for kstack.
	u_map is now superfluous.
2)	vm_map's don't need to be reference counted.  They always exist either
	in the kernel or in a vmspace.  The vmspaces are managed by reference
	counts.
3)	Remove the "wired" vm_map nonsense.
4)	No need to keep a cache of kernel stack kva's.
5)	Get rid of strange looking ++var, and change to var++.
6)	Change more data structures to use our "zone" allocator.  Added
	struct proc, struct vmspace and struct vnode.  This saves a significant
	amount of kva space and physical memory.  Additionally, this enables
	TSM for the zone managed memory.
7)	Keep ioopt disabled for now.
8)	Remove the now bogus "single use" map concept.
9)	Use generation counts or id's for data structures residing in TSM, where
	it allows us to avoid unneeded restart overhead during traversals, where
	blocking might occur.
10)	Account better for memory deficits, so the pageout daemon will be able
	to make enough memory available (experimental.)
11)	Fix some vnode locking problems. (From Tor, I think.)
12)	Add a check in ufs_lookup, to avoid lots of unneeded calls to bcmp.
	(experimental.)
13)	Significantly shrink, cleanup, and make slightly faster the vm_fault.c
	code.  Use generation counts, get rid of unneded collpase operations,
	and clean up the cluster code.
14)	Make vm_zone more suitable for TSM.

This commit is partially as a result of discussions and contributions from
other people, including DG, Tor Egge, PHK, and probably others that I
have forgotten to attribute (so let me know, if I forgot.)

This is not the infamous, final cleanup of the vnode stuff, but a necessary
step.  Vnode mgmt should be correct, but things might still change, and
there is still some missing stuff (like ioopt, and physical backing of
non-merged cache files, debugging of layering concepts.)
1998-01-22 17:30:44 +00:00
John Dyson
480ba2f552 Allow gdb to work again. 1998-01-21 12:18:00 +00:00
John Dyson
4722175765 Tie up some loose ends in vnode/object management. Remove an unneeded
config option in pmap.  Fix a problem with faulting in pages.  Clean-up
some loose ends in swap pager memory management.

The system should be much more stable, but all subtile bugs aren't fixed yet.
1998-01-17 09:17:02 +00:00
John Dyson
925a3a419a Fix some vnode management problems, and better mgmt of vnode free list.
Fix the UIO optimization code.
Fix an assumption in vm_map_insert regarding allocation of swap pagers.
Fix an spl problem in the collapse handling in vm_object_deallocate.
When pages are freed from vnode objects, and the criteria for putting
the associated vnode onto the free list is reached, either put the
vnode onto the list, or put it onto an interrupt safe version of the
list, for further transfer onto the actual free list.
Some minor syntax changes changing pre-decs, pre-incs to post versions.
Remove a bogus timeout (that I added for debugging) from vn_lock.

PHK will likely still have problems with the vnode list management, and
so do I, but it is better than it was.
1998-01-12 01:46:33 +00:00
John Dyson
95e5e988e0 Make our v_usecount vnode reference count work identically to the
original BSD code.  The association between the vnode and the vm_object
no longer includes reference counts.  The major difference is that
vm_object's are no longer freed gratuitiously from the vnode, and so
once an object is created for the vnode, it will last as long as the
vnode does.

When a vnode object reference count is incremented, then the underlying
vnode reference count is incremented also.  The two "objects" are now
more intimately related, and so the interactions are now much less
complex.

When vnodes are now normally placed onto the free queue with an object still
attached.  The rundown of the object happens at vnode rundown time, and
happens with exactly the same filesystem semantics of the original VFS
code.  There is absolutely no need for vnode_pager_uncache and other
travesties like that anymore.

A side-effect of these changes is that SMP locking should be much simpler,
the I/O copyin/copyout optimizations work, NFS should be more ponderable,
and further work on layered filesystems should be less frustrating, because
of the totally coherent management of the vnode objects and vnodes.

Please be careful with your system while running this code, but I would
greatly appreciate feedback as soon a reasonably possible.
1998-01-06 05:26:17 +00:00
John Dyson
60f8d46448 Fix the decl of vfs_ioopt, allow LFS to compile again, fix a minor problem
with the object cache removal.
1997-12-29 01:03:55 +00:00
John Dyson
2be70f79f6 Lots of improvements, including restructring the caching and management
of vnodes and objects.  There are some metadata performance improvements
that come along with this.  There are also a few prototypes added when
the need is noticed.  Changes include:

1) Cleaning up vref, vget.
2) Removal of the object cache.
3) Nuke vnode_pager_uncache and friends, because they aren't needed anymore.
4) Correct some missing LK_RETRY's in vn_lock.
5) Correct the page range in the code for msync.

Be gentle, and please give me feedback asap.
1997-12-29 00:25:11 +00:00
John Dyson
6d1756a948 The ioopt code is still buggy, but wasn't fully disabled. 1997-12-25 20:55:15 +00:00
John Dyson
c2e11a039d Change bogus usage of btoc to atop. The incorrect usage of btoc was
pointed out by bde.
1997-12-19 15:31:13 +00:00
John Dyson
1efb74fbcc Some performance improvements, and code cleanups (including changing our
expensive OFF_TO_IDX to btoc whenever possible.)
1997-12-19 09:03:37 +00:00
Bruce Evans
5270ecea67 Don't #define max() to get a version that works with vm_ooffset's.
Just use qmax().

This should be fixed more generally using overloaded functions.
1997-11-24 15:03:13 +00:00
Tor Egge
b44959ce49 Simplify map entries during user page wire and user page unwire operations in
vm_map_user_pageable().

Check return value of vm_map_lock_upgrade() during a user page wire operation.
1997-11-14 23:42:10 +00:00
Poul-Henning Kamp
4a11ca4e29 Remove a bunch of variables which were unused both in GENERIC and LINT.
Found by:	-Wunused
1997-11-07 08:53:44 +00:00
Bruce Evans
55b211e3af Removed unused #includes. 1997-10-28 15:59:26 +00:00
John Dyson
0a80f406b3 Decrease the initial allocation for the zone allocations. 1997-10-24 23:41:04 +00:00
Poul-Henning Kamp
a1c995b626 Last major round (Unless Bruce thinks of somthing :-) of malloc changes.
Distribute all but the most fundamental malloc types.  This time I also
remembered the trick to making things static:  Put "static" in front of
them.

A couple of finer points by:	bde
1997-10-12 20:26:33 +00:00
Poul-Henning Kamp
55166637cd Distribute and statizice a lot of the malloc M_* types.
Substantial input from:	bde
1997-10-11 18:31:40 +00:00
John Dyson
99448ed11d Change the M_NAMEI allocations to use the zone allocator. This change
plus the previous changes to use the zone allocator decrease the useage
of malloc by half.  The Zone allocator will be upgradeable to be able
to use per CPU-pools, and has more intelligent usage of SPLs.  Additionally,
it has reasonable stats gathering capabilities, while making most calls
inline.
1997-09-21 04:24:27 +00:00
Jonathan Lemon
987b847efc Do not consider VM_PROT_OVERRIDE_WRITE to be part of the protection
entry when handling a fault.  This is set by procfs whenever it wants
to write to a page, as a means of overriding `r-x COW' entries, but
causes failures in the `rwx' case.

Submitted by:	 bde
1997-09-12 15:58:47 +00:00
Bruce Evans
79624e2147 Removed unused #includes. 1997-09-01 03:17:34 +00:00
Bruce Evans
b9dcd593ff Fixed type mismatches for functions with args of type vm_prot_t and/or
vm_inherit_t.  These types are smaller than ints, so the prototypes
should have used the promoted type (int) to match the old-style function
definitions.  They use just vm_prot_t and/or vm_inherit_t.  This depends
on gcc features to work.  I fixed the definitions since this is easiest.
The correct fix may be to change the small types to u_int, to optimize
for time instead of space.
1997-08-25 22:15:31 +00:00
Steve Passe
7cbfd031b6 Added includes of smp.h for SMP.
This eliminates a bazillion warnings about implicit s_lock & friends.
1997-08-18 03:29:21 +00:00
John Dyson
03e9c6c101 Fix kern_lock so that it will work. Additionally, clean-up some of the
VM systems usage of the kernel lock (lockmgr) code.  This is a first
pass implementation, and is expected to evolve as needed.  The API
for the lock manager code has not changed, but the underlying implementation
has changed significantly.  This change should not materially affect
our current SMP or UP code without non-standard parameters being used.
1997-08-18 02:06:35 +00:00
John Dyson
507b10b48c Add exposure of some vm_zone allocation stats by sysctl. Also, change
the initialization parameters of some zones in VM map.  This contains
only optimizations and not bugfixes.
1997-08-06 04:58:05 +00:00
John Dyson
ba9be04c72 Fixed the commit botch that was causing crashes soon after system
startup.  Due to the error, the initialization of the zone for
pv_entries was missing.  The system should be usable again.
1997-08-05 23:03:24 +00:00
John Dyson
0d65e566b9 Another attempt at cleaning up the new memory allocator. 1997-08-05 22:24:31 +00:00
John Dyson
b79933ebfa Fix some bugs, document vm_zone better. Add copyright to vm_zone.h. Use
the new zone code in pmap.c so that we can get rid of the ugly ad-hoc
allocations in pmap.c.
1997-08-05 22:07:27 +00:00
John Dyson
f2adc8bb27 Modify pmap to use our new memory allocator. Also, change the vm_map_entry
allocations to be interrupt safe.
1997-08-05 01:32:52 +00:00
John Dyson
3075778b63 Get rid of the ad-hoc memory allocator for vm_map_entries, in lieu of
a simple, clean zone type allocator.  This new allocator will also be
used for machine dependent pmap PV entries.
1997-08-05 00:02:08 +00:00
John Dyson
11cccda1de Fix a very subtile problem that causes unnessary numbers of objects backing
a single logical object.
Submitted by:	Alan Cox <alc@cs.rice.edu>
1997-07-27 04:44:12 +00:00
Tor Egge
208d433777 Don't try upgrading an existing exclusive lock in vm_map_user_pageable.
This should close PR kern/3180.
Also remove a bogus unconditional call to vm_map_unlock_read in
vm_map_lookup.
1997-06-23 21:51:03 +00:00
John Dyson
dbc806e731 Fix a reference problem with maps. Only appears to manifest itself when
sharing address spaces.
1997-06-15 23:33:52 +00:00
John Dyson
5856e12e69 Fully implement vfork. Vfork is now much much faster than even our
fork. (On my machine, fork is about 240usecs, vfork is 78usecs.)

Implement rfork(!RFPROC !RFMEM), which allows a thread to divorce its memory
	from the other threads of a group.

Implement rfork(!RFPROC RFCFDG), which closes all file descriptors, eliminating
	possible existing shares with other threads/processes.

Implement rfork(!RFPROC RFFDG), which divorces the file descriptors for a
	thread from the rest of the group.

Fix the case where a thread does an exec.  It is almost nonsense for a thread
	to modify the other threads address space by an exec, so we
	now automatically divorce the address space before modifying it.
1997-04-13 01:48:35 +00:00
Peter Wemm
a2a1c95c10 The biggie: Get rid of the UPAGES from the top of the per-process address
space. (!)

Have each process use the kernel stack and pcb in the kvm space.  Since
the stacks are at a different address, we cannot copy the stack at fork()
and allow the child to return up through the function call tree to return
to user mode - create a new execution context and have the new process
begin executing from cpu_switch() and go to user mode directly.
In theory this should speed up fork a bit.

Context switch the tss_esp0 pointer in the common tss.  This is a lot
simpler since than swithching the gdt[GPROC0_SEL].sd.sd_base pointer
to each process's tss since the esp0 pointer is a 32 bit pointer, and the
sd_base setting is split into three different bit sections at non-aligned
boundaries and requires a lot of twiddling to reset.

The 8K of memory at the top of the process space is now empty, and unmapped
(and unmappable, it's higher than VM_MAXUSER_ADDRESS).

Simplity the pmap code to manage process contexts, we no longer have to
double map the UPAGES, this simplifies and should measuably speed up fork().

The following parts came from John Dyson:

Set PG_G on the UPAGES that are now in kernel context, and invalidate
them when swapping them out.

Move the upages object (upobj) from the vmspace to the proc structure.

Now that the UPAGES (pcb and kernel stack) are out of user space, make
rfork(..RFMEM..) do what was intended by sharing the vmspace
entirely via reference counting rather than simply inheriting the mappings.
1997-04-07 07:16:06 +00:00
John Dyson
7d78abc9d9 Make vm_map_protect be more complete about map simplification. This
is useful when a process changes it's page range protections very
much.
Submitted by:	Alan Cox <alc@cs.rice.edu>
1997-04-06 03:04:31 +00:00
John Dyson
a04c970a7a Fix the gdb executable modify problem. Thanks to the detective work
by Alan Cox <alc@cs.rice.edu>, and his description of the problem.

The bug was primarily in procfs_mem, but the mistake likely happened
due to the lack of vm system support for the operation.  I added
better support for selective marking of page dirty flags so that
vm_map_pageable(wiring) will not cause this problem again.

The code in procfs_mem is now less bogus (but maybe still a little
so.)
1997-04-06 02:29:45 +00:00
Peter Wemm
6875d25465 Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.
1997-02-22 09:48:43 +00:00
John Dyson
996c772f58 This is the kernel Lite/2 commit. There are some requisite userland
changes, so don't expect to be able to run the kernel as-is (very well)
without the appropriate Lite/2 userland changes.

The system boots and can mount UFS filesystems.

Untested: ext2fs, msdosfs, NFS
Known problems: Incorrect Berkeley ID strings in some files.
		Mount_std mounts will not work until the getfsent
		library routine is changed.

Reviewed by:	various people
Submitted by:	Jeffery Hsu <hsu@freebsd.org>
1997-02-10 02:22:35 +00:00
John Dyson
5069bf5747 Another fix to inheriting shared segments. Do the copy on write
thing if needed.
Submitted by:	Alan Cox <alc@cs.rice.edu>
1997-01-31 04:10:41 +00:00
John Dyson
fed9a9032e Fix two problems where a NULL object is dereferenced. One problem
was in the VM_INHERIT_SHARE case of vmspace_fork, and also in vm_map_madvise.
Submitted by:	Alan Cox <alc@cs.rice.edu>
1997-01-22 01:34:48 +00:00