OBJ_ONEMAPPING in the case where an object is extended by an
additional vm_map_entry must be allocated.
In vm_object_madvise(), remove calll to vm_page_cache() in MADV_FREE
case in order to avoid a page fault on page reuse. However, we still
mark the page as clean and destroy any swap backing store.
Submitted by: Alan Cox <alc@cs.rice.edu>
attempt to optimize forks but were essentially given-up on due to
problems and replaced with an explicit dup of the vm_map_entry structure.
Prior to the removal, they were entirely unused.
The vm_map_insert()/vm_object_coalesce() optimization has been extended
to include OBJT_SWAP objects as well as OBJT_DEFAULT objects. This is
possible because it costs nothing to extend an OBJT_SWAP object with
the new swapper. We can't do this with the old swapper. The old swapper
used a linear array that would have had to have been reallocated, costing
time as well as a potential low-memory deadlock.
in vm_map_simplify_entry. Basically, once you've verified that
the objects in the adjacent vm_map_entry's are the same, either
NULL or the same vm_object, there's no point in checking that the
objects have the same behavior.
Obtained from: Alan Cox <alc@cs.rice.edu>
Checked by: "Richard Seaman, Jr." <dick@tar.com>
Fix the following problem:
As the code stands now, growing any stack, and not just the process's
main stack, modifies vm->vm_ssize. This is inconsistent with the code
earlier in the same procedure.
This changes the definitions of a few items so that structures are the
same whether or not the option itself is enabled. This allows
people to enable and disable the option without recompilng the world.
As the author says:
|I ran into a problem pulling out the VM_STACK option. I was aware of this
|when I first did the work, but then forgot about it. The VM_STACK stuff
|has some code changes in the i386 branch. There need to be corresponding
|changes in the alpha branch before it can come out completely.
what is done:
|
|1) Pull the VM_STACK option out of the header files it appears in. This
|really shouldn't affect anything that executes with or without the rest
|of the VM_STACK patches. The vm_map_entry will then always have one
|extra element (avail_ssize). It just won't be used if the VM_STACK
|option is not turned on.
|
|I've also pulled the option out of vm_map.c. This shouldn't harm anything,
|since the routines that are enabled as a result are not called unless
|the VM_STACK option is enabled elsewhere.
|
|2) Add what appears to be appropriate code the the alpha branch, still
|protected behind the VM_STACK switch. I don't have an alpha machine,
|so we would need to get some testers with alpha machines to try it out.
|
|Once there is some testing, we can consider making the change permanent
|for both i386 and alpha.
|
[..]
|
|Once the alpha code is adequately tested, we can pull VM_STACK out
|everywhere.
|
Submitted by: "Richard Seaman, Jr." <dick@tar.com>
about conversions of objects to OBJT_SWAP, it is done automatically
now.
Replaced manually inserted code with inline calls for busy waiting on
pages, which also incidently fixes a potential PG_BUSY race due to
the code not running at splvm().
vm_objects no longer have a paging_offset field ( see vm/vm_object.c )
changes to the VM system to support the new swapper, VM bug
fixes, several VM optimizations, and some additional revamping of the
VM code. The specific bug fixes will be documented with additional
forced commits. This commit is somewhat rough in regards to code
cleanup issues.
Reviewed by: "John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>
downward growing stacks more general.
Add (but don't activate) code to use the new stack facility
when running threads, (specifically the linux threads support).
This allows people to use both linux compiled linuxthreads, and also the
native FreeBSD linux-threads port.
The code is conditional on VM_STACK. Not using this will
produce the old heavily tested system.
Submitted by: Richard Seaman <dick@tar.com>
1) The vnode pager wasn't properly tracking the file size due to
"size" being page rounded in some cases and not in others.
This sometimes resulted in corrupted files. First noticed by
Terry Lambert.
Fixed by changing the "size" pager_alloc parameter to be a 64bit
byte value (as opposed to a 32bit page index) and changing the
pagers and their callers to deal with this properly.
2) Fixed a bogus type cast in round_page() and trunc_page() that
caused some 64bit offsets and sizes to be scrambled. Removing
the cast required adding casts at a few dozen callers.
There may be problems with other bogus casts in close-by
macros. A quick check seemed to indicate that those were okay,
however.
expected. This bug caused builds of Modula-3 to fail in mysterious
ways on SMP kernels. More precisely, such builds failed on systems
with kern.fast_vfork equal to 0, the default and only supported
value for SMP kernels.
PR: kern/7468
Submitted by: tegge (Tor Egge)
Add some overflow checks to read/write (from bde).
Change all modifications to vm_page::flags, vm_page::busy, vm_object::flags
and vm_object::paging_in_progress to use operations which are not
interruptable.
Reviewed by: Bruce Evans <bde@zeta.org.au>
managed to avoid corruption of this variable by luck (the compiler used a
memory read-modify-write instruction which wasn't interruptable) but other
architectures cannot.
With this change, I am now able to 'make buildworld' on the alpha (sfx: the
crowd goes wild...)
casting them to long, etc. Fixed some nearby printf bogons (sign
errors not warned about by gcc, and style bugs, but not truncation
of vm_ooffset_t's).
Use slightly less bogus casts for passing pointers to ddb command
functions.
FreeBSD/alpha. The most significant item is to change the command
argument to ioctl functions from int to u_long. This change brings us
inline with various other BSD versions. Driver writers may like to
use (__FreeBSD_version == 300003) to detect this change.
The prototype FreeBSD/alpha machdep will follow in a couple of days
time.
unexpectedly do not complete writes even with sync I/O requests.
This should help the behavior of mmaped files when using
softupdates (and perhaps in other circumstances also.)
deallocation cycles. This should provide a measurable improvement
on swap and memory allocation on loaded systems. It is unlikely a
complete solution. Also, provide more map info with procfs.
Chuck Cranor spurred on this improvement.
has been some bitrot and incorrect assumptions in the vfs_bio code. These
problems have manifest themselves worse on NFS type filesystems, but can
still affect local filesystems under certain circumstances. Most of
the problems have involved mmap consistancy, and as a side-effect broke
the vfs.ioopt code. This code might have been committed seperately, but
almost everything is interrelated.
1) Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that
are fully valid.
2) Rather than deactivating erroneously read initial (header) pages in
kern_exec, we now free them.
3) Fix the rundown of non-VMIO buffers that are in an inconsistent
(missing vp) state.
4) Fix the disassociation of pages from buffers in brelse. The previous
code had rotted and was faulty in a couple of important circumstances.
5) Remove a gratuitious buffer wakeup in vfs_vmio_release.
6) Remove a crufty and currently unused cluster mechanism for VBLK
files in vfs_bio_awrite. When the code is functional, I'll add back
a cleaner version.
7) The page busy count wakeups assocated with the buffer cache usage were
incorrectly cleaned up in a previous commit by me. Revert to the
original, correct version, but with a cleaner implementation.
8) The cluster read code now tries to keep data associated with buffers
more aggressively (without breaking the heuristics) when it is presumed
that the read data (buffers) will be soon needed.
9) Change to filesystem lockmgr locks so that they use LK_NOPAUSE. The
delay loop waiting is not useful for filesystem locks, due to the
length of the time intervals.
10) Correct and clean-up spec_getpages.
11) Implement a fully functional nfs_getpages, nfs_putpages.
12) Fix nfs_write so that modifications are coherent with the NFS data on
the server disk (at least as well as NFS seems to allow.)
13) Properly support MS_INVALIDATE on NFS.
14) Properly pass down MS_INVALIDATE to lower levels of the VM code from
vm_map_clean.
15) Better support the notion of pages being busy but valid, so that
fewer in-transit waits occur. (use p->busy more for pageouts instead
of PG_BUSY.) Since the page is fully valid, it is still usable for
reads.
16) It is possible (in error) for cached pages to be busy. Make the
page allocation code handle that case correctly. (It should probably
be a printf or panic, but I want the system to handle coding errors
robustly. I'll probably add a printf.)
17) Correct the design and usage of vm_page_sleep. It didn't handle
consistancy problems very well, so make the design a little less
lofty. After vm_page_sleep, if it ever blocked, it is still important
to relookup the page (if the object generation count changed), and
verify it's status (always.)
18) In vm_pageout.c, vm_pageout_clean had rotted, so clean that up.
19) Push the page busy for writes and VM_PROT_READ into vm_pageout_flush.
20) Fix vm_pager_put_pages and it's descendents to support an int flag
instead of a boolean, so that we can pass down the invalidate bit.
have declined due to code-rot over time. The swap pager rundown code
has been clean-up, and unneeded wakeups removed. Lots of splbio's
are changed to splvm's. Also, set the dynamic tunables for the
pageout daemon to be more sane for larger systems (thereby decreasing
the daemon overheadla.)
of the various ad-hoc schemes.
2) When bringing in UPAGES, the pmap code needs to do another vm_page_lookup.
3) When appropriate, set the PG_A or PG_M bits a-priori to both avoid some
processor errata, and to minimize redundant processor updating of page
tables.
4) Modify pmap_protect so that it can only remove permissions (as it
originally supported.) The additional capability is not needed.
5) Streamline read-only to read-write page mappings.
6) For pmap_copy_page, don't enable write mapping for source page.
7) Correct and clean-up pmap_incore.
8) Cluster initial kern_exec pagin.
9) Removal of some minor lint from kern_malloc.
10) Correct some ioopt code.
11) Remove some dead code from the MI swapout routine.
12) Correct vm_object_deallocate (to remove backing_object ref.)
13) Fix dead object handling, that had problems under heavy memory load.
14) Add minor vm_page_lookup improvements.
15) Some pages are not in objects, and make sure that the vm_page.c can
properly support such pages.
16) Add some more page deficit handling.
17) Some minor code readability improvements.
MUST be PG_BUSY. It is bogus to free a page that isn't busy,
because it is in a state of being "unavailable" when being
freed. The additional advantage is that the page_remove code
has a better cross-check that the page should be busy and
unavailable for other use. There were some minor problems
with the collapse code, and this plugs those subtile "holes."
Also, the vfs_bio code wasn't checking correctly for PG_BUSY
pages. I am going to develop a more consistant scheme for
grabbing pages, busy or otherwise. For now, we are stuck
with the current morass.
1) Start using TSM.
Struct procs continue to point to upages structure, after being freed.
Struct vmspace continues to point to pte object and kva space for kstack.
u_map is now superfluous.
2) vm_map's don't need to be reference counted. They always exist either
in the kernel or in a vmspace. The vmspaces are managed by reference
counts.
3) Remove the "wired" vm_map nonsense.
4) No need to keep a cache of kernel stack kva's.
5) Get rid of strange looking ++var, and change to var++.
6) Change more data structures to use our "zone" allocator. Added
struct proc, struct vmspace and struct vnode. This saves a significant
amount of kva space and physical memory. Additionally, this enables
TSM for the zone managed memory.
7) Keep ioopt disabled for now.
8) Remove the now bogus "single use" map concept.
9) Use generation counts or id's for data structures residing in TSM, where
it allows us to avoid unneeded restart overhead during traversals, where
blocking might occur.
10) Account better for memory deficits, so the pageout daemon will be able
to make enough memory available (experimental.)
11) Fix some vnode locking problems. (From Tor, I think.)
12) Add a check in ufs_lookup, to avoid lots of unneeded calls to bcmp.
(experimental.)
13) Significantly shrink, cleanup, and make slightly faster the vm_fault.c
code. Use generation counts, get rid of unneded collpase operations,
and clean up the cluster code.
14) Make vm_zone more suitable for TSM.
This commit is partially as a result of discussions and contributions from
other people, including DG, Tor Egge, PHK, and probably others that I
have forgotten to attribute (so let me know, if I forgot.)
This is not the infamous, final cleanup of the vnode stuff, but a necessary
step. Vnode mgmt should be correct, but things might still change, and
there is still some missing stuff (like ioopt, and physical backing of
non-merged cache files, debugging of layering concepts.)
config option in pmap. Fix a problem with faulting in pages. Clean-up
some loose ends in swap pager memory management.
The system should be much more stable, but all subtile bugs aren't fixed yet.
Fix the UIO optimization code.
Fix an assumption in vm_map_insert regarding allocation of swap pagers.
Fix an spl problem in the collapse handling in vm_object_deallocate.
When pages are freed from vnode objects, and the criteria for putting
the associated vnode onto the free list is reached, either put the
vnode onto the list, or put it onto an interrupt safe version of the
list, for further transfer onto the actual free list.
Some minor syntax changes changing pre-decs, pre-incs to post versions.
Remove a bogus timeout (that I added for debugging) from vn_lock.
PHK will likely still have problems with the vnode list management, and
so do I, but it is better than it was.
original BSD code. The association between the vnode and the vm_object
no longer includes reference counts. The major difference is that
vm_object's are no longer freed gratuitiously from the vnode, and so
once an object is created for the vnode, it will last as long as the
vnode does.
When a vnode object reference count is incremented, then the underlying
vnode reference count is incremented also. The two "objects" are now
more intimately related, and so the interactions are now much less
complex.
When vnodes are now normally placed onto the free queue with an object still
attached. The rundown of the object happens at vnode rundown time, and
happens with exactly the same filesystem semantics of the original VFS
code. There is absolutely no need for vnode_pager_uncache and other
travesties like that anymore.
A side-effect of these changes is that SMP locking should be much simpler,
the I/O copyin/copyout optimizations work, NFS should be more ponderable,
and further work on layered filesystems should be less frustrating, because
of the totally coherent management of the vnode objects and vnodes.
Please be careful with your system while running this code, but I would
greatly appreciate feedback as soon a reasonably possible.
of vnodes and objects. There are some metadata performance improvements
that come along with this. There are also a few prototypes added when
the need is noticed. Changes include:
1) Cleaning up vref, vget.
2) Removal of the object cache.
3) Nuke vnode_pager_uncache and friends, because they aren't needed anymore.
4) Correct some missing LK_RETRY's in vn_lock.
5) Correct the page range in the code for msync.
Be gentle, and please give me feedback asap.