VM systems usage of the kernel lock (lockmgr) code. This is a first
pass implementation, and is expected to evolve as needed. The API
for the lock manager code has not changed, but the underlying implementation
has changed significantly. This change should not materially affect
our current SMP or UP code without non-standard parameters being used.
the system is out of memory. The daemon does a minimal amount of work that
increases as the system becomes more likely to run out of memory and page in/out.
The default tuning is fairly low in background CPU usage, and sysctl variables
have been added to enable flexable operation. This is an experimental feature
that will likely be changed and improved over time.
of the kernel, and also most of the dynamic parts of the kernel. Additionally,
4MB pages will be allocated for display buffers as appropriate (only.)
The 4MB support for SMP isn't complete, but doesn't interfere with operation
either.
flag wasn't being respected during vref(), et. al. Note that this
isn't the eventual fix for the locking problem. Fine grained SMP
in the VM and VFS code will require (lots) more work.
and b_validend. The changes to vfs_bio.c are a bit ugly but hopefully
can be tidied up later by a slight redesign.
PR: kern/2573, kern/2754, kern/3046 (possibly)
Reviewed by: dyson
the pageout daemon wasn't always being waken up appropriately when the
(cache + free) queues were depleted.
Submitted by: David S. Miller <davem@jenolan.rutgers.edu>
There are various options documented in i386/conf/LINT, there is more to
come over the next few days.
The kernel should run pretty much "as before" without the options to
activate SMP mode.
There are a handful of known "loose ends" that need to be fixed, but
have been put off since the SMP kernel is in a moderately good condition
at the moment.
This commit is the result of the tinkering and testing over the last 14
months by many people. A special thanks to Steve Passe for implementing
the APIC code!
fork. (On my machine, fork is about 240usecs, vfork is 78usecs.)
Implement rfork(!RFPROC !RFMEM), which allows a thread to divorce its memory
from the other threads of a group.
Implement rfork(!RFPROC RFCFDG), which closes all file descriptors, eliminating
possible existing shares with other threads/processes.
Implement rfork(!RFPROC RFFDG), which divorces the file descriptors for a
thread from the rest of the group.
Fix the case where a thread does an exec. It is almost nonsense for a thread
to modify the other threads address space by an exec, so we
now automatically divorce the address space before modifying it.
space. (!)
Have each process use the kernel stack and pcb in the kvm space. Since
the stacks are at a different address, we cannot copy the stack at fork()
and allow the child to return up through the function call tree to return
to user mode - create a new execution context and have the new process
begin executing from cpu_switch() and go to user mode directly.
In theory this should speed up fork a bit.
Context switch the tss_esp0 pointer in the common tss. This is a lot
simpler since than swithching the gdt[GPROC0_SEL].sd.sd_base pointer
to each process's tss since the esp0 pointer is a 32 bit pointer, and the
sd_base setting is split into three different bit sections at non-aligned
boundaries and requires a lot of twiddling to reset.
The 8K of memory at the top of the process space is now empty, and unmapped
(and unmappable, it's higher than VM_MAXUSER_ADDRESS).
Simplity the pmap code to manage process contexts, we no longer have to
double map the UPAGES, this simplifies and should measuably speed up fork().
The following parts came from John Dyson:
Set PG_G on the UPAGES that are now in kernel context, and invalidate
them when swapping them out.
Move the upages object (upobj) from the vmspace to the proc structure.
Now that the UPAGES (pcb and kernel stack) are out of user space, make
rfork(..RFMEM..) do what was intended by sharing the vmspace
entirely via reference counting rather than simply inheriting the mappings.
The typo was detected once apon a time with the -Wunused compile option.
The result was that a block of code for implementing
madvise(.. MADV_SEQUENTIAL..) behavior was "dead" and unused, probably
negating the effect of activating the option.
Reviewed by: dyson
by Alan Cox <alc@cs.rice.edu>, and his description of the problem.
The bug was primarily in procfs_mem, but the mistake likely happened
due to the lack of vm system support for the operation. I added
better support for selective marking of page dirty flags so that
vm_map_pageable(wiring) will not cause this problem again.
The code in procfs_mem is now less bogus (but maybe still a little
so.)
centric rather than VM-centric to fix a problem with errors not being
detectable when the header is read.
Killed exech_map as a result of these changes.
There appears to be no performance difference with this change.
changes, so don't expect to be able to run the kernel as-is (very well)
without the appropriate Lite/2 userland changes.
The system boots and can mount UFS filesystems.
Untested: ext2fs, msdosfs, NFS
Known problems: Incorrect Berkeley ID strings in some files.
Mount_std mounts will not work until the getfsent
library routine is changed.
Reviewed by: various people
Submitted by: Jeffery Hsu <hsu@freebsd.org>
the page to be unbusy, and it caused some algorithmic problems
as a result. There were some other problems with it also, so
this is a general cleanup of the code.
Submitted by: Douglas Crosher <dtc@scrooge.ee.swin.oz.au> and myself.
when allocating memory for network buffers at interrupt time. This is due
to inadequate checking for the new mcl_map. Fixed by merging mb_map and
mcl_map into a single mb_map.
Reviewed by: wollman
This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.
Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.
anymore with the "full" collapse fix that we added about 1yr ago!!! The
code has been removed by optioning it out for now, so we can put it back
in ASAP if any problems are found.
and objects. Previously, "fancy" memory management techniques
such as that used by the M3 RTS would have the tendancy of chopping
up processes allocated memory into lots of little objects. Alan
has come up with some improvements to migtigate the sitution to
the point where even the M3 RTS only has one object for bss and
it's managed memory (when running CVSUP.) (There are still cases where the
situation isn't improved when the system pages -- but this is much much
better for the vast majority of cases.) The system will now be able
to much more effectively merge map entries.
Submitted by: Alan Cox <alc@cs.rice.edu>
also implies VM_PROT_EXEC. We support it that way for now,
since the break system call by default gives VM_PROT_ALL. Now
we have a better chance of coalesing map entries when mixing
mmap/break type operations. This was contributing to excessive
numbers of map entries on the modula-3 runtime system. The
problem is still not "solved", but the situation makes more
sense.
Eventually, when we work on architectures where VM_PROT_READ
is orthogonal to VM_PROT_EXEC, we will have to visit this
issue carefully (esp. regarding security issues.)
maps. Additionally, eliminate the map->hint distortion
associated with useracc. That may/may-not be the "right"
thing to do -- but time will tell.
Submitted by: Partially by Alan Cox <alc@cs.rice.edu>
vm_map_simplify and vm_map_simplify_entry. Make vm_map_simplify_entry
handle wired maps so that we can get rid of vm_map_simplify. Modify
the callers of vm_map_simplify to properly use vm_map_simplify_entry.
Submitted by: Alan Cox <alc@cs.rice.edu>
has the negative effect of disabling some map optimizations. This
patch defers the creation of the object until it needs to be at fault time.
Submitted by: Alan Cox <alc@cs.rice.edu>
that we do allow mlock to span unallocated regions (of course, not
mlocking them.) We also allow mlocking of RO regions (which the old
code couldn't.) The restriction there is that once a RO region is
wired (mlocked), it cannot be debugged (or EVER written to.)
Under normal usage, the new mlock code will be a significant improvement
over our old stuff.
that map entries are coalesced when appropriate. Also, conditionalize
some code that is currently not used in vm_map_insert. This mod
has been added to eliminate unnecessary map entries in buffer map.
Additionally, there were some cases where map coalescing could be done
when it shouldn't. That problem has been resolved.
scheme. Additionally, add the capability for checking for unexpected
kernel page faults. The maximum amount of kva space for buffers hasn't
been decreased from where it is, but it will now be possible to do so.
This scheme manages the kva space similar to the buffers themselves. If
there isn't enough kva space because of usage or fragementation, buffers
will be reclaimed until a buffer allocation is successful. This scheme
should be very resistant to fragmentation problems until/if the LFS code
is fixed and uses the bogus buffer locking scheme -- but a 'fixed' LFS
is not likely to use such a scheme.
Now there should be NO problem allocating buffers up to MAXPHYS.
problem of allocating contiguous buffer memory in general, but
make it much more likely to work at boot-up time. The best
chance for an LKM-type load of a sound driver is immediately
after the mount of the root filesystem.
This appears to work for a 64K allocation on an 8MB system.
it breaks in the DEVFS_ROOT case. replicate a bit too much of bdevvp()
in here to circumvent the problem. The real problem is the magic that
lives in bdevsw[1].
the one place that depended on it. wakeup() is now prototyped in
<sys/systm.h> so that it is normally visible.
Added nested include of <sys/queue.h> in <vm/vm_object.h>. The queue
macros are a more fundamental prerequisite for <vm/vm_object.h> than
the wakeup prototype and previously happened to be included by
namespace pollution from <sys/proc.h> or elsewhere.
64K. The change has essentially neutral effect on those machines with
little or no cache, and has a positive effect on "normal" machines
with 256K or more cache.
`show vmopag', `show page' and `show pageq'. Moved all vm ddb stuff
to the ends of the vm source files.
Changed printf() to db_printf(), `indent' to db_indent, and iprintf()
to db_iprintf() in ddb commands. Moved db_indent and db_iprintf()
from vm to ddb.
vm_page.c:
Don't use __pure. Staticized.
db_output.c:
Reduced page width from 80 to 79 to inhibit double spacing for long
lines (there are still some problems if words are printed across
column 79).
The default level works with minimal overhead, but one can also enable
full, efficient use of a 512K cache. (Parameters can be generated
to support arbitrary cache sizes also.)
The interface into the "VMIO" system has changed to be more consistant
and robust. Essentially, it is now no longer necessary to call vn_open
to get merged VM/Buffer cache operation, and exceptional conditions
such as merged operation of VBLK devices is simpler and more correct.
This code corrects a potentially large set of problems including the
problems with ktrace output and loaded systems, file create/deletes,
etc.
Most of the changes to NFS are cosmetic and name changes, eliminating
a layer of subroutine calls. The direct calls to vput/vrele have
been re-instituted for better cross platform compatibility.
Reviewed by: davidg
problem with the 'shell scripts' was found, but there was a 'strange'
problem found with a 486 laptop that we could not find. This commit
backs the code back to 25-jul, and will be re-entered after the snapshot
in smaller (more easily tested) chunks.
is little or no reason to create a swap pager for small mmap's. The
vm_map_insert code will automatically create a swap pager if the object
becomes too large. This fix, per a request from phk.
performance issues.
1) The pmap module has had too many inlines, and so the
object file is simply bigger than it needs to be.
Some common code is also merged into subroutines.
2) Removal of some *evil* PHYS_TO_VM_PAGE macro calls.
Unfortunately, a few have needed to be added also.
The removal caused the need for more vm_page_lookups.
I added lookup hints to minimize the need for the
page table lookup operations.
3) Removal of some bogus performance improvements, that
mostly made the code more complex (tracking individual
page table page updates unnecessarily). Those improvements
actually hurt 386 processors perf (not that people who
worry about perf use 386 processors anymore :-)).
4) Changed pv queue manipulations/structures to be TAILQ's.
5) The pv queue code has had some performance problems since
day one. Some significant scalability issues are resolved
by threading the pv entries from the pmap AND the physical
address instead of just the physical address. This makes
certain pmap operations run much faster. This does
not affect most micro-benchmarks, but should help loaded system
performance *significantly*. DG helped and came up with most
of the solution for this one.
6) Most if not all pmap bit operations follow the pattern:
pmap_test_bit();
pmap_clear_bit();
That made for twice the necessary pv list traversal. The
pmap interface now supports only pmap_tc_bit type operations:
pmap_[test/clear]_modified, pmap_[test/clear]_referenced.
Additionally, the modified routine now takes a vm_page_t arg
instead of a phys address. This eliminates a PHYS_TO_VM_PAGE
operation.
7) Several rewrites of routines that contain redundant code to
use common routines, so that there is a greater likelihood of
keeping the cache footprint smaller.