1) A new mechanism has been added to prevent pages from being paged
out called "vm_page_hold". Similar to vm_page_wire, but
much lower overhead.
2) Scheduling algorithm has been changed to improve interactive
performance.
3) Paging algorithm improved.
4) Some vnode and swap pager bugs fixed.
Eliminates vm_fault overhead on process startup and
mmap referenced data for in-memory pages.
(process startup time using in-memory segments *much* faster)
2) Even more efficient pmap code. Code partially cleaned up.
More comments yet to follow.
(generally more efficient pte management)
3) Pageout clustering ( in addition to the FreeBSD V1.1 pagein
clustering.)
(much faster paging performance on non-write behind disk
subsystems, slightly faster performance on other systems.)
4) Slightly changed vm_pageout code for more efficiency and
better statistics. Also, resist swapout a little more.
(less likely to pageout a recently used page)
5) Slight improvement to the page table page trap efficiency.
(generally faster system VM fault performance)
6) Defer creation of unnamed anonymous regions pager until needed.
(speeds up shared memory bss creation)
7) Remove possible deadlock from swap_pager initialization.
8) Enhanced procfs to provide "vminfo" about vm objects and user
pmaps.
9) Increased MCLSHIFT/MCLBYTES from 2K to 4K to improve net &
socket performance and to prepare for things to come.
John Dyson
dyson@implode.root.com
David Greenman
davidg@root.com
the device close routine. This works because the device close calls
the line discipline close (which only flushes the output buffers) and
the ttyclose() routine, which does little of nothing except screw with
the session and process group fields (which is what was causing all
the problems).
* Removed pmap_is_wired
* added extra cli/sti protection in idle (swtch.s)
* slight code improvement in trap.c
* added lots of comments
* improved paging and other algorithms in VM system
set improves performance and fixes the following problems (description
from John Dyson):
1. Growing swap space problem in both static usage and
in situations with lots of fork/execs in heavy paging
situations.
2. Sparse swap space allocation (internal fragmentation.)
3. General swap_pager slowness.
Additionally, the new swap_pager also provides hooks for multi-page
I/O that is currently being developed (in early testing phases.)
Problem #1 is a result of a problem where objects cannot be collapsed
once a pager has been allocated for them. This problem has been solved
by relaxing the restriction by allowing the pages contained in a shadow
object's pager be copied to the parent object's pager. The copy is
afforded by manipulating pointers to the disk blocks on the swap space.
Since an improved swap_pager has already been developed with the data
structures to support the copy operation, this new swap_pager has been
introduced. Also, shadow object bypass in the collapse code has been
enhanced to support checking for pages on disk. The vm_pageout daemon
has also been modified to defer creation of an object's pager when the
object's shadow is paging. This allows efficient immediate collapsing
of a shadow into a parent object under many circumstances without the
creation of an intermediate pager.
Problem #2 is solved by the allocation data structures and algorithms
in the new swap_pager. Additionally, a newer version of this new swap_pager
is being tested that permits multiple page I/O and mitigation of the
fragmentation problems associated with allocation of large contiguous blocks
of swap space.
Problem #3 is addressed by better algorithms and a fix of a couple of bugs
in the swap_pager. Additionally, this new pager has a growth path allowing
multi-page inputs from disk. Approximately 50% performance improvement can
be expected under certain circumstances when using this pager in the standard
single page mode.
(Actually, I've seen more like twice the speed in my tests. -DLG)
a binary link-kit. Make all non-optional options (pagers, procfs) standard,
and update LINT to reflect new symtab requirements.
NB: -Wtraditional will henceforth be forgotten. This editing pass was
primarily intended to detect any constructions where the old code might
have been relying on traditional C semantics or syntax. These were all
fixed, and the result of fixing some of them means that -Wall is now a
realistic possibility within a few weeks.
John Dyson to make it reliably work under FreeBSD.
2) Added and enabled PROCFS in the GENERICxx and LINT kernels.
3) New execve() from me. Still work to be done here, but this version
works well and is needed before other changes can be made. For
a description of the design behind this, see freebsd-arch or
ask me.
4) Rewrote stack fault code; made user stack VM grow as needed rather
than all up front; improves performance a little and reduces
process memory requirements.
5) Incorporated fix from Gene Stark to fault/wire a user page table
page to fix a problem in copyout. This is a temporary fix and
is not appropriate for pageable page tables. For a description
of the problem, see Gene's post to the freebsd-hackers mailing
list.
6) Tighten up vm_page struct to reduce memory requirements for it. ifdef
pager page lock code as it's not being used currently.
7) Introduced new element to vmspace struct - vm_minsaddr; initial
(minimum) stack address. Compliment to vm_maxsaddr.
8) Added a panic if the allocation for process u-pages fails.
9) Improve performance and accuracy of kernel profiling by putting in
a little inline assembly instead of spl().
10) Made serial console with sio driver work. Still has problems with
serial input, but is almost useable.
11) Added -Bstatic to SYSTEM_LD in Makefile.i386 so that kernels will
build properly with the new ld.
The following patch adds the addr argument to signal handlers.
The kernel with the patch is no more and no less in compliance or in
violation of POSIX and ANSI C than the kernel before the patch.
The added functionality this addr argument provides is quite useful. It
enables an entire class of algorithms which use mprotect to trace memory
references. Beside garbage collectors, I have heard of this technique being
applied to debuggers and profilers. The only benchmarking I've performed is
using akcl to compile maxima: without the kernel patch, it takes 7 hours to
compile maxima, while with stratified garbage collection, it only takes 50
minutes.
Basically, I can't think of a reason not to add the addr argument and there
is a compelling need for it.
If you find the patch acceptable, please let me know so I can send my
FreeBSD akcl config files to wfs for inclusion in the core akcl release.
The old 386BSD config files there won't work on either NetBSD or FreeBSD.
Subject: Page fault in PTE area fails in copyout
Index: sys/i386/i386/trap.c FreeBSD-1.0.2
Description:
Reading files of several megabytes into Emacs, or many small
files all at once, would fail with "IO error - bad address".
Repeat-By:
The bug can be exercised by a test program that malloc()'s
a 5MB chunk of memory, and then, without accessing the memory
first, filling it with data from a file using read().
(I read 64k chunks from /dev/wd0d into successive 64k regions
of the 5MB chunk.) The read() will fail with EFAULT at the first
virtual address boundary that is a multiple of 0x400000.
Fix:
The problem was code in sys/i386/i386/trap.c that tries to
figure out what kind of trap occurred and to handle it appropriately.
It was interpreting any page fault with virtual address
>= vm->vm_maxsaddr as being a user stack segment fault.
In fact, addresses >= USRSTACK are in the user structure/PTE area,
and if they are handled as stack faults, the proper PTE will
not be paged in when it is supposed to be. This situation comes
up in copyout() and copyoutstr(), if PTE's are accessed for the
first time ever. The page fault on accessing the nonexistent PTE
is mishandled as a stack fault, and then the fault that occurs on
the subsequent access to the page itself causes copyout to fail
with EFAULT.
when the machine panics.
i386/i386/locore.s:
1) got rid of most .set directives that were being used like
#define's, and replaced them with appropriate #define's in
the appropriate header files (accessed via genassym).
2) added comments to header inclusions and global definitions,
and global variables
3) replaced some hardcoded constants with cpp defines (such as
PDESIZE and others)
4) aligned all comments to the same column to make them easier to
read
5) moved macro definitions for ENTRY, ALIGN, NOP, etc. to
/sys/i386/include/asmacros.h
6) added #ifdef BDE_DEBUGGER around all of Bruce's debugger code
7) added new global '_KERNend' to store last location+1 of kernel
8) cleaned up zeroing of bss so that only bss is zeroed
9) fix zeroing of page tables so that it really does zero them all
- not just if they follow the bss.
10) rewrote page table initialization code so that 1) works correctly
and 2) write protects the kernel text by default
11) properly initialize the kernel page directory, upages, p0stack PT,
and page tables. The previous scheme was more than a bit
screwy.
12) change allocation of virtual area of IO hole so that it is
fixed at KERNBASE + 0xa0000. The previous scheme put it
right after the kernel page tables and then later expected
it to be at KERNBASE +0xa0000
13) change multiple bogus settings of user read/write of various
areas of kernel VM - including the IO hole; we should never
be accessing the IO hole in user mode through the kernel
page tables
14) split kernel support routines such as bcopy, bzero, copyin,
copyout, etc. into a seperate file 'support.s'
15) split swtch and related routines into a seperate 'swtch.s'
16) split routines related to traps, syscalls, and interrupts
into a seperate file 'exception.s'
17) remove some unused global variables from locore that got
inserted by Garrett when he pulled them out of some .h
files.
i386/isa/icu.s:
1) clean up global variable declarations
2) move in declaration of astpending and netisr
i386/i386/pmap.c:
1) fix calculation of virtual_avail. It previously was calculated
to be right in the middle of the kernel page tables - not
a good place to start allocating kernel VM.
2) properly allocate kernel page dir/tables etc out of kernel map
- previously only took out 2 pages.
i386/i386/machdep.c:
1) modify boot() to print a warning that the system will reboot in
PANIC_REBOOT_WAIT_TIME amount of seconds, and let the user
abort with a key on the console. The machine will wait for
ever if a key is typed before the reboot. The default is
15 seconds, but can be set to 0 to mean don't wait at all,
-1 to mean wait forever, or any positive value to wait for
that many seconds.
2) print "Rebooting..." just before doing it.
kern/subr_prf.c:
1) remove PANICWAIT as it is deprecated by the change to machdep.c
i386/i386/trap.c:
1) add table of trap type strings and use it to print a real trap/
panic message rather than just a number. Lot's of work to
be done here, but this is the first step. Symbolic traceback
is in the TODO.
i386/i386/Makefile.i386:
1) add support in to build support.s, exception.s and swtch.s
...and various changes to various header files to make all of the
above happen.
the stack area and not memory above VM_MAXUSER_ADDRESS.
That way, copyout and friends now work for pages whose page table entries
have not yet been allocated/been paged out.
Remove NKMEMCLUSTERS, it is no longer define or used.
locores.s:
Fix comment on PTDpde and APTDpde to be pde instead of pte
Add new equation for calculating location of Sysmap
Remove Bill's old #ifdef garbage for counting up memory,
that stuff will never be made to work and was just cluttering
up the file.
Add code that places the PTD, page table pages, and kernel
stack below the 640k ISA hole if there is room for it, otherwise
put this stuff all at 1MB. This fixes the 28K bogusity in
the boot blocks, that can now go away!
Fix the caclulation of where first is to be dependent on
NKPDE so that we can skip over the above mentioned areas.
The 28K thing is now 44K in size due to the increase in
kernel virtual memory space, but since we no longer have
to worry about that this is no big deal.
Use if NNPX > 0 instead of ifdef NPX for floating point code.
machdep.c
Change the calculation of for the buffer cache to be
20% of all memory above 2MB and add back the upper limit
of 2/5's of the VM_KMEM_SIZE so that we do not eat ALL
of the kernel memory space on large memory machines, note
that this will not even come into effect unless you have
more than 32MB. The current buffer cache limit is 6.7MB
due to this caclulation.
It seems that we where erroniously allocating bufpages pages
for buffer_map. buffer_map is UNUSED in this implementation
of the buffer cache, but since the map is referenced in
several if statements a quick fix was to simply allocate
1 vm page (but no real memory) to it.
pmap.h
Remove rcsid, don't want them in the kernel files!
Removed some cruft inside an #ifdef DEBUGx that caused
compiler errors if you where compiling this for debug.
Use the #defines for PD_SHIFT and PG_SHIFT in place of
constants.
trap.c:
Remove patch kit header and rcsid, fix $Id$.
Now include "npx.h" and use NNPX for controlling the
floating point code.
Remove a now completly invalid check for a maximum virtual
address, the virtual address now ends at 0xFFFFFFFF so
there is no more MAX!! (Thanks David, I completly missed
that one!)
vm_machdep.c
Remove patch kit header and rcsid, fix $Id$.
Now include "npx.h" and use NNPX for controlling the
floating point code.
Replace several 0xFE00000 constants with KERNBASE
FP and we try to call the emulator when it is not compiled in.
Removed the #if defined(i486) || defined(i387) that use to call the
panic if we did not have a math emulator.
Removed an extranious include of i386/i386/math_emu.h from math_emulate.c.
profiling, and various protection checks that cause security holes
and system crashes.
* Changed min/max/bcmp/ffs/strlen to be static inline functions
- included from cpufunc.h in via systm.h. This change
improves performance in many parts of the kernel - up to 5% in the
networking layer alone. Note that this requires systm.h to be included
in any file that uses these functions otherwise it won't be able to
find them during the load.
* Fixed incorrect call to splx() in if_is.c
* Fixed bogus variable assignment to splx() in if_ed.c