performance issues.
1) The pmap module has had too many inlines, and so the
object file is simply bigger than it needs to be.
Some common code is also merged into subroutines.
2) Removal of some *evil* PHYS_TO_VM_PAGE macro calls.
Unfortunately, a few have needed to be added also.
The removal caused the need for more vm_page_lookups.
I added lookup hints to minimize the need for the
page table lookup operations.
3) Removal of some bogus performance improvements, that
mostly made the code more complex (tracking individual
page table page updates unnecessarily). Those improvements
actually hurt 386 processors perf (not that people who
worry about perf use 386 processors anymore :-)).
4) Changed pv queue manipulations/structures to be TAILQ's.
5) The pv queue code has had some performance problems since
day one. Some significant scalability issues are resolved
by threading the pv entries from the pmap AND the physical
address instead of just the physical address. This makes
certain pmap operations run much faster. This does
not affect most micro-benchmarks, but should help loaded system
performance *significantly*. DG helped and came up with most
of the solution for this one.
6) Most if not all pmap bit operations follow the pattern:
pmap_test_bit();
pmap_clear_bit();
That made for twice the necessary pv list traversal. The
pmap interface now supports only pmap_tc_bit type operations:
pmap_[test/clear]_modified, pmap_[test/clear]_referenced.
Additionally, the modified routine now takes a vm_page_t arg
instead of a phys address. This eliminates a PHYS_TO_VM_PAGE
operation.
7) Several rewrites of routines that contain redundant code to
use common routines, so that there is a greater likelihood of
keeping the cache footprint smaller.
Speed up for vfs_bio -- addition of a routine bqrelse to greatly diminish
overhead for merged cache.
Efficiency improvement for vfs_cluster. It used to do alot of redundant
calls to cluster_rbuild.
Correct the ordering for vrele of .text and release of credentials.
Use the selective tlb update for 486/586/P6.
Numerous fixes to the size of objects allocated for files. Additionally,
fixes in the various pagers.
Fixes for proper positioning of vnode_pager_setsize in msdosfs and ext2fs.
Fixes in the swap pager for exhausted resources. The pageout code
will not as readily thrash.
Change the page queue flags (PG_ACTIVE, PG_INACTIVE, PG_FREE, PG_CACHE) into
page queue indices (PQ_ACTIVE, PQ_INACTIVE, PQ_FREE, PQ_CACHE),
thereby improving efficiency of several routines.
Eliminate even more unnecessary vm_page_protect operations.
Significantly speed up process forks.
Make vm_object_page_clean more efficient, thereby eliminating the pause
that happens every 30seconds.
Make sequential clustered writes B_ASYNC instead of B_DELWRI even in the
case of filesystems mounted async.
Fix a panic with busy pages when write clustering is done for non-VMIO
buffers.
sysv_ipc.c: add stub functions that either simply return (for the hooks
in kern_fork/kern_exit) or log() a messgae and call enosys() (for the
syscalls). sysv_ipc.c will become "standard" in conf/files and has
#ifs for all the permutations.
the !COMPAT_43 case - use a common function even when there is no
`old' function. The diffs for this are large because of code motion
to restore the function order to what it was before the pseudo-argument
changes.
Include <sys/sysproto.h> to get correct args structs and prototypes.
The diffs for this are large because the declarations of the args structs
were moved to become comments in the function headers. The comments may
actually match the automatically generated declarations right now.
Add prototypes.
Prototypes are located in <sys/sysproto.h>.
Add appropriate #include <sys/sysproto.h> to files that needed
protos from systm.h.
Add structure definitions to appropriate files that relied on sys/systm.h,
right before system call definition, as in the rest of the kernel source.
In kern_prot.c, instead of using the dummy structure "args", create
individual dummy structures named <syscall>_args. This makes
life easier for prototype generation.
Instead of using a fake "compat" argument, pass a real compat int to function
if COMPAT_43 is defined. Functions involved: wait4, accept, recvfrom,
getsockname.
With the compat psuedo-argument, this introduces an argument structure
that can have two possible sizes depending on compat options.
This makes life difficult for lkm modules like ibcs2, which would
have to guess what size used in kernel when compiled. Also,
the prototype generator for these structures cannot generate proper sizes.
Now there is only one fixed structure and makes everybody happy.
I recommend these changes be introduced to 2.1 so that ibcs2, linux
lkm's generated for 2.2 can still run on a 2.1 kernel.
cycles. While waiting there I added a lot of the extra ()'s I have, (I have
never used LISP to any extent). So I compiled the kernel with -Wall and
shut up a lot of "suggest you add ()'s", removed a bunch of unused var's
and added a couple of declarations here and there. Having a lap-top is
highly recommended. My kernel still runs, yell at me if you kernel breaks.
charge scheduling CPU of child process to the parent and have child
inherit scheduling CPU from parent on fork. Makes a **big** difference
in the feel of the system to interactive users.
Submitted by: John Dyson