much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.
The majority of the merged VM/cache work is by John Dyson.
The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.
vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.
vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.
vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.
vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.
vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.
pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.
vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.
proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.
swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.
machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.
machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.
ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.
Submitted by: John Dyson and David Greenman
Fix single-stepping of emulated FPU instructions.
Don't panic if an FPU instruction is attempted but there is no FPU
and no FPU emulator is configured.
short, it gets filled uop to its length. This matches the getdomainname
and gethostname manual pages.
(getbootfile also uses this function and I think it should have the same
behaviour)
This also fixes a bug with keyinit where the seed was not saved in
/etc/skeykeys. So S/Key should be fully functional again.
Reviewed by:
Submitted by:
Obtained from:
Improve hzto():
Round up instead of down and then add 1 tick. This fixes sleep(1)
sometimes sleeping for < 1 second and usleep(10000) sometimes sleeping
for as little as 1 usec + syscall time.
Don't do all the calculations at splhigh().
Don't depend on `tick' being a multiple of 1000.
Don't lose accuracy for `sec' between 0x7fffffff / 1000 - 1000 and
0x7fffffff / hz.
Don't assume that longs are 32 bits or that ints have the same size as
longs.
of returning EINVAL since something may depend on them being broken.
Allowing negative limits caused bugs almost everywhere. The recent
fixes for MAXSSIZ checked the limits too late to stop anyone defeating
limits set by root...
used as an address value. Then all comparisons should be done unsigned
and not signed. Fix it with a typecast of u_quad_t.
Error can be demonstrated with the current bash in port, do a
ulimit -s unlimited and the machine hangs. bash delivers through
an internal error a large negative value for the stacksize, the
comparison saw this smaller than MAXSSIZ and then tried to expand
the stack to this size.
operation of each clist. Limit the growth of each clist. Clists
can only grow larger than the reserved minimum if there are free
cblocks in a shared pool. The size of this pool is now fixed
(this could be improved). The reserved and maximum sizes are more
carefully allocated for slip and ppp, depending on the mtu. A maximum
MTU of 16384 is now enforced for ppp.
1. The pageout daemon used to block under certain
circumstances, and we needed to add new functionality
that would cause the pageout daemon to block more often.
Now, the pageout daemon mostly just gets rid of pages
and kills processes when the system is out of swap.
The swapping, rss limiting and object cache trimming
have been folded into a new daemon called "vmdaemon".
This new daemon does things that need to be done for
the VM system, but can block. For example, if the
vmdaemon blocks for memory, the pageout daemon
can take care of it. If the pageout daemon had
blocked for memory, it was difficult to handle
the situation correctly (and in some cases, was
impossible).
2. The collapse problem has now been entirely fixed.
It now appears to be impossible to accumulate unnecessary
vm objects. The object collapsing now occurs when ref counts
drop to one (where it is more likely to be more simple anyway
because less pages would be out on disk.) The original
fixes were incomplete in that pathological circumstances
could still be contrived to cause uncontrolled growth
of swap. Also, the old code still, under steady state
conditions, used more swap space than necessary. When
using the new code, users will generally notice a
significant decrease in swap space usage, and theoretically,
the system should be leaving fewer unused pages around
competing for memory.
Submitted by: John Dyson
Find enclosed a short bugfix to get the union filesystem up and running
in FreeBSD-current. We don't think we've got all the problems yet but
these fixes sort out the major ones (which mostly concert bad locking
of vnodes), no doubt we'll post others as necessary. Known problems
include the inability of the umount command (not the system call) to unmount
unions in certain circumstances (this is due the way "realpath" works),
and the failure of direntries to always get all available files in
unioned subdirectories. We are, as they say, working on it.
Submitted by: tim@cs.city.ac.uk (Tim Wilkinson)
a clist return with an error. There are some clist starvation/deadlock
bugs elsewhere and killing clist hogs didn't help because the breaks
only exited from the inner loops.
Cosmetic.
Return from trap() if trap_fatal() returns. trap_fatal() isn't
fatal if you have ddb. Returning from trap() is usually the right
thing to do and much better than falling through.
and all SCSI devices (except that it's not done quite the way I want). New
information added includes:
- A text description of the device
- A ``state''---unknown, unconfigured, idle, or busy
- A generic parent device (with support in the m.i. code)
- An interrupt mask type field (which will hopefully go away) so that
. ``doconfig'' can be written
This requires a new version of the `lsdev' program as well (next commit).
A word of wisdom, don't do this:
| cd /usr/bin
| for i in *
| do
| cp $i /tmp/a
| gzip -9 < /tmp/a > $i
| done
It will compress files with multiple links several times. do it this way:
| cd /usr/bin
| for i in *
| do
| gunzip -f < $i > /tmp/a
| gzip -9 < /tmp/a > $i
| done
For it to be useful, you must stick your disklabel on the partition which
starts where the MBR says FreeBSD lives. If you don't do that, you might
get a bad day.
Oh, that probably also means that putting swap there is a bad idea...
- excise some unused code (#if 0'd out - don't want to nuke it yet)
- fix problems with "make depend" - some macros were screwing it up
- get rid of some static local variables
There still seems to be a small reentrancy problem somewhere.
pmap.c: tons of unused vars zapped, various other warnings silenced.
trap.c: unused vars zapped.
vm_machdep.c: A wrong argument, which by chance did the right thing, was
corrected.
1) cut this up into /sys/sys/inflate.h, sys/kern/inflate.c
sys/kern/ingact_gzip.c
2) make a lot more things static
3) make a lot of globals const
4) make some args const
5) first stage of making globals into a struct (not used yet)
The vm_allocate() call which was introduced between revisions 1.4 and
1.5 of imagact_gzip.c broke things. I have backed that out for the time
being. (Davidg: help please)
WARNING: if you have gzip enabled in your kernel, you must now run
config again, as another source file has been added. Otherwise your
kernel compile will fall over.
This is all still WIP. More commits to come.
Suggestions from: phk.
more weird kinds of a.out than anyone can argue for. This code failed to
load the first 28K of the text-segment, in the case where the first page
of the a.out contains only the a.out-header, and the text is still at 0x0.
Thanks Steven !
the uncompression buffer. Now malloc(M_GZIP) is used for all the Huffman-
tree stuff only. Numbers so far indicate < 15Kb Malloc use + 32 Kb for
the abovementioned buffer while uncompressing.
cleaned up much of the cruft in this thing.
No printf's in the case where things go well.
Gzip-headers can contain filenames and comments (as long as they're
shorter than the page-size.)
I don't think we leak memory, in the "exec/aout" code. I'm not quite sure
about the inflate code yet, but I don't think memory is lost.
Q: Can I add a class M_GZIP to <sys/malloc.h> and bump M_LAST one up
without any thing else needing tweaking ?
Poul-Henning
WARNING: THIS MATERIAL MIGHT GO AWAY!
This material needs the core-groups approval to stay here for the 2.0 release.
If the core-group does not concent to this commit, it will be backed out.
***
It is a non-gpl'ed "unzip" which will allow execution of a.out files which
have been sent through "gzip -9". The idea being saved disk-space.
Just now this code has quality rating: "working prototype".
To compress a file to be used with this, do it exactly this way:
gzip -9 -v < /bin/FOO > /tmp/FOO
remember to chmod /tmp/FOO as needed.
DON'T compress all of you binaries right away ! There are several things
which you should consider first:
1. Using compressed binaries, you use >MUCH< more VM, and thus swap-space.
2. It is slow.
3. It might crash your machine.
Apart from that, I welcome comments...
NB: There is also a change to sys/conf/files, but cvs core-dumped on me,
so it didn't get into the logs or emailed, but the commit seems to have
happended OK.
cycles. While waiting there I added a lot of the extra ()'s I have, (I have
never used LISP to any extent). So I compiled the kernel with -Wall and
shut up a lot of "suggest you add ()'s", removed a bunch of unused var's
and added a couple of declarations here and there. Having a lap-top is
highly recommended. My kernel still runs, yell at me if you kernel breaks.
- Make a number of filesystems work again when they are statically compiled
(blush)
- FIFOs are no longer optional; ``options FIFO'' removed from distributed
config files.
- set args->lkm_offset correctly so that VFS modules can be unloaded
- initialize _fs_vfsops.vfc_refcount correctly so that VFS modules can
be unloaded
- include kernel.h in a few placves to get the correct definition of DATA_SET
This code is mostly taken from the 1.1 port (which was in turn taken from
Dave Mills's kern.tar.Z example). A few significant differences:
1) ntp_gettime() is now a MIB variable rather than a system call. A few
fiddles are done in libc to make it behave the same.
2) mono_time does not participate in the PLL adjustments.
3) A new interface has been defined (in <machine/clock.h>) for doing
possibly machine-dependent things around the time of the clock update.
This is used in Pentium kernels to disable interrupts, set `time', and
reset the CPU cycle counter as quickly as possible to avoid jitter in
microtime(). Measurements show an apparent resolution of a bit more than
8.14usec, which is reasonable given system-call overhead.
otherwise the machine will overflow the stack in a recursive fault loop
(causing the machine to spontaneously reboot because of the stack fault
that ultimately happens).
Submitted by: Inspired by Bruce Evans, but this change is different
than what he suggested.
the Mach/i386 version of the BSD/vax(?) <machine/psl.h>. The Mach
version has slightly better names for many macros but is now out of
date and little used. It was originally used even less (for spelling
PSL_T as EFL_TF in <machine/db_machdep.h>).
Added "sys/rtprio.h" with the used defines.
Added rtprio(2) - the kernel interface. init_sysent.c,
kern_resource.c
syscalls.master
Added 32 new runqueues (rtqs), with initialization. kern_proc.c
kern_synch.c
Realtime processes do not change nice/priority kern_synch.c
Added a column "rt" to ddb's ps (#ifdef RTPRIO_DEBUG) kern_synch.c
Realtime priorities are enherited through fork(). kern_fork.c
Init (and children) NOT run as realtime process. init_main.c
Submitted by: Henrik Vestergaard Draboel
with BOUNCE_BUFFERS. This is more intuitive, and is better for future
multiplatform support. Added BOUNCE_BUFFERS option to the GENERIC and
LINT kernel config files.
com.h/lpa.h. Removed all vestiges of com/lpa out of conf.c and also
fixed up the end of cdevsw/bdevsw to have "no" routines instead of
a NULL pointer (suggested by someone a few weeks back).
in your kernel config now).
2) Added ps ddb function from 1.1.5. Cleaned it up a bit and moved into its
own file.
3) Added \r handing in db_printf.
4) Added missing memory usage stats to statclock().
5) Added dummy function to pseudo_set so it will be emitted if there
are no other pseudo declarations.
machdep.c:
Changed printf's a little and call vfs_unmountall() if the sync was
successful.
cd9660_vfsops.c, ffs_vfsops.c, nfs_vfsops.c, lfs_vfsops.c:
Allow dismount of root FS. It is now disallowed at a higher level.
vfs_conf.c:
Removed unused rootfs global.
vfs_subr.c:
Added new routines vfs_unmountall and vfs_unmountroot. Filesystems
are now dismounted if the machine is properly rebooted.
ffs_vfsops.c:
Toggle clean bit at the appropriate places. Print warning if an
unclean FS is mounted.
ffs_vfsops.c, lfs_vfsops.c:
Fix bug in selecting proper flags for VOP_CLOSE().
vfs_syscalls.c:
Disallow dismounting root FS via umount syscall.
use of timeout_t -> timeout_func_t in aha1542 and aha1742 drivers.
2) fix a bug in the portalfs that was uncovered by better prototyping -
specifically, the time must be converted from timeval to timespec
before storing in va_atime.
3) fixed/added some miscellaneous prototypes
- Delete redundant declarations.
- Add -Wredundant-declarations to Makefile.i386 so they don't come back.
- Delete sloppy COMMON-style declarations of uninitialized data in
header files.
- Add a few prototypes.
- Clean up warnings resulting from the above.
NB: ioconf.c will still generate a redundant-declaration warning, which
is unavoidable unless somebody volunteers to make `config' smarter.
done. This patch was extended to also include a suggested change by
Kirk McKusick which allows the control tty to be reasigned to a different
tty without losing a vnode.
``changes'' are actually not changes at all, but CVS sometimes has trouble
telling the difference.
This also includes support for second-directory compiles. This is not
quite complete yet, as `config' doesn't yet do the right thing. You can
still make it work trivially, however, by doing the following:
rm /sys/compile
mkdir /usr/obj/sys/compile
ln -s M-. /sys/compile
cd /sys/i386/conf
config MYKERNEL
cd ../../compile/MYKERNEL
ln -s /sys @
rm machine
ln -s @/i386/include machine
make depend
make
are running under. Here's how to bootstrap (order is important):
1) Re-compile gcc (just the driver is all you need).
2) Re-compile libc.
3) Re-compile your kernel. Reboot.
4) cd /usr/src/include; make install
You can now detect the compilation environment with the following code:
#if !defined(__FreeBSD__)
#define __FreeBSD_version 199401
#elif __FreeBSD__ == 1
#define __FreeBSD_version 199405
#else
#include <osreldate.h>
#endif
You can determine the run-time environment by calling the new C library
function getosreldate(), or by examining the MIB variable kern.osreldate.
For the time being, the release date is defined as 199409, which we have
already established as our target.
use it in NFS. This is required both for diskless support and for POSIX
compliance. Note: the support in NFS is only for the local node.
Submitted by: based on work originally done by Yuval Yurom
improvements via the new routines pmap_qenter/pmap_qremove and pmap_kenter/
pmap_kremove. These routine allow fast mapping of pages for those
architectures that have "normal" MMUs. Also included is a fix to the
pageout daemon to properly check a queue end condition.
Submitted by: John Dyson
charge scheduling CPU of child process to the parent and have child
inherit scheduling CPU from parent on fork. Makes a **big** difference
in the feel of the system to interactive users.
Submitted by: John Dyson