1) Start using TSM.
Struct procs continue to point to upages structure, after being freed.
Struct vmspace continues to point to pte object and kva space for kstack.
u_map is now superfluous.
2) vm_map's don't need to be reference counted. They always exist either
in the kernel or in a vmspace. The vmspaces are managed by reference
counts.
3) Remove the "wired" vm_map nonsense.
4) No need to keep a cache of kernel stack kva's.
5) Get rid of strange looking ++var, and change to var++.
6) Change more data structures to use our "zone" allocator. Added
struct proc, struct vmspace and struct vnode. This saves a significant
amount of kva space and physical memory. Additionally, this enables
TSM for the zone managed memory.
7) Keep ioopt disabled for now.
8) Remove the now bogus "single use" map concept.
9) Use generation counts or id's for data structures residing in TSM, where
it allows us to avoid unneeded restart overhead during traversals, where
blocking might occur.
10) Account better for memory deficits, so the pageout daemon will be able
to make enough memory available (experimental.)
11) Fix some vnode locking problems. (From Tor, I think.)
12) Add a check in ufs_lookup, to avoid lots of unneeded calls to bcmp.
(experimental.)
13) Significantly shrink, cleanup, and make slightly faster the vm_fault.c
code. Use generation counts, get rid of unneded collpase operations,
and clean up the cluster code.
14) Make vm_zone more suitable for TSM.
This commit is partially as a result of discussions and contributions from
other people, including DG, Tor Egge, PHK, and probably others that I
have forgotten to attribute (so let me know, if I forgot.)
This is not the infamous, final cleanup of the vnode stuff, but a necessary
step. Vnode mgmt should be correct, but things might still change, and
there is still some missing stuff (like ioopt, and physical backing of
non-merged cache files, debugging of layering concepts.)
config option in pmap. Fix a problem with faulting in pages. Clean-up
some loose ends in swap pager memory management.
The system should be much more stable, but all subtile bugs aren't fixed yet.
Fix the UIO optimization code.
Fix an assumption in vm_map_insert regarding allocation of swap pagers.
Fix an spl problem in the collapse handling in vm_object_deallocate.
When pages are freed from vnode objects, and the criteria for putting
the associated vnode onto the free list is reached, either put the
vnode onto the list, or put it onto an interrupt safe version of the
list, for further transfer onto the actual free list.
Some minor syntax changes changing pre-decs, pre-incs to post versions.
Remove a bogus timeout (that I added for debugging) from vn_lock.
PHK will likely still have problems with the vnode list management, and
so do I, but it is better than it was.
original BSD code. The association between the vnode and the vm_object
no longer includes reference counts. The major difference is that
vm_object's are no longer freed gratuitiously from the vnode, and so
once an object is created for the vnode, it will last as long as the
vnode does.
When a vnode object reference count is incremented, then the underlying
vnode reference count is incremented also. The two "objects" are now
more intimately related, and so the interactions are now much less
complex.
When vnodes are now normally placed onto the free queue with an object still
attached. The rundown of the object happens at vnode rundown time, and
happens with exactly the same filesystem semantics of the original VFS
code. There is absolutely no need for vnode_pager_uncache and other
travesties like that anymore.
A side-effect of these changes is that SMP locking should be much simpler,
the I/O copyin/copyout optimizations work, NFS should be more ponderable,
and further work on layered filesystems should be less frustrating, because
of the totally coherent management of the vnode objects and vnodes.
Please be careful with your system while running this code, but I would
greatly appreciate feedback as soon a reasonably possible.
it in struct proc instead.
This fixes a boatload of compiler warning, and removes a lot of cruft
from the sources.
I have not removed the /*ARGSUSED*/, they will require some looking at.
libkvm, ps and other userland struct proc frobbing programs will need
recompiled.
Rename vn_default_error to vop_defaultop all over the place.
Move vn_bwrite from vfs_bio.c to vfs_default.c and call it vop_stdbwrite.
Use vop_null instead of nullop.
Move vop_nopoll from vfs_subr.c to vfs_default.c
Move vop_sharedlock from vfs_subr.c to vfs_default.c
Move vop_nolock from vfs_subr.c to vfs_default.c
Move vop_nounlock from vfs_subr.c to vfs_default.c
Move vop_noislocked from vfs_subr.c to vfs_default.c
Use vop_ebadf instead of *_ebadf.
Add vop_defaultop for getpages on master vnode in MFS.
Distribute all but the most fundamental malloc types. This time I also
remembered the trick to making things static: Put "static" in front of
them.
A couple of finer points by: bde
Add support for "interrupt driven configuration hooks".
A component of the kernel can register a hook, most likely
during auto-configuration, and receive a callback once
interrupt services are available. This callback will occur before
the root and dump devices are configured, so the configuration
task can affect the selection of those two devices or complete
any tasks that need to be performed prior to launching init.
System boot is posponed so long as a hook is registered. The
hook owner is responsible for removing the hook once their task
is complete or the system boot can continue.
kern_acct.c kern_clock.c kern_exit.c kern_synch.c kern_time.c:
Change the interface and implementation for the kernel callout
service. The new implemntaion is based on the work of
Adam M. Costello and George Varghese, published in a technical
report entitled "Redesigning the BSD Callout and Timer Facilities".
The interface used in FreeBSD is a little different than the one
outlined in the paper. The new function prototypes are:
struct callout_handle timeout(void (*func)(void *),
void *arg, int ticks);
void untimeout(void (*func)(void *), void *arg,
struct callout_handle handle);
If a client wishes to remove a timeout, it must store the
callout_handle returned by timeout and pass it to untimeout.
The new implementation gives 0(1) insert and removal of callouts
making this interface scale well even for applications that
keep 100s of callouts outstanding.
See the updated timeout.9 man page for more details.
local filesystem metadata at the first brelse call when the
block device vnode has v_tag set to VT_NFS.
Reviewed by: phk
Submitted by: Tor Egge <tegge@idi.ntnu.no>
as chargeable CPU usage. This should mitigate the problem of processes
doing disk I/O hogging the CPU. Various users have reported the
problem, and test code shows that the problem should now be gone.
cause a problem of spiraling death due to buffer resource limitations.
The vfs_bio code in general had little ability to handle buffer resource
management, and now it does. Also, there are a lot more knobs for tuning the
vfs_bio code now. The knobs came free because of the need that there
always be some immediately available buffers (non-delayed or locked) for
use. Note that the buffer cache code is much less likely to get bogged
down with lots of delayed writes, even more so than before.
It is possible for multiple process to sleep concurrently waiting
for a buffer. When the buffer shortage is a shortage of space but
not a shortage of buffer headers, the processes took turns creating
empty buffers and waking each other to advertise the brelse() of
the empties; progress was never made because tsleep() always found
another high-priority process to run and everything was done at
splbio(), so vfs_update never had a chance to flush delayed writes,
not to mention that i/o never had a chance to complete.
The problem seems to be rare in practice, but it can easily be
reproduced by misusing block devices, at least for sufficently slow
devices on machines with a sufficiently small buffer cache. E.g.,
`tar cvf /dev/fd0 /kernel' on an 8MB system with no disk in fd0
causes the problem quickly; the same command with a disk in fd0
causes the problem not quite as quickly; and people have reported
problems newfs'ing file systems on block devices.
Block devices only cause this problem indirectly. They are pessimized
for time and space, and the space pessimization causes the shortage
(it manifests as internal fragmentation in buffer_map).
This should be fixed in 2.2.
and b_validend. The changes to vfs_bio.c are a bit ugly but hopefully
can be tidied up later by a slight redesign.
PR: kern/2573, kern/2754, kern/3046 (possibly)
Reviewed by: dyson
cache queue more often. The pageout daemon had to be waken up
more often than necessary since pages were not put on the
cache queue, when they should have been.
Submitted by: David Greenman <dg@freebsd.org>
This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.
Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.
decrease the size of buffer_map to approx 2/3 of what it used to be
(buffer_map can be smaller now.) The original commit of these changes
increased the size of buffer_map to the point where the system would
not boot on large systems -- now large systems with large caches will
have even less problems than before.
scheme. Additionally, add the capability for checking for unexpected
kernel page faults. The maximum amount of kva space for buffers hasn't
been decreased from where it is, but it will now be possible to do so.
This scheme manages the kva space similar to the buffers themselves. If
there isn't enough kva space because of usage or fragementation, buffers
will be reclaimed until a buffer allocation is successful. This scheme
should be very resistant to fragmentation problems until/if the LFS code
is fixed and uses the bogus buffer locking scheme -- but a 'fixed' LFS
is not likely to use such a scheme.
Now there should be NO problem allocating buffers up to MAXPHYS.
larger than the vfs layer can provide. We now automatically support
32K clusters if MSDOSFS is installed, and panic if a filesystem tries
to allocate a buffer larger than MAXBSIZE.
This commit is a result of some "prodding" by BDE.
substantially increasing buffer space. Specifically, we double
the number of buffers, but allocate only half the amount of memory
per buffer. Note that VDIR files aren't cached unless instantiated
in a buffer. This will significantly improve caching.
The heuristic for managment of memory backing the buffer cache was
nice, but didn't work due to some architectural problems. Simplify
and improve the algorithm.
Major: When blocking occurs in allocbuf() for VMIO files,
excess wire counts could accumulate.
Major: Pages are incorrectly accumulated into the physical
buffer for clustered reads. This happens when bogus
page is needed.
Minor: When reclaiming buffers, the async flag on the buffer
needs to be zero, or the reclaim is not optimal.
Minor: The age flag should be cleared, if a buffer is wanted.
incorrect, and correct the support for B_ORDERED. The spl window
fix was from Peter Wemm, and his questions led me to find the problem with
the interrupt time page manipulation.
B_ASYNC flag broke things pretty bad (freeing buffer already on
queue or other wierd buffer queue errors.) The broken code is
left in commented out, but this makes the problem go away for
now.
The default level works with minimal overhead, but one can also enable
full, efficient use of a 512K cache. (Parameters can be generated
to support arbitrary cache sizes also.)
Bowrite guarantees that buffers queued after a call to bowrite will
be written after the specified buffer (on a particular device).
Bowrite does this either by taking advantage of hardware ordering support
(e.g. tagged queueing on SCSI devices) or resorting to a synchronous write.
The interface into the "VMIO" system has changed to be more consistant
and robust. Essentially, it is now no longer necessary to call vn_open
to get merged VM/Buffer cache operation, and exceptional conditions
such as merged operation of VBLK devices is simpler and more correct.
This code corrects a potentially large set of problems including the
problems with ktrace output and loaded systems, file create/deletes,
etc.
Most of the changes to NFS are cosmetic and name changes, eliminating
a layer of subroutine calls. The direct calls to vput/vrele have
been re-instituted for better cross platform compatibility.
Reviewed by: davidg