1) Files weren't properly synced on filesystems other than UFS. In some
cases, this lead to lost data. Most likely would be noticed on NFS.
The fix is to make the VM page sync/object_clean general rather than
in each filesystem.
2) Mixing regular and mmaped file I/O on NFS was very broken. It caused
chunks of files to end up as zeroes rather than the intended contents.
The fix was to fix several race conditions and to kludge up the
"b_dirtyoff" and "b_dirtyend" that NFS relies upon - paying attention
to page modifications that occurred via the mmapping.
Reviewed by: David Greenman
Submitted by: John Dyson
These changes solve the problem in a general way by moving the
initialization out of the individual fs_mountroot's and into swaponvp().
Submitted by: Poul-Henning Kamp
changes. The check for nswap was bogus, but the code was so convoluted
that it was difficult to tell. It's better now. :-)
Reviewed by: David Greenman (extensively), and John Dyson
Submitted by: Poul-Henning Kamp, w/tweaks by me.
inconsistencies in the VM system that eventually lead to a panic. These
changes fix the behavior to conform to the behavior in SunOS, which is
to deny faults to pages beyond the EOF (returning SIGBUS). Internally,
this is implemented by requiring faults to be within the object size
boundaries. These changes exposed another bug, namely that passing in
an offset to mmap when trying to map an unnamed anonymous region also
results in internal inconsistencies. In this case, the offset is forced
to zero.
Reviewed by: John Dyson and others
require specific partitions be mentioned in the kernel config
file ("swap on foo" is now obsolete).
From Poul-Henning:
The visible effect is this:
As default, unless
options "NSWAPDEV=23"
is in your config, you will have four swap-devices.
You can swapon(2) any block device you feel like, it doesn't have
to be in the kernel config.
There is a performance/resource win available by getting the NSWAPDEV right
(but only if you have just one swap-device ??), but using that as default
would be too restrictive.
The invisible effect is that:
Swap-handling disappears from the $arch part of the kernel.
It gets a lot simpler (-145 lines) and cleaner.
Reviewed by: John Dyson, David Greenman
Submitted by: Poul-Henning Kamp, with minor changes by me.
with davidg about it, I hereby kill two undocumented misfeatures:
The code to skip a miniroot in the swapdev is not particular useful, and
if we need it we need it to be done properly, ie size the fs and skip all
of it not some hardcoded size, and subtract what we skip from the length
in the first place.
The SEQSWAP dies too. It's not the way to do it, it doesn't work, and
nobody have expressed any great desire for it to work. The way to
implement it correctly would be a second argument to swapon(2) to give
a priority/policy information. Low priority swapdevs can be made so
by adding them at a far offset (0x80000000 kind of thing), with almost no
modification to the strategy routine (in particular a offset per swapdev).
But until the need is obvious, it will not be done.
to emit spurious page outside of object type messages. It is not
a fatal condition anyway, so the message will be omitted for
release. Also, the code that "clips" the allocation size, associated
with the above problem, was fixed.
space for the hash list buckets and is a little faster. The features
of tailq aren't needed. Increased the size of the object hash table
to improve performance. In the future, this will be changed so that
the table is sized dynamically.
Headers should always use `__inline' for inline functions to avoid
syntax errors when modules that don't even use the offending functions
are compiled with `gcc -ansi'.
pages that are in FS buffers. This fixes the (believed to already have been
fixed) problem with msync() not doing it's job...in other words, the
stuff that Andrew has continuously been complaining about.
Submitted by: John Dyson, w/minor changes by me.
Fixed remaining known bugs in the buffer IO and VM system.
vfs_bio.c:
Fixed some race conditions and locking bugs. Improved performance
by removing some (now) unnecessary code and fixing some broken
logic.
Fixed process accounting of # of FS outputs.
Properly handle NFS interrupts (B_EINTR).
(various)
Replaced calls to clrbuf() with calls to an optimized routine
called vfs_bio_clrbuf().
(various FS sync)
Sync out modified vnode_pager backed pages.
ffs_vnops.c:
Do two passes: Sync out file data first, then indirect blocks.
vm_fault.c:
Fixed deadly embrace caused by acquiring locks in the wrong order.
vnode_pager.c:
Changed to use buffer I/O system for writing out modified pages. This
should fix the problem with the modification date previous not getting
updated. Also dramatically simplifies the code. Note that this is
going to change in the future and be implemented via VOP_PUTPAGES().
vm_object.c:
Fixed a pile of bugs related to cleaning (vnode) objects. The performance
of vm_object_page_clean() is terrible when dealing with huge objects,
but this will change when we implement a binary tree to keep the object
pages sorted.
vm_pageout.c:
Fixed broken clustering of pageouts. Fixed race conditions and other
lockup style bugs in the scanning of pages. Improved performance.
to accurately track this. It isn't an indicator of resource consumption
anyway.
Removed cnt.v_kernel_pages: We don't implement this and doing so accurately
would be very difficult (and ambiguous - since process pages are often
double mapped in the kernel and the process address spaces).
VTEXT not always getting cleared when it is supposed to. Added check to
make sure that vm_object_remove() isn't called with a NULL pager or for
a pager for an OBJ_INTERNAL object (neither of which will be on the hash
list). Clear OBJ_CANPERSIST if we decide to terminate it because of no
resident pages.
was the wrong size. This is the likely cause of panics reported by
Lars Fredriksen and Paul Richards related to a -1 blkno when paging
via the swap_pager.
Submitted by: John Dyson