syncs the entire underlying file rather then just the requested range,
resulting in huge inefficiencies when the VM system is articulated in
a certain way. The VOP_FSYNC was also found to massively reduce NFS
performance in certain cases.
Change MADV_DONTNEED and MADV_FREE to call vm_page_dontneed() instead
of vm_page_deactivate(). Using vm_page_deactivate() causes all
inactive and cache pages to be recycled before the dontneed/free page
is recycled, effectively flushing our entire VM inactive & cache
queues continuously even if only a few pages are being actively MADV
free'd and reused (such as occurs with a sequential scan of a
memory-mapped file).
Reviewed by: Alan Cox <alc@cs.rice.edu>, David Greenman <dg@root.com>
vm_map.c:
Don't set OBJ_ONEMAPPING on arbitrary vm objects. Only default
and swap type vm objects should have it set. vm_object_deallocate
already handles these cases.
vm_object.c:
If OBJ_ONEMAPPING isn't already clear in vm_object_shadow,
we are in trouble. Instead of clearing it, make it
an assertion that it is already clear.
Remove a useless argument from vm_map_madvise's interface (vm_map.c,
vm_map.h, and vm_mmap.c).
Remove a redundant test in vm_uiomove (vm_map.c).
Make two changes to vm_object_coalesce:
1. Determine whether the new range of pages actually overlaps
the existing object's range of pages before calling vm_object_page_remove.
(Prior to this change almost 90% of the calls to vm_object_page_remove
were to remove pages that were beyond the end of the object.)
2. Free any swap space allocated to removed pages.
1. The size of vm_object::memq is vm_object::resident_page_count,
not vm_object::size.
2. The "size > 4" test sometimes results in the traversal of a ~1000 page
memq in order to locate ~10 pages.
Fix bug where an object's OBJ_WRITEABLE/OBJ_MIGHTBEDIRTY flags do
not get set under certain circumstances ( page rename case ).
Reviewed by: Alan Cox <alc@cs.rice.edu>, John Dyson
free swap space out from under a busy page. This is not legal because
the swap may be reallocated and I/O issued while I/O is still in
progress on the same swap page from the madvise()'d object. This bug
could only occur under extreme paging conditions but might not cause
an error until much later. As a side-benefit, madvise() is now even
smaller.
possible without actually unmapping it from the process.
As of now, I declare madvise() on OBJT_DEFAULT/OBJT_SWAP objects to be
'working and complete'.
OBJ_ONEMAPPING in the case where an object is extended by an
additional vm_map_entry must be allocated.
In vm_object_madvise(), remove calll to vm_page_cache() in MADV_FREE
case in order to avoid a page fault on page reuse. However, we still
mark the page as clean and destroy any swap backing store.
Submitted by: Alan Cox <alc@cs.rice.edu>
no major operational changes were made. The three core object->memq loops
were moved into a single inline procedure and various operational
characteristics of the collapse function were documented.
attempt to optimize forks but were essentially given-up on due to
problems and replaced with an explicit dup of the vm_map_entry structure.
Prior to the removal, they were entirely unused.
Since paging is in progress, page scan in vm_page_qcollapse() must be
protected at atleast splbio() to prevent pages from being ripped out from
under the scan.
The vm_map_insert()/vm_object_coalesce() optimization has been extended
to include OBJT_SWAP objects as well as OBJT_DEFAULT objects. This is
possible because it costs nothing to extend an OBJT_SWAP object with
the new swapper. We can't do this with the old swapper. The old swapper
used a linear array that would have had to have been reallocated, costing
time as well as a potential low-memory deadlock.
object->paging_offset has been removed - it was used to optimize a
single OBJT_SWAP collapse case yet introduced massive confusion throughout
vm_object.c. The optimization was inconsequential except for the
claim that it didn't have to allocate any memory. The optimization
has been removed.
madvise() has been fixed. The old madvise() could be made to operate
on shared objects which is a big no-no. The new one is much more careful
in what it modifies. MADV_FREE was totally broken and has now been fixed.
vm_page_rename() now automatically dirties a page, so explicit dirtying
of the page prior to calling vm_page_rename() has been removed.
changes to the VM system to support the new swapper, VM bug
fixes, several VM optimizations, and some additional revamping of the
VM code. The specific bug fixes will be documented with additional
forced commits. This commit is somewhat rough in regards to code
cleanup issues.
Reviewed by: "John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>
file to a stream socket. sendfile(2) is similar to implementations in
HP-UX, Linux, and other systems, but the API is more extensive and
addresses many of the complaints that the Apache Group and others have
had with those other implementations. Thanks to Marc Slemko of the
Apache Group for helping me work out the best API for this.
Anyway, this has the "net" result of speeding up sends of files over
TCP/IP sockets by about 10X (that is to say, uses 1/10th of the CPU
cycles) when compared to a traditional read/write loop.
needs to be called prior to freeing remaining pages in the object so that
the device pager has an opportunity to grab its "fake" pages. Also, in
the case of wired pages, the page must be made busy prior to calling
vm_page_remove. This is a difference from 2.2.x that I overlooked when
I brought these changes forward.
legitimately wired pages. Currently we print a diagnostic when this
happens, but this will be removed soon when it will be common for this
to occur with zero-copy TCP/IP buffers.
Add some overflow checks to read/write (from bde).
Change all modifications to vm_page::flags, vm_page::busy, vm_object::flags
and vm_object::paging_in_progress to use operations which are not
interruptable.
Reviewed by: Bruce Evans <bde@zeta.org.au>
managed to avoid corruption of this variable by luck (the compiler used a
memory read-modify-write instruction which wasn't interruptable) but other
architectures cannot.
With this change, I am now able to 'make buildworld' on the alpha (sfx: the
crowd goes wild...)
casting them to long, etc. Fixed some nearby printf bogons (sign
errors not warned about by gcc, and style bugs, but not truncation
of vm_ooffset_t's).
FreeBSD/alpha. The most significant item is to change the command
argument to ioctl functions from int to u_long. This change brings us
inline with various other BSD versions. Driver writers may like to
use (__FreeBSD_version == 300003) to detect this change.
The prototype FreeBSD/alpha machdep will follow in a couple of days
time.
unexpectedly do not complete writes even with sync I/O requests.
This should help the behavior of mmaped files when using
softupdates (and perhaps in other circumstances also.)
deallocation cycles. This should provide a measurable improvement
on swap and memory allocation on loaded systems. It is unlikely a
complete solution. Also, provide more map info with procfs.
Chuck Cranor spurred on this improvement.
problems. Tor Egge and others have helped with various VM bugs
lately, but don't blame him -- blame me!!!
pmap.c:
1) Create an object for kernel page table allocations. This
fixes a bogus allocation method previously used for such, by
grabbing pages from the kernel object, using bogus pindexes.
(This was a code cleanup, and perhaps a minor system stability
issue.)
pmap.c:
2) Pre-set the modify and accessed bits when prudent. This will
decrease bus traffic under certain circumstances.
vfs_bio.c, vfs_cluster.c:
3) Rather than calculating the beginning virtual byte offset
multiple times, stick the offset into the buffer header, so
that the calculated offset can be reused. (Long long multiplies
are often expensive, and this is a probably unmeasurable performance
improvement, and code cleanup.)
vfs_bio.c:
4) Handle write recursion more intelligently (but not perfectly) so
that it is less likely to cause a system panic, and is also
much more robust.
vfs_bio.c:
5) getblk incorrectly wrote out blocks that are incorrectly sized.
The problem is fixed, and writes blocks out ONLY when B_DELWRI
is true.
vfs_bio.c:
6) Check that already constituted buffers have fully valid pages. If
not, then make sure that the B_CACHE bit is not set. (This was
a major source of Sig-11 type problems.)
vfs_bio.c:
7) Fix a potential system deadlock due to an incorrectly specified
sleep priority while waiting for a buffer write operation. The
change that I made opens the system up to serious problems, and
we need to examine the issue of process sleep priorities.
vfs_cluster.c, vfs_bio.c:
8) Make clustered reads work more correctly (and more completely)
when buffers are already constituted, but not fully valid.
(This was another system reliability issue.)
vfs_subr.c, ffs_inode.c:
9) Create a vtruncbuf function, which is used by filesystems that
can truncate files. The vinvalbuf forced a file sync type operation,
while vtruncbuf only invalidates the buffers past the new end of file,
and also invalidates the appropriate pages. (This was a system reliabiliy
and performance issue.)
10) Modify FFS to use vtruncbuf.
vm_object.c:
11) Make the object rundown mechanism for OBJT_VNODE type objects work
more correctly. Included in that fix, create pager entries for
the OBJT_DEAD pager type, so that paging requests that might slip
in during race conditions are properly handled. (This was a system
reliability issue.)
vm_page.c:
12) Make some of the page validation routines be a little less picky
about arguments passed to them. Also, support page invalidation
change the object generation count so that we handle generation
counts a little more robustly.
vm_pageout.c:
13) Further reduce pageout daemon activity when the system doesn't
need help from it. There should be no additional performance
decrease even when the pageout daemon is running. (This was
a significant performance issue.)
vnode_pager.c:
14) Teach the vnode pager to handle race conditions during vnode
deallocations.