track of the number of dirty buffers held by a vnode. When a
bdwrite is done on a buffer, check the existing number of dirty
buffers associated with its vnode. If the number rises above
vfs.dirtybufthresh (currently 90% of vfs.hidirtybuffers), one
of the other (hopefully older) dirty buffers associated with
the vnode is written (using bawrite). In the event that this
approach fails to curb the growth in it the vnode's number of
dirty buffers (due to soft updates rollback dependencies),
the more drastic approach of doing a VOP_FSYNC on the vnode
is used. This code primarily affects very large and actively
written files such as snapshots. This change should eliminate
hanging when taking snapshots or doing background fsck on
very large filesystems.
Hopefully, one day it will be possible to cache filesystem
metadata in the VM cache as is done with file data. As it
stands, only the buffer cache can be used which limits total
metadata storage to about 20Mb no matter how much memory is
available on the system. This rather small memory gets badly
thrashed causing a lot of extra I/O. For example, taking a
snapshot of a 1Tb filesystem minimally requires about 35,000
write operations, but because of the cache thrashing (we only
have about 350 buffers at our disposal) ends up doing about
237,540 I/O's thus taking twenty-five minutes instead of four
if it could run entirely in the cache.
Reported by: Attila Nagy <bra@fsn.hu>
Sponsored by: DARPA & NAI Labs.
- Remove the buftimelock mutex and acquire the buf's interlock to protect
these fields instead.
- Hold the vnode interlock while locking bufs on the clean/dirty queues.
This reduces some cases from one BUF_LOCK with a LK_NOWAIT and another
BUF_LOCK with a LK_TIMEFAIL to a single lock.
Reviewed by: arch, mckusick
- Get rid of the useless atop() / pmap_phys_address() detour. The
device mmap handlers must now give back the physical address
without atop()'ing it.
- Don't borrow the physical address of the mapping in the returned
int. Now we properly pass a vm_offset_t * and expect it to be
filled by the mmap handler when the mapping was successful. The
mmap handler must now return 0 when successful, any other value
is considered as an error. Previously, returning -1 was the only
way to fail. This change thus accidentally fixes some devices
which were bogusly returning errno constants which would have been
considered as addresses by the device pager.
- Garbage collect the poorly named pmap_phys_address() now that it's
no longer used.
- Convert all the d_mmap_t consumers to the new API.
I'm still not sure wheter we need a __FreeBSD_version bump for this,
since and we didn't guarantee API/ABI stability until 5.1-RELEASE.
Discussed with: alc, phk, jake
Reviewed by: peter
Compile-tested on: LINT (i386), GENERIC (alpha and sparc64)
Runtime-tested on: i386
time and there's no indication that it will improve anytime soon.
By removing support for SimOS it is possible to build LINT on
Alpha, which is considered more important at the moment.
Not objected to on: alpha@
- Changed VM_MAXUSER_ADDRESS to be defined in terms of PTDPTDI. In order for
assumptions about the recursive page table map to work it must be the base
of the recursive map. Any pte offset that's not NPTEPG will break these
assumptions.
Sponsored by: DARPA, Network Associates Laboratories
from the source directory. (This mostly affects the RELENG_4's
``make release'' release.5 target, where "rtermcap" build-tool
for release/sysinstall ends up in the source directory and later
steps of release.5 wipe it out.)
Spotted by: jhay
NSWAPDEV limit.
- Don't warn about devices that are not in use in 'swapoff -a'.
- Re-add behavior mistakenly removed in revision 1.44:
If using 'swapon -a', do not warn if the device is already in use.
PR: 46633
Submitted by: Andy Farkas <andyf@speednet.com.au> (in part)
Reviewed by: mike (mentor)
The initial stack_block is staticly allocated and will be aligned
according to the alignment requirements of pointers, which does not
necessarily match the alignment enforced by ALIGN. To solve this a
more involved change is required: remove the static initial stack
and deal with an initial condition of not having a stack at all. This
change is therefore more risky than the previous ones, but unavoidable
(other than not using the platform default alignment).
Discussed with: tjr
Approved and reviewed by: tjr
Tested on: alpha, i386, ia64 and sparc64
flag that can be marked on each symmetric op
o eliminate hw.ubsec.maxbatch and hw.ubsec.maxaggr since they are not
needed anymore
o change ubsec_feed to return void instead of int since zero is always
returned and noone ever looked at the return value