where physical addresses larger than virtual addresses, such as i386s
with PAE.
- Use this to represent physical addresses in the MI vm system and in the
i386 pmap code. This also changes the paddr parameter to d_mmap_t.
- Fix printf formats to handle physical addresses >4G in the i386 memory
detection code, and due to kvtop returning vm_paddr_t instead of u_long.
Note that this is a name change only; vm_paddr_t is still the same as
vm_offset_t on all currently supported platforms.
Sponsored by: DARPA, Network Associates Laboratories
Discussed with: re, phk (cdevsw change)
- Create a new function bdone() which sets B_DONE and calls wakup(bp). This
is suitable for use as b_iodone for buf consumers who are not going
through the buf cache.
- Create a new function bwait() which waits for the buf to be done at a set
priority and with a specific wmesg.
- Replace several cases where the above functionality was implemented
without locking with the new functions.
requiring locked bufs in vfs_bio_awrite(). Previously the buf could
have been written out by fsync before we acquired the buf lock if it
weren't for giant. The cluster_wbuild() handles this race properly but
the single write at the end of vfs_bio_awrite() would not.
- Modify flushbufqueues() so there is only one copy of the loop. Pass a
parameter in that says whether or not we should sync bufs with deps.
- Call flushbufqueues() a second time and then break if we couldn't find
any bufs without deps.
- Define one flag GB_LOCK_NOWAIT that tells getblk() to pass the LK_NOWAIT
flag to the initial BUF_LOCK(). This will eventually be used in cases
were we want to use a buffer only if it is not currently in use.
- Convert all consumers of the getblk() api to use this extra parameter.
Reviwed by: arch
Not objected to by: mckusick
delta 1.371) we must ensure that we do not get ourselves into a
recursive trap endlessly trying to clean up after ourselves.
Reported by: Attila Nagy <bra@fsn.hu>
Sponsored by: DARPA & NAI Labs.
track of the number of dirty buffers held by a vnode. When a
bdwrite is done on a buffer, check the existing number of dirty
buffers associated with its vnode. If the number rises above
vfs.dirtybufthresh (currently 90% of vfs.hidirtybuffers), one
of the other (hopefully older) dirty buffers associated with
the vnode is written (using bawrite). In the event that this
approach fails to curb the growth in it the vnode's number of
dirty buffers (due to soft updates rollback dependencies),
the more drastic approach of doing a VOP_FSYNC on the vnode
is used. This code primarily affects very large and actively
written files such as snapshots. This change should eliminate
hanging when taking snapshots or doing background fsck on
very large filesystems.
Hopefully, one day it will be possible to cache filesystem
metadata in the VM cache as is done with file data. As it
stands, only the buffer cache can be used which limits total
metadata storage to about 20Mb no matter how much memory is
available on the system. This rather small memory gets badly
thrashed causing a lot of extra I/O. For example, taking a
snapshot of a 1Tb filesystem minimally requires about 35,000
write operations, but because of the cache thrashing (we only
have about 350 buffers at our disposal) ends up doing about
237,540 I/O's thus taking twenty-five minutes instead of four
if it could run entirely in the cache.
Reported by: Attila Nagy <bra@fsn.hu>
Sponsored by: DARPA & NAI Labs.
- Remove the buftimelock mutex and acquire the buf's interlock to protect
these fields instead.
- Hold the vnode interlock while locking bufs on the clean/dirty queues.
This reduces some cases from one BUF_LOCK with a LK_NOWAIT and another
BUF_LOCK with a LK_TIMEFAIL to a single lock.
Reviewed by: arch, mckusick
queue lock already held.
- In getblk() and flushbufqueues() use bremfreel() while we still have the
buf queue lock held to keep the lists consistent.
- Add LK_NOWAIT to two cases where we're essentially asserting that the bufs
are not locked while acquiring the locks. This will make sure that we get
the appropriate panic() and not another one for sleeping with a lock held.
that is protected by the vnode lock.
- Move B_SCANNED into b_vflags and call it BV_SCANNED.
- Create a vop_stdfsync() modeled after spec's sync.
- Replace spec_fsync, msdos_fsync, and hpfs_fsync with the stdfsync and some
fs specific processing. This gives all of these filesystems proper
behavior wrt MNT_WAIT/NOWAIT and the use of the B_SCANNED flag.
- Annotate the locking in buf.h
buf lists, synchronization variables, and atomic ops for the counters.
This change does not remove giant from any code although some pushdown
may be possible.
- In vfs_bio_awrite() don't access buf fields without the buf lock.
vm_pageout_deficit:
1. Update vm_pageout_deficit before VM_WAIT. There is no sense in
delaying the update; the sooner the pageout daemon receives this
information the better. Reviewed by: tegge
2. Update vm_pageout_deficit according to the number of pages still
needed to complete the allocation, not the original size of the
allocation. Submitted by: tegge
(These errors have existed since the introduction of vm_pageout_deficit
in revision 1.144.)
portable copy. Note that pmap_extract() must be used instead of
pmap_kextract().
This is precursor work to a reorganization of vmapbuf() to close remaining
user/kernel races (which can lead to a panic).
to sort out disk-io from file-io in the vm/buffer/filesystem space.
The intent is to sort VOP_STRATEGY calls into those which operate
on "real" vnodes and those which operate on VCHR vnodes. For
the latter kind, the call will be changed to VOP_SPECSTRATEGY,
possibly conditionally for those places where dual-use happens.
Add a default VOP_SPECSTRATEGY method which will call the normal
VOP_STRATEGY. First time it is called it will print debugging
information. This will only happen if a normal vnode is passed
to VOP_SPECSTRATEGY by mistake.
Add a real VOP_SPECSTRATEGY in specfs, which does what VOP_STRATEGY
does on a VCHR vnode today.
Add a new VOP_STRATEGY method in specfs to catch instances where
the conversion to VOP_SPECSTRATEGY has not yet happened. Handle
the request just like we always did, but first time called print
debugging information.
Apart up to two instances of console messages per boot, this amounts
to a glorified no-op commit.
If you get any of the messages on your console I would very much
like a copy of them mailed to phk@freebsd.org