Commit Graph

608 Commits

Author SHA1 Message Date
Alan Cox
ab83ac429d Reduce the scope of the page queues lock in vfs_busy_pages() now that
vm_page_sleep_if_busy() no longer requires the caller to hold the page
queues lock.
2006-08-08 06:00:49 +00:00
Alan Cox
af51d7bf57 Eliminate OBJ_WRITEABLE. It hasn't been used in a long time. 2006-07-21 06:40:29 +00:00
Jeff Roberson
4b24e4210e - Properly check against B_DELWRI and B_NEEDSGIANT. This check was
incorrectly written and caused some !NEEDSGIANT buffers to be put in
   the NEEDSGIANT queue.

Sponsored by:	Isilon Systems, Inc.
2006-04-04 06:44:21 +00:00
Jeff Roberson
084d64ac21 - Add the B_NEEDSGIANT flag which is only set if the vnode that owns a buf
requires Giant.  It is set in bgetvp and cleared in brelvp.
 - Create QUEUE_DIRTY_GIANT for dirty buffers that require giant.
 - In the buf daemon, only grab giant when processing QUEUE_DIRTY_GIANT and
   only if we think there are buffers in that queue.

Sponsored by:	Isilon Systems, Inc.
2006-03-31 02:56:30 +00:00
Pawel Jakub Dawidek
96c0381f5c Destroy "bip" bio in error case.
Found by:	Coverity Prevent analysis tool
Coverity ID:	795
MFC after:	3 days
2006-03-22 00:42:41 +00:00
Tor Egge
c78226329a For low memory situations, non-VMIO buffers didnt't release pages back to
the system when brelse() was called with B_RELBUF set on the buffer.  This
could be a problem when the system was low on memory, had many buffers on
QUEUE_EMPTYKVA and started to traverse directories.  For each getnewbuf(),
pages were allocated from the system, driving the free reserve downwards.
For each brelse(), the system put the buffer on QUEUE_CLEAN, with B_INVAL
set.

This commit changes the semantics of B_RELBUF to also free pages from
non-VMIO buffers.

Reviewed by:	alc
2006-02-02 21:37:39 +00:00
Alan Cox
bb53e2bf27 Remove an unnecessary call to pmap_remove_all(). The given page is not
mapped because its contents are invalid.

Reviewed by: tegge
2006-01-23 00:00:45 +00:00
Tor Egge
dffaf91aa3 Set flag in needsbuffer while still holding bqlock to avoid lost wakeup. 2006-01-16 22:09:47 +00:00
Alexander Leidinger
ef39c05baa MI changes:
- provide an interface (macros) to the page coloring part of the VM system,
   this allows to try different coloring algorithms without the need to
   touch every file [1]
 - make the page queue tuning values readable: sysctl vm.stats.pagequeue
 - autotuning of the page coloring values based upon the cache size instead
   of options in the kernel config (disabling of the page coloring as a
   kernel option is still possible)

MD changes:
 - detection of the cache size: only IA32 and AMD64 (untested) contains
   cache size detection code, every other arch just comes with a dummy
   function (this results in the use of default values like it was the
   case without the autotuning of the page coloring)
 - print some more info on Intel CPU's (like we do on AMD and Transmeta
   CPU's)

Note to AMD owners (IA32 and AMD64): please run "sysctl vm.stats.pagequeue"
and report if the cache* values are zero (= bug in the cache detection code)
or not.

Based upon work by:	Chad David <davidc@acns.ab.ca> [1]
Reviewed by:		alc, arch (in 2004)
Discussed with:		alc, Chad David, arch (in 2004)
2005-12-31 14:39:20 +00:00
Craig Rodrigues
6951bea6c8 Changes imported from XFS for FreeBSD project:
- add fields to struct buf (needed by XFS)
    - 3 private fields: b_fsprivate1, b_fsprivate2, b_fsprivate3
    - b_pin_count, count of pinned buffer

- add new B_MANAGED flag
- add breada() function to initiate asynchronous I/O on read-ahead blocks.
- add bufdone_finish(), bpin(), bunpin_wait() functions

Patches provided by:	kan
Reviewed by:		phk
Silence on:		arch@
2005-12-07 03:39:08 +00:00
Robert Watson
5bb84bc84b Normalize a significant number of kernel malloc type names:
- Prefer '_' to ' ', as it results in more easily parsed results in
  memory monitoring tools such as vmstat.

- Remove punctuation that is incompatible with using memory type names
  as file names, such as '/' characters.

- Disambiguate some collisions by adding subsystem prefixes to some
  memory types.

- Generally prefer lower case to upper case.

- If the same type is defined in multiple architecture directories,
  attempt to use the same name in additional cases.

Not all instances were caught in this change, so more work is required to
finish this conversion.  Similar changes are required for UMA zone names.
2005-10-31 15:41:29 +00:00
Tor Egge
8272da3106 Release clean buffer with wrong size and no dependencies also for non-VMIO
case.
2005-10-09 22:41:25 +00:00
Don Lewis
bd3c2d867d Un-staticize waitrunningbufspace() and call it before returning from
ffs_copyonwrite() if any async writes were launched.

Restore the threads previous TDP_NORUNNINGBUF state before returning
from ffs_copyonwrite().
2005-09-30 18:07:41 +00:00
Don Lewis
6c8b634f1d Un-staticize runningbufwakeup() and staticize updateproc.
Add a new private thread flag to indicate that the thread should
not sleep if runningbufspace is too large.

Set this flag on the bufdaemon and syncer threads so that they skip
the waitrunningbufspace() call in bufwrite() rather than than
checking the proc pointer vs. the known proc pointers for these two
threads.  A way of preventing these threads from being starved for
I/O but still placing limits on their outstanding I/O would be
desirable.

Set this flag in ffs_copyonwrite() to prevent bufwrite() calls from
blocking on the runningbufspace check while holding snaplk.  This
prevents snaplk from being held for an arbitrarily long period of
time if runningbufspace is high and greatly reduces the contention
for snaplk.  The disadvantage is that ffs_copyonwrite() can start
a large amount of I/O if there are a large number of snapshots,
which could cause a deadlock in other parts of the code.

Call runningbufwakeup() in ffs_copyonwrite() to decrement runningbufspace
before attempting to grab snaplk so that I/O requests waiting on
snaplk are not counted in runningbufspace as being in-progress.
Increment runningbufspace again before actually launching the
original I/O request.

Prior to the above two changes, the system could deadlock if enough
I/O requests were blocked by snaplk to prevent runningbufspace from
falling below lorunningspace and one of the bawrite() calls in
ffs_copyonwrite() blocked in waitrunningbufspace() while holding
snaplk.

See <http://www.holm.cc/stress/log/cons143.html>
2005-09-30 01:30:01 +00:00
Peter Edwards
d41c4674c2 Close a race in biodone(), whereby the bio_done field of the passed
bio may have been freed and reassigned by the wakeup before being
tested after releasing the bdonelock.

There's a non-zero chance this is the cause of a few of the crashes
knocking around with biodone() sitting in the stack backtrace.

Reviewed By: phk@
2005-09-29 10:37:20 +00:00
Jeff Roberson
9e2aaec1e3 - Use lockmgr_printinfo rather than rolling our own. This introduces a
slight problem by using printf instead of db_printf however
   'show lockedvnods' does the same so I believe it is ok for now.
2005-08-03 05:02:08 +00:00
Alan Cox
ec9c9e7363 Eliminate inconsistency in the setting of the B_DONE flag. Specifically,
make the b_iodone callback responsible for setting it if it is needed.
Previously, it was set unconditionally by bufdone() without holding
whichever lock is shared by the b_iodone callback and the corresponding
top-half function.  Consequently, in a race, the top-half function could
conclude that operation was done before the b_iodone callback finished.
See, for example, aio_physwakeup() and aio_fphysio().

Note: I don't believe that the other, more widely-used b_iodone callbacks
are affected.

Discussed with: jeff
Reviewed by: phk
MFC after: 2 weeks
2005-07-20 19:06:06 +00:00
Jeff Roberson
7a06fe49dc - Add and enhance asserts related to the wrong bufobj panic.
Sponsored by:	Isilon Systems, Inc.
Approved by:	re (blanket vfs)
2005-06-14 20:32:27 +00:00
Jeff Roberson
748c92fbad - Split one KASSERT in bremfree() into two to aid in debugging.
Sponsored by:	Isilon Systems, Inc.
2005-06-13 00:45:05 +00:00
Brian Feldman
cc3149b1ea Fix a serious deadlock with the NFS client. Given a large enough
atomic write request, it can fill the buffer cache with the entirety
of that write in order to handle retries.  However, it never drops
the vnode lock, or else it wouldn't be atomic, so it ends up waiting
indefinitely for more buf memory that cannot be gotten as it has it
all, and it waits in an uncancellable state.

To fix this, hibufspace is exported and scaled to a reasonable
fraction.  This is used as the limit of how much of an atomic write
request by the NFS client will be handled asynchronously.  If the
request is larger than this, it will be turned into a synchronous
request which won't deadlock the system.  It's possible this value is
far off from what is required by some, so it shall be tunable as soon
as mount_nfs(8) learns of the new field.

The slowdown between an asynchronous and a synchronous write on NFS
appears to be on the order of 2x-4x.

General nod by:	gad
MFC after:	2 weeks
More testing:	wes
PR:		kern/79208
2005-06-10 23:50:41 +00:00
Jeff Roberson
a3d239bc29 - My sub-par public school education has been exposed. s/sentinal/sentinel/
Noticed by:	Emil Mikulic
2005-06-09 04:40:20 +00:00
Jeff Roberson
9e879a5ee0 - Under heavy IO load the buf daemon can run for many hundereds of
milliseconds due to what is essentially n^2 algorithmic complexity.  This
   change makes the algorithm N*2 instead.  This heavy processing manifested
   itself as skipping in audio and video playback due to the long scheduling
   latencies and contention on giant by pcm.
 - flushbufqueues() is now responsible for flushing multiple buffers
   rather than one at a time.  This allows us to save our progress in the
   list by using a sentinal.  We must do the numdirtywakeup() and
   waitrunningbufspace() here now rather than in buf_daemon().
 - Also add a uio_yield() after we have processed the list once for bufs
   without deps and again for bufs with deps.  This is to release Giant
   and allow any other giant locked code to proceed.

Tested by:	Many users on current@
Revealed by:	schedgraph traces sent by Emil Mikulic & Anthony Ginepro
2005-06-08 20:26:05 +00:00
Jeff Roberson
1f22a07afd - Add bufobj_wrefl() to add a write ref to a bufobj that is already locked. 2005-05-30 07:01:18 +00:00
Jeff Roberson
4a723b3604 - Remove long dead splbio() calls and comments relating to the old
synchronization mechanism.
2005-04-30 12:18:50 +00:00
Jeff Roberson
ba4f7c7023 - Don't acquire Giant before calling b_biodone, individual consumers are
now required to do so themselves.

Sponsored by:	Isilon Systems, Inc.
2005-04-30 11:44:22 +00:00
Jeff Roberson
0d12524bbf - Add two KASSERTs to prevent us from recycling a buf that is still on a
bufobj list.

Sponsored by:	Isilon Systems, Inc.
2005-04-22 00:53:20 +00:00
Jeff Roberson
6c759f3558 - Add information about the buf lock to db_show_buffer.
- Add a 'show lockedbufs' command that is similar to show lockedvnods.

Sponsored by:	Isilon Systems, Inc.
2005-03-25 00:20:37 +00:00
Jeff Roberson
ec346d1040 - Lock access to the buffer_map with the vm_map lock. In 4.x this was
done with splbio, in 5.x this was done with Giant.

Discussed with:		alc
Reported by:		julian, pho
2005-03-08 09:34:54 +00:00
Poul-Henning Kamp
1ba212823f Make various vnode related functions static 2005-02-10 12:28:58 +00:00
Jeff Roberson
5c18d18b1d - Add more information to the getnewbuf() recycling KTR.
Sponsored by:	Isilon Systems, Inc.
2005-02-10 02:22:56 +00:00
Jeff Roberson
b56dc9a785 - Remove an invalid KASSERT added in recent background write reshuffling.
Sponsored by:	Isilon Systems, Inc.
2005-02-08 23:25:08 +00:00
Poul-Henning Kamp
dd19a799b8 Background writes are entirely an FFS/Softupdates thing.
Give FFS vnodes a specific bufwrite method which contains all the
background write stuff and then calls into the default bufwrite()
for the rest of the job.

Remove all the background write related stuff from the normal bufwrite.

This drags the softdep_move_dependencies() back into FFS.

Long term, it is worth looking at simply copying the data into
allocated memory and issuing the bio directly and not create the
"shadow buf" in the first place (just like copy-on-write is done
in snapshots for instance).  I don't think we really gain anything
but complexity from doing this with a buf.
2005-02-08 20:29:10 +00:00
Jeff Roberson
8364446643 - Don't release BKGRDINPROG until after we've bufdone'd the copy.
Sponsored by:	Isilon Systems, Inc.
2005-02-05 01:26:14 +00:00
Jeff Roberson
bd8d684fd7 - Don't drop the wref on the bufobj until after bufdone() has completed.
Without this, threads waiting in bufobj_wwait() may wakeup prior to
   bufdone() completing.

Sponsored by:	Isilon Systems, Inc.
2005-01-28 17:48:58 +00:00
Poul-Henning Kamp
8516dd18e1 Don't use VOP_GETVOBJECT, use vp->v_object directly. 2005-01-25 00:40:01 +00:00
Poul-Henning Kamp
35764be39e Kill the VV_OBJBUF and test the v_object for NULL instead. 2005-01-24 13:13:57 +00:00
Jeff Roberson
71ddd673b1 - Add CTR calls to trace the lifecycle of a buffer.
- Remove some KASSERTs which are invalid if the appropriate lock is
   not held.
 - Slightly restructure bremfree() so that it is more sane.
 - Change the flush code in bdwrite() to avoid acquiring a mutex
   whenever possible.
 - Change the flush code in bdwrite() to avoid holding the bufobj mutex
   while calling buf_countdeps().  This introduces a lock-order
   relationship with the softdep lock that can not otherwise be resolved.
 - Don't set B_DONE until bufdone() is complete, otherwise another
   processor may believe the buf is done before it is.
 - Only acquire Giant if the caller has set b_iodone.  Don't grab giant
   around normal bufdone() calls.

Sponsored By:	Isilon Systems, Inc.
2005-01-24 10:47:04 +00:00
Poul-Henning Kamp
6ef8480a88 Add BO_SYNC() and add a default which uses the secret vnode pointer
and VOP_FSYNC() for now.
2005-01-11 10:43:08 +00:00
Poul-Henning Kamp
8df6bac4c7 Remove the unused credential argument from VOP_FSYNC() and VFS_SYNC().
I'm not sure why a credential was added to these in the first place, it is
not used anywhere and it doesn't make much sense:

	The credentials for syncing a file (ability to write to the
	file) should be checked at the system call level.

	Credentials for syncing one or more filesystems ("none")
	should be checked at the system call level as well.

	If the filesystem implementation needs a particular credential
	to carry out the syncing it would logically have to the
	cached mount credential, or a credential cached along with
	any delayed write data.

Discussed with:	rwatson
2005-01-11 07:36:22 +00:00
Jeff Roberson
b646893f0f - Eliminate the acquisition and release of the bqlock in bremfree() by
setting the B_REMFREE flag in the buf.  This is done to prevent lock order
   reversals with code that must call bremfree() with a local lock held.
   This also reduces overhead by removing two lock operations per buf for
   fsync() and similar.
 - Check for the B_REMFREE flag in brelse() and bqrelse() after the bqlock
   has been acquired so that we may remove ourself from the free-list.
 - Provide a bremfreef() function to immediately remove a buf from a
   free-list for use only by NFS.  This is done because the nfsclient code
   overloads the b_freelist queue for its own async. io queue.
 - Simplify the numfreebuffers accounting by removing a switch statement
   that executed the same code in every possible case.
 - getnewbuf() can encounter locked bufs on free-lists once Giant is removed.
   Remove a panic associated with this condition and delay asserts that
   inspect the buf until after it is locked.

Reviewed by:	phk
Sponsored by:	Isilon Systems, Inc.
2004-11-18 08:44:09 +00:00
Poul-Henning Kamp
6e67e2a710 Retire b_magic now, we have the bufobj containing the same hint. 2004-11-04 09:48:18 +00:00
Poul-Henning Kamp
9f7a3028d5 Change buf->b_object to buf->b_bufobj->bo_object
some whitespace fixes.
2004-11-04 09:06:54 +00:00
Poul-Henning Kamp
9bc4d9a495 whitespace 2004-11-04 08:25:52 +00:00
Poul-Henning Kamp
c569065139 Remove buf->b_dev field. 2004-11-04 07:59:57 +00:00
Alan Cox
d19ef81437 The synchronization provided by vm object locking has eliminated the
need for most calls to vm_page_busy().  Specifically, most calls to
vm_page_busy() occur immediately prior to a call to vm_page_remove().
In such cases, the containing vm object is locked across both calls.
Consequently, the setting of the vm page's PG_BUSY flag is not even
visible to other threads that are following the synchronization
protocol.

This change (1) eliminates the calls to vm_page_busy() that
immediately precede a call to vm_page_remove() or functions, such as
vm_page_free() and vm_page_rename(), that call it and (2) relaxes the
requirement in vm_page_remove() that the vm page's PG_BUSY flag is
set.  Now, the vm page's PG_BUSY flag is set only when the vm object
lock is released while the vm page is still in transition.  Typically,
this is when it is undergoing I/O.
2004-11-03 20:17:31 +00:00
Poul-Henning Kamp
0cbda9dfd5 Remove the last call in the system to VOP_SPECSTRATEGY(): We can no
longer come through the VNODE layer to the disks since all the filesystems
now go via geom_vfs to GEOM.
2004-10-29 10:52:31 +00:00
Poul-Henning Kamp
6afb3b1c37 Give dev_strategy() an explict cdev argument in preparation for removing
buf->b-dev.

Put a bio between the buf passed to dev_strategy() and the device driver
strategy routine in order to not clobber fields in the buf.

Assert copyright on vfs_bio.c and update copyright message to canonical
text.  There is no legal difference between John Dysons two-clause
abbreviated BSD license and the canonical text.
2004-10-29 07:16:37 +00:00
Poul-Henning Kamp
c5995e45eb Lock bp->b_bufobj->b_object instead of bp->b_object 2004-10-28 08:38:46 +00:00
Poul-Henning Kamp
6e77a04170 The island council met and voted buf_prewrite() home.
Give ffs it's own bufobj->bo_ops vector and create a private strategy
routine, (currently misnamed for forwards compatibility), which is
just a copy of the generic bufstrategy routine except we call
softdep_disk_prewrite() directly instead of through the buf_prewrite()
indirection.

Teach UFS about the need for softdep_disk_prewrite() and call the
function directly in FFS.

Remove buf_prewrite() from the default bufstrategy() and from the
global bio_ops method vector.
2004-10-26 10:44:10 +00:00
Poul-Henning Kamp
5d9d81e7ea Put the I/O block size in bufobj->bo_bsize.
We keep si_bsize_phys around for now as that is the simplest way to pull
the number out of disk device drivers in devfs_open().  The correct solution
would be to do an ioctl(DIOCGSECTORSIZE), but the point is probably mooth
when filesystems sit on GEOM, so don't bother for now.
2004-10-26 07:39:12 +00:00
Alan Cox
cd9c0da805 Hold the lock on the containing vm object when calling
vm_page_sleep_if_busy().
2004-10-26 06:58:26 +00:00
Poul-Henning Kamp
ee1d0eb330 Remove vnode->v_bsize. This was a dead-end. 2004-10-25 07:50:59 +00:00
Alan Cox
a50b705403 Use VM_ALLOC_NOBUSY to eliminate vm_page_wakeup() calls and the acquisition
and release of the global page queues lock required to make the call.

Remove GIANT_REQUIRED from vm_hold_free_pages().  All of its VM operations
are properly synchronized.
2004-10-25 06:34:14 +00:00
Poul-Henning Kamp
4dcd0ac4cf Collapse vnode->v_object and buf->b_object into bufobj->bo_object. 2004-10-25 06:02:57 +00:00
Poul-Henning Kamp
b792bebeea Move the buffer method vector (buf->b_op) to the bufobj.
Extend it with a strategy method.

Add bufstrategy() which do the usual VOP_SPECSTRATEGY/VOP_STRATEGY
song and dance.

Rename ibwrite to bufwrite().

Move the two NFS buf_ops to more sensible places, add bufstrategy
to them.

Add inlines for bwrite() and bstrategy() which calls through
buf->b_bufobj->b_ops->b_{write,strategy}().

Replace almost all VOP_STRATEGY()/VOP_SPECSTRATEGY() calls with bstrategy().
2004-10-24 20:03:41 +00:00
Poul-Henning Kamp
494eb176e7 Add b_bufobj to struct buf which eventually will eliminate the need for b_vp.
Initialize b_bufobj for all buffers.

Make incore() and gbincore() take a bufobj instead of a vnode.

Make inmem() local to vfs_bio.c

Change a lot of VI_[UN]LOCK(bp->b_vp) to BO_[UN]LOCK(bp->b_bufobj)
also VI_MTX() to BO_MTX(),

Make buf_vlist_add() take a bufobj instead of a vnode.

Eliminate other uses of bp->b_vp where bp->b_bufobj will do.

Various minor polishing: remove "register", turn panic into KASSERT,
use new function declarations, TAILQ_FOREACH_SAFE() etc.
2004-10-22 08:47:20 +00:00
Poul-Henning Kamp
a76d8f4ec9 Move the VI_BWAIT flag into no bo_flag element of bufobj and call it BO_WWAIT
Add bufobj_wref(), bufobj_wdrop() and bufobj_wwait() to handle the write
count on a bufobj.  Bufobj_wdrop() replaces vwakeup().

Use these functions all relevant places except in ffs_softdep.c where
the use if interlocked_sleep() makes this impossible.

Rename b_vnbufs to b_bobufs now that we touch all the relevant files anyway.
2004-10-21 15:53:54 +00:00
Poul-Henning Kamp
6230ce6aa9 use dev_re[fl]thread() rather than home rolled versions. 2004-09-24 05:55:03 +00:00
Poul-Henning Kamp
1a52a73d68 Eliminate DEV_STRATEGY() macro: call dev_strategy() directly.
Make dev_strategy() handle errors and departing devices properly.
2004-09-23 14:45:04 +00:00
Poul-Henning Kamp
a0e78d2eb0 Do not refcount the cdevsw, but rather maintain a cdev->si_threadcount
of the number of threads which are inside whatever is behind the
cdevsw for this particular cdev.

Make the device mutex visible through dev_lock() and dev_unlock().
We may want finer granularity later.

Replace spechash_mtx use with dev_lock()/dev_unlock().
2004-09-23 07:17:41 +00:00
Poul-Henning Kamp
08dbd671ff Remove unused B_WRITEINPROG flag 2004-09-15 21:49:22 +00:00
Poul-Henning Kamp
4095f485c8 undent some functions a bit. 2004-09-15 21:08:58 +00:00
Poul-Henning Kamp
ab19cad78e stylistic polishing. 2004-09-15 20:54:23 +00:00
Poul-Henning Kamp
883d3c0c07 Remove the buffercache/vnode side of BIO_DELETE processing in
preparation for integration of p4::phk_bufwork.  In the future,
local filesystems will talk to GEOM directly and they will consequently
be able to issue BIO_DELETE directly.  Since the removal of the fla
driver, BIO_DELETE has effectively been a no-op anyway.
2004-09-13 06:50:42 +00:00
Poul-Henning Kamp
cf95b5c381 Eliminate unused second argument to reassignbuf() and simplify it
accordingly.
2004-07-25 21:24:23 +00:00
Poul-Henning Kamp
a3d57cfbfd Neuter this warning for now, I think I know the remaining issues. 2004-07-25 08:09:21 +00:00
Alan Cox
d8582da660 Remove GIANT_REQUIRED from vmapbuf(). 2004-07-18 04:57:49 +00:00
Peter Edwards
0f01586867 Fix bug introduced in rev 1.434:
When avoiding the zeroing of "bogus_page" when it appears in a buf,
be sure to advance the pointers into the data for successive pages.

The bug caused file corruption when read(2)ing from a "hole" in a
file where a previous page of the read block had already been faulted
in: fsx tripped up on this pretty quickly. The particular access
pattern is probably pretty unusual, so other applications probably
wouldn't have had problems, but you'd never know.

Reviewed By: alc@
2004-07-06 23:40:40 +00:00
Poul-Henning Kamp
7f6599fec6 Make the last commit handle non-phk root devices better. 2004-07-04 19:42:25 +00:00
Stefan Farfeleder
5908d366fb Consistently use __inline instead of __inline__ as the former is an empty macro
in <sys/cdefs.h> for compilers without support for inline.
2004-07-04 16:11:03 +00:00
Poul-Henning Kamp
1cbb1e02c4 Blocksize for I/O should be a property of the vnode and not found by groping
around in the vnodes surroundings when we allocate a block.

Assign a blocksize when we create a vnode, and yell a warning (and ignore it)
if we got the wrong size.

Please email all such warnings to me.
2004-07-04 12:49:04 +00:00
Poul-Henning Kamp
cfa5e80af8 Remove stale comment 2004-07-03 19:37:06 +00:00
Poul-Henning Kamp
f3732fd15b Second half of the dev_t cleanup.
The big lines are:
	NODEV -> NULL
	NOUDEV -> NODEV
	udev_t -> dev_t
	udev2dev() -> findcdev()

Various minor adjustments including handling of userland access to kernel
space struct cdev etc.
2004-06-17 17:16:53 +00:00
Poul-Henning Kamp
89c9c53da0 Do the dreaded s/dev_t/struct cdev */
Bump __FreeBSD_version accordingly.
2004-06-16 09:47:26 +00:00
Alan Cox
ec1100fc6e Avoid pointless zeroing of the bogus page in vfs_bio_clrbuf().
Suggested by:	tegge@	(from October of last year)
2004-05-08 06:46:40 +00:00
Alan Cox
5a32489377 Make vm_page's PG_ZERO flag immutable between the time of the page's
allocation and deallocation.  This flag's principal use is shortly after
allocation.  For such cases, clearing the flag is pointless.  The only
unusual use of PG_ZERO is in vfs_bio_clrbuf().  However, allocbuf() never
requests a prezeroed page.  So, vfs_bio_clrbuf() never sees a prezeroed
page.

Reviewed by:	tegge@
2004-05-06 05:03:23 +00:00
Dag-Erling Smørgrav
30a058027a Replace a manual check of a VMIO candidate with vn_canvmio(). This
silences an annoying warning in getblk() when VMIO'ing on a directory
vnode, which can happen when vfs.vmiodirenable is 1.

Bring the warning message in line with reality at the same time.

Submitted by:	hmp
2004-03-12 12:02:12 +00:00
Poul-Henning Kamp
ceb58ca58f When I was a kid my work table was one cluttered mess an cleaning it up
were a rather overwhelming task.  I soon learned that if you don't know
where you're going to store something, at least try to pile it next to
something slightly related in the hope that a pattern emerges.

Apply the same principle to the ffs/snapshot/softupdates code which have
leaked into specfs:  Add yet a buf-quasi-method and call it from the
only two places I can see it can make a difference and implement the
magic in ffs_softdep.c where it belongs.

It's not pretty, but at least it's one less layer violated.
2004-03-11 18:50:33 +00:00
Poul-Henning Kamp
4d453ef101 Properly vector all bwrite() and BUF_WRITE() calls through the same path
and s/BUF_WRITE()/bwrite()/ since it now does the same as bwrite().
2004-03-11 18:02:36 +00:00
Alan Cox
3eba15c12e Remove GIANT_REQUIRED from vunmapbuf(). 2004-03-07 00:37:18 +00:00
Poul-Henning Kamp
cd690b60de Device megapatch 6/6:
This is what we came here for:  Hang dev_t's from their cdevsw,
refcount cdevsw and dev_t and generally keep track of things a lot
better than we used to:

Hold a cdevsw reference around all entrances into the device driver,
this will be necessary to safely determine when we can unload driver
code.

Hold a dev_t reference while the device is open.

KASSERT that we do not enter the driver on a non-referenced dev_t.

Remove old D_NAG code, anonymous dev_t's are not a problem now.

When destroy_dev() is called on a referenced dev_t, move it to
dead_cdevsw's list.  When the refcount drops, free it.

Check that cdevsw->d_version is correct.  If not, set all methods
to the dead_*() methods to prevent entrance into driver.  Print
warning on console to this effect.  The device driver may still
explode if it is also incompatible with newbus, but in that case
we probably didn't get this far in the first place.
2004-02-21 21:57:26 +00:00
Alan Cox
c5aebf380c swp_pager_async_iodone() no longer requires Giant. Modify bufdone()
and swapgeom_done() to perform swp_pager_async_iodone() without Giant.

Reviewed by:	tegge
2004-02-07 08:54:50 +00:00
Alan Cox
96a7b42213 Remove a variable that has been initialized but otherwise unused since
revision 1.315.
2003-12-20 19:46:21 +00:00
Poul-Henning Kamp
00cbe31bd8 Send B_PHYS out to pasture, it no longer serves any function. 2003-11-15 09:28:09 +00:00
Alan Cox
28c9416429 - Remove the remaining now unnecessary checks for the buf's b_object being
NULL.  See revision 1.421 for more detail.
 - Remove GIANT_REQUIRED from vfs_unbusy_pages().  Discussed with: jeff
2003-11-15 08:45:36 +00:00
Poul-Henning Kamp
1415a09d42 Replace B_PHYS conditional assignment to bio_offset with KASSERT check
to see that the originating code already did it right.
2003-11-12 10:27:06 +00:00
Kirk McKusick
fde81c7d8e Update the statfs structure with 64-bit fields to allow
accurate reporting of multi-terabyte filesystem sizes.

You should build and boot a new kernel BEFORE doing a `make world'
as the new kernel will know about binaries using the old statfs
structure, but an old kernel will not know about the new system
calls that support the new statfs structure. Running an old kernel
after a `make world' will cause programs such as `df' that do a
statfs system call to fail with a bad system call.

Reviewed by:	Bruce Evans <bde@zeta.org.au>
Reviewed by:	Tim Robbins <tjr@freebsd.org>
Reviewed by:	Julian Elischer <julian@elischer.org>
Reviewed by:	the hoards of <arch@freebsd.org>
Sponsored by:   DARPA & NAI Labs.
2003-11-12 08:01:40 +00:00
Alan Cox
e35e0182c3 - Revision 1.469 of vfs_subr.c resulted in the buf's b_object field being
consistency initialized.  Consequently, a number of conditionals that
   checked the validity of b_object before passing it to VM_OBJECT_LOCK()
   and VM_OBJECT_UNLOCK() are no longer needed.
2003-11-11 04:45:37 +00:00
Kirk McKusick
15a93fcc31 Allow the bufdaemon and update daemon processes to skip the
waitrunningbufspace() calls so that they are always able to
proceed and clean up buffer space.

Submitted by:	Brian Fundakowski Feldman <green@freebsd.org>
2003-11-04 06:30:00 +00:00
John Baldwin
787f162df6 Move the P_COWINPROGRESS flag from being a per-process p_flag to being a
per-thread td_pflag which doesn't require any locks to read or write as it
is only read or written by curthread on itself.

Glanced at by:	mckusick
2003-10-23 21:14:08 +00:00
Poul-Henning Kamp
68b00bf648 Remove KASSERTS on B_PHYS for vmapbuf() and vunmapbuf(), B_PHYS is going
away.
2003-10-21 06:53:10 +00:00
Alan Cox
48ae2dddac - Add vm object locking to vfs_clean_pages() and vfs_bio_set_validclean().
This is to synchronize access to the vm page's valid field by
   vm_page_set_validclean().
2003-10-19 20:39:06 +00:00
Poul-Henning Kamp
2d6a9d0747 Initialize b_iooffset before calling VOP_[SPEC]STRATEGY 2003-10-18 19:49:46 +00:00
Poul-Henning Kamp
0efedd8864 Don't report b_pblkno, it is going away. 2003-10-18 17:59:02 +00:00
Poul-Henning Kamp
583b92e328 Convert some if(bla) panic("foo") to KASSERTS to improve grep-ability. 2003-10-18 09:32:39 +00:00
Poul-Henning Kamp
d986d4580c The size and contents of the DEV_STRATEGY() macro has progressed to
the point where it being a macro is no longer sensible, and it will
only be more so in days to come.

BIO_STRATEGY() is now only used from DEV_STRATEGY() and should not
be used directly anymore.

Put the contents of both in the new function dev_strategy() and
make DEV_STRATEGY() call that function.

In addition, this allows us to make the rather magic bufdonebio()
helper function static.

This alse saves hunderedandsome bytes of code in a typical kernel.
2003-10-18 09:03:15 +00:00
Jeff Roberson
85b9831dfa - Add a mising vn_finished_write()
Pointy hat:     jeff
Found by:       robert
Obtained from:  kirk
2003-10-14 00:38:34 +00:00
Alan Cox
d58e70a08d In vfs_bio_clrbuf(), ignore the state of the object lock if the page is the
"bogus" page.

Found by:	tegge
2003-10-12 18:26:48 +00:00
Alan Cox
08814d66d5 - Synchronize access to a page's valid field in vfs_bio_clrbuf()
by using the lock from its containing object.
 - Remove GIANT_REQUIRED from vm_hold_load_pages().
2003-10-10 07:26:21 +00:00
Jeff Roberson
d1cf0fc7fc - Add a missing vn_start_write() to flushbufqueues(). This could have
caused snapshot related problems.
 - The vp can not be NULL here or we would panic in vfs_bio_awrite().  Stop
   confusing the logic by checking for it in several places.

Submitted by:	kirk and then rototilled by me to remove vp == NULL checks.
2003-10-05 22:16:08 +00:00
Alan Cox
6ec2fca505 Eliminate some unnecessary uses of the vm page queues lock around the
vm page's valid field.  This field is being synchronized using the
containing vm object's lock.
2003-10-04 22:47:20 +00:00
Alan Cox
bf0da100d6 - Extend the scope the vm object lock to cover calls to
vm_page_is_valid().
 - Assert that the lock on the containing vm object is held in
   vm_page_is_valid().
2003-10-04 19:23:29 +00:00
Alan Cox
c76789caa6 - vm_hold_free_pages() should lock the kernel object. (The pages being
freed belong to the kernel object.)
 - Increase the granularity of the vm object locking in vm_hold_load_pages()
   in order to reduce the number of times that we acquire and release the
   same lock.
2003-09-22 04:58:09 +00:00
Alan Cox
35b86dc8de Correct a typo in the previous revision. 2003-09-15 02:56:48 +00:00
Alan Cox
58abfe0051 Convert vmapbuf() from using pmap_extract() to using
pmap_extract_and_hold().  Note, however, that GIANT_REQUIRED should not be
removed until all platforms fully implement the "prot" parameter to
pmap_extract_and_hold().

Reviewed by:	tegge
2003-09-13 04:29:55 +00:00
Jeff Roberson
d919a11d06 - Define a new flag for getblk(): GB_NOCREAT. This flag causes getblk() to
bail out if the buffer is not already present.
 - The buffer returned by incore() is not locked and should not be sent to
   brelse().  Use getblk() with the new GB_NOCREAT flag to preserve the
   desired semantics.
2003-08-31 08:50:11 +00:00
Jeff Roberson
a7db559087 - If there is no vp assume that BKGRDINPROG is not set and set RELPBUF in
brelse().
2003-08-31 01:07:45 +00:00
Jeff Roberson
b5c61abd82 - In some cases bp->b_vp can be NULL in brelse, don't try to lock the
interlock in that case.

Found by:	alc
2003-08-31 00:06:07 +00:00
Marcel Moolenaar
9e8147f3af In bufdone(), change the format specifier for m->valid and m->dirty to
a long type and explicitly cast m->valid and m->dirty to unsigned long.
When PAGE_SIZE is 32K, these fields are in fact unsigned long.
2003-08-28 19:58:11 +00:00
Alexander Kabaev
772a9659d9 Do not return with vnode interlock held.
Reviewed by:	rwatson
2003-08-28 15:48:15 +00:00
Jeff Roberson
9dbfeb0ae6 - Move BX_BKGRDWAIT and BX_BKGRDINPROG to BV_ and the b_vflags field.
- Surround all accesses of the BKGRD{WAIT,INPROG} flags with the vnode
   interlock.
 - Don't use the B_LOCKED flag and QUEUE_LOCKED for background write
   buffers.  Check for the BKGRDINPROG flag before recycling or throwing
   away a buffer.  We do this instead because it is not safe for us to move
   the original buffer to a new queue from the callback on the background
   write buffer.
 - Remove the B_LOCKED flag and the locked buffer queue.  They are no longer
   used.
 - The vnode interlock is used around checks for BKGRDINPROG where it may
   not be strictly necessary.  If we hold the buf lock the a back-ground
   write will not be started without our knowledge, one may only be
   completed while we're not looking.  Rather than remove the code, Document
   two of the places where this extra locking is done.  A pass should be
   done to verify and minimize the locking later.
2003-08-28 06:55:18 +00:00
Alan Cox
b7ad744dc5 Hold the page queues lock when performing vm_page_clear_dirty() and
vm_page_set_invalid().
2003-08-23 18:11:53 +00:00
Poul-Henning Kamp
4bfd22f25e Grab Giant in bufdonebio() since drivers may not hold it.
This only protects the "struct buf" consumers (ie: DEV_STRATEGY()),
but does not protect BIO_STRATEGY() users.
2003-08-02 09:45:10 +00:00
Alan Cox
105660e8ba Eliminate an abuse of kmem_alloc_pageable() in bufinit()
by using VM_ALLOC_NOOBJ to allocate the bogus page.

Reviewed by:	tegge
2003-08-02 05:05:34 +00:00
Poul-Henning Kamp
568733688b Initialize b_saveaddr when we hand out buffers 2003-06-20 08:26:38 +00:00
Alan Cox
f717a9d063 Lock the vm object when removing a page. 2003-06-11 16:37:33 +00:00
David E. O'Brien
677b542ea2 Use __FBSDID(). 2003-06-11 00:56:59 +00:00
Poul-Henning Kamp
17a1391990 The IO_NOWDRAIN and B_NOWDRAIN hacks are no longer needed to prevent
deadlocks with vnode backed md(4) devices because md now uses a
kthread to run the bio requests instead of doing it directly from
the bio down path.
2003-05-31 16:42:45 +00:00
Alan Cox
01dfc1deae Finish the vm_object locking for this file, including holding the vm_object
lock when accessing the vm_object's flags or calling vm_page_lookup().
2003-04-28 05:40:45 +00:00
Alan Cox
af3e0bb202 - Lock the vm_object when performing vm_page_alloc() in allocbuf(). 2003-04-26 07:42:24 +00:00
Alan Cox
097d4338db Lock the vm_object in vfs_busy_pages(). 2003-04-20 00:17:05 +00:00
Alan Cox
0fa05eae77 - Lock the vm_object when performing vm_object_pip_subtract().
- Assert that the vm_object lock is held in vm_object_pip_subtract().
2003-04-19 22:11:41 +00:00
Alan Cox
0d420ad3e6 - Lock the vm_object when performing vm_object_pip_wakeupn().
- Assert that the vm_object lock is held in vm_object_pip_wakeupn().
 - Add a new macro VM_OBJECT_LOCK_ASSERT().
2003-04-19 21:15:44 +00:00
Alan Cox
de5ef10142 Update locking on the kernel_object to use the new macros. 2003-04-14 00:36:53 +00:00
Alan Cox
0b556837a9 Remove an unnecessary trunc_page() from vmapbuf().
Reviewed by:	tegge
2003-04-06 00:40:54 +00:00
Alan Cox
08468b6ad7 o Check the b_bufsize passed to vmapbuf() returning an error
if it is invalid.
 o Remove a debugging printf() from vmapbuf().

Suggested by:   tegge
2003-04-04 06:14:54 +00:00
Poul-Henning Kamp
d086f85ac4 Preparation commit before I start on the bioqueue lockdown:
Collect all the bits of bioqueue handing in subr_disk.c, vfs_bio.c is big
enough as it is and disksort already lives in subr_disk.c.
2003-03-30 08:51:23 +00:00
Tor Egge
5bbb806004 Add support for reading directly from file to userland buffer when the
O_DIRECT descriptor status flag is set and both offset and length is a
multiple of the physical media sector size.
2003-03-26 23:40:42 +00:00
Jake Burkholder
227f9a1c58 - Add vm_paddr_t, a physical address type. This is required for systems
where physical addresses larger than virtual addresses, such as i386s
  with PAE.
- Use this to represent physical addresses in the MI vm system and in the
  i386 pmap code.  This also changes the paddr parameter to d_mmap_t.
- Fix printf formats to handle physical addresses >4G in the i386 memory
  detection code, and due to kvtop returning vm_paddr_t instead of u_long.

Note that this is a name change only; vm_paddr_t is still the same as
vm_offset_t on all currently supported platforms.

Sponsored by:	DARPA, Network Associates Laboratories
Discussed with:	re, phk (cdevsw change)
2003-03-25 00:07:06 +00:00
Poul-Henning Kamp
b4b138c27f Including <sys/stdint.h> is (almost?) universally only to be able to use
%j in printfs, so put a newsted include in <sys/systm.h> where the printf
prototype lives and save everybody else the trouble.
2003-03-18 08:45:25 +00:00
Jeff Roberson
749ffa4ecd - Add a lock for protecting against msleep(bp, ...) wakeup(bp) races.
- Create a new function bdone() which sets B_DONE and calls wakup(bp). This
   is suitable for use as b_iodone for buf consumers who are not going
   through the buf cache.
 - Create a new function bwait() which waits for the buf to be done at a set
   priority and with a specific wmesg.
 - Replace several cases where the above functionality was implemented
   without locking with the new functions.
2003-03-13 07:31:45 +00:00
Jeff Roberson
09f11da5a3 - Remove a race between fsync like functions and flushbufqueues() by
requiring locked bufs in vfs_bio_awrite().  Previously the buf could
   have been written out by fsync before we acquired the buf lock if it
   weren't for giant.  The cluster_wbuild() handles this race properly but
   the single write at the end of vfs_bio_awrite() would not.
 - Modify flushbufqueues() so there is only one copy of the loop.  Pass a
   parameter in that says whether or not we should sync bufs with deps.
 - Call flushbufqueues() a second time and then break if we couldn't find
   any bufs without deps.
2003-03-13 07:19:23 +00:00
Jeff Roberson
7261f5f68e - Add a new 'flags' parameter to getblk().
- Define one flag GB_LOCK_NOWAIT that tells getblk() to pass the LK_NOWAIT
   flag to the initial BUF_LOCK().  This will eventually be used in cases
   were we want to use a buffer only if it is not currently in use.
 - Convert all consumers of the getblk() api to use this extra parameter.

Reviwed by:	arch
Not objected to by:	mckusick
2003-03-04 00:04:44 +00:00
Jeff Roberson
491081fabf - Hold the vnode interlock across calls to bgetvp instead of acquiring it
internally.  This is required to stop multiple bufs from being associated
   with a single lblkno.
2003-03-02 06:05:23 +00:00
Jeff Roberson
bff5362bf2 - gc USE_BUFHASH. The smp locking of the buf cache renders this useless. 2003-03-01 05:55:03 +00:00
Kirk McKusick
7e734c4149 When doing cleanup of excessive buffers in bdwrite (see kern/vfs_bio.c
delta 1.371) we must ensure that we do not get ourselves into a
recursive trap endlessly trying to clean up after ourselves.

Reported by:	Attila Nagy <bra@fsn.hu>
Sponsored by:   DARPA & NAI Labs.
2003-02-25 23:59:09 +00:00
Jeff Roberson
2e3981a70c - Add the missing NULL interlock argument to a recently added BUF_LOCK. 2003-02-25 08:23:11 +00:00
Kirk McKusick
3a7053cb60 Prevent large files from monopolizing the system buffers. Keep
track of the number of dirty buffers held by a vnode. When a
bdwrite is done on a buffer, check the existing number of dirty
buffers associated with its vnode. If the number rises above
vfs.dirtybufthresh (currently 90% of vfs.hidirtybuffers), one
of the other (hopefully older) dirty buffers associated with
the vnode is written (using bawrite). In the event that this
approach fails to curb the growth in it the vnode's number of
dirty buffers (due to soft updates rollback dependencies),
the more drastic approach of doing a VOP_FSYNC on the vnode
is used. This code primarily affects very large and actively
written files such as snapshots. This change should eliminate
hanging when taking snapshots or doing background fsck on
very large filesystems.

Hopefully, one day it will be possible to cache filesystem
metadata in the VM cache as is done with file data. As it
stands, only the buffer cache can be used which limits total
metadata storage to about 20Mb no matter how much memory is
available on the system. This rather small memory gets badly
thrashed causing a lot of extra I/O. For example, taking a
snapshot of a 1Tb filesystem minimally requires about 35,000
write operations, but because of the cache thrashing (we only
have about 350 buffers at our disposal) ends up doing about
237,540 I/O's thus taking twenty-five minutes instead of four
if it could run entirely in the cache.

Reported by:	Attila Nagy <bra@fsn.hu>
Sponsored by:   DARPA & NAI Labs.
2003-02-25 06:44:42 +00:00
Jeff Roberson
17661e5ac4 - Add an interlock argument to BUF_LOCK and BUF_TIMELOCK.
- Remove the buftimelock mutex and acquire the buf's interlock to protect
   these fields instead.
 - Hold the vnode interlock while locking bufs on the clean/dirty queues.
   This reduces some cases from one BUF_LOCK with a LK_NOWAIT and another
   BUF_LOCK with a LK_TIMEFAIL to a single lock.

Reviewed by:	arch, mckusick
2003-02-25 03:37:48 +00:00
Warner Losh
a163d034fa Back out M_* changes, per decision of the TRB.
Approved by: trb
2003-02-19 05:47:46 +00:00
Jeff Roberson
71146186a1 - Introduce a new function bremfreel() that does a bremfree with the buf
queue lock already held.
 - In getblk() and flushbufqueues() use bremfreel() while we still have the
   buf queue lock held to keep the lists consistent.
 - Add LK_NOWAIT to two cases where we're essentially asserting that the bufs
   are not locked while acquiring the locks.  This will make sure that we get
   the appropriate panic() and not another one for sleeping with a lock held.
2003-02-16 10:43:06 +00:00
Jeff Roberson
25c4325446 - Add a comment about a race that will happen without Giant. 2003-02-10 22:47:34 +00:00
Jeff Roberson
c7b716cc2a - Unlock the nblock after the loop in bwillwrite(). 2003-02-10 22:33:59 +00:00
Jeff Roberson
7137d635ac - In getnewbuf() unlock the bq lock prior to sleeping when we're out of
buffers.

Submitted by:	tegge
2003-02-10 06:02:51 +00:00
Jeff Roberson
3306adcfcf - Correct another atomic op.
Spotted by:	alc
2003-02-09 22:39:51 +00:00
Jeff Roberson
69953c8435 - Move some code out from #ifdef INVARIANTS. 2003-02-09 12:11:37 +00:00
Jeff Roberson
767b9a529d - Cleanup unlocked accesses to buf flags by introducing a new b_vflag member
that is protected by the vnode lock.
 - Move B_SCANNED into b_vflags and call it BV_SCANNED.
 - Create a vop_stdfsync() modeled after spec's sync.
 - Replace spec_fsync, msdos_fsync, and hpfs_fsync with the stdfsync and some
   fs specific processing.  This gives all of these filesystems proper
   behavior wrt MNT_WAIT/NOWAIT and the use of the B_SCANNED flag.
 - Annotate the locking in buf.h
2003-02-09 11:28:35 +00:00
Jeff Roberson
15553af710 - spell add 'add' and not 'subtract' in an atomic op.
Spotted by:	alc
Pointy hat to:	jeff
2003-02-09 11:21:40 +00:00
Jeff Roberson
d85be48243 - Lock down the buffer cache's infrastructure code. This includes locks on
buf lists, synchronization variables, and atomic ops for the counters.
   This change does not remove giant from any code although some pushdown
   may be possible.
 - In vfs_bio_awrite() don't access buf fields without the buf lock.
2003-02-09 09:47:31 +00:00
Alfred Perlstein
44956c9863 Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
2003-01-21 08:56:16 +00:00