Commit Graph

336 Commits

Author SHA1 Message Date
Alan Cox
0ab3c7a594 Correct an error of omission in the reimplementation of the page
cache: vnode_pager_setsize() must handle the case where a file is
truncated to a non-page-size-aligned boundary and there is a cached
page underlying the new end of file.

Reported by:	kris, tegge
Tested by:	kris
MFC after:	3 days
2007-10-22 06:23:46 +00:00
Pawel Jakub Dawidek
57fd3d5572 When we do open, we should lock the vnode exclusively. This fixes few races:
- fifo race, where two threads assign v_fifoinfo,
- v_writecount modifications,
- v_object modifications,
- and probably more...

Discussed with:	kib, ups
Approved by:	re (rwatson)
2007-07-26 16:58:09 +00:00
Attilio Rao
b4b7081961 Do proper "locking" for missing vmmeters part.
Now, we assume no more sched_lock protection for some of them and use the
distribuited loads method for vmmeter (distribuited through CPUs).

Reviewed by: alc, bde
Approved by: jeff (mentor)
2007-06-04 21:45:18 +00:00
Attilio Rao
2feb50bf7d Revert VMCNT_* operations introduction.
Probabilly, a general approach is not the better solution here, so we should
solve the sched_lock protection problems separately.

Requested by: alc
Approved by: jeff (mentor)
2007-05-31 22:52:15 +00:00
Jeff Roberson
222d01951f - define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating
vmcnts.  This can be used to abstract away pcpu details but also changes
   to use atomics for all counters now.  This means sched lock is no longer
   responsible for protecting counts in the switch routines.

Contributed by:		Attilio Rao <attilio@FreeBSD.org>
2007-05-18 07:10:50 +00:00
Alan Cox
7e2393ff51 Long ago, revision 1.22 of vm/vm_pager.h introduced a bug. Specifically,
it introduced a check after the call to file system's get pages method
that assumes that the get pages method does not change the array of pages
that is passed to it.  In the case of vnode_pager_generic_getpages(),
this assumption has been incorrect.  The contents of the array of pages
may be shifted by vnode_pager_generic_getpages().  Likely, the problem
has been hidden by vnode_pager_haspage() limiting the set of pages that
are passed to vnode_pager_generic_getpages() such that a shift never
occurs.

The fix implemented herein is to adjust the pointer to the array of pages
rather than shifting the pages within the array.

MFC after: 3 weeks
Fix suggested by: tegge
2006-10-14 23:21:48 +00:00
Alan Cox
bff763439b Change vnode_pager_addr() such that on returning it distinguishes between
an error returned by VOP_BMAP() and a hole in the file.

Change the callers to vnode_pager_addr() such that they return
VM_PAGER_ERROR when VOP_BMAP fails instead of a zero-filled page.

Reviewed by: tegge
MFC after: 3 weeks
2006-10-14 22:09:03 +00:00
Alan Cox
1de11f1af3 Distinguish between two distinct kinds of errors from VOP_BMAP() in
vnode_pager_generic_getpages(): (1) that VOP_BMAP() is unsupported by the
underlying file system and (2) an error in performing the VOP_BMAP().
Previously, vnode_pager_generic_getpages() assumed that all errors were
of the first type.  If, in fact, the error was of the second type, the
likely outcome was for the process to become permanently blocked on a busy
page.

MFC after: 3 weeks
Reviewed by: tegge
2006-10-10 18:26:18 +00:00
Alan Cox
f4f83da02d Change vnode_pager_generic_getpages() so that it does not panic if the
given file is sparse.  Instead, it zeroes the requested page.

Reviewed by: tegge
PR: kern/98116
MFC after: 3 days
2006-10-08 20:26:16 +00:00
Alan Cox
5786be7cc7 Introduce a field to struct vm_page for storing flags that are
synchronized by the lock on the object containing the page.

Transition PG_WANTED and PG_SWAPINPROG to use the new field,
eliminating the need for holding the page queues lock when setting
or clearing these flags.  Rename PG_WANTED and PG_SWAPINPROG to
VPO_WANTED and VPO_SWAPINPROG, respectively.

Eliminate the assertion that the page queues lock is held in
vm_page_io_finish().

Eliminate the acquisition and release of the page queues lock
around calls to vm_page_io_finish() in kern_sendfile() and
vfs_unbusy_pages().
2006-08-09 17:43:27 +00:00
Tor Egge
3b582b4e72 Eliminate a deadlock when creating snapshots. Blocking vn_start_write() must
be called without any vnode locks held.  Remove calls to vn_start_write() and
vn_finished_write() in vnode_pager_putpages() and add these calls before the
vnode lock is obtained to most of the callers that don't already have them.
2006-03-02 22:13:28 +00:00
Jeff Roberson
b73f64c484 - Fix silly VI locking that is used to check a single flag. The vnode
lock also protects this flag so it is not necessary.
 - Don't rely on v_mount to detect whether or not we've been recycled, use
   the more appropriate VI_DOOMED instead.

Sponsored by:	Isilon Systems, Inc.
MFC After:	1 week
2006-02-06 10:14:12 +00:00
Yaroslav Tykhiy
731959b118 Use off_t for file size passed to vnode_create_vobject().
The former type, size_t, was causing truncation to 32 bits on i386,
which immediately led to undersizing of VM objects backed by
files >4GB.  In particular, sendfile(2) was broken for such files.

PR:		kern/92243
MFC after:	5 days
2006-02-01 12:43:13 +00:00
Paul Saab
dd498befc4 Rate limit vnode_pager_putpages printfs to once a second. 2005-11-01 23:00:24 +00:00
Alexander Kabaev
857b66d505 Do not use vm_pager_init() to initialize vnode_pbuf_freecnt variable.
vm_pager_init() is run before required nswbuf variable has been set
to correct value. This caused system to run with single pbuf available
for vnode_pager. Handle both cluster_pbuf_freecnt and vnode_pbuf_freecnt
variable in the same way.

Reported by:	ade
Obtained from:	alc
MFC after:	2 days
2005-08-13 20:21:33 +00:00
Suleiman Souhlal
4f12e0acb0 Use atomic operations on runningbufspace.
PR:		kern/84318
Submitted by:	ade
MFC after:	3 days
2005-08-08 22:44:10 +00:00
Alan Cox
d2d9c9aca7 Remove a stale comment concerning spl* usage. 2005-05-19 03:53:07 +00:00
Bjoern A. Zeeb
f3aad9a6bb Correct 32 vs 64 bit signedness issues.
Approved by:	pjd (mentor)
MFC after:	2 weeks
2005-05-18 08:57:31 +00:00
Jeff Roberson
ed4fe4f4f5 - Add a new object flag "OBJ_NEEDSGIANT". We set this flag if the
underlying vnode requires Giant.
 - In vm_fault only acquire Giant if the underlying object has NEEDSGIANT
   set.
 - In vm_object_shadow inherit the NEEDSGIANT flag from the backing object.
2005-05-03 11:11:26 +00:00
Jeff Roberson
6e4b282039 - Don't NULL the vnode's v_object pointer until after the object is torn
down.  If we have dirty pages, the putpages routine will need to know
   what the vnode's object is so that it may write out dirty pages.

Pointy hat:	phk
Found by:	obrien
2005-04-03 22:56:58 +00:00
Jeff Roberson
f247a5240d - LK_NOPAUSE is a nop now.
Sponsored by:   Isilon Systems, Inc.
2005-03-31 04:37:09 +00:00
Jeff Roberson
7747c03884 - Don't directly adjust v_usecount, use vref() instead.
Sponsored by:	Isilon Systems, Inc.
2005-03-14 09:03:19 +00:00
Jeff Roberson
1d39df3fe9 - Retire OLOCK and OWANT. All callers hold the vnode lock when creating
a vnode object.  There has been an assert to prove this for some time.

Sponsored by:	Isilon Systems, Inc.
2005-03-14 07:29:40 +00:00
Jeff Roberson
493d78b3bd - Don't acquire the vnode lock in destroy_vobject, assert that it has
already been acquired by the caller.

Sponsored by:	Isilon Systems, Inc.
2005-03-13 12:05:05 +00:00
Poul-Henning Kamp
dfd4be14bd Try to unbreak the vnode locking around vop_reclaim() (based mostly on
patch from kan@).

Pull bufobj_invalbuf() out of vinvalbuf() and make g_vfs call it on
close.  This is not yet a generally safe function, but for this very
specific use it is safe.  This solves the problem with buffers not
being flushed by unmount or after failed mount attempts.
2005-02-19 11:44:57 +00:00
Poul-Henning Kamp
7146d6cb3e Move the contents of vop_stddestroyvobject() to the new vnode_pager
function vnode_destroy_vobject().

Make the new function zero the vp->v_object pointer so we can tell
if a call is missing.
2005-01-28 08:56:48 +00:00
Poul-Henning Kamp
d07a6d3f61 Move the body of vop_stdcreatevobject() over to the vnode_pager under
the name Sande^H^H^H^H^Hvnode_create_vobject().

Make the new function take a size argument which removes the need for
a VOP_STAT() or a very pessimistic guess for disks.

Call that new function from vop_stdcreatevobject().

Make vnode_pager_alloc() private now that its only user came home.
2005-01-24 21:21:59 +00:00
Poul-Henning Kamp
35764be39e Kill the VV_OBJBUF and test the v_object for NULL instead. 2005-01-24 13:13:57 +00:00
Jeff Roberson
ae51ff1127 - Remove GIANT_REQUIRED where giant is no longer required.
- Use VFS_LOCK_GIANT() rather than directly acquiring giant in places
   where giant is only held because vfs requires it.

Sponsored By:   Isilon Systems, Inc.
2005-01-24 10:48:29 +00:00
Warner Losh
60727d8b86 /* -> /*- for license, minor formatting changes 2005-01-07 02:29:27 +00:00
Poul-Henning Kamp
475e8cc394 fix comment 2004-12-25 21:30:41 +00:00
Alan Cox
2ad036b657 Almost nine years ago, when support for 1TB files was introduced in
revision 1.55, the address parameter to vnode_pager_addr() was changed
from an unsigned 32-bit quantity to a signed 64-bit quantity.  However,
an out-of-range check on the address was not updated.  Consequently,
memory-mapped I/O on files greater than 2GB could cause a kernel panic.
Since the address is now a signed 64-bit quantity, the problem resolution
is simply to remove a cast.

Reviewed by: bde@ and tegge@
PR: 73010
MFC after: 1 week
2004-12-07 22:05:38 +00:00
Alan Cox
d8fed1d050 Correct a sanity check in vnode_pager_generic_putpages(). The cast used
to implement the sanity check should have been changed when we converted
the implementation of vm_pindex_t from 32 to 64 bits.  (Thus, RELENG_4 is
not affected.)  The consequence of this error would be a legimate write to
an extremely large file being treated as an errant attempt to write meta-
data.

Discussed with: tegge@
2004-12-05 21:48:11 +00:00
Poul-Henning Kamp
9c83534dd8 Make VOP_BMAP return a struct bufobj for the underlying storage device
instead of a vnode for it.

The vnode_pager does not and should not have any interest in what
the filesystem uses for backend.

(vfs_cluster doesn't use the backing store argument.)
2004-11-15 09:18:27 +00:00
Poul-Henning Kamp
676f3ee26c Explicitly call pbrelvp() 2004-11-15 08:06:05 +00:00
Alan Cox
19187819b7 Move a call to wakeup() from vm_object_terminate() to vnode_pager_dealloc()
because this call is only needed to wake threads that slept when they
discovered a dead object connected to a vnode.  To eliminate unnecessary
calls to wakeup() by vnode_pager_dealloc(), introduce a new flag,
OBJ_DISCONNECTWNT.

Reviewed by: tegge@
2004-11-06 05:33:02 +00:00
Poul-Henning Kamp
6229cc508b Also check that the sectormask is bigger than zero.
Wrap this overly long KASSERT and remove newline.
2004-10-26 19:51:57 +00:00
Poul-Henning Kamp
5d9d81e7ea Put the I/O block size in bufobj->bo_bsize.
We keep si_bsize_phys around for now as that is the simplest way to pull
the number out of disk device drivers in devfs_open().  The correct solution
would be to do an ioctl(DIOCGSECTORSIZE), but the point is probably mooth
when filesystems sit on GEOM, so don't bother for now.
2004-10-26 07:39:12 +00:00
Poul-Henning Kamp
b792bebeea Move the buffer method vector (buf->b_op) to the bufobj.
Extend it with a strategy method.

Add bufstrategy() which do the usual VOP_SPECSTRATEGY/VOP_STRATEGY
song and dance.

Rename ibwrite to bufwrite().

Move the two NFS buf_ops to more sensible places, add bufstrategy
to them.

Add inlines for bwrite() and bstrategy() which calls through
buf->b_bufobj->b_ops->b_{write,strategy}().

Replace almost all VOP_STRATEGY()/VOP_SPECSTRATEGY() calls with bstrategy().
2004-10-24 20:03:41 +00:00
Poul-Henning Kamp
1a31a6c3b2 add KASSERTS 2004-09-07 07:32:40 +00:00
Alan Cox
0cb507cb20 Acquire and release Giant around a call to VOP_BMAP(). (This is a
prerequisite to any further reduction in Giant's use by vm_fault().)
2004-08-19 02:37:12 +00:00
Alan Cox
5a32489377 Make vm_page's PG_ZERO flag immutable between the time of the page's
allocation and deallocation.  This flag's principal use is shortly after
allocation.  For such cases, clearing the flag is pointless.  The only
unusual use of PG_ZERO is in vfs_bio_clrbuf().  However, allocbuf() never
requests a prezeroed page.  So, vfs_bio_clrbuf() never sees a prezeroed
page.

Reviewed by:	tegge@
2004-05-06 05:03:23 +00:00
Alan Cox
87aefa499a Push down Giant into vm_pager_get_pages(). The only get pages methods that
require Giant are in the device and vnode pagers.
2004-04-23 06:10:58 +00:00
Alan Cox
9e0ddbd00a Eliminate vm_pager_map_page() and vm_pager_unmap_page() and their uses.
Use sf_buf_alloc() and sf_buf_free() instead.
2004-04-06 07:12:32 +00:00
Alan Cox
a67048571f Eliminate the acquisition and release of Giant from vnode_pager_alloc().
The vm object and vnode locking should suffice.

Discussed with:	jeff
2004-01-04 03:18:24 +00:00
Tim J. Robbins
167a9effa5 In vnode_pager_input_smlfs(), call VOP_STRATEGY instead of VOP_SPECSTRATEGY
on non-VCHR vnodes. This fixes a panic when reading data from files on a
filesystem with a small (less than a page) block size.

PR:		59271
Reviewed by:	alc
2003-11-15 09:54:11 +00:00
Alan Cox
52051abcf1 - Call vnode_pager_input_old() with the vm object locked. 2003-10-25 05:21:16 +00:00
Alan Cox
2e3b314d3a - Push down Giant from vm_pageout() to vm_pageout_scan(), freeing
vm_pageout_page_stats() from Giant.
 - Modify vm_pager_put_pages() and vm_pager_page_unswapped() to expect the
   vm object to be locked on entry.  (All of the pager routines now expect
   this.)
2003-10-24 06:43:04 +00:00
Alan Cox
2bf43e4374 - Hold the vm object's lock around calls to vm_page_set_validclean(). 2003-10-20 04:05:24 +00:00
Alan Cox
1b26eb10ff - Synchronize access to a vm page's valid field using the containing
vm object's lock.
 - Reduce the scope of the vm page queues lock in two places.
2003-10-19 00:01:56 +00:00
Alan Cox
8b575f6c28 - Synchronize access to the page's valid field in
vnode_pager_generic_getpages() using the containing object's lock.
2003-10-18 21:30:29 +00:00
Poul-Henning Kamp
2c18019f14 DuH!
bp->b_iooffset (the spot on the disk), not bp->b_offset (the offset in
the file)
2003-10-18 14:10:28 +00:00
Poul-Henning Kamp
9fbf91c0dd Initialize bp->b_offset before calling VOP_[SPEC]STRATEGY().
Remove stale comment about B_PHYS.
2003-10-18 11:11:05 +00:00
Alan Cox
417a26a154 Add vm object locking to vnode_pager_lock(). (This triggers the movement
of a VM_OBJECT_LOCK() in vm_fault().)
2003-09-18 02:26:03 +00:00
Marcel Moolenaar
23562e4bc6 In vnode_pager_generic_putpages(), change the printf format specifier
to long and explicitly cast field dirty of struct vm_page to unsigned
long. When PAGE_SIZE is 32K, this field is actually unsigned long.
2003-08-29 00:16:30 +00:00
Alan Cox
b7ad744dc5 Hold the page queues lock when performing vm_page_clear_dirty() and
vm_page_set_invalid().
2003-08-23 18:11:53 +00:00
Poul-Henning Kamp
6a4b58230c Replace a homegrown bdone()/bwait() implementation by the real thing 2003-08-18 19:47:16 +00:00
Poul-Henning Kamp
ec7948490b Use NULL for 3rd argument of VOP_BMAP() rather than custom cast.
Eliminate unused variable.
2003-08-17 18:54:23 +00:00
Poul-Henning Kamp
4e6586002d Use sparse struct initializations for struct pagerops.
This makes grepping for which pagers implement which methods easier.
2003-08-05 06:51:26 +00:00
Alan Cox
f29ba63ec9 Maintain a lock on the vm object of interest throughout vm_fault(),
releasing the lock only if we are about to sleep (e.g., vm_pager_get_pages()
or vm_pager_has_pages()).  If we sleep, we have marked the vm object with
the paging-in-progress flag.
2003-06-22 21:35:41 +00:00
Alan Cox
31953be936 Lock the vm object when freeing a vm page. 2003-06-18 04:27:18 +00:00
Alan Cox
8630c1173e Add vm object locking to various pagers' "get pages" methods, i386 stack
management functions, and a u area management function.
2003-06-13 03:02:28 +00:00
David E. O'Brien
874651b13c Use __FBSDID(). 2003-06-11 23:50:51 +00:00
Alan Cox
2a8f9ab57f - Finish vm object and page locking in vnode_pager_setsize().
- Make some small style changes to vnode_pager_setsize(); most notably,
   move two comments to a more logical place.
2003-06-10 20:28:41 +00:00
Alan Cox
658ad5fff5 Lock the vm_object when performing vm_pager_deallocate(). 2003-05-06 02:45:28 +00:00
Alan Cox
1ca5895341 - Convert vm_object_pip_wait() from using tsleep() to msleep().
- Make vm_object_pip_sleep() static.
 - Lock the vm_object when performing vm_object_pip_wait().
2003-04-26 18:33:18 +00:00
Alan Cox
49281fbf68 Update locking around vm_object_page_remove() to use the new macros. 2003-04-18 16:39:03 +00:00
Poul-Henning Kamp
b4b138c27f Including <sys/stdint.h> is (almost?) universally only to be able to use
%j in printfs, so put a newsted include in <sys/systm.h> where the printf
prototype lives and save everybody else the trouble.
2003-03-18 08:45:25 +00:00
Alan Cox
09c80124a3 Remove ENABLE_VFS_IOOPT. It is a long unfinished work-in-progress.
Discussed on:	arch@
2003-03-06 03:41:02 +00:00
Poul-Henning Kamp
ca94e7c4ca We can get past here on a normal vnode as well, so use VOP_STRATEGY if so. 2003-01-13 21:32:16 +00:00
Poul-Henning Kamp
5266a767e5 Convert VOP_STRATEGY to VOP_SPECSTRATEGY in the generic getpages and
the pager input for small filesystems.
2003-01-05 20:32:03 +00:00
Poul-Henning Kamp
862702306b Convert calls to BUF_STRATEGY to VOP_STRATEGY calls. This is a no-op since
all BUF_STRATEGY did in the first place was call VOP_STRATEGY.
2003-01-03 06:32:15 +00:00
Matthew Dillon
43b7990e30 Allow the VM object flushing code to cluster. When the filesystem syncer
comes along and flushes a file which has been mmap()'d SHARED/RW, with
dirty pages, it was flushing the underlying VM object asynchronously,
resulting in thousands of 8K writes.  With this change the VM Object flushing
code will cluster dirty pages in 64K blocks.

Note that until the low memory deadlock issue is reviewed, it is not safe
to allow the pageout daemon to use this feature.  Forced pageouts still
use fs block size'd ops for the moment.

MFC after:	3 days
2002-12-28 21:03:42 +00:00
Alan Cox
475e8011ab Perform vm_object_lock() and vm_object_unlock() around
vm_object_page_remove().
2002-12-15 05:41:56 +00:00
Alan Cox
85e0124324 Hold the page queues lock when performing pmap_clear_modify().
Approved by:	re (blanket)
2002-11-27 19:51:48 +00:00
Alan Cox
178949e021 Hold the page queues/flags lock when calling vm_page_set_validclean().
Approved by:	re
2002-11-23 19:10:31 +00:00
Alan Cox
e8a27959f6 Add page queue and flag locking in vnode_pager_setsize().
Approved by:	re
2002-11-23 03:58:35 +00:00
Alan Cox
4fec79bef8 Now that pmap_remove_all() is exported by our pmap implementations
use it directly.
2002-11-16 07:44:25 +00:00
Alan Cox
d154fb4fe6 When prot is VM_PROT_NONE, call pmap_page_protect() directly rather than
indirectly through vm_page_protect().  The one remaining page flag that
is updated by vm_page_protect() is already being updated by our various
pmap implementations.

Note: A later commit will similarly change the VM_PROT_READ case and
eliminate vm_page_protect().
2002-11-10 07:12:04 +00:00
Maxime Henrion
bf1001fa0f Better printf() formats. 2002-11-07 23:16:22 +00:00
Poul-Henning Kamp
37c841831f Be consistent about "static" functions: if the function is marked
static in its prototype, mark it static at the definition too.

Inspired by:    FlexeLint warning #512
2002-09-28 17:15:38 +00:00
Jeff Roberson
63e7e60dba - Add a ASSERT_VOP_LOCKED in vnode_pager_alloc.
- Lock access to v_iflags.
2002-09-25 01:23:43 +00:00
Alan Cox
fff6062ab6 o Retire vm_page_zero_fill() and vm_page_zero_fill_area(). Ever since
pmap_zero_page() and pmap_zero_page_area() were modified to accept
   a struct vm_page * instead of a physical address, vm_page_zero_fill()
   and vm_page_zero_fill_area() have served no purpose.
2002-08-25 00:22:31 +00:00
Jeff Roberson
e6e370a7fe - Replace v_flag with v_iflag and v_vflag
- v_vflag is protected by the vnode lock and is used when synchronization
   with VOP calls is needed.
 - v_iflag is protected by interlock and is used for dealing with vnode
   management issues.  These flags include X/O LOCK, FREE, DOOMED, etc.
 - All accesses to v_iflag and v_vflag have either been locked or marked with
   mp_fixme's.
 - Many ASSERT_VOP_LOCKED calls have been added where the locking was not
   clear.
 - Many functions in vfs_subr.c were restructured to provide for stronger
   locking.

Idea stolen from:	BSD/OS
2002-08-04 10:29:36 +00:00
Alan Cox
e43c2eab07 o Lock page queue accesses by vm_page_free().
o Apply some style fixes.
2002-07-28 20:13:48 +00:00
Alan Cox
9d52288860 o Lock page queue accesses by vm_page_activate() and vm_page_deactivate(). 2002-07-27 04:30:46 +00:00
Robert Drehmel
47e151dd7a - Use (OFF_TO_IDX(off) - pi) instead of (OFF_TO_IDX(off - IDX_TO_OFF(pi))).
- Reformat a comment.
2002-07-01 14:14:07 +00:00
Alan Cox
990ab7add4 o Replace GIANT_REQUIRED in vnode_pager_alloc() by the acquisition and
release of Giant.  (Annotate as MPSAFE.)
 o Also, in vnode_pager_alloc(), remove an unnecessary re-initialization
   of struct vm_object::flags and move a statement that is duplicated
   in both branches of an if-else.
2002-06-22 07:28:06 +00:00
Tom Rhodes
d394511de3 More s/file system/filesystem/g 2002-05-16 21:28:32 +00:00
Poul-Henning Kamp
98b0c78978 Make daddr_t and u_daddr_t 64bits wide.
Retire daddr64_t and use daddr_t instead.

Sponsored by:	DARPA & NAI Labs.
2002-05-14 11:09:43 +00:00
Alan Cox
c0b6bbb80b o Condition the compilation and use of vm_freeze_copyopts()
on ENABLE_VFS_IOOPT.
2002-05-06 05:45:57 +00:00
Peter Wemm
44e74ba6c3 We do not necessarily need to map/unmap pages to zero parts of them.
On systems where physical memory is also direct mapped (alpha, sparc,
ia64 etc) this is slightly harmful.
2002-04-28 00:15:48 +00:00
Alfred Perlstein
11caded34f Remove __P. 2002-03-19 22:20:14 +00:00
Kirk McKusick
0d2af52141 Introduce the new 64-bit size disk block, daddr64_t. Change
the bio and buffer structures to have daddr64_t bio_pblkno,
b_blkno, and b_lblkno fields which allows access to disks
larger than a Terabyte in size. This change also requires
that the VOP_BMAP vnode operation accept and return daddr64_t
blocks. This delta should not affect system operation in
any way. It merely sets up the necessary interfaces to allow
the development of disk drivers that work with these larger
disk block addresses. It also allows for the development of
UFS2 which will use 64-bit block addresses.
2002-03-15 18:49:47 +00:00
Eivind Eklund
a128794977 - Remove a number of extra newlines that do not belong here according to
style(9)
- Minor space adjustment in cases where we have "( ", " )", if(), return(),
  while(), for(), etc.
- Add /* SYMBOL */ after a few #endifs.

Reviewed by:	alc
2002-03-10 21:52:48 +00:00
John Baldwin
a854ed9893 Simple p_ucred -> td_ucred changes to start using the per-thread ucred
reference.
2002-02-27 18:32:23 +00:00
Matthew Dillon
3ebeaf5984 This fixes a large number of bugs in our NFS client side code. A recent
commit by Kirk also fixed a softupdates bug that could easily be triggered
by server side NFS.

	* An edge case with shared R+W mmap()'s and truncate whereby
	  the system would inappropriately clear the dirty bits on
	  still-dirty data.  (applicable to all filesystems)

	  THIS FIX TEMPORARILY DISABLED PENDING FURTHER TESTING.
	  see vm/vm_page.c line 1641

	* The straddle case for VM pages and buffer cache buffers when
	  truncating.  (applicable to NFS client side)

	* Possible SMP database corruption due to vm_pager_unmap_page()
	  not clearing the TLB for the other cpu's.  (applicable to NFS
	  client side but could effect all filesystems).  Note: not
	  considered serious since the corruption occurs beyond the file
	  EOF.

	* When flusing a dirty buffer due to B_CACHE getting cleared,
	  we were accidently setting B_CACHE again (that is, bwrite() sets
	  B_CACHE), when we really want it to stay clear after the write
	  is complete.  This resulted in a corrupt buffer.  (applicable
	  to all filesystems but probably only triggered by NFS)

	* We have to call vtruncbuf() when ftruncate()ing to remove
	  any buffer cache buffers.  This is still tentitive, I may
	  be able to remove it due to the second bug fix.  (applicable
	  to NFS client side)

	* vnode_pager_setsize() race against nfs_vinvalbuf()... we have
	  to set n_size before calling nfs_vinvalbuf or the NFS code
	  may recursively vnode_pager_setsize() to the original value
	  before the truncate.  This is what was causing the user mmap
	  bus faults in the nfs tester program.  (applicable to NFS
	  client side)

	* Fix to softupdates (see ufs/ffs/ffs_inode.c 1.73, commit made
	  by Kirk).

Testing program written by: Avadis Tevanian, Jr.
Testing program supplied by: jkh / Apple (see Dec2001 posting to freebsd-hackers with Subject 'NFS: How to make FreeBS fall on its face in one easy step')
MFC after:	1 week
2001-12-14 01:16:57 +00:00
Matthew Dillon
33c6774151 Adjust vnode_pager_input_smlfs() to not attempt to BMAP blocks beyond the
file EOF.  This works around a bug in the ISOFS (CDRom) BMAP code which
returns bogus values for requests beyond the file EOF rather then returning
an error, resulting in either corrupt data being mmap()'d beyond the file EOF
or resulting in a seg-fault on the last page of a mmap()'d file (mmap()s of
CDRom files).

Reported by: peter / Yahoo
MFC after:	3 days
2001-11-05 18:58:47 +00:00
Matthew Dillon
00a6f47f13 Finally fix the VM bug where a file whos EOF occurs in the middle of a page
would sometimes prevent a dirty page from being cleaned, even when synced,
resulting in the dirty page being re-flushed to disk every 30-60 seconds or
so, forever.  The problem is that when the filesystem flushes a page to
its backing file it typically does not clear dirty bits representing areas
of the page that are beyond the file EOF.  If the file is also mmap()'d and
a fault is taken, vm_fault (properly, is required to) set the vm_page_t->dirty
bits to VM_PAGE_BITS_ALL.  This combination could leave us with an uncleanable,
unfreeable page.

The solution is to have the vnode_pager detect the edge case and manually
clear the dirty bits representing areas beyond the file EOF.  The filesystem
does the rest and the page comes up clean after the write completes.

MFC after:	3 days
2001-10-12 18:17:34 +00:00
John Baldwin
bd78cece5d Change the kernel's ucred API as follows:
- crhold() returns a reference to the ucred whose refcount it bumps.
- crcopy() now simply copies the credentials from one credential to
  another and has no return value.
- a new crshared() primitive is added which returns true if a ucred's
  refcount is > 1 and false (0) otherwise.
2001-10-11 23:38:17 +00:00
Julian Elischer
b40ce4165d KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after:    ha ha ha ha
2001-09-12 08:38:13 +00:00
John Baldwin
bd8e0d5871 Whitespace fixes. 2001-08-04 20:49:29 +00:00
Matthew Dillon
54d9214595 whitespace / register cleanup 2001-07-04 19:00:13 +00:00
Matthew Dillon
0cddd8f023 With Alfred's permission, remove vm_mtx in favor of a fine-grained approach
(this commit is just the first stage).  Also add various GIANT_ macros to
formalize the removal of Giant, making it easy to test in a more piecemeal
fashion. These macros will allow us to test fine-grained locks to a degree
before removing Giant, and also after, and to remove Giant in a piecemeal
fashion via sysctl's on those subsystems which the authors believe can
operate without Giant.
2001-07-04 16:20:28 +00:00
John Baldwin
b62b9b648b Fix a XXX comment by moving the initialization of the number of pbuf's
for the vnode pager to a new vnode pager init method instead of making it
a hack in getpages().
2001-07-03 07:35:56 +00:00
John Baldwin
342a1480aa Don't hold the VM lock across VOP's and other things that can sleep. 2001-05-29 16:58:25 +00:00
John Baldwin
e6b961ffbd - Assert Giant is held in the vnode pager methods.
- Lock the VM while walking down a vm_object's backing_object list in
  vnode_pager_lock().
2001-05-23 22:51:23 +00:00
Alfred Perlstein
2395531439 Introduce a global lock for the vm subsystem (vm_mtx).
vm_mtx does not recurse and is required for most low level
vm operations.

faults can not be taken without holding Giant.

Memory subsystems can now call the base page allocators safely.

Almost all atomic ops were removed as they are covered under the
vm mutex.

Alpha and ia64 now need to catch up to i386's trap handlers.

FFS and NFS have been tested, other filesystems will need minor
changes (grabbing the vm lock when twiddling page properties).

Reviewed (partially) by: jake, jhb
2001-05-19 01:28:09 +00:00
Greg Lehey
60fb0ce365 Revert consequences of changes to mount.h, part 2.
Requested by:	bde
2001-04-29 02:45:39 +00:00
Greg Lehey
d98dc34f52 Correct #includes to work with fixed sys/mount.h. 2001-04-23 09:05:15 +00:00
Alfred Perlstein
d8d5fa8805 vnode_pager_freepage() is really vm_page_free() in disguise,
nuke vnode_pager_freepage() and replace all calls to it with vm_page_free()
2001-04-19 06:18:23 +00:00
Matthew Dillon
2b6b0df712 This implements a better launder limiting solution. There was a solution
in 4.2-REL which I ripped out in -stable and -current when implementing the
low-memory handling solution.  However, maxlaunder turns out to be the saving
grace in certain very heavily loaded systems (e.g. newsreader box).  The new
algorithm limits the number of pages laundered in the first pageout daemon
pass.  If that is not sufficient then suceessive will be run without any
limit.

Write I/O is now pipelined using two sysctls, vfs.lorunningspace and
vfs.hirunningspace.  This prevents excessive buffered writes in the
disk queues which cause long (multi-second) delays for reads.  It leads
to more stable (less jerky) and generally faster I/O streaming to disk
by allowing required read ops (e.g. for indirect blocks and such) to occur
without interrupting the write stream, amoung other things.

NOTE: eventually, filesystem write I/O pipelining needs to be done on a
per-device basis.  At the moment it is globalized.
2000-12-26 19:41:38 +00:00
Kirk McKusick
f2a2857bb3 Add snapshots to the fast filesystem. Most of the changes support
the gating of system calls that cause modifications to the underlying
filesystem. The gating can be enabled by any filesystem that needs
to consistently suspend operations by adding the vop_stdgetwritemount
to their set of vnops. Once gating is enabled, the function
vfs_write_suspend stops all new write operations to a filesystem,
allows any filesystem modifying system calls already in progress
to complete, then sync's the filesystem to disk and returns. The
function vfs_write_resume allows the suspended write operations to
begin again. Gating is not added by default for all filesystems as
for SMP systems it adds two extra locks to such critical kernel
paths as the write system call. Thus, gating should only be added
as needed.

Details on the use and current status of snapshots in FFS can be
found in /sys/ufs/ffs/README.snapshot so for brevity and timelyness
is not included here. Unless and until you create a snapshot file,
these changes should have no effect on your system (famous last words).
2000-07-11 22:07:57 +00:00
Peter Wemm
0385347c1a Implement an optimization of the VM<->pmap API. Pass vm_page_t's directly
to various pmap_*() functions instead of looking up the physical address
and passing that.  In many cases, the first thing the pmap code was doing
was going to a lot of trouble to get back the original vm_page_t, or
it's shadow pv_table entry.

Inspired by: John Dyson's 1998 patches.

Also:
Eliminate pv_table as a seperate thing and build it into a machine
dependent part of vm_page_t.  This eliminates having a seperate set of
structions that shadow each other in a 1:1 fashion that we often went to
a lot of trouble to translate from one to the other. (see above)
This happens to save 4 bytes of physical memory for each page in the
system.  (8 bytes on the Alpha).

Eliminate the use of the phys_avail[] array to determine if a page is
managed (ie: it has pv_entries etc).  Store this information in a flag.
Things like device_pager set it because they create vm_page_t's on the
fly that do not have pv_entries.  This makes it easier to "unmanage" a
page of physical memory (this will be taken advantage of in subsequent
commits).

Add a function to add a new page to the freelist.  This could be used
for reclaiming the previously wasted pages left over from preloaded
loader(8) files.

Reviewed by:	dillon
2000-05-21 12:50:18 +00:00
Poul-Henning Kamp
9626b608de Separate the struct bio related stuff out of <sys/buf.h> into
<sys/bio.h>.

<sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall
not be made a nested include according to bdes teachings on the
subject of nested includes.

Diskdrivers and similar stuff below specfs::strategy() should no
longer need to include <sys/buf.> unless they need caching of data.

Still a few bogus uses of struct buf to track down.

Repocopy by:    peter
2000-05-05 09:59:14 +00:00
Poul-Henning Kamp
c244d2de43 Move B_ERROR flag to b_ioflags and call it BIO_ERROR.
(Much of this done by script)

Move B_ORDERED flag to b_ioflags and call it BIO_ORDERED.

Move b_pblkno and b_iodone_chain to struct bio while we transition, they
will be obsoleted once bio structs chain/stack.

Add bio_queue field for struct bio aware disksort.

Address a lot of stylistic issues brought up by bde.
2000-04-02 15:24:56 +00:00
Philippe Charnier
5929bcfaba Revert spelling mistake I made in the previous commit
Requested by: Alan and Bruce
2000-03-27 20:41:17 +00:00
Philippe Charnier
956f31353c Spelling 2000-03-26 15:20:23 +00:00
Poul-Henning Kamp
b99c307a21 Rename the existing BUF_STRATEGY() to DEV_STRATEGY()
substitute BUF_WRITE(foo) for VOP_BWRITE(foo->b_vp, foo)

substitute BUF_STRATEGY(foo) for VOP_STRATEGY(foo->b_vp, foo)

This patch is machine generated except for the ccd.c and buf.h parts.
2000-03-20 11:29:10 +00:00
Poul-Henning Kamp
21144e3bf1 Remove B_READ, B_WRITE and B_FREEBUF and replace them with a new
field in struct buf: b_iocmd.  The b_iocmd is enforced to have
exactly one bit set.

B_WRITE was bogusly defined as zero giving rise to obvious coding
mistakes.

Also eliminate the redundant struct buf flag B_CALL, it can just
as efficiently be done by comparing b_iodone to NULL.

Should you get a panic or drop into the debugger, complaining about
"b_iocmd", don't continue.  It is likely to write on your disk
where it should have been reading.

This change is a step in the direction towards a stackable BIO capability.

A lot of this patch were machine generated (Thanks to style(9) compliance!)

Vinum users:  Greg has not had time to test this yet, be careful.
2000-03-20 10:44:49 +00:00
Poul-Henning Kamp
923502ff91 useracc() the prequel:
Merge the contents (less some trivial bordering the silly comments)
of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>.  This puts
the #defines for the vm_inherit_t and vm_prot_t types next to their
typedefs.

This paves the road for the commit to follow shortly: change
useracc() to use VM_PROT_{READ|WRITE} rather than B_{READ|WRITE}
as argument.
1999-10-29 18:09:36 +00:00
Matthew Dillon
24579ca1d7 The vnode pager (used when you do file-backed mmaps) must use the
underlying physical sector size when aligning I/O transfer sizes.
    It cannot assume 512 bytes.

    We assume the underlying sector size is a power of 2.  If it isn't,
    mmap() will break badly anyway (in the same way mmap broke with NFS
    when NFS tried to cache piecemeal write ranges in buffers, before
    we enforced read-buffer-before-write-piecemeal for NFS).

Reviewed by:	Alan Cox <alc@cs.rice.edu>, David Greenman <dg@root.com>
1999-09-17 05:17:59 +00:00
Peter Wemm
c3aac50f28 $Id$ -> $FreeBSD$ 1999-08-28 01:08:13 +00:00
Alan Cox
2c28a10540 Add the (inline) function vm_page_undirty for clearing the dirty bitmask
of a vm_page.

Use it.

Submitted by:	dillon
1999-08-17 04:02:34 +00:00
Peter Wemm
3efc015bae Fix some int/long printf problems for the Alpha 1999-07-01 19:53:43 +00:00
Kirk McKusick
67812eacd7 Convert buffer locking from using the B_BUSY and B_WANTED flags to using
lockmgr locks. This commit should be functionally equivalent to the old
semantics. That is, all buffer locking is done with LK_EXCLUSIVE
requests. Changes to take advantage of LK_SHARED and LK_RECURSIVE will
be done in future commits.
1999-06-26 02:47:16 +00:00
Dmitrij Tejblum
54746b676c Fix confusion of size of transfer with size of the pager.
PR:		11658
Broken in:	1.89 (1998/03/07)
1999-05-15 23:42:39 +00:00
Poul-Henning Kamp
b0eeea2042 remove b_proc from struct buf, it's (now) unused.
Reviewed by:	dillon, bde
1999-05-06 20:00:34 +00:00
Alan Cox
4221e284a3 The VFS/BIO subsystem contained a number of hacks in order to optimize
piecemeal, middle-of-file writes for NFS.  These hacks have caused no
end of trouble, especially when combined with mmap().  I've removed
them.  Instead, NFS will issue a read-before-write to fully
instantiate the struct buf containing the write.  NFS does, however,
optimize piecemeal appends to files.  For most common file operations,
you will not notice the difference.  The sole remaining fragment in
the VFS/BIO system is b_dirtyoff/end, which NFS uses to avoid cache
coherency issues with read-merge-write style operations.  NFS also
optimizes the write-covers-entire-buffer case by avoiding the
read-before-write.  There is quite a bit of room for further
optimization in these areas.

The VM system marks pages fully-valid (AKA vm_page_t->valid =
VM_PAGE_BITS_ALL) in several places, most noteably in vm_fault.  This
is not correct operation.  The vm_pager_get_pages() code is now
responsible for marking VM pages all-valid.  A number of VM helper
routines have been added to aid in zeroing-out the invalid portions of
a VM page prior to the page being marked all-valid.  This operation is
necessary to properly support mmap().  The zeroing occurs most often
when dealing with file-EOF situations.  Several bugs have been fixed
in the NFS subsystem, including bits handling file and directory EOF
situations and buf->b_flags consistancy issues relating to clearing
B_ERROR & B_INVAL, and handling B_DONE.

getblk() and allocbuf() have been rewritten.  B_CACHE operation is now
formally defined in comments and more straightforward in
implementation.  B_CACHE for VMIO buffers is based on the validity of
the backing store.  B_CACHE for non-VMIO buffers is based simply on
whether the buffer is B_INVAL or not (B_CACHE set if B_INVAL clear,
and vise-versa).  biodone() is now responsible for setting B_CACHE
when a successful read completes.  B_CACHE is also set when a bdwrite()
is initiated and when a bwrite() is initiated.  VFS VOP_BWRITE
routines (there are only two - nfs_bwrite() and bwrite()) are now
expected to set B_CACHE.  This means that bowrite() and bawrite() also
set B_CACHE indirectly.

There are a number of places in the code which were previously using
buf->b_bufsize (which is DEV_BSIZE aligned) when they should have
been using buf->b_bcount.  These have been fixed.  getblk() now clears
B_DONE on return because the rest of the system is so bad about
dealing with B_DONE.

Major fixes to NFS/TCP have been made.  A server-side bug could cause
requests to be lost by the server due to nfs_realign() overwriting
other rpc's in the same TCP mbuf chain.  The server's kernel must be
recompiled to get the benefit of the fixes.

Submitted by:	Matthew Dillon <dillon@apollo.backplane.com>
1999-05-02 23:57:16 +00:00
Dmitrij Tejblum
897a45eff9 Convert usage of vm_page_bits() to the new convention ("Inputs are required
to range within a page").
1999-04-10 20:52:11 +00:00
Julian Elischer
8d17e69460 Catch a case spotted by Tor where files mmapped could leave garbage in the
unallocated parts of the last page when the file ended on a frag
but not a page boundary.
Delimitted by tags PRE_MATT_MMAP_EOF and POST_MATT_MMAP_EOF,
in files alpha/alpha/pmap.c i386/i386/pmap.c nfs/nfs_bio.c vm/pmap.h
    vm/vm_page.c vm/vm_page.h vm/vnode_pager.c miscfs/specfs/spec_vnops.c
    ufs/ufs/ufs_readwrite.c kern/vfs_bio.c

Submitted by: Matt Dillon <dillon@freebsd.org>
Reviewed by: Alan Cox <alc@freebsd.org>
1999-04-05 19:38:30 +00:00
Eivind Eklund
4491ea9111 Correct a comment. 1999-03-27 02:39:01 +00:00
Alan Cox
0e3cdf2cf8 Reviewed by: "John S. Dyson" <dyson@iquest.net>
Submitted by:	Matthew Dillon <dillon@apollo.backplane.com>
To prevent a deadlock, if we are extremely low on memory, force synchronous
operation by the VOP_PUTPAGES in vnode_pager_putpages.
1999-02-27 23:39:28 +00:00
Matthew Dillon
e4542174b0 vm_pager_put_pages() is passed an rcval array to hold per-page return
values.  The 'int' return value for the procedure was never used and
    not well defined in any case when there are mixed errors on pages, so
    it has been removed.  vm_pager_put_pages() and associated vm_pager
    functions now return void.
1999-01-24 02:32:15 +00:00
Matthew Dillon
1c7c3c6a86 This is a rather large commit that encompasses the new swapper,
changes to the VM system to support the new swapper, VM bug
    fixes, several VM optimizations, and some additional revamping of the
    VM code.  The specific bug fixes will be documented with additional
    forced commits.  This commit is somewhat rough in regards to code
    cleanup issues.

Reviewed by:	"John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>
1999-01-21 08:29:12 +00:00
Robert V. Baron
af1f63c7eb In vnode_pager_input_old, set auio.uio_procp = curproc
vs auio.uio_procp = (struct proc *) 0
1998-12-04 18:39:44 +00:00
David Greenman
6cde7a165f Fixed two potentially serious classes of bugs:
1) The vnode pager wasn't properly tracking the file size due to
   "size" being page rounded in some cases and not in others.
   This sometimes resulted in corrupted files. First noticed by
   Terry Lambert.
   Fixed by changing the "size" pager_alloc parameter to be a 64bit
   byte value (as opposed to a 32bit page index) and changing the
   pagers and their callers to deal with this properly.
2) Fixed a bogus type cast in round_page() and trunc_page() that
   caused some 64bit offsets and sizes to be scrambled. Removing
   the cast required adding casts at a few dozen callers.
   There may be problems with other bogus casts in close-by
   macros. A quick check seemed to indicate that those were okay,
   however.
1998-10-13 08:24:45 +00:00
Robert V. Baron
6e3a3f387c John Dyson approved of this solution; make vnode_pager_input_old set m->valid 1998-09-28 23:58:10 +00:00
Bruce Evans
500b04a257 Instantiate `nfs_mount_type' in a standard file so that it is present
when nfs is an LKM.  Declare it in a header file.  Don't forget to use
it in non-Lite2 code.  Initialize it to -1 instead of to 0, since 0
will soon be the mount type number for the first vfs loaded.

NetBSD uses strcmp() to avoid this ugly global.
1998-09-05 15:17:34 +00:00
Doug Rabson
e69763a315 Cosmetic changes to the PAGE_XXX macros to make them consistent with
the other objects in vm.
1998-09-04 08:06:57 +00:00
Luoqi Chen
c576d12115 Fix a rounding problem that causes vnode pager to fail to remove the last
partially filled page during a truncation.

PR:		kern/7422
1998-08-25 13:47:37 +00:00
Doug Rabson
069e9bc1b4 Change various syscalls to use size_t arguments instead of u_int.
Add some overflow checks to read/write (from bde).

Change all modifications to vm_page::flags, vm_page::busy, vm_object::flags
and vm_object::paging_in_progress to use operations which are not
interruptable.

Reviewed by: Bruce Evans <bde@zeta.org.au>
1998-08-24 08:39:39 +00:00
Bruce Evans
fc62ef1fb5 Fixed printf format errors. 1998-07-11 11:30:46 +00:00
Bruce Evans
ac1e407b32 Fixed printf format errors. 1998-07-11 07:46:16 +00:00
Julian Elischer
fd5d1124e2 VOP_STRATEGY grows an (struct vnode *) argument
as the value in b_vp is often not really what you want.
(and needs to be frobbed). more cleanups will follow this.
Reviewed by: Bruce Evans <bde@freebsd.org>
1998-07-04 20:45:42 +00:00
John Dyson
bef608bd7e Some VM improvements, including elimination of alot of Sig-11
problems.  Tor Egge and others have helped with various VM bugs
lately, but don't blame him -- blame me!!!

pmap.c:
1)	Create an object for kernel page table allocations.  This
	fixes a bogus allocation method previously used for such, by
	grabbing pages from the kernel object, using bogus pindexes.
	(This was a code cleanup, and perhaps a minor system stability
	 issue.)

pmap.c:
2)	Pre-set the modify and accessed bits when prudent.  This will
	decrease bus traffic under certain circumstances.

vfs_bio.c, vfs_cluster.c:
3)	Rather than calculating the beginning virtual byte offset
	multiple times, stick the offset into the buffer header, so
	that the calculated offset can be reused.  (Long long multiplies
	are often expensive, and this is a probably unmeasurable performance
	improvement, and code cleanup.)

vfs_bio.c:
4)	Handle write recursion more intelligently (but not perfectly) so
	that it is less likely to cause a system panic, and is also
	much more robust.

vfs_bio.c:
5)	getblk incorrectly wrote out blocks that are incorrectly sized.
	The problem is fixed, and writes blocks out ONLY when B_DELWRI
	is true.

vfs_bio.c:
6)	Check that already constituted buffers have fully valid pages.  If
	not, then make sure that the B_CACHE bit is not set. (This was
	a major source of Sig-11 type problems.)

vfs_bio.c:
7)	Fix a potential system deadlock due to an incorrectly specified
	sleep priority while waiting for a buffer write operation.  The
	change that I made opens the system up to serious problems, and
	we need to examine the issue of process sleep priorities.

vfs_cluster.c, vfs_bio.c:
8)	Make clustered reads work more correctly (and more completely)
	when buffers are already constituted, but not fully valid.
	(This was another system reliability issue.)

vfs_subr.c, ffs_inode.c:
9)	Create a vtruncbuf function, which is used by filesystems that
	can truncate files.  The vinvalbuf forced a file sync type operation,
	while vtruncbuf only invalidates the buffers past the new end of file,
	and also invalidates the appropriate pages.  (This was a system reliabiliy
	and performance issue.)

10)	Modify FFS to use vtruncbuf.

vm_object.c:
11)	Make the object rundown mechanism for OBJT_VNODE type objects work
	more correctly.  Included in that fix, create pager entries for
	the OBJT_DEAD pager type, so that paging requests that might slip
	in during race conditions are properly handled.  (This was a system
	reliability issue.)

vm_page.c:
12)	Make some of the page validation routines be a little less picky
	about arguments passed to them.  Also, support page invalidation
	change the object generation count so that we handle generation
	counts a little more robustly.

vm_pageout.c:
13)	Further reduce pageout daemon activity when the system doesn't
	need help from it.  There should be no additional performance
	decrease even when the pageout daemon is running.  (This was
	a significant performance issue.)

vnode_pager.c:
14)	Teach the vnode pager to handle race conditions during vnode
	deallocations.
1998-03-16 01:56:03 +00:00
Mike Smith
86ffbd76d0 Complement diagnostic messages about missing per-FS VOP page operations,
but don't make their absence fatal.
Submitted by:	terry
1998-03-09 08:58:53 +00:00
John Dyson
8f9110f6a1 This mega-commit is meant to fix numerous interrelated problems. There
has been some bitrot and incorrect assumptions in the vfs_bio code.  These
problems have manifest themselves worse on NFS type filesystems, but can
still affect local filesystems under certain circumstances.  Most of
the problems have involved mmap consistancy, and as a side-effect broke
the vfs.ioopt code.  This code might have been committed seperately, but
almost everything is interrelated.

1)	Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that
	are fully valid.
2)	Rather than deactivating erroneously read initial (header) pages in
	kern_exec, we now free them.
3)	Fix the rundown of non-VMIO buffers that are in an inconsistent
	(missing vp) state.
4)	Fix the disassociation of pages from buffers in brelse.  The previous
	code had rotted and was faulty in a couple of important circumstances.
5)	Remove a gratuitious buffer wakeup in vfs_vmio_release.
6)	Remove a crufty and currently unused cluster mechanism for VBLK
	files in vfs_bio_awrite.  When the code is functional, I'll add back
	a cleaner version.
7)	The page busy count wakeups assocated with the buffer cache usage were
	incorrectly cleaned up in a previous commit by me.  Revert to the
	original, correct version, but with a cleaner implementation.
8)	The cluster read code now tries to keep data associated with buffers
	more aggressively (without breaking the heuristics) when it is presumed
	that the read data (buffers) will be soon needed.
9)	Change to filesystem lockmgr locks so that they use LK_NOPAUSE.  The
	delay loop waiting is not useful for filesystem locks, due to the
	length of the time intervals.
10)	Correct and clean-up spec_getpages.
11)	Implement a fully functional nfs_getpages, nfs_putpages.
12)	Fix nfs_write so that modifications are coherent with the NFS data on
	the server disk (at least as well as NFS seems to allow.)
13)	Properly support MS_INVALIDATE on NFS.
14)	Properly pass down MS_INVALIDATE to lower levels of the VM code from
	vm_map_clean.
15)	Better support the notion of pages being busy but valid, so that
	fewer in-transit waits occur.  (use p->busy more for pageouts instead
	of PG_BUSY.)  Since the page is fully valid, it is still usable for
	reads.
16)	It is possible (in error) for cached pages to be busy.  Make the
	page allocation code handle that case correctly.  (It should probably
	be a printf or panic, but I want the system to handle coding errors
	robustly.  I'll probably add a printf.)
17)	Correct the design and usage of vm_page_sleep.  It didn't handle
	consistancy problems very well, so make the design a little less
	lofty.  After vm_page_sleep, if it ever blocked, it is still important
	to relookup the page (if the object generation count changed), and
	verify it's status (always.)
18)	In vm_pageout.c, vm_pageout_clean had rotted, so clean that up.
19)	Push the page busy for writes and VM_PROT_READ into vm_pageout_flush.
20)	Fix vm_pager_put_pages and it's descendents to support an int flag
	instead of a boolean, so that we can pass down the invalidate bit.
1998-03-07 21:37:31 +00:00
John Dyson
ffc82b0a70 1) Use a more consistent page wait methodology.
2)	Do not unnecessarily force page blocking when paging
	pages out.
3)	Further improve swap pager performance and correctness,
	including fixing the paging in progress deadlock (except
	in severe I/O error conditions.)
4)	Enable vfs_ioopt=1 as a default.
5)	Fix and enable the page prezeroing in SMP mode.

All in all, SMP systems especially should show a significant
improvement in "snappyness."
1998-03-01 04:18:54 +00:00
Mike Smith
ce75f2c365 In the author's words:
These diffs implement the first stage of a VOP_{GET|PUT}PAGES pushdown
for local media FS's.

See ffs_putpages in /sys/ufs/ufs/ufs_readwrite.c for implementation
details for generic *_{get|put}pages for local media FS's.  Support
is trivial to add for any FS that formerly relied on the default
behaviour of the vnode_pager in in EOPNOTSUPP cases (just copy the
ffs_getpages() code for the FS in question's *_{get|put}pages).

Obviously, it would be better if each local media FS implemented a
more optimal method, instead of calling an exported interface from
the /sys/vm/vnode_pager.c, but this is a necessary first step in
getting the FS's to a point where they can be supplied with better
implementations on a case-by-case basis.

Obviously, the cd9660_putpages() can be rather trivial (since it
is a read-only FS type 8-)).

A slight (temporary) modification is made to print a diagnostic message
in the case where the underlying filesystem attempts to engage in the
previous behaviour.  Failure is likely to be ungraceful.

Submitted by:	terry@freebsd.org (Terry Lambert)
1998-02-26 06:39:59 +00:00