Commit Graph

231 Commits

Author SHA1 Message Date
Alan Cox
4221e284a3 The VFS/BIO subsystem contained a number of hacks in order to optimize
piecemeal, middle-of-file writes for NFS.  These hacks have caused no
end of trouble, especially when combined with mmap().  I've removed
them.  Instead, NFS will issue a read-before-write to fully
instantiate the struct buf containing the write.  NFS does, however,
optimize piecemeal appends to files.  For most common file operations,
you will not notice the difference.  The sole remaining fragment in
the VFS/BIO system is b_dirtyoff/end, which NFS uses to avoid cache
coherency issues with read-merge-write style operations.  NFS also
optimizes the write-covers-entire-buffer case by avoiding the
read-before-write.  There is quite a bit of room for further
optimization in these areas.

The VM system marks pages fully-valid (AKA vm_page_t->valid =
VM_PAGE_BITS_ALL) in several places, most noteably in vm_fault.  This
is not correct operation.  The vm_pager_get_pages() code is now
responsible for marking VM pages all-valid.  A number of VM helper
routines have been added to aid in zeroing-out the invalid portions of
a VM page prior to the page being marked all-valid.  This operation is
necessary to properly support mmap().  The zeroing occurs most often
when dealing with file-EOF situations.  Several bugs have been fixed
in the NFS subsystem, including bits handling file and directory EOF
situations and buf->b_flags consistancy issues relating to clearing
B_ERROR & B_INVAL, and handling B_DONE.

getblk() and allocbuf() have been rewritten.  B_CACHE operation is now
formally defined in comments and more straightforward in
implementation.  B_CACHE for VMIO buffers is based on the validity of
the backing store.  B_CACHE for non-VMIO buffers is based simply on
whether the buffer is B_INVAL or not (B_CACHE set if B_INVAL clear,
and vise-versa).  biodone() is now responsible for setting B_CACHE
when a successful read completes.  B_CACHE is also set when a bdwrite()
is initiated and when a bwrite() is initiated.  VFS VOP_BWRITE
routines (there are only two - nfs_bwrite() and bwrite()) are now
expected to set B_CACHE.  This means that bowrite() and bawrite() also
set B_CACHE indirectly.

There are a number of places in the code which were previously using
buf->b_bufsize (which is DEV_BSIZE aligned) when they should have
been using buf->b_bcount.  These have been fixed.  getblk() now clears
B_DONE on return because the rest of the system is so bad about
dealing with B_DONE.

Major fixes to NFS/TCP have been made.  A server-side bug could cause
requests to be lost by the server due to nfs_realign() overwriting
other rpc's in the same TCP mbuf chain.  The server's kernel must be
recompiled to get the benefit of the fixes.

Submitted by:	Matthew Dillon <dillon@apollo.backplane.com>
1999-05-02 23:57:16 +00:00
Mike Smith
f4711b2df4 Simplify the tunefs example, since tunefs uses getfsfile(). Lots of
people complain about working out what device their filesystems are
mounted on.
1999-04-27 21:11:19 +00:00
Kirk McKusick
38e28fd66b Reorganize locking to avoid holding the lock during calls to bdwrite
and brelse (which may sleep in some systems).

Obtained from:	Matthew Dillon <dillon@apollo.backplane.com>
1999-03-02 06:38:07 +00:00
Kirk McKusick
eef33ce9bd When fsync'ing a file on a filesystem using soft updates, we first try
to write all the dirty blocks. If some of those blocks have dependencies,
they will be remarked dirty when the I/O completes. On systems with
really fast I/O systems, it is possible to get in an infinite loop trying
to flush the buffers, because the I/O finishes before we can get all the
dirty buffers off the v_dirtyblkhd list and into the I/O queue. (The
previous algorithm looped over the v_dirtyblkhd list writing out buffers
until the list emptied.) So, now we mark each buffer that we try to
write so that we can distinguish the ones that are being remarked dirty
from those that we have not yet tried to flush. Once we have tried to
push every buffer once, we then push any associated metadata that is
causing the remaining buffers to be redirtied.

Submitted by:	Matthew Dillon <dillon@apollo.backplane.com>
1999-03-02 04:04:31 +00:00
Kirk McKusick
4cbb89d95d Ensure that softdep_sync_metadata can handle bmsafemap and mkdir entries
if they ever arise (which should not happen as softdep_sync_metadata is
currently used).
1999-03-02 00:19:47 +00:00
Kirk McKusick
133ff2619a fix double LIST_REMOVE; other cosmetic changes to match version 9.32.
Obtained from: Jeffrey Hsu <hsu@FreeBSD.ORG>
1999-02-17 20:01:20 +00:00
Matthew Dillon
8aef171243 Fix warnings in preparation for adding -Wall -Wcast-qual to the
kernel compile
1999-01-28 00:57:57 +00:00
David Greenman
8ab2fa0073 Gutted softdep_deallocate_dependencies and replaced it with a panic. It
turns out to not be useful to unwind the dependencies and continue in
the face of a fatal error.
Also changed the log() to a printf() in softdep_error() so that it will
be output in the case of a impending panic.
Submitted by:	Kirk McKusick <mckusick@mckusick.com>
1999-01-22 09:07:32 +00:00
Eivind Eklund
5b1b6c5859 Silence warning about unused debug function. (I'll turn this function
into a DDB command in my next staticization sweep).
1999-01-12 11:42:41 +00:00
Eivind Eklund
a862221fa0 Add a warning about the copyright restraints. 1999-01-08 16:03:12 +00:00
Bruce Evans
de5d1ba57c Don't pass unused unused timestamp args to UFS_UPDATE() or waste
time initializing them.  This almost finishes centralizing (in-core)
timestamp updates in ufs_itimes().
1999-01-07 16:14:19 +00:00
Bruce Evans
4591d9bb7e UFS_UPDATE() takes a boolean `waitfor' arg, so don't pass it the value
MNT_WAIT when we mean boolean `true' or check for that value not being
passed.  There was no problem in practice because MNT_WAIT had the
magic value of 1.
1999-01-06 18:18:06 +00:00
Bruce Evans
d64dbc8719 Ifdefed the conditionally used variable `prtrealloc'. Declare it
as volatile so that there is no chance that the code that it controls
is optimised away.
1999-01-06 17:04:33 +00:00
Bruce Evans
5991fd0370 Backed out rev.1.47. It just broke my optimisations for lazy syncing
of timestamps in rev.1.45.  The soft updates bug was elsewhere.

Forgotten by:	luoqi
1999-01-06 16:52:38 +00:00
Eivind Eklund
fb1167777a Remove the 'waslocked' parameter to vfs_object_create(). 1999-01-05 18:50:03 +00:00
Eivind Eklund
a777e82019 Remove the last clients of vfs_object_create(..., waslocked=1);
waslocked will go away shortly.

Reviewed by:	dg
1999-01-02 01:32:36 +00:00
Julian Elischer
1f35e8c8da Remove some compiler warnings. 1998-12-10 20:11:47 +00:00
Bruce Evans
672be20b9f Don't use the strange null pointer constant `(ufs_daddr_t)0' in a call
to VOP_BMAP().  Don't use uncast NULLs in the same call.
1998-11-29 03:12:06 +00:00
David Greenman
1c680b45a2 Restored the "reallocblks" code to its former glory. What this does is
basically do a on-the-fly defragmentation of the FFS filesystem, changing
file block allocations to make them contiguous. Thanks to Kirk McKusick
for providing hints on what needed to be done to get this working.
1998-11-13 01:01:44 +00:00
Peter Wemm
2ec07c6614 Change dirty block list handling to use TAILQ macros. 1998-10-31 15:33:32 +00:00
Peter Wemm
40c8cfe552 Use TAILQ macros for clean/dirty block list processing. Set b_xflags
rather than abusing the list next pointer with a magic number.
1998-10-31 15:31:29 +00:00
Jordan K. Hubbard
2dcc2f0693 Clarify a rather ambiguous debugging message. 1998-10-28 10:37:54 +00:00
Bruce Evans
b5ee16407f Oops, the redundant tests for major numbers weren't redundant here.
They checked for the magic major number for the "device" behind mfs
mount points.  Use a more obvious check for this device.

Debugged by:		Andrew Gallatin <gallatin@cs.duke.edu>
1998-10-27 11:47:08 +00:00
Bruce Evans
9c0619dace Don't follow null bdevsw pointers. The `major(dev) < nblkdev' test rotted
when bdevsw[] became sparse.  We still depend on magic to avoid having to
check that (v_rdev) device numbers in vnodes are not NODEV.

Removed redundant `major(dev) < nblkdev' tests instead of updating them.
1998-10-25 19:02:48 +00:00
Poul-Henning Kamp
f5ef029e92 Nitpicking and dusting performed on a train. Removes trivial warnings
about unused variables, labels and other lint.
1998-10-25 17:44:59 +00:00
Nate Williams
ed8d80c2de Fix 'noatime' bug that was unrelated to use of noatime.
The problem is caused when a directory block is compacted.  When this
occurs, softdep_change_directoryentry_offset() is called to relocate each
directory entry and adjust its matching diradd structure, if any, to match
the new location of the entry.  The bug is that while
softdep_change_directoryentry_offset() correctly adjusts the offsets of
the diradd structures on the pd_diraddhd[] lists (which are not yet ready
to be committed to disk), it fails to adjust the offsets of the diradd
structures on the pd_pendinghd list (which are ready to be committed to
disk).  This causes the dependency structures to be inconsistent with
the buf contents.  Now, if the compaction has moved a directory entry to
the same offset as one of the diradd structures on the pd_pendinghd list
*and* a syscall is done that tries to remove this directory entry before
this directory block has been written to disk (which would empty
pd_pendinghd), a sanity check in newdirrem() will call panic() when it
notices that the inode number in the entry that it is to be removed doesn't
match the inode number in the diradd structure with that offset of that
entry.

Reviewed by:	Kirk McKusick <mckusick@McKusick.COM>
Submitted by:	Don Lewis <Don.Lewis@tsc.tdk.com>
1998-10-03 19:17:11 +00:00
Bruce Evans
0922cce61c Fixed clean flag handling:
- don't set the clean flag on unmount of an unclean filesystem that was
  (forcibly) mounted rw.
- set the clean flag on rw -> ro update of a mounted initially-clean
  filesystem.
- fixed some style bugs (mostly long lines).

This uses the fs_flags field and FS_UNCLEAN state bit which were
introduced in the softdep changes.  NetBSD uses extra state bits in
fs_clean.

Reviewed by:	luoqui
1998-09-26 04:59:42 +00:00
Luoqi Chen
e266594c25 Eliminate a race in VOP_FSYNC() when softupdates is enabled.
Submitted by:	Kirk McKusick	<mckusick@McKusick.COM>
Two minor changes are also included,
1. Remove gratuitious checks for error return from vn_lock with LK_RETRY set,
   vn_lock should always succeed in these cases.
2. Back out change rev. 1.36->1.37, which unnecessarily makes async mount
   a little more unstable. It also keeps us in sync with other BSDs.
Suggested by:	Bruce Evans	<bde@zeta.org.au>
1998-09-24 15:02:46 +00:00
Luoqi Chen
f9e84c2fee Restore pre-v1.44 behavior: always copy modified in-core inode to disk
buffer. Otherwise some in-core inode changes might be lost, including
important meta data (e.g. size) if softupdates is enabled.
1998-09-15 14:45:28 +00:00
Søren Schmidt
d024c95599 Remove the SLICE code.
This clearly needs alot more thought, and we dont need this to hunt
us down in 3.0-RELEASE.
1998-09-14 19:56:42 +00:00
Bruce Evans
9164000766 Don't dereference an uninitialized pointer in dead code. The dead
code gets executed if it is compiled without optimization.
1998-09-12 14:46:15 +00:00
Bruce Evans
8994ca3ce9 Removed statically configured mount type numbers (MOUNT_*) and all
references to them.

The change a couple of days ago to ignore these numbers in statically
configured vfsconf structs was slightly premature because the cd9660,
cfs, devfs, ext2fs, nfs vfs's still used MOUNT_* instead of the number
in their vfsconf struct.
1998-09-07 13:17:06 +00:00
Bruce Evans
ff261f16f6 Put the zombie ffs sysctl node in "notyet" state together with its few
remaining children.  Prepare it for MOUNT_UFS going away.
1998-09-07 11:50:19 +00:00
Poul-Henning Kamp
0375c9f2b8 Add a new vnode op, VOP_FREEBLKS(), which filesystems can use to inform
device drivers about sectors no longer in use.

Device-drivers receive the call through d_strategy, if they have
D_CANFREE in d_flags.

This allows flash based devices to erase the sectors and avoid
pointlessly carrying them around in compactions.

Reviewed by:	Kirk Mckusick, bde
Sponsored by:	M-Systems (www.m-sys.com)
1998-09-05 14:13:12 +00:00
Bruce Evans
0492d857d1 Removed unused includes. 1998-08-17 19:09:36 +00:00
Julian Elischer
55d80b2df1 Handle the case of moving a directory onto the top of a sibling's
child of the same name.

Submitted by:	Kirk Mckusick with fixes from luoqi Chen
Obtained from:   Whistle test tree.
1998-08-12 20:46:47 +00:00
Bruce Evans
ac1e407b32 Fixed printf format errors. 1998-07-11 07:46:16 +00:00
Julian Elischer
bcbd6c6fdd Don't update superblock if mounted readonly,
also fixes some problems with softupdates on root.
More cleanups are needed here..
Submitted by: Luoqi Chen <luoqi@watermarkgroup.com>
1998-07-08 23:52:27 +00:00
Julian Elischer
fd5d1124e2 VOP_STRATEGY grows an (struct vnode *) argument
as the value in b_vp is often not really what you want.
(and needs to be frobbed). more cleanups will follow this.
Reviewed by: Bruce Evans <bde@freebsd.org>
1998-07-04 20:45:42 +00:00
Bruce Evans
3055187290 Sync timestamp changes for inodes of special files to disk as late
as possible (when the inode is reclaimed).  Temporarily only do
this if option UFS_LAZYMOD configured and softupdates aren't enabled.
UFS_LAZYMOD is intentionally left out of /sys/conf/options.

This is mainly to avoid almost useless disk i/o on battery powered
machines.  It's silly to write to disk (on the next sync or when the
inode becomes inactive) just because someone hit a key or something
wrote to the screen or /dev/null.

PR:		5577
Previous version reviewed by:	phk
1998-07-03 22:17:03 +00:00
Bruce Evans
33cc029eab Centralized in-core inode update. Update the in-core inode directly
in ufs_setattr() so that there is no need to pass timestamps to
UFS_UPDATE() (everything else just needs the current time).  Ignore
the passed-in timestamps in UFS_UPDATE() and always call ufs_itimes()
(was: itimes()) to do the update.  The timestamps are still passed
so that all the callers don't need to be changed yet.
1998-07-03 18:46:52 +00:00
Jordan K. Hubbard
d94ce17be4 Flesh this document out just a little in response to some user
questions and also recommend linking over copying since, at this stage,
a stale copy is a real concern.
1998-06-26 10:35:55 +00:00
Julian Elischer
c619155f0e Slight change to directory cleanup
Makes soft updates a bit cleaner. Eliminates some warnings about
'corrupted directories' from fsck.
1998-06-14 19:31:28 +00:00
Julian Elischer
28ed032673 Note which version of Kirk's sources this corresponds to. 1998-06-12 21:21:26 +00:00
Julian Elischer
aa75cb86b4 Fix the case when renaming to a file that you've just created and deleted,
that had an inode that has not yet been written to disk, when the inode of the
new file is also not yet written to disk, and your old directory entry is not
yet on disk but you need to remove it and the new name exists in memory
but has been deleted but the transaction to write the deleted name to disk
exists and has not yet been cancelled by the request to delete the non
existant name.  I don't know how kirk could have missed such a glaring
problem for so long. :-) Especially since the inconsitency survived on
the disk for a whole 4 second on average before being fixed by other code.
This was not a crashing bug but just led to filesystem inconsitencies
if you crashed.

Submitted by: Kirk McKusick (mckusick@mckusick.com)
1998-06-12 20:48:30 +00:00
Julian Elischer
6d0ba44288 Add B_NOCACHE to several cases where BSD4.4 only required a B_INVAL.
Change worked out by john and kirk in consort.
1998-06-11 17:44:32 +00:00
Julian Elischer
8c221701c3 Fix for "live inode" panic.
Submitted by: Kirk McKusick <mckusick@McKusick.COM>
Reviewed by: yeah right...
1998-06-10 20:45:46 +00:00
Julian Elischer
4af0bb0f9e Remove buggy debugging code. 1998-06-10 20:03:16 +00:00
Julian Elischer
939001af5c Back out John's changes 1.45 -> 1.46
Kirk confirms that the original semantic was what he wanted...
(well, a very slight difference)
May fix "dangling deps" panic with soft updates.
1998-06-10 19:27:56 +00:00
Doug Rabson
8435e0aef5 Use size_t instead of u_int for sizes. 1998-06-04 17:21:39 +00:00