Commit Graph

459 Commits

Author SHA1 Message Date
Jason Evans
1b367556b5 Convert all simplelocks to mutexes and remove the simplelock implementations. 2001-01-24 12:35:55 +00:00
Ian Dowse
f55ff3f3ef The ffs superblock includes a 128-byte region for use by temporary
in-core pointers to summary information. An array in this region
(fs_csp) could overflow on filesystems with a very large number of
cylinder groups (~16000 on i386 with 8k blocks). When this happens,
other fields in the superblock get corrupted, and fsck refuses to
check the filesystem.

Solve this problem by replacing the fs_csp array in 'struct fs'
with a single pointer, and add padding to keep the length of the
128-byte region fixed. Update the kernel and userland utilities
to use just this single pointer.

With this change, the kernel no longer makes use of the superblock
fields 'fs_csshift' and 'fs_csmask'. Add a comment to newfs/mkfs.c
to indicate that these fields must be calculated for compatibility
with older kernels.

Reviewed by:	mckusick
2001-01-15 18:30:40 +00:00
Kirk McKusick
cb3ab5aaf7 Properly compute the size of the final block of superblock summary information.
Submitted by:	Ian Dowse <iedowse@maths.tcd.ie>
2001-01-12 21:56:55 +00:00
Kirk McKusick
48d617487d Several small but important fixes for snapshots:
1) Be more tolerant of missing snapshot files by only trying to decrement
   their reference count if they are registered as active.

2) Fix for snapshots of filesystems with block sizes larger than 8K
   (from Ollivier Robert <roberto@eurocontrol.fr>).

3) Fix to avoid losing last block in snapshot file when calculating blocks
   that need to be copied (from Don Coleman <coleman@coleman.org>).
2000-12-19 04:41:09 +00:00
Kirk McKusick
6da443cb22 Get rid of spurious check in ffs_truncate for i_size == length
which fails to set the modification time on the file. The same
check a few lines later takes the correct action.

Submitted by:	Ian Dowse <iedowse@maths.tcd.ie>
2000-12-19 04:20:13 +00:00
Assar Westerlund
ca85ca6099 add a stub for softdep_slowdown so that it's possible to build the
kernel without SOFTUPDATES
2000-12-17 23:59:56 +00:00
Seigo Tanimura
937c4dfa08 Do not race for the lock of an inode hash.
Reviewed by:	jhb
2000-12-13 10:04:01 +00:00
Kirk McKusick
1d733bbd10 Preventing runaway kernel soft updates memory, take three.
Previously, the syncer process was the only process in the
system that could process the soft updates background work
list. If enough other processes were adding requests to that
list, it would eventually grow without bound. Because some of
the work list requests require vnodes to be locked, it was
not generally safe to let random processes process the work
list while they already held vnodes locked. By adding a flag
to the work list queue processing function to indicate whether
the calling process could safely lock vnodes, it becomes possible
to co-opt other processes into helping out with the work list.
Now when the worklist gets too large, other processes can safely
help out by picking off those work requests that can be handled
without locking a vnode, leaving only the small number of
requests requiring a vnode lock for the syncer process. With
this change, it appears possible to keep even the nastiest
workloads under control.

Submitted by:	Paul Saab <ps@yahoo-inc.com>
2000-12-13 08:30:35 +00:00
David Malone
7cc0979fd6 Convert more malloc+bzero to malloc+M_ZERO.
Submitted by:	josh@zipperup.org
Submitted by:	Robert Drehmel <robd@gmx.net>
2000-12-08 21:51:06 +00:00
Poul-Henning Kamp
959b7375ed Staticize some malloc M_ instances. 2000-12-08 20:09:00 +00:00
Kirk McKusick
71868b020d More aggressively rate limit the growth of soft dependency structures
in the face of multiple processes doing massive numbers of filesystem
operations. While this patch will work in nearly all situations, there
are still some perverse workloads that can overwhelm the system.
Detecting and handling these perverse workloads will be the subject
of another patch.

Reviewed by:	Paul Saab <ps@yahoo-inc.com>
Obtained from:	Ethan Solomita <ethan@geocast.com>
2000-11-20 06:22:39 +00:00
Matthew Dillon
936524aa02 Implement a low-memory deadlock solution.
Removed most of the hacks that were trying to deal with low-memory
    situations prior to now.

    The new code is based on the concept that I/O must be able to function in
    a low memory situation.  All major modules related to I/O (except
    networking) have been adjusted to allow allocation out of the system
    reserve memory pool.  These modules now detect a low memory situation but
    rather then block they instead continue to operate, then return resources
    to the memory pool instead of cache them or leave them wired.

    Code has been added to stall in a low-memory situation prior to a vnode
    being locked.

    Thus situations where a process blocks in a low-memory condition while
    holding a locked vnode have been reduced to near nothing.  Not only will
    I/O continue to operate, but many prior deadlock conditions simply no
    longer exist.

Implement a number of VFS/BIO fixes

	(found by Ian): in biodone(), bogus-page replacement code, the loop
        was not properly incrementing loop variables prior to a continue
        statement.  We do not believe this code can be hit anyway but we
        aren't taking any chances.  We'll turn the whole section into a
        panic (as it already is in brelse()) after the release is rolled.

	In biodone(), the foff calculation was incorrectly
        clamped to the iosize, causing the wrong foff to be calculated
        for pages in the case of an I/O error or biodone() called without
        initiating I/O.  The problem always caused a panic before.  Now it
        doesn't.  The problem is mainly an issue with NFS.

	Fixed casts for ~PAGE_MASK.  This code worked properly before only
        because the calculations use signed arithmatic.  Better to properly
        extend PAGE_MASK first before inverting it for the 64 bit masking
        op.

	In brelse(), the bogus_page fixup code was improperly throwing
        away the original contents of 'm' when it did the j-loop to
        fix the bogus pages.  The result was that it would potentially
        invalidate parts of the *WRONG* page(!), leading to corruption.

	There may still be cases where a background bitmap write is
        being duplicated, causing potential corruption.  We have identified
        a potentially serious bug related to this but the fix is still TBD.
        So instead this patch contains a KASSERT to detect the problem
  	and panic the machine rather then continue to corrupt the filesystem.
	The problem does not occur very often..  it is very hard to
	reproduce, and it may or may not be the cause of the corruption
	people have reported.

Review by: (VFS/BIO: mckusick, Ian Dowse <iedowse@maths.tcd.ie>)
Testing by: (VM/Deadlock) Paul Saab <ps@yahoo-inc.com>
2000-11-18 23:06:26 +00:00
Kirk McKusick
bd4bd019fb When deleting a file, the ordering of events imposed by soft updates
is to first write the deleted directory entry to disk, second write
the zero'ed inode to disk, and finally to release the freed blocks
and the inode back to the cylinder-group map. As this ordering
requires two disk writes to occur which are normally spaced about
30 seconds apart (except when memory is under duress), it takes
about a minute from the time that a file is deleted until its inode
and data blocks show up in the cylinder-group map for reallocation.
If a file has had only a brief lifetime (less than 30 seconds from
creation to deletion), neither its inode nor its directory entry
may have been written to disk. If its directory entry has not been
written to disk, then we need not wait for that directory block to
be written as the on-disk directory block does not reference the
inode. Similarly, if the allocated inode has never been written to
disk, we do not have to wait for it to be written back either as
its on-disk representation is still zero'ed out. Thus, in the case
of a short lived file, we can simply release the blocks and inode
to the cylinder-group map immediately. As the inode and its blocks
are released immediately, they are immediately available for other
uses. If they are not released for a minute, then other inodes and
blocks must be allocated for short lived files, cluttering up the
vnode and buffer caches. The previous code was a bit too aggressive
in trying to release the blocks and inode back to the cylinder-group
map resulting in their being made available when in fact the inode
on disk had not yet been zero'ed. This patch takes a more conservative
approach to doing the release which avoids doing the release prematurely.
2000-11-14 09:00:25 +00:00
Adrian Chadd
0b0c10b48d Initial commit of IFS - a inode-namespaced FFS. Here is a short
description:

How it works:
--

Basically ifs is a copy of ffs, overriding some vfs/vnops. (Yes, hack.)
I didn't see the need in duplicating all of sys/ufs/ffs to get this
off the ground.

File creation is done through a special file - 'newfile' . When newfile
is called, the system allocates and returns an inode. Note that newfile
is done in a cloning fashion:

fd = open("newfile", O_CREAT|O_RDWR, 0644);
fstat(fd, &st);

printf("new file is %d\n", (int)st.st_ino);

Once you have created a file, you can open() and unlink() it by its returned
inode number retrieved from the stat call, ie:

fd = open("5", O_RDWR);

The creation permissions depend entirely if you have write access to the
root directory of the filesystem.

To get the list of currently allocated inodes, VOP_READDIR has been added
which returns a directory listing of those currently allocated.

--

What this entails:

* patching conf/files and conf/options to include IFS as a new compile
  option (and since ifs depends upon FFS, include the FFS routines)

* An entry in i386/conf/NOTES indicating IFS exists and where to go for
  an explanation

* Unstaticize a couple of routines in src/sys/ufs/ffs/ which the IFS
  routines require (ffs_mount() and ffs_reload())

* a new bunch of routines in src/sys/ufs/ifs/ which implement the IFS
  routines. IFS replaces some of the vfsops, and a handful of vnops -
  most notably are VFS_VGET(), VOP_LOOKUP(), VOP_UNLINK() and VOP_READDIR().
  Any other directory operation is marked as invalid.

What this results in:

* an IFS partition's create permissions are controlled by the perm/ownership of
  the root mount point, just like a normal directory

* Each inode has perm and ownership too

* IFS does *NOT* mean an FFS partition can be opened per inode. This is a
  completely seperate filesystem here

* Softupdates doesn't work with IFS, and really I don't think it needs it.
  Besides, fsck's are FAST. (Try it :-)

* Inodes 0 and 1 aren't allocatable because they are special (dump/swap IIRC).
  Inode 2 isn't allocatable since UFS/FFS locks all inodes in the system against
  this particular inode, and unravelling THAT code isn't trivial. Therefore,
  useful inodes start at 3.

Enjoy, and feedback is definitely appreciated!
2000-10-14 03:02:30 +00:00
Eivind Eklund
7eb9fca557 Blow away the v_specmountpoint define, replacing it with what it was
defined as (rdev->si_mountpoint)
2000-10-09 17:31:39 +00:00
Robert Watson
ff435dcb91 o Move initialization of ump from mp to the top of the function so that
it is defined whenm used in ufs_extattr_uepm_destroy(), fixing a panic
  due to a NULL pointer dereference.

Submitted by:	Wesley Morgan <morganw@chemicals.tacorp.com>
2000-10-06 15:31:28 +00:00
Robert Watson
9de54ba513 o Add call to ufs_extattr_uepm_destroy() in ffs_unmount() so as to clean
up lock on extattrs.
o Get for free a comment indicating where auto-starting of extended
  attributes will eventually occur, as it was in my commit tree also.
  No implementation change here, only a comment.
2000-10-04 04:44:51 +00:00
Jason Evans
a18b1f1d4d Convert lockmgr locks from using simple locks to using mutexes.
Add lockdestroy() and appropriate invocations, which corresponds to
lockinit() and must be called to clean up after a lockmgr lock is no
longer needed.
2000-10-04 01:29:17 +00:00
Boris Popov
67e871664b Add a lock structure to vnode structure. Previously it was either allocated
separately (nfs, cd9660 etc) or keept as a first element of structure
referenced by v_data pointer(ffs). Such organization leads to known problems
with stacked filesystems.

From this point vop_no*lock*() functions maintain only interlock lock.
vop_std*lock*() functions maintain built-in v_lock structure using lockmgr().
vop_sharedlock() is compatible with vop_stdunlock(), but maintains a shared
lock on vnode.

If filesystem wishes to export lockmgr compatible lock, it can put an address
of this lock to v_vnlock field. This indicates that the upper filesystem
can take advantage of it and use single lock structure for entire (or part)
of stack of vnodes. This field shouldn't be examined or modified by VFS code
except for initialization purposes.

Reviewed in general by:	mckusick
2000-09-25 15:24:04 +00:00
Robert Watson
907da7c385 o Permit UFS Extended Attributes to be associated with special devices
and FIFOs.

Obtained from:	TrustedBSD Project
2000-09-21 19:06:02 +00:00
Dag-Erling Smørgrav
8461bdba85 Silence a warning. 2000-09-17 19:41:26 +00:00
Kirk McKusick
52a3bfa2e7 Cannot do MALLOC with M_WAITOK while holding ACQUIRE_LOCK
Obtained from:	Ethan Solomita <ethan@geocast.com>
2000-09-07 23:02:55 +00:00
Jason Evans
0384fff8c5 Major update to the way synchronization is done in the kernel. Highlights
include:

* Mutual exclusion is used instead of spl*().  See mutex(9).  (Note: The
  alpha port is still in transition and currently uses both.)

* Per-CPU idle processes.

* Interrupts are run in their own separate kernel threads and can be
  preempted (i386 only).

Partially contributed by:	BSDi (BSD/OS)
Submissions by (at least):	cp, dfr, dillon, grog, jake, jhb, sheldonh
2000-09-07 01:33:02 +00:00
Tor Egge
b5ee7ec63a Initialize *countp to 0 in stub for softdep_flushworklist().
This allows ffs_fsync() to break out of a loop that might otherwise
be infinite on kernels compiled without the SOFTUPDATES option.
The observed symptom was a system hang at the first unmount attempt.
2000-08-09 00:41:54 +00:00
Ollivier Robert
8694d8e912 Fix the lockmgr panic everyone is seeing at shutdown time.
vput assumes curproc is the lock holder, but it's not true in this case.

Thanks a lot Luoqi !

Submitted by:	luoqi
Tested by:	phk
2000-08-01 14:15:07 +00:00
Peter Wemm
6ee6b42ef7 Minor change: fix warning - move a 'struct vnode *vp' declaration inside a
#ifdef DIAGNOSTIC to match its corresponding usage.
2000-07-28 22:27:00 +00:00
Kirk McKusick
3592b7155c Clean up the snapshot code so that it no longer depends on the use of
the SF_IMMUTABLE flag to prevent writing. Instead put in explicit
checking for the SF_SNAPSHOT flag in the appropriate places. With
this change, it is now possible to rename and link to snapshot files.
It is also possible to set or clear any of the owner, group, or
other read bits on the file, though none of the write or execute
bits can be set. There is also an explicit test to prevent the
setting or clearing of the SF_SNAPSHOT flag via chflags() or
fchflags(). Note also that the modify time cannot be changed as
it needs to accurately reflect the time that the snapshot was taken.

Submitted by:	Robert Watson <rwatson@FreeBSD.org>
2000-07-26 23:07:01 +00:00
Kirk McKusick
55ba28c60a Add stub for softdep_flushworklist() so that kernels compiled
without the SOFTUPDATES option will load correctly.

Obtained from:	John Baldwin <jhb@bsdi.com>
2000-07-25 05:28:59 +00:00
Kirk McKusick
9b97113391 This patch corrects the first round of panics and hangs reported
with the new snapshot code.

Update addaliasu to correctly implement the semantics of the old
checkalias function. When a device vnode first comes into existence,
check to see if an anonymous vnode for the same device was created
at boot time by bdevvp(). If so, adopt the bdevvp vnode rather than
creating a new vnode for the device. This corrects a problem which
caused the kernel to panic when taking a snapshot of the root
filesystem.

Change the calling convention of vn_write_suspend_wait() to be the
same as vn_start_write().

Split out softdep_flushworklist() from softdep_flushfiles() so that
it can be used to clear the work queue when suspending filesystem
operations.

Access to buffers becomes recursive so that snapshots can recursively
traverse their indirect blocks using ffs_copyonwrite() when checking
for the need for copy on write when flushing one of their own indirect
blocks. This eliminates a deadlock between the syncer daemon and a
process taking a snapshot.

Ensure that softdep_process_worklist() can never block because of a
snapshot being taken. This eliminates a problem with buffer starvation.

Cleanup change in ffs_sync() which did not synchronously wait when
MNT_WAIT was specified. The result was an unclean filesystem panic
when doing forcible unmount with heavy filesystem I/O in progress.

Return a zero'ed block when reading a block that was not in use at
the time that a snapshot was taken. Normally, these blocks should
never be read. However, the readahead code will occationally read
them which can cause unexpected behavior.

Clean up the debugging code that ensures that no blocks be written
on a filesystem while it is suspended. Snapshots must explicitly
label the blocks that they are writing during the suspension so that
they do not cause a `write on suspended filesystem' panic.

Reorganize ffs_copyonwrite() to eliminate a deadlock and also to
prevent a race condition that would permit the same block to be
copied twice. This change eliminates an unexpected soft updates
inconsistency in fsck caused by the double allocation.

Use bqrelse rather than brelse for buffers that will be needed
soon again by the snapshot code. This improves snapshot performance.
2000-07-24 05:28:33 +00:00
Boris Popov
3fbd97427e Prevent possible dereference of NULL pointer.
Submitted by:	Marius Bendiksen <mbendiks@eunet.no>
2000-07-13 02:17:14 +00:00
Kirk McKusick
d303f71fdc Brain fault, forgot to update ffs_snapshot.c with the new calling convention
for vn_start_write.
2000-07-12 00:27:27 +00:00
Kirk McKusick
f2a2857bb3 Add snapshots to the fast filesystem. Most of the changes support
the gating of system calls that cause modifications to the underlying
filesystem. The gating can be enabled by any filesystem that needs
to consistently suspend operations by adding the vop_stdgetwritemount
to their set of vnops. Once gating is enabled, the function
vfs_write_suspend stops all new write operations to a filesystem,
allows any filesystem modifying system calls already in progress
to complete, then sync's the filesystem to disk and returns. The
function vfs_write_resume allows the suspended write operations to
begin again. Gating is not added by default for all filesystems as
for SMP systems it adds two extra locks to such critical kernel
paths as the write system call. Thus, gating should only be added
as needed.

Details on the use and current status of snapshots in FFS can be
found in /sys/ufs/ffs/README.snapshot so for brevity and timelyness
is not included here. Unless and until you create a snapshot file,
these changes should have no effect on your system (famous last words).
2000-07-11 22:07:57 +00:00
Kirk McKusick
d4c1816924 Clean up warning about undeclared function by declaring softdep_fsync
in mount.h instead of ffs_extern.h. The correct solution is to use
an indirect function pointer so that the kernel does not have to be
built with options FFS, but that will be left for another day.
2000-07-11 19:28:26 +00:00
Kirk McKusick
cc3962a9cd Delete README as it is now obsolete. Relevant information is in
README.softupdates.
2000-07-08 02:32:49 +00:00
Kirk McKusick
876578906d Update to reflect current status. 2000-07-08 02:31:21 +00:00
Kirk McKusick
22e5a6234e Get userland visible flags added for snapshots to give a few days
advance preparation for them to get migrated into place so that
subsequent changes in utilities will not fail to compile for lack
of up-to-date header files in /usr/include.
2000-07-04 04:58:34 +00:00
Poul-Henning Kamp
3275cf7379 Make the two calls from kern/* into softupdates #ifdef SOFTUPDATES,
that is way cleaner than using the softupdates_stub stunt, which
should be killed when convenient.

Discussed with:	mckusick
2000-07-03 13:26:54 +00:00
Andrey A. Chernov
2d90744fd8 Remove obsoleted info about linking from contrib 2000-06-24 13:29:25 +00:00
Kirk McKusick
858c16fab8 Update to new copyright. 2000-06-22 00:29:53 +00:00
Kirk McKusick
6019e6208f When running with quotas enabled on a filesystem using soft updates,
the system would panic when a user's inode quota was exceeded (see
PR 18959 for details). This fixes that problem.

PR:		18959
Submitted by:	Jason Godsey <jason@unixguy.fidalgo.net>
2000-06-18 22:14:28 +00:00
Kirk McKusick
d3abb52714 Some additional performance improvements. When freeing an inode
check to see if it has been committed to disk. If it has never
been written, it can be freed immediately. For short lived files
this change allows the same inode to be reused repeatedly.
Similarly, when upgrading a fragment to a larger size, if it
has never been claimed by an inode on disk, it too can be freed
immediately making it available for reuse often in the next slowly
growing block of the same file.
2000-06-18 22:05:57 +00:00
Poul-Henning Kamp
7c50d77218 Revert part of my bioops change which implemented panic(8). 2000-06-16 14:32:13 +00:00
Poul-Henning Kamp
7523681895 ARGH! I have too many source trees :-(
Fix prototype errors in last commit.
2000-06-16 13:00:33 +00:00
Poul-Henning Kamp
a2e7a027a7 Virtualizes & untangles the bioops operations vector.
Ref: Message-ID: <18317.961014572@critter.freebsd.dk> To: current@
2000-06-16 08:48:51 +00:00
Poul-Henning Kamp
6ea6805f8c Remove a comment which should never have made it in. 2000-06-14 21:48:19 +00:00
Robert Watson
b2b0497ab5 o If FFS_EXTATTR is defined, don't print out an error message on unmount
if an FFS partition returns EOPNOTSUPP, as it just means extended
  attributes weren't enabled on that partition.  Prevents spurious
  warning per-partition at shutdown.
2000-06-04 04:50:36 +00:00
Jake Burkholder
e39756439c Back out the previous change to the queue(3) interface.
It was not discussed and should probably not happen.

Requested by:		msmith and others
2000-05-26 02:09:24 +00:00
Jake Burkholder
740a1973a6 Change the way that the queue(3) structures are declared; don't assume that
the type argument to *_HEAD and *_ENTRY is a struct.

Suggested by:	phk
Reviewed by:	phk
Approved by:	mdodd
2000-05-23 20:41:01 +00:00
Robert Watson
f3706a0361 s/ffs_unmonut/ffs_unmount/ in a gratuitous ufs_extattr printf.
Reported by:	knu
2000-05-07 17:21:08 +00:00
Poul-Henning Kamp
9626b608de Separate the struct bio related stuff out of <sys/buf.h> into
<sys/bio.h>.

<sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall
not be made a nested include according to bdes teachings on the
subject of nested includes.

Diskdrivers and similar stuff below specfs::strategy() should no
longer need to include <sys/buf.> unless they need caching of data.

Still a few bogus uses of struct buf to track down.

Repocopy by:    peter
2000-05-05 09:59:14 +00:00
Poul-Henning Kamp
2c9b67a8df Remove unneeded #include <vm/vm_zone.h>
Generated by:	src/tools/tools/kerninclude
2000-04-30 18:52:11 +00:00
Poul-Henning Kamp
87150cb06d s/biowait/bufwait/g
Prodded by: several.
2000-04-29 16:25:22 +00:00
Poul-Henning Kamp
eb95c536ad Remove unneeded #include <sys/kernel.h> 2000-04-29 15:36:14 +00:00
Poul-Henning Kamp
3389ae9350 Remove ~25 unneeded #include <sys/conf.h>
Remove ~60 unneeded #include <sys/malloc.h>
2000-04-19 14:58:28 +00:00
Robert Watson
a64ed08955 Introduce extended attribute support for FFS, allowing arbitrary
(name, value) pairs to be associated with inodes.  This support is
used for ACLs, MAC labels, and Capabilities in the TrustedBSD
security extensions, which are currently under development.

In this implementation, attributes are backed to data vnodes in the
style of the quota support in FFS.  Support for FFS extended
attributes may be enabled using the FFS_EXTATTR kernel option
(disabled by default).  Userland utilities and man pages will be
committed in the next batch.  VFS interfaces and man pages have
been in the repo since 4.0-RELEASE and are unchanged.

o ufs/ufs/extattr.h: UFS-specific extattr defines
o ufs/ufs/ufs_extattr.c: bulk of support routines
o ufs/{ufs,ffs,mfs}/*.[ch]: hooks and extattr.h includes
o contrib/softupdates/ffs_softdep.c: extattr.h includes
o conf/options, conf/files, i386/conf/LINT: added FFS_EXTATTR

o coda/coda_vfsops.c: XXX required extattr.h due to ufsmount.h
(This should not be the case, and will be fixed in a future commit)

Currently attributes are not supported in MFS.  This will be fixed.

Reviewed by:	adrian, bp, freebsd-fs, other unthanked souls
Obtained from:	TrustedBSD Project
2000-04-15 03:34:27 +00:00
Poul-Henning Kamp
c244d2de43 Move B_ERROR flag to b_ioflags and call it BIO_ERROR.
(Much of this done by script)

Move B_ORDERED flag to b_ioflags and call it BIO_ORDERED.

Move b_pblkno and b_iodone_chain to struct bio while we transition, they
will be obsoleted once bio structs chain/stack.

Add bio_queue field for struct bio aware disksort.

Address a lot of stylistic issues brought up by bde.
2000-04-02 15:24:56 +00:00
Poul-Henning Kamp
b99c307a21 Rename the existing BUF_STRATEGY() to DEV_STRATEGY()
substitute BUF_WRITE(foo) for VOP_BWRITE(foo->b_vp, foo)

substitute BUF_STRATEGY(foo) for VOP_STRATEGY(foo->b_vp, foo)

This patch is machine generated except for the ccd.c and buf.h parts.
2000-03-20 11:29:10 +00:00
Poul-Henning Kamp
21144e3bf1 Remove B_READ, B_WRITE and B_FREEBUF and replace them with a new
field in struct buf: b_iocmd.  The b_iocmd is enforced to have
exactly one bit set.

B_WRITE was bogusly defined as zero giving rise to obvious coding
mistakes.

Also eliminate the redundant struct buf flag B_CALL, it can just
as efficiently be done by comparing b_iodone to NULL.

Should you get a panic or drop into the debugger, complaining about
"b_iocmd", don't continue.  It is likely to write on your disk
where it should have been reading.

This change is a step in the direction towards a stackable BIO capability.

A lot of this patch were machine generated (Thanks to style(9) compliance!)

Vinum users:  Greg has not had time to test this yet, be careful.
2000-03-20 10:44:49 +00:00
Kirk McKusick
584508a741 Use 64-bit math to calculate if we have hit our freespace limit.
Necessary for coherent results on filesystems bigger than 0.5Tb.
2000-03-17 03:44:47 +00:00
Kirk McKusick
9f043878d0 Use 64-bit math to decide if optimization needs to be changed.
Necessary for coherent results on filesystems bigger than 0.5Tb.

Submitted by:	Paul Saab <ps@yahoo-inc.com>
2000-03-15 07:08:36 +00:00
Matthew Dillon
f8fa53397f Fix a 'freeing free block' panic in UFS. The problem occurs when the
filesystem fills up.  If the first indirect block exists and FFS is able
    to allocate deeper indirect blocks, but is not able to allocate the
    data block, FFS improperly unwinds the indirect blocks and leaves a
    block pointer hanging to a freed block.  This will cause a panic later
    when the file is removed.  The solution is to properly account for the
    first block-pointer-to-an-indirect-block we had to create in a balloc
    operation and then unwind it if a failure occurs.

Detective work by: Ian Dowse <iedowse@maths.tcd.ie>
Reviewed by: mckusick, Ian Dowse <iedowse@maths.tcd.ie>
Approved by: jkh
2000-02-24 20:43:20 +00:00
Kirk McKusick
4434ff1d38 When writing out bitmap buffers, need to skip over ones that already
have a write in progress. Otherwise one can get in an infinite loop
trying to get them all flushed.

Submitted by:	Matthew Dillon <dillon@apollo.backplane.com>
2000-01-30 20:32:59 +00:00
Kirk McKusick
57a91f6fb0 During fastpath processing for removal of a short-lived inode, the
set of restrictions for cancelling an inode dependency (inodedep)
is somewhat stronger than originally coded. Since this check appears
in two places, we codify it into the function check_inode_unwritten
which we then call from the two sites, one freeing blocks and the
other freeing directory entries.

Submitted by:	Steinar Haug via Matthew Dillon
2000-01-18 01:33:05 +00:00
Kirk McKusick
4c6adb0622 Need to reorganize the flushing of directory entry (pagedep) dependencies
so that they never try to lock an inode corresponding to ".." as this
can lead to deadlock. We observe that any inode with an updated link count
is always pushed into its buffer at the time of the link count change, so
we do not need to do a VOP_UPDATE, but merely find its buffer and write it.
The only time we need to get the inode itself is from the result of a
mkdir whose name will never be ".." and hence locking such an inode will
never request a lock above us in the filesystem tree. Thanks to Brian
Fundakowski Feldman for providing the test program that tickled soft updates
into hanging in "inode" sleep.

Submitted by:	Brian Fundakowski Feldman <green@FreeBSD.org>
2000-01-18 01:30:03 +00:00
Kirk McKusick
105ef72c55 Better bounding on softdep_flushfiles; other minor tweeks to checks. 2000-01-17 06:35:11 +00:00
Kirk McKusick
107d5039ef Must track multiple uncommitted renames until one ultimately gets
committed to disk or is removed.
2000-01-17 06:28:18 +00:00
Matthew Dillon
173cce7c8e Non-operational change, fix compiler warning.
Reviewed by:  mckusick
2000-01-14 04:39:28 +00:00
Kirk McKusick
d7127837a2 Confirming Peter's fix (locking 101: release the lock before you go
to sleep). Locking 101, part 2: do not look at buffer contents after
you have been asleep. There is no telling what wonderous changes may
have occurred.
2000-01-13 20:03:22 +00:00
Peter Wemm
7f473504e6 Free the global softupdates lock prior to tsleep() in getdirtybuf().
This seems to be responsible for a bunch of panics where the process
sleeps and something else finds softupdates "locked" when it shouldn't
be.  This commit is unreviewed, but has been a big help here.
Previously my boxes would panic pretty much on the first fsync() that
wrote something to disk.
2000-01-13 18:48:12 +00:00
Kirk McKusick
1c2ceb2880 Because cylinder group blocks are now written in background,
it is no longer sufficient to get a lock on a buffer to know
that its write has been completed. We have to first get the
lock on the buffer, then check to see if it is doing a
background write. If it is doing background write, we have
to wait for the background write to finish, then check to see
if that fullfilled our dependency, and if not to start another
write. Luckily the explanation is longer than the fix.
2000-01-13 07:20:01 +00:00
Kirk McKusick
94313add1f A panic occurs during an fsync when a dirty block associated with
a vnode has not been written (which would clear certain of its
dependencies). The problems arises because fsync with MNT_NOWAIT
no longer pushes all the dirty blocks associated with a vnode. It
skips those that require rollbacks, since they will just get instantly
dirty again. Such skipped blocks are marked so that they will not be
skipped a second time (otherwise circular dependencies would never
clear). So, we fsync twice to ensure that everything will be written
at least once.
2000-01-13 07:17:39 +00:00
Kirk McKusick
4ed62fbd7f The only known cause of this panic is running out of disk space.
The problem occurs when an indirect block and a data block are
being allocated at the same time. For example when the 13th block
of the file is written, the filesystem needs to allocate the first
indirect block and a data block. If the indirect block allocation
succeeds, but the data block allocation fails, the error code
dellocates the indirect block as it has nothing at which to point.
Unfortunately, it does not deallocate the indirect block's associated
dependencies which then fail when they find the block unexpectedly
gone (ptr == 0 instead of its expected value). The fix is to fsync
the file before doing the block rollback, as the fsync will flush
out all of the dependencies. Once the rollback is done the file
must be fsync'ed again so that the soft updates code does not find
unexpected changes. This approach is much slower than writing the
code to back out the extraneous dependencies, but running out of
disk space is not expected to be a common occurence, so just getting
it right is the main criterion.

PR:		kern/15063
Submitted by:	Assar Westerlund <assar@stacken.kth.se>
2000-01-11 08:27:00 +00:00
Kirk McKusick
10767f840b We cannot proceed to free the blocks of the file until the dependencies
have been cleaned up by deallocte_dependencies(). Once that is done, it
is safe to post the request to free the blocks. A similar change is also
needed for the freefile case.
2000-01-11 06:52:35 +00:00
Poul-Henning Kamp
ba4ad1fcea Give vn_isdisk() a second argument where it can return a suitable errno.
Suggested by:	bde
2000-01-10 12:04:27 +00:00
Kirk McKusick
26e5527c86 Missing FREE_LOCK call before handle_workitem_freeblocks.
Submitted by:	"Kenneth D. Merry" <ken@kdm.org>
2000-01-10 08:39:03 +00:00
Kirk McKusick
cf60e8e4bf Several performance improvements for soft updates have been added:
1) Fastpath deletions. When a file is being deleted, check to see if it
   was so recently created that its inode has not yet been written to
   disk. If so, the delete can proceed to immediately free the inode.
2) Background writes: No file or block allocations can be done while the
   bitmap is being written to disk. To avoid these stalls, the bitmap is
   copied to another buffer which is written thus leaving the original
   available for futher allocations.
3) Link count tracking. Constantly track the difference in i_effnlink and
   i_nlink so that inodes that have had no change other than i_effnlink
   need not be written.
4) Identify buffers with rollback dependencies so that the buffer flushing
   daemon can choose to skip over them.
2000-01-10 00:24:24 +00:00
Kirk McKusick
f0f7d38386 Keep tighter control of removal dependencies by limiting the number
of dirrem structure rather than the collaterally created freeblks
and freefile structures. Limit the rate of buffer dirtying by the
syncer process during periods of intense file removal.
2000-01-09 23:35:38 +00:00
Kirk McKusick
3f5b28bc07 Reorganize softdep_fsync so that it only does the inode-is-flushed
check before the inode is unlocked while grabbing its parent directory.
Once it is unlocked, other operations may slip in that could make
the inode-is-flushed check fail. Allowing other writes to the inode
before returning from fsync does not break the semantics of fsync
since we have flushed everything that was dirty at the time of the
fsync call.
2000-01-09 23:14:57 +00:00
Kirk McKusick
e2dc60835d Get rid of unreferenced function. 2000-01-09 22:42:42 +00:00
Kirk McKusick
83aaf63ab2 Make static non-exported functions from soft updates. 2000-01-09 22:40:09 +00:00
Peter Wemm
c447342094 Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL"
is an application space macro and the applications are supposed to be free
to use it as they please (but cannot).  This is consistant with the other
BSD's who made this change quite some time ago.  More commits to come.
1999-12-29 05:07:58 +00:00
Bruce Evans
7e58bfacbe Update the unclean flag for mount -u. I forgot to handle this case
when I made the absence of the clean flag sticky in rev.1.88.  This
was a problem main for "mount /".  There is no way to mount "/" for
writing without using mount -u (normally implicitly), so after
"mount -f /" of an unclean filesystem, the absence of the clean flag
was sticky forever.
1999-12-23 15:42:14 +00:00
Eivind Eklund
369dc8ceb8 Change incorrect NULLs to 0s 1999-12-21 11:14:12 +00:00
Robert Watson
91f37dcba1 Second pass commit to introduce new ACL and Extended Attribute system
calls, vnops, vfsops, both in /kern, and to individual file systems that
require a vfsop_ array entry.

Reviewed by:	eivind
1999-12-19 06:08:07 +00:00
Kirk McKusick
6a4152243f The function request_cleanup() had a tsleep() with PCATCH. It is
quite dangerous, since the process may hold locks at the point,
and if it is stopped in that tsleep the machine may hang. Because
the sleep is so short, the PCATCH is not required here, so it has
been removed. For the future, the FreeBSD team needs to decide
whether it is still reasonable to stop a process in tsleep, as that
may affect any other code that uses PCATCH while holding kernel locks.

Submitted by:	Dmitrij Tejblum <tejblum@arc.hq.cti.ru>
Reviewed by:	Kirk McKusick <mckusick@mckusick.com>
1999-12-16 22:02:09 +00:00
Eivind Eklund
762e6b856c Introduce NDFREE (and remove VOP_ABORTOP) 1999-12-15 23:02:35 +00:00
Eivind Eklund
6bdfe06ad9 Lock reporting and assertion changes.
* lockstatus() and VOP_ISLOCKED() gets a new process argument and a new
  return value: LK_EXCLOTHER, when the lock is held exclusively by another
  process.
* The ASSERT_VOP_(UN)LOCKED family is extended to use what this gives them
* Extend the vnode_if.src format to allow more exact specification than
  locked/unlocked.

This commit should not do any semantic changes unless you are using
DEBUG_VFS_LOCKS.

Discussed with:	grog, mch, peter, phk
Reviewed by:	peter
1999-12-11 16:13:02 +00:00
Bill Fumerola
43cd4e8815 Remove the 'alpha, use at your own risk' death-statement.
Reviewed by:	mckusick (verbally at FreeBSDcon)
1999-12-03 00:40:31 +00:00
Bill Fumerola
cfa5001489 Fix typo, add $FreeBSD$ 1999-12-03 00:34:26 +00:00
Kirk McKusick
9f54c05286 Preferentially allocate the first indirect block in the same
cylinder group as the inode. This makes a 15% difference in
read speed for files in the 96K to 500K size range.
1999-12-01 19:33:12 +00:00
Poul-Henning Kamp
38224dcd59 Convert various pieces of code to use vn_isdisk() rather than checking
for vp->v_type == VBLK.

In ccd: we don't need to call VOP_GETATTR to find the type of a vnode.

Reviewed by:    sos
1999-11-22 10:33:55 +00:00
Eivind Eklund
b2f2b704d0 We do not have ffs_checkexp, so remove the prototype 1999-11-20 16:44:44 +00:00
Poul-Henning Kamp
0429e37ade struct mountlist and struct mount.mnt_list have no business being
a CIRCLEQ.  Change them to TAILQ_HEAD and TAILQ_ENTRY respectively.

This removes ugly  mp != (void*)&mountlist  comparisons.

Requested by:   phk
Submitted by:   Jake Burkholder jake@checker.org
PR:             14967
1999-11-20 10:00:46 +00:00
Poul-Henning Kamp
698f9cf828 Next step in the device cleanup process.
Correctly lock vnodes when calling VOP_OPEN() from filesystem mount code.

Unify spec_open() for bdev and cdev cases.

Remove the disabled bdev specific read/write code.
1999-11-09 14:15:33 +00:00
Bruce Evans
5bd5c8b9e5 Quick fix for breakage of ext2fs link counts as reported by stat(2) by
the soft updates changes: only report the link count to be i_effnlink
in ufs_getattr() for file systems that maintain i_effnlink.

Tested by:	Mike Dracopoulos <mdraco@math.uoa.gr>
1999-11-03 12:05:39 +00:00
Mike Smith
6d14782861 Newline-terminate the complaint message about not being able to find
the root vnode pointer.
1999-11-01 23:57:28 +00:00
Poul-Henning Kamp
923502ff91 useracc() the prequel:
Merge the contents (less some trivial bordering the silly comments)
of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>.  This puts
the #defines for the vm_inherit_t and vm_prot_t types next to their
typedefs.

This paves the road for the commit to follow shortly: change
useracc() to use VM_PROT_{READ|WRITE} rather than B_{READ|WRITE}
as argument.
1999-10-29 18:09:36 +00:00
Poul-Henning Kamp
b89392e703 Remove the D_NOCLUSTER[RW] options which were added because vn had
problems.  Now that Matt has fixed vn, this can go.  The vn driver
should have used d_maxio (now si_iosize_max) anyway.
1999-09-30 07:11:30 +00:00
Poul-Henning Kamp
1b5464ef9d Remove v_maxio from struct vnode.
Replace it with mnt_iosize_max in struct mount.

Nits from:	bde
1999-09-29 20:05:33 +00:00
Alfred Perlstein
c24fda81c9 Seperate the export check in VFS_FHTOVP, exports are now checked via
VFS_CHECKEXP.

Add fh(open|stat|stafs) syscalls to allow userland to query filesystems
based on (network) filehandle.

Obtained from:	NetBSD
1999-09-11 00:46:08 +00:00
Peter Wemm
280652828b $Id$ -> $FreeBSD$ 1999-08-28 02:16:32 +00:00
Peter Wemm
c3aac50f28 $Id$ -> $FreeBSD$ 1999-08-28 01:08:13 +00:00
Poul-Henning Kamp
41d2e3e09e Introduce vn_isdisk(struct vnode *vp) function, and use it to test for diskness. 1999-08-25 12:24:39 +00:00
Sheldon Hearn
740e3a15f7 Fix bug introduced in rev 1.28, which causes kernel build to break for
the case where DEBUG is defined but not DIAGNOSTIC. ffs_checkblk is
declared conditionally on DIAGNOSTIC, not DEBUG.

PR:	13314
Reviewed by:	bde
1999-08-24 08:39:41 +00:00
Bruce Evans
d918320517 Use devtoname() to print dev_t's instead of casting them to long or u_long
for misprinting in %lx format.
1999-08-23 20:35:21 +00:00
Poul-Henning Kamp
7dc5cd047f The bdevsw() and cdevsw() are now identical, so kill the former. 1999-08-13 10:29:38 +00:00
Poul-Henning Kamp
0ef1c82630 Decommision miscfs/specfs/specdev.h. Most of it goes into <sys/conf.h>,
a few lines into <sys/vnode.h>.

Add a few fields to struct specinfo, paving the way for the fun part.
1999-08-08 18:43:05 +00:00
Kirk McKusick
4dc0c8f521 Create the macro DOINGASYNC to check whether the MNT_ASYNC flag has
been set for a mount point. Insert missing checks to ensure that all
write operations are done asynchronously when the MNT_ASYNC option
has been requested.

Submitted by:	Craig A Soules <soules+@andrew.cmu.edu>
Reviewed by:	Kirk McKusick <mckusick@mckusick.com>
1999-07-13 18:20:13 +00:00
Poul-Henning Kamp
68de329e34 Use the fsid from the superblock, unless it looks bogus or has already
been taken by some other filesystem.
1999-07-11 19:16:50 +00:00
Ollivier Robert
7fe29b0aef Add $Id$
Approved by:	kirk
1999-07-07 07:51:04 +00:00
John Polstra
24755bdc25 Update pathnames for new location of soft-updates sources. 1999-07-03 21:34:05 +00:00
Kirk McKusick
48703fedf1 No longer need to set B_ASYNC flag since BUF_KERNPROC now
unconditionally sets the identity of the buffer.
1999-06-29 15:57:40 +00:00
Peter Wemm
a6451da76b Keep the inlines for <sys/buf.h> happy.. 1999-06-27 13:26:23 +00:00
Kirk McKusick
67812eacd7 Convert buffer locking from using the B_BUSY and B_WANTED flags to using
lockmgr locks. This commit should be functionally equivalent to the old
semantics. That is, all buffer locking is done with LK_EXCLUSIVE
requests. Changes to take advantage of LK_SHARED and LK_RECURSIVE will
be done in future commits.
1999-06-26 02:47:16 +00:00
Kirk McKusick
7481264c1e On our final pass through ffs_fsync, do all I/O synchronously so that
we can find out if our flush is failing because of write errors. This
change avoids a "flush failed" panic during unrecoverable disk errors.
1999-06-18 05:49:46 +00:00
Kirk McKusick
f9c8cab591 Add a vnode argument to VOP_BWRITE to get rid of the last vnode
operator special case. Delete special case code from vnode_if.sh,
vnode_if.src, umap_vnops.c, and null_vnops.c.
1999-06-16 23:27:55 +00:00
Kirk McKusick
e4ab40bcb6 Get rid of the global variable rushjob and replace it with a function in
kern/vfs_subr.c named speedup_syncer() which handles the speedup request.
Change the various clients of rushjob to use the new function.
1999-06-15 23:37:29 +00:00
Poul-Henning Kamp
2447bec829 Simplify cdevsw registration.
The cdevsw_add() function now finds the major number(s) in the
struct cdevsw passed to it.  cdevsw_add_generic() is no longer
needed, cdevsw_add() does the same thing.

cdevsw_add() will print an message if the d_maj field looks bogus.

Remove nblkdev and nchrdev variables.  Most places they were used
bogusly.  Instead check a dev_t for validity by seeing if devsw()
or bdevsw() returns NULL.

Move bdevsw() and devsw() functions to kern/kern_conf.c

Bump __FreeBSD_version to 400006

This commit removes:
        72 bogus makedev() calls
        26 bogus SYSINIT functions

if_xe.c bogusly accessed cdevsw[], author/maintainer please fix.

I4b and vinum not changed.  Patches emailed to authors.  LINT
probably broken until they catch up.
1999-05-31 11:29:30 +00:00
Julian Elischer
2e897e94b6 Cosmetic changes to make it compile without errors in gcc -Wall 1999-05-22 04:43:04 +00:00
Kirk McKusick
c2606ec5c6 Add a hook to ffs_fsync to allow soft updates to get first chance at doing
a sync on the block device for the filesystem. That allows it to push the
bitmap blocks before the inode blocks which greatly reduces the number of
inode rollbacks that need to be done.
1999-05-14 01:26:46 +00:00
Peter Wemm
51b5226683 Try and fix a dev_t/major/minor etc nit. 1999-05-12 22:32:07 +00:00
Kirk McKusick
71a0942aca Put back changes that might be causing trouble on Alpha. 1999-05-09 19:39:54 +00:00
Poul-Henning Kamp
4be2eb8c49 I got tired of seeing all the cdevsw[major(foo)] all over the place.
Made a new (inline) function devsw(dev_t dev) and substituted it.

Changed to the BDEV variant to this format as well: bdevsw(dev_t dev)

DEVFS will eventually benefit from this change too.
1999-05-08 06:40:31 +00:00
Poul-Henning Kamp
46eede0058 Continue where Julian left off in July 1998:
Virtualize bdevsw[] from cdevsw.  bdevsw() is now an (inline)
        function.

        Join CDEV_MODULE and BDEV_MODULE to DEV_MODULE (please pay attention
        to the order of the cmaj/bmaj arguments!)

        Join CDEV_DRIVER_MODULE and BDEV_DRIVER_MODULE to DEV_DRIVER_MODULE
        (ditto!)

(Next step will be to convert all bdev dev_t's to cdev dev_t's
before they get to do any damage^H^H^H^H^H^Hwork in the kernel.)
1999-05-07 10:11:40 +00:00
Kirk McKusick
36cfb417de Whitespace cleanup. 1999-05-07 05:21:16 +00:00
Kirk McKusick
7957996abd Get rid of random debugging cruft; sync up with latest version. 1999-05-07 05:11:31 +00:00
Kirk McKusick
224a6aa241 Severe slowdowns have been reported when creating or removing many
files at once on a filesystem running soft updates. The root of
the problem is that soft updates limits the amount of memory that
may be allocated to dependency structures so as to avoid hogging
kernel memory. The original algorithm just waited for the disk I/O
to catch up and reduce the number of dependencies. This new code
takes a much more aggressive approach. Basically there are two
resources that routinely hit the limit. Inode dependencies during
periods with a high file creation rate and file and block removal
dependencies during periods with a high file removal rate. I have
attacked these problems from two fronts. When the inode dependency
limits are reached, I pick a random inode dependency, UFS_UPDATE
it together with all the other dirty inodes contained within its
disk block and then write that disk block. This trick usually
clears 5-50 inode dependencies in a single disk I/O. For block and
file removal dependencies, I pick a random directory page that has
at least one remove pending and VOP_FSYNC its directory. That
releases all its removal dependencies to the work queue. To further
hasten things along, I also immediately start the work queue process
rather than waiting for its next one second scheduled run.
1999-05-07 02:26:47 +00:00
Peter Wemm
dfd5dee1b0 Add sufficient braces to keep egcs happy about potentially ambiguous
if/else nesting.
1999-05-06 18:13:11 +00:00
Alan Cox
4221e284a3 The VFS/BIO subsystem contained a number of hacks in order to optimize
piecemeal, middle-of-file writes for NFS.  These hacks have caused no
end of trouble, especially when combined with mmap().  I've removed
them.  Instead, NFS will issue a read-before-write to fully
instantiate the struct buf containing the write.  NFS does, however,
optimize piecemeal appends to files.  For most common file operations,
you will not notice the difference.  The sole remaining fragment in
the VFS/BIO system is b_dirtyoff/end, which NFS uses to avoid cache
coherency issues with read-merge-write style operations.  NFS also
optimizes the write-covers-entire-buffer case by avoiding the
read-before-write.  There is quite a bit of room for further
optimization in these areas.

The VM system marks pages fully-valid (AKA vm_page_t->valid =
VM_PAGE_BITS_ALL) in several places, most noteably in vm_fault.  This
is not correct operation.  The vm_pager_get_pages() code is now
responsible for marking VM pages all-valid.  A number of VM helper
routines have been added to aid in zeroing-out the invalid portions of
a VM page prior to the page being marked all-valid.  This operation is
necessary to properly support mmap().  The zeroing occurs most often
when dealing with file-EOF situations.  Several bugs have been fixed
in the NFS subsystem, including bits handling file and directory EOF
situations and buf->b_flags consistancy issues relating to clearing
B_ERROR & B_INVAL, and handling B_DONE.

getblk() and allocbuf() have been rewritten.  B_CACHE operation is now
formally defined in comments and more straightforward in
implementation.  B_CACHE for VMIO buffers is based on the validity of
the backing store.  B_CACHE for non-VMIO buffers is based simply on
whether the buffer is B_INVAL or not (B_CACHE set if B_INVAL clear,
and vise-versa).  biodone() is now responsible for setting B_CACHE
when a successful read completes.  B_CACHE is also set when a bdwrite()
is initiated and when a bwrite() is initiated.  VFS VOP_BWRITE
routines (there are only two - nfs_bwrite() and bwrite()) are now
expected to set B_CACHE.  This means that bowrite() and bawrite() also
set B_CACHE indirectly.

There are a number of places in the code which were previously using
buf->b_bufsize (which is DEV_BSIZE aligned) when they should have
been using buf->b_bcount.  These have been fixed.  getblk() now clears
B_DONE on return because the rest of the system is so bad about
dealing with B_DONE.

Major fixes to NFS/TCP have been made.  A server-side bug could cause
requests to be lost by the server due to nfs_realign() overwriting
other rpc's in the same TCP mbuf chain.  The server's kernel must be
recompiled to get the benefit of the fixes.

Submitted by:	Matthew Dillon <dillon@apollo.backplane.com>
1999-05-02 23:57:16 +00:00
Mike Smith
f4711b2df4 Simplify the tunefs example, since tunefs uses getfsfile(). Lots of
people complain about working out what device their filesystems are
mounted on.
1999-04-27 21:11:19 +00:00
Kirk McKusick
38e28fd66b Reorganize locking to avoid holding the lock during calls to bdwrite
and brelse (which may sleep in some systems).

Obtained from:	Matthew Dillon <dillon@apollo.backplane.com>
1999-03-02 06:38:07 +00:00
Kirk McKusick
eef33ce9bd When fsync'ing a file on a filesystem using soft updates, we first try
to write all the dirty blocks. If some of those blocks have dependencies,
they will be remarked dirty when the I/O completes. On systems with
really fast I/O systems, it is possible to get in an infinite loop trying
to flush the buffers, because the I/O finishes before we can get all the
dirty buffers off the v_dirtyblkhd list and into the I/O queue. (The
previous algorithm looped over the v_dirtyblkhd list writing out buffers
until the list emptied.) So, now we mark each buffer that we try to
write so that we can distinguish the ones that are being remarked dirty
from those that we have not yet tried to flush. Once we have tried to
push every buffer once, we then push any associated metadata that is
causing the remaining buffers to be redirtied.

Submitted by:	Matthew Dillon <dillon@apollo.backplane.com>
1999-03-02 04:04:31 +00:00
Kirk McKusick
4cbb89d95d Ensure that softdep_sync_metadata can handle bmsafemap and mkdir entries
if they ever arise (which should not happen as softdep_sync_metadata is
currently used).
1999-03-02 00:19:47 +00:00
Kirk McKusick
133ff2619a fix double LIST_REMOVE; other cosmetic changes to match version 9.32.
Obtained from: Jeffrey Hsu <hsu@FreeBSD.ORG>
1999-02-17 20:01:20 +00:00
Matthew Dillon
8aef171243 Fix warnings in preparation for adding -Wall -Wcast-qual to the
kernel compile
1999-01-28 00:57:57 +00:00
David Greenman
8ab2fa0073 Gutted softdep_deallocate_dependencies and replaced it with a panic. It
turns out to not be useful to unwind the dependencies and continue in
the face of a fatal error.
Also changed the log() to a printf() in softdep_error() so that it will
be output in the case of a impending panic.
Submitted by:	Kirk McKusick <mckusick@mckusick.com>
1999-01-22 09:07:32 +00:00
Eivind Eklund
5b1b6c5859 Silence warning about unused debug function. (I'll turn this function
into a DDB command in my next staticization sweep).
1999-01-12 11:42:41 +00:00
Eivind Eklund
a862221fa0 Add a warning about the copyright restraints. 1999-01-08 16:03:12 +00:00
Bruce Evans
de5d1ba57c Don't pass unused unused timestamp args to UFS_UPDATE() or waste
time initializing them.  This almost finishes centralizing (in-core)
timestamp updates in ufs_itimes().
1999-01-07 16:14:19 +00:00
Bruce Evans
4591d9bb7e UFS_UPDATE() takes a boolean `waitfor' arg, so don't pass it the value
MNT_WAIT when we mean boolean `true' or check for that value not being
passed.  There was no problem in practice because MNT_WAIT had the
magic value of 1.
1999-01-06 18:18:06 +00:00
Bruce Evans
d64dbc8719 Ifdefed the conditionally used variable `prtrealloc'. Declare it
as volatile so that there is no chance that the code that it controls
is optimised away.
1999-01-06 17:04:33 +00:00
Bruce Evans
5991fd0370 Backed out rev.1.47. It just broke my optimisations for lazy syncing
of timestamps in rev.1.45.  The soft updates bug was elsewhere.

Forgotten by:	luoqi
1999-01-06 16:52:38 +00:00
Eivind Eklund
fb1167777a Remove the 'waslocked' parameter to vfs_object_create(). 1999-01-05 18:50:03 +00:00
Eivind Eklund
a777e82019 Remove the last clients of vfs_object_create(..., waslocked=1);
waslocked will go away shortly.

Reviewed by:	dg
1999-01-02 01:32:36 +00:00
Julian Elischer
1f35e8c8da Remove some compiler warnings. 1998-12-10 20:11:47 +00:00
Bruce Evans
672be20b9f Don't use the strange null pointer constant `(ufs_daddr_t)0' in a call
to VOP_BMAP().  Don't use uncast NULLs in the same call.
1998-11-29 03:12:06 +00:00
David Greenman
1c680b45a2 Restored the "reallocblks" code to its former glory. What this does is
basically do a on-the-fly defragmentation of the FFS filesystem, changing
file block allocations to make them contiguous. Thanks to Kirk McKusick
for providing hints on what needed to be done to get this working.
1998-11-13 01:01:44 +00:00
Peter Wemm
2ec07c6614 Change dirty block list handling to use TAILQ macros. 1998-10-31 15:33:32 +00:00
Peter Wemm
40c8cfe552 Use TAILQ macros for clean/dirty block list processing. Set b_xflags
rather than abusing the list next pointer with a magic number.
1998-10-31 15:31:29 +00:00
Jordan K. Hubbard
2dcc2f0693 Clarify a rather ambiguous debugging message. 1998-10-28 10:37:54 +00:00