Commit Graph

407 Commits

Author SHA1 Message Date
Kirk McKusick
cb3ab5aaf7 Properly compute the size of the final block of superblock summary information.
Submitted by:	Ian Dowse <iedowse@maths.tcd.ie>
2001-01-12 21:56:55 +00:00
Kirk McKusick
48d617487d Several small but important fixes for snapshots:
1) Be more tolerant of missing snapshot files by only trying to decrement
   their reference count if they are registered as active.

2) Fix for snapshots of filesystems with block sizes larger than 8K
   (from Ollivier Robert <roberto@eurocontrol.fr>).

3) Fix to avoid losing last block in snapshot file when calculating blocks
   that need to be copied (from Don Coleman <coleman@coleman.org>).
2000-12-19 04:41:09 +00:00
Kirk McKusick
6da443cb22 Get rid of spurious check in ffs_truncate for i_size == length
which fails to set the modification time on the file. The same
check a few lines later takes the correct action.

Submitted by:	Ian Dowse <iedowse@maths.tcd.ie>
2000-12-19 04:20:13 +00:00
Assar Westerlund
ca85ca6099 add a stub for softdep_slowdown so that it's possible to build the
kernel without SOFTUPDATES
2000-12-17 23:59:56 +00:00
Seigo Tanimura
937c4dfa08 Do not race for the lock of an inode hash.
Reviewed by:	jhb
2000-12-13 10:04:01 +00:00
Kirk McKusick
1d733bbd10 Preventing runaway kernel soft updates memory, take three.
Previously, the syncer process was the only process in the
system that could process the soft updates background work
list. If enough other processes were adding requests to that
list, it would eventually grow without bound. Because some of
the work list requests require vnodes to be locked, it was
not generally safe to let random processes process the work
list while they already held vnodes locked. By adding a flag
to the work list queue processing function to indicate whether
the calling process could safely lock vnodes, it becomes possible
to co-opt other processes into helping out with the work list.
Now when the worklist gets too large, other processes can safely
help out by picking off those work requests that can be handled
without locking a vnode, leaving only the small number of
requests requiring a vnode lock for the syncer process. With
this change, it appears possible to keep even the nastiest
workloads under control.

Submitted by:	Paul Saab <ps@yahoo-inc.com>
2000-12-13 08:30:35 +00:00
David Malone
7cc0979fd6 Convert more malloc+bzero to malloc+M_ZERO.
Submitted by:	josh@zipperup.org
Submitted by:	Robert Drehmel <robd@gmx.net>
2000-12-08 21:51:06 +00:00
Poul-Henning Kamp
959b7375ed Staticize some malloc M_ instances. 2000-12-08 20:09:00 +00:00
Kirk McKusick
71868b020d More aggressively rate limit the growth of soft dependency structures
in the face of multiple processes doing massive numbers of filesystem
operations. While this patch will work in nearly all situations, there
are still some perverse workloads that can overwhelm the system.
Detecting and handling these perverse workloads will be the subject
of another patch.

Reviewed by:	Paul Saab <ps@yahoo-inc.com>
Obtained from:	Ethan Solomita <ethan@geocast.com>
2000-11-20 06:22:39 +00:00
Matthew Dillon
936524aa02 Implement a low-memory deadlock solution.
Removed most of the hacks that were trying to deal with low-memory
    situations prior to now.

    The new code is based on the concept that I/O must be able to function in
    a low memory situation.  All major modules related to I/O (except
    networking) have been adjusted to allow allocation out of the system
    reserve memory pool.  These modules now detect a low memory situation but
    rather then block they instead continue to operate, then return resources
    to the memory pool instead of cache them or leave them wired.

    Code has been added to stall in a low-memory situation prior to a vnode
    being locked.

    Thus situations where a process blocks in a low-memory condition while
    holding a locked vnode have been reduced to near nothing.  Not only will
    I/O continue to operate, but many prior deadlock conditions simply no
    longer exist.

Implement a number of VFS/BIO fixes

	(found by Ian): in biodone(), bogus-page replacement code, the loop
        was not properly incrementing loop variables prior to a continue
        statement.  We do not believe this code can be hit anyway but we
        aren't taking any chances.  We'll turn the whole section into a
        panic (as it already is in brelse()) after the release is rolled.

	In biodone(), the foff calculation was incorrectly
        clamped to the iosize, causing the wrong foff to be calculated
        for pages in the case of an I/O error or biodone() called without
        initiating I/O.  The problem always caused a panic before.  Now it
        doesn't.  The problem is mainly an issue with NFS.

	Fixed casts for ~PAGE_MASK.  This code worked properly before only
        because the calculations use signed arithmatic.  Better to properly
        extend PAGE_MASK first before inverting it for the 64 bit masking
        op.

	In brelse(), the bogus_page fixup code was improperly throwing
        away the original contents of 'm' when it did the j-loop to
        fix the bogus pages.  The result was that it would potentially
        invalidate parts of the *WRONG* page(!), leading to corruption.

	There may still be cases where a background bitmap write is
        being duplicated, causing potential corruption.  We have identified
        a potentially serious bug related to this but the fix is still TBD.
        So instead this patch contains a KASSERT to detect the problem
  	and panic the machine rather then continue to corrupt the filesystem.
	The problem does not occur very often..  it is very hard to
	reproduce, and it may or may not be the cause of the corruption
	people have reported.

Review by: (VFS/BIO: mckusick, Ian Dowse <iedowse@maths.tcd.ie>)
Testing by: (VM/Deadlock) Paul Saab <ps@yahoo-inc.com>
2000-11-18 23:06:26 +00:00
Kirk McKusick
bd4bd019fb When deleting a file, the ordering of events imposed by soft updates
is to first write the deleted directory entry to disk, second write
the zero'ed inode to disk, and finally to release the freed blocks
and the inode back to the cylinder-group map. As this ordering
requires two disk writes to occur which are normally spaced about
30 seconds apart (except when memory is under duress), it takes
about a minute from the time that a file is deleted until its inode
and data blocks show up in the cylinder-group map for reallocation.
If a file has had only a brief lifetime (less than 30 seconds from
creation to deletion), neither its inode nor its directory entry
may have been written to disk. If its directory entry has not been
written to disk, then we need not wait for that directory block to
be written as the on-disk directory block does not reference the
inode. Similarly, if the allocated inode has never been written to
disk, we do not have to wait for it to be written back either as
its on-disk representation is still zero'ed out. Thus, in the case
of a short lived file, we can simply release the blocks and inode
to the cylinder-group map immediately. As the inode and its blocks
are released immediately, they are immediately available for other
uses. If they are not released for a minute, then other inodes and
blocks must be allocated for short lived files, cluttering up the
vnode and buffer caches. The previous code was a bit too aggressive
in trying to release the blocks and inode back to the cylinder-group
map resulting in their being made available when in fact the inode
on disk had not yet been zero'ed. This patch takes a more conservative
approach to doing the release which avoids doing the release prematurely.
2000-11-14 09:00:25 +00:00
Adrian Chadd
0b0c10b48d Initial commit of IFS - a inode-namespaced FFS. Here is a short
description:

How it works:
--

Basically ifs is a copy of ffs, overriding some vfs/vnops. (Yes, hack.)
I didn't see the need in duplicating all of sys/ufs/ffs to get this
off the ground.

File creation is done through a special file - 'newfile' . When newfile
is called, the system allocates and returns an inode. Note that newfile
is done in a cloning fashion:

fd = open("newfile", O_CREAT|O_RDWR, 0644);
fstat(fd, &st);

printf("new file is %d\n", (int)st.st_ino);

Once you have created a file, you can open() and unlink() it by its returned
inode number retrieved from the stat call, ie:

fd = open("5", O_RDWR);

The creation permissions depend entirely if you have write access to the
root directory of the filesystem.

To get the list of currently allocated inodes, VOP_READDIR has been added
which returns a directory listing of those currently allocated.

--

What this entails:

* patching conf/files and conf/options to include IFS as a new compile
  option (and since ifs depends upon FFS, include the FFS routines)

* An entry in i386/conf/NOTES indicating IFS exists and where to go for
  an explanation

* Unstaticize a couple of routines in src/sys/ufs/ffs/ which the IFS
  routines require (ffs_mount() and ffs_reload())

* a new bunch of routines in src/sys/ufs/ifs/ which implement the IFS
  routines. IFS replaces some of the vfsops, and a handful of vnops -
  most notably are VFS_VGET(), VOP_LOOKUP(), VOP_UNLINK() and VOP_READDIR().
  Any other directory operation is marked as invalid.

What this results in:

* an IFS partition's create permissions are controlled by the perm/ownership of
  the root mount point, just like a normal directory

* Each inode has perm and ownership too

* IFS does *NOT* mean an FFS partition can be opened per inode. This is a
  completely seperate filesystem here

* Softupdates doesn't work with IFS, and really I don't think it needs it.
  Besides, fsck's are FAST. (Try it :-)

* Inodes 0 and 1 aren't allocatable because they are special (dump/swap IIRC).
  Inode 2 isn't allocatable since UFS/FFS locks all inodes in the system against
  this particular inode, and unravelling THAT code isn't trivial. Therefore,
  useful inodes start at 3.

Enjoy, and feedback is definitely appreciated!
2000-10-14 03:02:30 +00:00
Eivind Eklund
7eb9fca557 Blow away the v_specmountpoint define, replacing it with what it was
defined as (rdev->si_mountpoint)
2000-10-09 17:31:39 +00:00
Robert Watson
ff435dcb91 o Move initialization of ump from mp to the top of the function so that
it is defined whenm used in ufs_extattr_uepm_destroy(), fixing a panic
  due to a NULL pointer dereference.

Submitted by:	Wesley Morgan <morganw@chemicals.tacorp.com>
2000-10-06 15:31:28 +00:00
Robert Watson
9de54ba513 o Add call to ufs_extattr_uepm_destroy() in ffs_unmount() so as to clean
up lock on extattrs.
o Get for free a comment indicating where auto-starting of extended
  attributes will eventually occur, as it was in my commit tree also.
  No implementation change here, only a comment.
2000-10-04 04:44:51 +00:00
Jason Evans
a18b1f1d4d Convert lockmgr locks from using simple locks to using mutexes.
Add lockdestroy() and appropriate invocations, which corresponds to
lockinit() and must be called to clean up after a lockmgr lock is no
longer needed.
2000-10-04 01:29:17 +00:00
Boris Popov
67e871664b Add a lock structure to vnode structure. Previously it was either allocated
separately (nfs, cd9660 etc) or keept as a first element of structure
referenced by v_data pointer(ffs). Such organization leads to known problems
with stacked filesystems.

From this point vop_no*lock*() functions maintain only interlock lock.
vop_std*lock*() functions maintain built-in v_lock structure using lockmgr().
vop_sharedlock() is compatible with vop_stdunlock(), but maintains a shared
lock on vnode.

If filesystem wishes to export lockmgr compatible lock, it can put an address
of this lock to v_vnlock field. This indicates that the upper filesystem
can take advantage of it and use single lock structure for entire (or part)
of stack of vnodes. This field shouldn't be examined or modified by VFS code
except for initialization purposes.

Reviewed in general by:	mckusick
2000-09-25 15:24:04 +00:00
Robert Watson
907da7c385 o Permit UFS Extended Attributes to be associated with special devices
and FIFOs.

Obtained from:	TrustedBSD Project
2000-09-21 19:06:02 +00:00
Dag-Erling Smørgrav
8461bdba85 Silence a warning. 2000-09-17 19:41:26 +00:00
Kirk McKusick
52a3bfa2e7 Cannot do MALLOC with M_WAITOK while holding ACQUIRE_LOCK
Obtained from:	Ethan Solomita <ethan@geocast.com>
2000-09-07 23:02:55 +00:00
Jason Evans
0384fff8c5 Major update to the way synchronization is done in the kernel. Highlights
include:

* Mutual exclusion is used instead of spl*().  See mutex(9).  (Note: The
  alpha port is still in transition and currently uses both.)

* Per-CPU idle processes.

* Interrupts are run in their own separate kernel threads and can be
  preempted (i386 only).

Partially contributed by:	BSDi (BSD/OS)
Submissions by (at least):	cp, dfr, dillon, grog, jake, jhb, sheldonh
2000-09-07 01:33:02 +00:00
Tor Egge
b5ee7ec63a Initialize *countp to 0 in stub for softdep_flushworklist().
This allows ffs_fsync() to break out of a loop that might otherwise
be infinite on kernels compiled without the SOFTUPDATES option.
The observed symptom was a system hang at the first unmount attempt.
2000-08-09 00:41:54 +00:00
Ollivier Robert
8694d8e912 Fix the lockmgr panic everyone is seeing at shutdown time.
vput assumes curproc is the lock holder, but it's not true in this case.

Thanks a lot Luoqi !

Submitted by:	luoqi
Tested by:	phk
2000-08-01 14:15:07 +00:00
Peter Wemm
6ee6b42ef7 Minor change: fix warning - move a 'struct vnode *vp' declaration inside a
#ifdef DIAGNOSTIC to match its corresponding usage.
2000-07-28 22:27:00 +00:00
Kirk McKusick
3592b7155c Clean up the snapshot code so that it no longer depends on the use of
the SF_IMMUTABLE flag to prevent writing. Instead put in explicit
checking for the SF_SNAPSHOT flag in the appropriate places. With
this change, it is now possible to rename and link to snapshot files.
It is also possible to set or clear any of the owner, group, or
other read bits on the file, though none of the write or execute
bits can be set. There is also an explicit test to prevent the
setting or clearing of the SF_SNAPSHOT flag via chflags() or
fchflags(). Note also that the modify time cannot be changed as
it needs to accurately reflect the time that the snapshot was taken.

Submitted by:	Robert Watson <rwatson@FreeBSD.org>
2000-07-26 23:07:01 +00:00
Kirk McKusick
55ba28c60a Add stub for softdep_flushworklist() so that kernels compiled
without the SOFTUPDATES option will load correctly.

Obtained from:	John Baldwin <jhb@bsdi.com>
2000-07-25 05:28:59 +00:00
Kirk McKusick
9b97113391 This patch corrects the first round of panics and hangs reported
with the new snapshot code.

Update addaliasu to correctly implement the semantics of the old
checkalias function. When a device vnode first comes into existence,
check to see if an anonymous vnode for the same device was created
at boot time by bdevvp(). If so, adopt the bdevvp vnode rather than
creating a new vnode for the device. This corrects a problem which
caused the kernel to panic when taking a snapshot of the root
filesystem.

Change the calling convention of vn_write_suspend_wait() to be the
same as vn_start_write().

Split out softdep_flushworklist() from softdep_flushfiles() so that
it can be used to clear the work queue when suspending filesystem
operations.

Access to buffers becomes recursive so that snapshots can recursively
traverse their indirect blocks using ffs_copyonwrite() when checking
for the need for copy on write when flushing one of their own indirect
blocks. This eliminates a deadlock between the syncer daemon and a
process taking a snapshot.

Ensure that softdep_process_worklist() can never block because of a
snapshot being taken. This eliminates a problem with buffer starvation.

Cleanup change in ffs_sync() which did not synchronously wait when
MNT_WAIT was specified. The result was an unclean filesystem panic
when doing forcible unmount with heavy filesystem I/O in progress.

Return a zero'ed block when reading a block that was not in use at
the time that a snapshot was taken. Normally, these blocks should
never be read. However, the readahead code will occationally read
them which can cause unexpected behavior.

Clean up the debugging code that ensures that no blocks be written
on a filesystem while it is suspended. Snapshots must explicitly
label the blocks that they are writing during the suspension so that
they do not cause a `write on suspended filesystem' panic.

Reorganize ffs_copyonwrite() to eliminate a deadlock and also to
prevent a race condition that would permit the same block to be
copied twice. This change eliminates an unexpected soft updates
inconsistency in fsck caused by the double allocation.

Use bqrelse rather than brelse for buffers that will be needed
soon again by the snapshot code. This improves snapshot performance.
2000-07-24 05:28:33 +00:00
Boris Popov
3fbd97427e Prevent possible dereference of NULL pointer.
Submitted by:	Marius Bendiksen <mbendiks@eunet.no>
2000-07-13 02:17:14 +00:00
Kirk McKusick
d303f71fdc Brain fault, forgot to update ffs_snapshot.c with the new calling convention
for vn_start_write.
2000-07-12 00:27:27 +00:00
Kirk McKusick
f2a2857bb3 Add snapshots to the fast filesystem. Most of the changes support
the gating of system calls that cause modifications to the underlying
filesystem. The gating can be enabled by any filesystem that needs
to consistently suspend operations by adding the vop_stdgetwritemount
to their set of vnops. Once gating is enabled, the function
vfs_write_suspend stops all new write operations to a filesystem,
allows any filesystem modifying system calls already in progress
to complete, then sync's the filesystem to disk and returns. The
function vfs_write_resume allows the suspended write operations to
begin again. Gating is not added by default for all filesystems as
for SMP systems it adds two extra locks to such critical kernel
paths as the write system call. Thus, gating should only be added
as needed.

Details on the use and current status of snapshots in FFS can be
found in /sys/ufs/ffs/README.snapshot so for brevity and timelyness
is not included here. Unless and until you create a snapshot file,
these changes should have no effect on your system (famous last words).
2000-07-11 22:07:57 +00:00
Kirk McKusick
d4c1816924 Clean up warning about undeclared function by declaring softdep_fsync
in mount.h instead of ffs_extern.h. The correct solution is to use
an indirect function pointer so that the kernel does not have to be
built with options FFS, but that will be left for another day.
2000-07-11 19:28:26 +00:00
Kirk McKusick
cc3962a9cd Delete README as it is now obsolete. Relevant information is in
README.softupdates.
2000-07-08 02:32:49 +00:00
Kirk McKusick
876578906d Update to reflect current status. 2000-07-08 02:31:21 +00:00
Kirk McKusick
22e5a6234e Get userland visible flags added for snapshots to give a few days
advance preparation for them to get migrated into place so that
subsequent changes in utilities will not fail to compile for lack
of up-to-date header files in /usr/include.
2000-07-04 04:58:34 +00:00
Poul-Henning Kamp
3275cf7379 Make the two calls from kern/* into softupdates #ifdef SOFTUPDATES,
that is way cleaner than using the softupdates_stub stunt, which
should be killed when convenient.

Discussed with:	mckusick
2000-07-03 13:26:54 +00:00
Andrey A. Chernov
2d90744fd8 Remove obsoleted info about linking from contrib 2000-06-24 13:29:25 +00:00
Kirk McKusick
858c16fab8 Update to new copyright. 2000-06-22 00:29:53 +00:00
Kirk McKusick
6019e6208f When running with quotas enabled on a filesystem using soft updates,
the system would panic when a user's inode quota was exceeded (see
PR 18959 for details). This fixes that problem.

PR:		18959
Submitted by:	Jason Godsey <jason@unixguy.fidalgo.net>
2000-06-18 22:14:28 +00:00
Kirk McKusick
d3abb52714 Some additional performance improvements. When freeing an inode
check to see if it has been committed to disk. If it has never
been written, it can be freed immediately. For short lived files
this change allows the same inode to be reused repeatedly.
Similarly, when upgrading a fragment to a larger size, if it
has never been claimed by an inode on disk, it too can be freed
immediately making it available for reuse often in the next slowly
growing block of the same file.
2000-06-18 22:05:57 +00:00
Poul-Henning Kamp
7c50d77218 Revert part of my bioops change which implemented panic(8). 2000-06-16 14:32:13 +00:00
Poul-Henning Kamp
7523681895 ARGH! I have too many source trees :-(
Fix prototype errors in last commit.
2000-06-16 13:00:33 +00:00
Poul-Henning Kamp
a2e7a027a7 Virtualizes & untangles the bioops operations vector.
Ref: Message-ID: <18317.961014572@critter.freebsd.dk> To: current@
2000-06-16 08:48:51 +00:00
Poul-Henning Kamp
6ea6805f8c Remove a comment which should never have made it in. 2000-06-14 21:48:19 +00:00
Robert Watson
b2b0497ab5 o If FFS_EXTATTR is defined, don't print out an error message on unmount
if an FFS partition returns EOPNOTSUPP, as it just means extended
  attributes weren't enabled on that partition.  Prevents spurious
  warning per-partition at shutdown.
2000-06-04 04:50:36 +00:00
Jake Burkholder
e39756439c Back out the previous change to the queue(3) interface.
It was not discussed and should probably not happen.

Requested by:		msmith and others
2000-05-26 02:09:24 +00:00
Jake Burkholder
740a1973a6 Change the way that the queue(3) structures are declared; don't assume that
the type argument to *_HEAD and *_ENTRY is a struct.

Suggested by:	phk
Reviewed by:	phk
Approved by:	mdodd
2000-05-23 20:41:01 +00:00
Robert Watson
f3706a0361 s/ffs_unmonut/ffs_unmount/ in a gratuitous ufs_extattr printf.
Reported by:	knu
2000-05-07 17:21:08 +00:00
Poul-Henning Kamp
9626b608de Separate the struct bio related stuff out of <sys/buf.h> into
<sys/bio.h>.

<sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall
not be made a nested include according to bdes teachings on the
subject of nested includes.

Diskdrivers and similar stuff below specfs::strategy() should no
longer need to include <sys/buf.> unless they need caching of data.

Still a few bogus uses of struct buf to track down.

Repocopy by:    peter
2000-05-05 09:59:14 +00:00
Poul-Henning Kamp
2c9b67a8df Remove unneeded #include <vm/vm_zone.h>
Generated by:	src/tools/tools/kerninclude
2000-04-30 18:52:11 +00:00
Poul-Henning Kamp
87150cb06d s/biowait/bufwait/g
Prodded by: several.
2000-04-29 16:25:22 +00:00
Poul-Henning Kamp
eb95c536ad Remove unneeded #include <sys/kernel.h> 2000-04-29 15:36:14 +00:00
Poul-Henning Kamp
3389ae9350 Remove ~25 unneeded #include <sys/conf.h>
Remove ~60 unneeded #include <sys/malloc.h>
2000-04-19 14:58:28 +00:00
Robert Watson
a64ed08955 Introduce extended attribute support for FFS, allowing arbitrary
(name, value) pairs to be associated with inodes.  This support is
used for ACLs, MAC labels, and Capabilities in the TrustedBSD
security extensions, which are currently under development.

In this implementation, attributes are backed to data vnodes in the
style of the quota support in FFS.  Support for FFS extended
attributes may be enabled using the FFS_EXTATTR kernel option
(disabled by default).  Userland utilities and man pages will be
committed in the next batch.  VFS interfaces and man pages have
been in the repo since 4.0-RELEASE and are unchanged.

o ufs/ufs/extattr.h: UFS-specific extattr defines
o ufs/ufs/ufs_extattr.c: bulk of support routines
o ufs/{ufs,ffs,mfs}/*.[ch]: hooks and extattr.h includes
o contrib/softupdates/ffs_softdep.c: extattr.h includes
o conf/options, conf/files, i386/conf/LINT: added FFS_EXTATTR

o coda/coda_vfsops.c: XXX required extattr.h due to ufsmount.h
(This should not be the case, and will be fixed in a future commit)

Currently attributes are not supported in MFS.  This will be fixed.

Reviewed by:	adrian, bp, freebsd-fs, other unthanked souls
Obtained from:	TrustedBSD Project
2000-04-15 03:34:27 +00:00
Poul-Henning Kamp
c244d2de43 Move B_ERROR flag to b_ioflags and call it BIO_ERROR.
(Much of this done by script)

Move B_ORDERED flag to b_ioflags and call it BIO_ORDERED.

Move b_pblkno and b_iodone_chain to struct bio while we transition, they
will be obsoleted once bio structs chain/stack.

Add bio_queue field for struct bio aware disksort.

Address a lot of stylistic issues brought up by bde.
2000-04-02 15:24:56 +00:00
Poul-Henning Kamp
b99c307a21 Rename the existing BUF_STRATEGY() to DEV_STRATEGY()
substitute BUF_WRITE(foo) for VOP_BWRITE(foo->b_vp, foo)

substitute BUF_STRATEGY(foo) for VOP_STRATEGY(foo->b_vp, foo)

This patch is machine generated except for the ccd.c and buf.h parts.
2000-03-20 11:29:10 +00:00
Poul-Henning Kamp
21144e3bf1 Remove B_READ, B_WRITE and B_FREEBUF and replace them with a new
field in struct buf: b_iocmd.  The b_iocmd is enforced to have
exactly one bit set.

B_WRITE was bogusly defined as zero giving rise to obvious coding
mistakes.

Also eliminate the redundant struct buf flag B_CALL, it can just
as efficiently be done by comparing b_iodone to NULL.

Should you get a panic or drop into the debugger, complaining about
"b_iocmd", don't continue.  It is likely to write on your disk
where it should have been reading.

This change is a step in the direction towards a stackable BIO capability.

A lot of this patch were machine generated (Thanks to style(9) compliance!)

Vinum users:  Greg has not had time to test this yet, be careful.
2000-03-20 10:44:49 +00:00
Kirk McKusick
584508a741 Use 64-bit math to calculate if we have hit our freespace limit.
Necessary for coherent results on filesystems bigger than 0.5Tb.
2000-03-17 03:44:47 +00:00
Kirk McKusick
9f043878d0 Use 64-bit math to decide if optimization needs to be changed.
Necessary for coherent results on filesystems bigger than 0.5Tb.

Submitted by:	Paul Saab <ps@yahoo-inc.com>
2000-03-15 07:08:36 +00:00
Matthew Dillon
f8fa53397f Fix a 'freeing free block' panic in UFS. The problem occurs when the
filesystem fills up.  If the first indirect block exists and FFS is able
    to allocate deeper indirect blocks, but is not able to allocate the
    data block, FFS improperly unwinds the indirect blocks and leaves a
    block pointer hanging to a freed block.  This will cause a panic later
    when the file is removed.  The solution is to properly account for the
    first block-pointer-to-an-indirect-block we had to create in a balloc
    operation and then unwind it if a failure occurs.

Detective work by: Ian Dowse <iedowse@maths.tcd.ie>
Reviewed by: mckusick, Ian Dowse <iedowse@maths.tcd.ie>
Approved by: jkh
2000-02-24 20:43:20 +00:00
Kirk McKusick
4434ff1d38 When writing out bitmap buffers, need to skip over ones that already
have a write in progress. Otherwise one can get in an infinite loop
trying to get them all flushed.

Submitted by:	Matthew Dillon <dillon@apollo.backplane.com>
2000-01-30 20:32:59 +00:00
Kirk McKusick
57a91f6fb0 During fastpath processing for removal of a short-lived inode, the
set of restrictions for cancelling an inode dependency (inodedep)
is somewhat stronger than originally coded. Since this check appears
in two places, we codify it into the function check_inode_unwritten
which we then call from the two sites, one freeing blocks and the
other freeing directory entries.

Submitted by:	Steinar Haug via Matthew Dillon
2000-01-18 01:33:05 +00:00
Kirk McKusick
4c6adb0622 Need to reorganize the flushing of directory entry (pagedep) dependencies
so that they never try to lock an inode corresponding to ".." as this
can lead to deadlock. We observe that any inode with an updated link count
is always pushed into its buffer at the time of the link count change, so
we do not need to do a VOP_UPDATE, but merely find its buffer and write it.
The only time we need to get the inode itself is from the result of a
mkdir whose name will never be ".." and hence locking such an inode will
never request a lock above us in the filesystem tree. Thanks to Brian
Fundakowski Feldman for providing the test program that tickled soft updates
into hanging in "inode" sleep.

Submitted by:	Brian Fundakowski Feldman <green@FreeBSD.org>
2000-01-18 01:30:03 +00:00
Kirk McKusick
105ef72c55 Better bounding on softdep_flushfiles; other minor tweeks to checks. 2000-01-17 06:35:11 +00:00
Kirk McKusick
107d5039ef Must track multiple uncommitted renames until one ultimately gets
committed to disk or is removed.
2000-01-17 06:28:18 +00:00
Matthew Dillon
173cce7c8e Non-operational change, fix compiler warning.
Reviewed by:  mckusick
2000-01-14 04:39:28 +00:00
Kirk McKusick
d7127837a2 Confirming Peter's fix (locking 101: release the lock before you go
to sleep). Locking 101, part 2: do not look at buffer contents after
you have been asleep. There is no telling what wonderous changes may
have occurred.
2000-01-13 20:03:22 +00:00
Peter Wemm
7f473504e6 Free the global softupdates lock prior to tsleep() in getdirtybuf().
This seems to be responsible for a bunch of panics where the process
sleeps and something else finds softupdates "locked" when it shouldn't
be.  This commit is unreviewed, but has been a big help here.
Previously my boxes would panic pretty much on the first fsync() that
wrote something to disk.
2000-01-13 18:48:12 +00:00
Kirk McKusick
1c2ceb2880 Because cylinder group blocks are now written in background,
it is no longer sufficient to get a lock on a buffer to know
that its write has been completed. We have to first get the
lock on the buffer, then check to see if it is doing a
background write. If it is doing background write, we have
to wait for the background write to finish, then check to see
if that fullfilled our dependency, and if not to start another
write. Luckily the explanation is longer than the fix.
2000-01-13 07:20:01 +00:00
Kirk McKusick
94313add1f A panic occurs during an fsync when a dirty block associated with
a vnode has not been written (which would clear certain of its
dependencies). The problems arises because fsync with MNT_NOWAIT
no longer pushes all the dirty blocks associated with a vnode. It
skips those that require rollbacks, since they will just get instantly
dirty again. Such skipped blocks are marked so that they will not be
skipped a second time (otherwise circular dependencies would never
clear). So, we fsync twice to ensure that everything will be written
at least once.
2000-01-13 07:17:39 +00:00
Kirk McKusick
4ed62fbd7f The only known cause of this panic is running out of disk space.
The problem occurs when an indirect block and a data block are
being allocated at the same time. For example when the 13th block
of the file is written, the filesystem needs to allocate the first
indirect block and a data block. If the indirect block allocation
succeeds, but the data block allocation fails, the error code
dellocates the indirect block as it has nothing at which to point.
Unfortunately, it does not deallocate the indirect block's associated
dependencies which then fail when they find the block unexpectedly
gone (ptr == 0 instead of its expected value). The fix is to fsync
the file before doing the block rollback, as the fsync will flush
out all of the dependencies. Once the rollback is done the file
must be fsync'ed again so that the soft updates code does not find
unexpected changes. This approach is much slower than writing the
code to back out the extraneous dependencies, but running out of
disk space is not expected to be a common occurence, so just getting
it right is the main criterion.

PR:		kern/15063
Submitted by:	Assar Westerlund <assar@stacken.kth.se>
2000-01-11 08:27:00 +00:00
Kirk McKusick
10767f840b We cannot proceed to free the blocks of the file until the dependencies
have been cleaned up by deallocte_dependencies(). Once that is done, it
is safe to post the request to free the blocks. A similar change is also
needed for the freefile case.
2000-01-11 06:52:35 +00:00
Poul-Henning Kamp
ba4ad1fcea Give vn_isdisk() a second argument where it can return a suitable errno.
Suggested by:	bde
2000-01-10 12:04:27 +00:00
Kirk McKusick
26e5527c86 Missing FREE_LOCK call before handle_workitem_freeblocks.
Submitted by:	"Kenneth D. Merry" <ken@kdm.org>
2000-01-10 08:39:03 +00:00
Kirk McKusick
cf60e8e4bf Several performance improvements for soft updates have been added:
1) Fastpath deletions. When a file is being deleted, check to see if it
   was so recently created that its inode has not yet been written to
   disk. If so, the delete can proceed to immediately free the inode.
2) Background writes: No file or block allocations can be done while the
   bitmap is being written to disk. To avoid these stalls, the bitmap is
   copied to another buffer which is written thus leaving the original
   available for futher allocations.
3) Link count tracking. Constantly track the difference in i_effnlink and
   i_nlink so that inodes that have had no change other than i_effnlink
   need not be written.
4) Identify buffers with rollback dependencies so that the buffer flushing
   daemon can choose to skip over them.
2000-01-10 00:24:24 +00:00
Kirk McKusick
f0f7d38386 Keep tighter control of removal dependencies by limiting the number
of dirrem structure rather than the collaterally created freeblks
and freefile structures. Limit the rate of buffer dirtying by the
syncer process during periods of intense file removal.
2000-01-09 23:35:38 +00:00
Kirk McKusick
3f5b28bc07 Reorganize softdep_fsync so that it only does the inode-is-flushed
check before the inode is unlocked while grabbing its parent directory.
Once it is unlocked, other operations may slip in that could make
the inode-is-flushed check fail. Allowing other writes to the inode
before returning from fsync does not break the semantics of fsync
since we have flushed everything that was dirty at the time of the
fsync call.
2000-01-09 23:14:57 +00:00
Kirk McKusick
e2dc60835d Get rid of unreferenced function. 2000-01-09 22:42:42 +00:00
Kirk McKusick
83aaf63ab2 Make static non-exported functions from soft updates. 2000-01-09 22:40:09 +00:00
Peter Wemm
c447342094 Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL"
is an application space macro and the applications are supposed to be free
to use it as they please (but cannot).  This is consistant with the other
BSD's who made this change quite some time ago.  More commits to come.
1999-12-29 05:07:58 +00:00
Bruce Evans
7e58bfacbe Update the unclean flag for mount -u. I forgot to handle this case
when I made the absence of the clean flag sticky in rev.1.88.  This
was a problem main for "mount /".  There is no way to mount "/" for
writing without using mount -u (normally implicitly), so after
"mount -f /" of an unclean filesystem, the absence of the clean flag
was sticky forever.
1999-12-23 15:42:14 +00:00
Eivind Eklund
369dc8ceb8 Change incorrect NULLs to 0s 1999-12-21 11:14:12 +00:00
Robert Watson
91f37dcba1 Second pass commit to introduce new ACL and Extended Attribute system
calls, vnops, vfsops, both in /kern, and to individual file systems that
require a vfsop_ array entry.

Reviewed by:	eivind
1999-12-19 06:08:07 +00:00
Kirk McKusick
6a4152243f The function request_cleanup() had a tsleep() with PCATCH. It is
quite dangerous, since the process may hold locks at the point,
and if it is stopped in that tsleep the machine may hang. Because
the sleep is so short, the PCATCH is not required here, so it has
been removed. For the future, the FreeBSD team needs to decide
whether it is still reasonable to stop a process in tsleep, as that
may affect any other code that uses PCATCH while holding kernel locks.

Submitted by:	Dmitrij Tejblum <tejblum@arc.hq.cti.ru>
Reviewed by:	Kirk McKusick <mckusick@mckusick.com>
1999-12-16 22:02:09 +00:00
Eivind Eklund
762e6b856c Introduce NDFREE (and remove VOP_ABORTOP) 1999-12-15 23:02:35 +00:00
Eivind Eklund
6bdfe06ad9 Lock reporting and assertion changes.
* lockstatus() and VOP_ISLOCKED() gets a new process argument and a new
  return value: LK_EXCLOTHER, when the lock is held exclusively by another
  process.
* The ASSERT_VOP_(UN)LOCKED family is extended to use what this gives them
* Extend the vnode_if.src format to allow more exact specification than
  locked/unlocked.

This commit should not do any semantic changes unless you are using
DEBUG_VFS_LOCKS.

Discussed with:	grog, mch, peter, phk
Reviewed by:	peter
1999-12-11 16:13:02 +00:00
Bill Fumerola
43cd4e8815 Remove the 'alpha, use at your own risk' death-statement.
Reviewed by:	mckusick (verbally at FreeBSDcon)
1999-12-03 00:40:31 +00:00
Bill Fumerola
cfa5001489 Fix typo, add $FreeBSD$ 1999-12-03 00:34:26 +00:00
Kirk McKusick
9f54c05286 Preferentially allocate the first indirect block in the same
cylinder group as the inode. This makes a 15% difference in
read speed for files in the 96K to 500K size range.
1999-12-01 19:33:12 +00:00
Poul-Henning Kamp
38224dcd59 Convert various pieces of code to use vn_isdisk() rather than checking
for vp->v_type == VBLK.

In ccd: we don't need to call VOP_GETATTR to find the type of a vnode.

Reviewed by:    sos
1999-11-22 10:33:55 +00:00
Eivind Eklund
b2f2b704d0 We do not have ffs_checkexp, so remove the prototype 1999-11-20 16:44:44 +00:00
Poul-Henning Kamp
0429e37ade struct mountlist and struct mount.mnt_list have no business being
a CIRCLEQ.  Change them to TAILQ_HEAD and TAILQ_ENTRY respectively.

This removes ugly  mp != (void*)&mountlist  comparisons.

Requested by:   phk
Submitted by:   Jake Burkholder jake@checker.org
PR:             14967
1999-11-20 10:00:46 +00:00
Poul-Henning Kamp
698f9cf828 Next step in the device cleanup process.
Correctly lock vnodes when calling VOP_OPEN() from filesystem mount code.

Unify spec_open() for bdev and cdev cases.

Remove the disabled bdev specific read/write code.
1999-11-09 14:15:33 +00:00
Bruce Evans
5bd5c8b9e5 Quick fix for breakage of ext2fs link counts as reported by stat(2) by
the soft updates changes: only report the link count to be i_effnlink
in ufs_getattr() for file systems that maintain i_effnlink.

Tested by:	Mike Dracopoulos <mdraco@math.uoa.gr>
1999-11-03 12:05:39 +00:00
Mike Smith
6d14782861 Newline-terminate the complaint message about not being able to find
the root vnode pointer.
1999-11-01 23:57:28 +00:00
Poul-Henning Kamp
923502ff91 useracc() the prequel:
Merge the contents (less some trivial bordering the silly comments)
of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>.  This puts
the #defines for the vm_inherit_t and vm_prot_t types next to their
typedefs.

This paves the road for the commit to follow shortly: change
useracc() to use VM_PROT_{READ|WRITE} rather than B_{READ|WRITE}
as argument.
1999-10-29 18:09:36 +00:00
Poul-Henning Kamp
b89392e703 Remove the D_NOCLUSTER[RW] options which were added because vn had
problems.  Now that Matt has fixed vn, this can go.  The vn driver
should have used d_maxio (now si_iosize_max) anyway.
1999-09-30 07:11:30 +00:00
Poul-Henning Kamp
1b5464ef9d Remove v_maxio from struct vnode.
Replace it with mnt_iosize_max in struct mount.

Nits from:	bde
1999-09-29 20:05:33 +00:00
Alfred Perlstein
c24fda81c9 Seperate the export check in VFS_FHTOVP, exports are now checked via
VFS_CHECKEXP.

Add fh(open|stat|stafs) syscalls to allow userland to query filesystems
based on (network) filehandle.

Obtained from:	NetBSD
1999-09-11 00:46:08 +00:00
Peter Wemm
280652828b $Id$ -> $FreeBSD$ 1999-08-28 02:16:32 +00:00
Peter Wemm
c3aac50f28 $Id$ -> $FreeBSD$ 1999-08-28 01:08:13 +00:00