Commit Graph

680 Commits

Author SHA1 Message Date
Kirk McKusick
71868b020d More aggressively rate limit the growth of soft dependency structures
in the face of multiple processes doing massive numbers of filesystem
operations. While this patch will work in nearly all situations, there
are still some perverse workloads that can overwhelm the system.
Detecting and handling these perverse workloads will be the subject
of another patch.

Reviewed by:	Paul Saab <ps@yahoo-inc.com>
Obtained from:	Ethan Solomita <ethan@geocast.com>
2000-11-20 06:22:39 +00:00
Matthew Dillon
936524aa02 Implement a low-memory deadlock solution.
Removed most of the hacks that were trying to deal with low-memory
    situations prior to now.

    The new code is based on the concept that I/O must be able to function in
    a low memory situation.  All major modules related to I/O (except
    networking) have been adjusted to allow allocation out of the system
    reserve memory pool.  These modules now detect a low memory situation but
    rather then block they instead continue to operate, then return resources
    to the memory pool instead of cache them or leave them wired.

    Code has been added to stall in a low-memory situation prior to a vnode
    being locked.

    Thus situations where a process blocks in a low-memory condition while
    holding a locked vnode have been reduced to near nothing.  Not only will
    I/O continue to operate, but many prior deadlock conditions simply no
    longer exist.

Implement a number of VFS/BIO fixes

	(found by Ian): in biodone(), bogus-page replacement code, the loop
        was not properly incrementing loop variables prior to a continue
        statement.  We do not believe this code can be hit anyway but we
        aren't taking any chances.  We'll turn the whole section into a
        panic (as it already is in brelse()) after the release is rolled.

	In biodone(), the foff calculation was incorrectly
        clamped to the iosize, causing the wrong foff to be calculated
        for pages in the case of an I/O error or biodone() called without
        initiating I/O.  The problem always caused a panic before.  Now it
        doesn't.  The problem is mainly an issue with NFS.

	Fixed casts for ~PAGE_MASK.  This code worked properly before only
        because the calculations use signed arithmatic.  Better to properly
        extend PAGE_MASK first before inverting it for the 64 bit masking
        op.

	In brelse(), the bogus_page fixup code was improperly throwing
        away the original contents of 'm' when it did the j-loop to
        fix the bogus pages.  The result was that it would potentially
        invalidate parts of the *WRONG* page(!), leading to corruption.

	There may still be cases where a background bitmap write is
        being duplicated, causing potential corruption.  We have identified
        a potentially serious bug related to this but the fix is still TBD.
        So instead this patch contains a KASSERT to detect the problem
  	and panic the machine rather then continue to corrupt the filesystem.
	The problem does not occur very often..  it is very hard to
	reproduce, and it may or may not be the cause of the corruption
	people have reported.

Review by: (VFS/BIO: mckusick, Ian Dowse <iedowse@maths.tcd.ie>)
Testing by: (VM/Deadlock) Paul Saab <ps@yahoo-inc.com>
2000-11-18 23:06:26 +00:00
Kirk McKusick
bd4bd019fb When deleting a file, the ordering of events imposed by soft updates
is to first write the deleted directory entry to disk, second write
the zero'ed inode to disk, and finally to release the freed blocks
and the inode back to the cylinder-group map. As this ordering
requires two disk writes to occur which are normally spaced about
30 seconds apart (except when memory is under duress), it takes
about a minute from the time that a file is deleted until its inode
and data blocks show up in the cylinder-group map for reallocation.
If a file has had only a brief lifetime (less than 30 seconds from
creation to deletion), neither its inode nor its directory entry
may have been written to disk. If its directory entry has not been
written to disk, then we need not wait for that directory block to
be written as the on-disk directory block does not reference the
inode. Similarly, if the allocated inode has never been written to
disk, we do not have to wait for it to be written back either as
its on-disk representation is still zero'ed out. Thus, in the case
of a short lived file, we can simply release the blocks and inode
to the cylinder-group map immediately. As the inode and its blocks
are released immediately, they are immediately available for other
uses. If they are not released for a minute, then other inodes and
blocks must be allocated for short lived files, cluttering up the
vnode and buffer caches. The previous code was a bit too aggressive
in trying to release the blocks and inode back to the cylinder-group
map resulting in their being made available when in fact the inode
on disk had not yet been zero'ed. This patch takes a more conservative
approach to doing the release which avoids doing the release prematurely.
2000-11-14 09:00:25 +00:00
Bruce Evans
1c1752872f Fixed breakage of mknod() in rev.1.48 of ext2_vnops.c and rev.1.126 of
ufs_vnops.c:

1) i_ino was confused with i_number, so the inode number passed to
   VFS_VGET() was usually wrong (usually 0U).
2) ip was dereferenced after vgone() freed it, so the inode number
   passed to VFS_VGET() was sometimes not even wrong.

Bug (1) was usually fatal in ext2_mknod(), since ext2fs doesn't have
space for inode 0 on the disk; ino_to_fsba() subtracts 1 from the
inode number, so inode number 0U gives a way out of bounds array
index.  Bug(1) was usually harmless in ufs_mknod(); ino_to_fsba()
doesn't subtract 1, and VFS_VGET() reads suitable garbage (all 0's?)
from the disk for the invalid inode number 0U; ufs_mknod() returns
a wrong vnode, but most callers just vput() it; the correct vnode is
eventually obtained by an implicit VFS_VGET() just like it used to be.

Bug (2) usually doesn't happen.
2000-11-04 08:10:56 +00:00
Eivind Eklund
e3c4036b18 Give vop_mmap an untimely death. The opportunity to give it a timely
death timed out in 1996.
2000-11-01 17:57:24 +00:00
Poul-Henning Kamp
ef10cd6a64 Add a missing <sys/systm.h> 2000-10-30 20:37:19 +00:00
Poul-Henning Kamp
cf9fa8e725 Move suser() and suser_xxx() prototypes and a related #define from
<sys/proc.h> to <sys/systm.h>.

Correctly document the #includes needed in the manpage.

Add one now needed #include of <sys/systm.h>.
Remove the consequent 48 unused #includes of <sys/proc.h>.
2000-10-29 16:06:56 +00:00
Poul-Henning Kamp
9f69a4578a Weaken a bogus dependency on <sys/proc.h> in <sys/buf.h> by #ifdef'ing
the offending inline function (BUF_KERNPROC) on it being #included
already.

I'm not sure BUF_KERNPROC() is even the right thing to do or in the
right place or implemented the right way (inline vs normal function).

Remove consequently unneeded #includes of <sys/proc.h>
2000-10-29 14:54:55 +00:00
Poul-Henning Kamp
53ce36d17a Remove unneeded #include <sys/proc.h> lines. 2000-10-29 13:57:19 +00:00
Robert Watson
47460a23a0 o Introduce new VOP_ACCESS() flag VADMIN, allowing file systems to perform
"administrative" authorization checks.  In most cases, the VADMIN test
  checks to make sure the credential effective uid is the same as the file
  owner.
o Modify vaccess() to set VADMIN as an available right if the uid is
  appropriate.
o Modify references to uid-based access control operations such that they
  now always invoke VOP_ACCESS() instead of using hard-coded policy checks.
o This allows alternative UFS policies to be implemented by replacing only
  ufs_access() (such as mandatory system policies).
o VOP_ACCESS() requires the caller to hold an exclusive vnode lock on the
  vnode: I believe that new invocations of VOP_ACCESS() are always called
  with the lock held.
o Some direct checks of the uid remain, largely associated with the QUOTA
  and SUIDDIR code.

Reviewed by:	eivind
Obtained from:	TrustedBSD Project
2000-10-19 07:53:59 +00:00
Adrian Chadd
0b0c10b48d Initial commit of IFS - a inode-namespaced FFS. Here is a short
description:

How it works:
--

Basically ifs is a copy of ffs, overriding some vfs/vnops. (Yes, hack.)
I didn't see the need in duplicating all of sys/ufs/ffs to get this
off the ground.

File creation is done through a special file - 'newfile' . When newfile
is called, the system allocates and returns an inode. Note that newfile
is done in a cloning fashion:

fd = open("newfile", O_CREAT|O_RDWR, 0644);
fstat(fd, &st);

printf("new file is %d\n", (int)st.st_ino);

Once you have created a file, you can open() and unlink() it by its returned
inode number retrieved from the stat call, ie:

fd = open("5", O_RDWR);

The creation permissions depend entirely if you have write access to the
root directory of the filesystem.

To get the list of currently allocated inodes, VOP_READDIR has been added
which returns a directory listing of those currently allocated.

--

What this entails:

* patching conf/files and conf/options to include IFS as a new compile
  option (and since ifs depends upon FFS, include the FFS routines)

* An entry in i386/conf/NOTES indicating IFS exists and where to go for
  an explanation

* Unstaticize a couple of routines in src/sys/ufs/ffs/ which the IFS
  routines require (ffs_mount() and ffs_reload())

* a new bunch of routines in src/sys/ufs/ifs/ which implement the IFS
  routines. IFS replaces some of the vfsops, and a handful of vnops -
  most notably are VFS_VGET(), VOP_LOOKUP(), VOP_UNLINK() and VOP_READDIR().
  Any other directory operation is marked as invalid.

What this results in:

* an IFS partition's create permissions are controlled by the perm/ownership of
  the root mount point, just like a normal directory

* Each inode has perm and ownership too

* IFS does *NOT* mean an FFS partition can be opened per inode. This is a
  completely seperate filesystem here

* Softupdates doesn't work with IFS, and really I don't think it needs it.
  Besides, fsck's are FAST. (Try it :-)

* Inodes 0 and 1 aren't allocatable because they are special (dump/swap IIRC).
  Inode 2 isn't allocatable since UFS/FFS locks all inodes in the system against
  this particular inode, and unravelling THAT code isn't trivial. Therefore,
  useful inodes start at 3.

Enjoy, and feedback is definitely appreciated!
2000-10-14 03:02:30 +00:00
Robert Watson
d62bd6076e o Sanity check was inverted, resulting in a possible spurious panic
during unmount if extended attributes were in use.  Correct by removing
  an unneeded (and undesirable) '!'.
2000-10-09 20:04:39 +00:00
Eivind Eklund
7eb9fca557 Blow away the v_specmountpoint define, replacing it with what it was
defined as (rdev->si_mountpoint)
2000-10-09 17:31:39 +00:00
Robert Watson
ff435dcb91 o Move initialization of ump from mp to the top of the function so that
it is defined whenm used in ufs_extattr_uepm_destroy(), fixing a panic
  due to a NULL pointer dereference.

Submitted by:	Wesley Morgan <morganw@chemicals.tacorp.com>
2000-10-06 15:31:28 +00:00
Robert Watson
9de54ba513 o Add call to ufs_extattr_uepm_destroy() in ffs_unmount() so as to clean
up lock on extattrs.
o Get for free a comment indicating where auto-starting of extended
  attributes will eventually occur, as it was in my commit tree also.
  No implementation change here, only a comment.
2000-10-04 04:44:51 +00:00
Robert Watson
d32d56a07d o Correct use of lockdestroy() by adding a new ufs_extattr_uepm_destroy()
call, which should be the last thing down to a per-mount extattr
  management structure, after ufs_extattr_stop() on the file system.
  This currently has the effect only of destroying the per-mount lock
  on extended attributes, and clearing appropriate flags.
o Remove inappropriate invocation in ufs_extattr_vnode_inactive().
2000-10-04 04:41:33 +00:00
Jason Evans
a18b1f1d4d Convert lockmgr locks from using simple locks to using mutexes.
Add lockdestroy() and appropriate invocations, which corresponds to
lockinit() and must be called to clean up after a lockmgr lock is no
longer needed.
2000-10-04 01:29:17 +00:00
Boris Popov
67e871664b Add a lock structure to vnode structure. Previously it was either allocated
separately (nfs, cd9660 etc) or keept as a first element of structure
referenced by v_data pointer(ffs). Such organization leads to known problems
with stacked filesystems.

From this point vop_no*lock*() functions maintain only interlock lock.
vop_std*lock*() functions maintain built-in v_lock structure using lockmgr().
vop_sharedlock() is compatible with vop_stdunlock(), but maintains a shared
lock on vnode.

If filesystem wishes to export lockmgr compatible lock, it can put an address
of this lock to v_vnlock field. This indicates that the upper filesystem
can take advantage of it and use single lock structure for entire (or part)
of stack of vnodes. This field shouldn't be examined or modified by VFS code
except for initialization purposes.

Reviewed in general by:	mckusick
2000-09-25 15:24:04 +00:00
Robert Watson
907da7c385 o Permit UFS Extended Attributes to be associated with special devices
and FIFOs.

Obtained from:	TrustedBSD Project
2000-09-21 19:06:02 +00:00
Robert Watson
bec1333db4 o Disallow privileged processes in jail() from directly accessing
system namespace extended attributes.
o Document privilege/jail() interaction relating to extended
  attributes.

Obtained from:	TrustedBSD Project
2000-09-18 18:10:13 +00:00
Robert Watson
cf48f6e42c o Allow privileged processes in jail() to override sticky bit behavior
on directories.
o Allow privileged processes in jail() to create inodes with the
  setgid bit set even if they are not a member of the group denoted
  by the file creation gid.  This occurs due to inherited gid's from
  parent directories on file creation, allowing a user to create a
  file with a gid that is not in the creating process's credentials.

Obtained from:	TrustedBSD Project
2000-09-18 18:03:49 +00:00
Robert Watson
f5770bb46a o Add a comment clarifying interaction between jail(), privileged processes,
and UFS file flags.  Here's what the comment says, for reference:

	Privileged processes in jail() are permitted to modify
	arbitrary user flags on files, but are not permitted
	to modify system flags.

  In other words, privilege does allow a process in jail to modify user
  flags for objects that the process does not own, but privilege will
  not permit the setting of system flags on the file.

Obtained from:	TrustedBSD Project
2000-09-18 17:58:15 +00:00
Robert Watson
ea57890740 o Add missing PRISON_ROOT allowing a privileged process in a jail() to not
remove the setuid/setgid bits by virtue of a change to a file with those
  bits set, even if the process doesn't own the file, or isn't a group
  member of the file's gid.

Obtained from:	TrustedBSD Project
2000-09-18 17:53:22 +00:00
Robert Watson
4da6e3d109 o Substitute suser() calls for direct credential checks, which is now
safe as suser() no longer sets ASU.
o Note that in some cases, the PRISON_ROOT flag is used even though no
  process structure is passed, to indicate that if a process structure
  (and hence jail) was available, it would be ok.  In the long run,
  the jail identifier should probably be moved to ucred, as the uidinfo
  information was.
o Some uid 0 checks remain relating to the quota code, which I'll leave
  for another day.

Reviewed by:	phk, eivind
Obtained from:	TrustedBSD Project
2000-09-18 16:13:02 +00:00
Dag-Erling Smørgrav
8461bdba85 Silence a warning. 2000-09-17 19:41:26 +00:00
Boris Popov
3ff1a2f43e Add new flag PDIRUNLOCK to the component.cn_flags which should be set by
filesystem lookup() routine if it unlocks parent directory. This flag should
be carefully tracked by filesystems if they want to work properly with nullfs
and other stacked filesystems.

VFS takes advantage of this flag to perform symantically correct usage
of vrele() instead of vput() if parent directory already unlocked.

If filesystem fails to track this flag then previous codepath in VFS left
unchanged.

Convert UFS code to set PDIRUNLOCK flag if necessary. Other filesystmes will
be changed after some period of testing.

Reviewed in general by:	mckusick, dillon, adrian
Obtained from:	NetBSD
2000-09-17 07:26:42 +00:00
Poul-Henning Kamp
ae2276e657 Remove a pointless casting of a gid_t to a gid_t. 2000-09-16 18:20:27 +00:00
Boris Popov
e37acb62a0 Add VOP_*VOBJECT vops, because MFS requires explicit vop specification.
Noted by:	knu
2000-09-12 16:21:16 +00:00
Robert Watson
5ab404120f o Variety of extended attribute fixes
- In ufs_extattr_enable(), return EEXIST instead of EOPNOTSUPP
	  if the caller tries to configure an attribute name that is
	  already configured
	- Throughout, add IO_NODELOCKED to VOP_{READ,WRITE} calls to
	  indicate lock status of passed vnode.  Apparently not a
	  problem, but worth fixing.
	- For all writes, make use of IO_SYNC consistent.  Really,
	  IO_UNIT and combining of VOP_WRITE's should happen, but I
	  don't have that tested.  At least with this, it's
	  consistent usage.  (pointed out by: bde)
	- In ufs_extattr_get(), fixed nested locking of backing
	  vnode (fine due to recursive lock support, but make it
	  more consistent with other code)
	- In ufs_extattr_get(), clean up return code to set uio_resid
	  more consistently with other pieces of code (worked fine,
	  this is just a cleanup)
	- Fix ufs_extattr_rm(), which was broken--effectively a nop.
	- Minor comment and whitespace fixes.

Obtained from:	TrustedBSD Project
2000-09-12 05:35:47 +00:00
John Baldwin
38a6ecf4de Fix a 64-bitism. Use size_t instead of int for 4th argument to copyinstr.
Approved by:	rwatson
2000-09-11 05:43:02 +00:00
Kirk McKusick
52a3bfa2e7 Cannot do MALLOC with M_WAITOK while holding ACQUIRE_LOCK
Obtained from:	Ethan Solomita <ethan@geocast.com>
2000-09-07 23:02:55 +00:00
Jason Evans
0384fff8c5 Major update to the way synchronization is done in the kernel. Highlights
include:

* Mutual exclusion is used instead of spl*().  See mutex(9).  (Note: The
  alpha port is still in transition and currently uses both.)

* Per-CPU idle processes.

* Interrupts are run in their own separate kernel threads and can be
  preempted (i386 only).

Partially contributed by:	BSDi (BSD/OS)
Submissions by (at least):	cp, dfr, dillon, grog, jake, jhb, sheldonh
2000-09-07 01:33:02 +00:00
Robert Watson
bbf0607700 Modify extended attribute protection model to authorize based on
attribute namespace and DAC protection on file:
	- Attribute names beginning with '$' are in the system namespace
	- The attribute name "$" is reserved
	- System namespace attributes may only be read/set by suser()
	  or by kernel (cred == NULL)
	- Other attribute names are in the application namespace
	- The attribute name "" is reserved
	- Application namespace attributes are protected in the manner
	  of the target file permission

o Kernel changes
	- Add ufs_extattr_valid_attrname() to check whether the requested
	  attribute "set" or "enable" is appropriate (i.e., non-reserved)
	- Modify ufs_extattr_credcheck() to accept target file vnode, not
	  to take inode uid
	- Modify ufs_extattr_credcheck() to check namespace, then enforce
	  either kernel/suser for system namespace, or vaccess() for
	  application namespace
o EA backing file format changes
	- Remove permission fields from extended attribute backing file
	  header
	- Bump extended attribute backing file header version to 3
o Update extattrctl.c and extattrctl.8
	- Remove now deprecated -r and -w arguments to initattr, as
	  permissions are now implicit
	- (unrelated) fix error reporting and unlinking during failed
	  initattr to remove duplicate/inaccurate error messages, and to
	  only unlink if the failure wasn't in the backing file open()

Obtained from:	TrustedBSD Project
2000-09-02 20:31:26 +00:00
Robert Watson
012c643d3e o Restructure vaccess() so as to check for DAC permission to modify the
object before falling back on privilege.  Make vaccess() accept an
  additional optional argument, privused, to determine whether
  privilege was required for vaccess() to return 0.  Add commented
  out capability checks for reference.  Rename some variables to make
  it more clear which modes/uids/etc are associated with the object,
  and which with the access mode.
o Update file system use of vaccess() to pass NULL as the optional
  privused argument.  Once additional patches are applied, suser()
  will no longer set ASU, so privused will permit passing of
  privilege information up the stack to the caller.

Reviewed by:	bde, green, phk, -security, others
Obtained from:	TrustedBSD Project
2000-08-29 14:45:49 +00:00
Robert Watson
877dd71fc6 o Correct spelling of ufs_exttatr_find_attr -> ufs_extattr_find_attr
o Add "const" qualifier to attrname argument of various calls to remove
  warnings

Obtained from:	TrustedBSD Project
2000-08-26 22:00:58 +00:00
Poul-Henning Kamp
3f54a085a6 Remove all traces of Julians DEVFS (incl from kern/subr_diskslice.c)
Remove old DEVFS support fields from dev_t.

  Make uid, gid & mode members of dev_t and set them in make_dev().

  Use correct uid, gid & mode in make_dev in disk minilayer.

  Add support for registering alias names for a dev_t using the
  new function make_dev_alias().  These will show up as symlinks
  in DEVFS.

  Use makedev() rather than make_dev() for MFSs magic devices to prevent
  DEVFS from noticing this abuse.

  Add a field for DEVFS inode number in dev_t.

  Add new DEVFS in fs/devfs.

  Add devfs cloning to:
        disk minilayer (ie: ad(4), sd(4), cd(4) etc etc)
        md(4), tun(4), bpf(4), fd(4)

  If DEVFS add -d flag to /sbin/inits args to make it mount devfs.

  Add commented out DEVFS to GENERIC
2000-08-20 21:34:39 +00:00
Poul-Henning Kamp
e39c53eda5 Centralize the canonical vop_access user/group/other check in vaccess().
Discussed with: bde
2000-08-20 08:36:26 +00:00
Tor Egge
b5ee7ec63a Initialize *countp to 0 in stub for softdep_flushworklist().
This allows ffs_fsync() to break out of a loop that might otherwise
be infinite on kernels compiled without the SOFTUPDATES option.
The observed symptom was a system hang at the first unmount attempt.
2000-08-09 00:41:54 +00:00
Ollivier Robert
8694d8e912 Fix the lockmgr panic everyone is seeing at shutdown time.
vput assumes curproc is the lock holder, but it's not true in this case.

Thanks a lot Luoqi !

Submitted by:	luoqi
Tested by:	phk
2000-08-01 14:15:07 +00:00
Peter Wemm
68e530258a Minor tweak - removed unused variable 'struct mount *mp'; 2000-07-28 22:28:05 +00:00
Peter Wemm
6ee6b42ef7 Minor change: fix warning - move a 'struct vnode *vp' declaration inside a
#ifdef DIAGNOSTIC to match its corresponding usage.
2000-07-28 22:27:00 +00:00
Kirk McKusick
3592b7155c Clean up the snapshot code so that it no longer depends on the use of
the SF_IMMUTABLE flag to prevent writing. Instead put in explicit
checking for the SF_SNAPSHOT flag in the appropriate places. With
this change, it is now possible to rename and link to snapshot files.
It is also possible to set or clear any of the owner, group, or
other read bits on the file, though none of the write or execute
bits can be set. There is also an explicit test to prevent the
setting or clearing of the SF_SNAPSHOT flag via chflags() or
fchflags(). Note also that the modify time cannot be changed as
it needs to accurately reflect the time that the snapshot was taken.

Submitted by:	Robert Watson <rwatson@FreeBSD.org>
2000-07-26 23:07:01 +00:00
Poul-Henning Kamp
a0580699e1 Fix the "mfs_badop[vop_getwritemount] = 45" messages. 2000-07-26 17:53:04 +00:00
Kirk McKusick
55ba28c60a Add stub for softdep_flushworklist() so that kernels compiled
without the SOFTUPDATES option will load correctly.

Obtained from:	John Baldwin <jhb@bsdi.com>
2000-07-25 05:28:59 +00:00
Kirk McKusick
d56bdab31c Eliminate periodic 'mfs_badop[vop_getwritemount] = 45' messages.
Submitted by:	Sheldon Hearn <sheldonh@uunet.co.za>
2000-07-25 05:11:57 +00:00
Kirk McKusick
9b97113391 This patch corrects the first round of panics and hangs reported
with the new snapshot code.

Update addaliasu to correctly implement the semantics of the old
checkalias function. When a device vnode first comes into existence,
check to see if an anonymous vnode for the same device was created
at boot time by bdevvp(). If so, adopt the bdevvp vnode rather than
creating a new vnode for the device. This corrects a problem which
caused the kernel to panic when taking a snapshot of the root
filesystem.

Change the calling convention of vn_write_suspend_wait() to be the
same as vn_start_write().

Split out softdep_flushworklist() from softdep_flushfiles() so that
it can be used to clear the work queue when suspending filesystem
operations.

Access to buffers becomes recursive so that snapshots can recursively
traverse their indirect blocks using ffs_copyonwrite() when checking
for the need for copy on write when flushing one of their own indirect
blocks. This eliminates a deadlock between the syncer daemon and a
process taking a snapshot.

Ensure that softdep_process_worklist() can never block because of a
snapshot being taken. This eliminates a problem with buffer starvation.

Cleanup change in ffs_sync() which did not synchronously wait when
MNT_WAIT was specified. The result was an unclean filesystem panic
when doing forcible unmount with heavy filesystem I/O in progress.

Return a zero'ed block when reading a block that was not in use at
the time that a snapshot was taken. Normally, these blocks should
never be read. However, the readahead code will occationally read
them which can cause unexpected behavior.

Clean up the debugging code that ensures that no blocks be written
on a filesystem while it is suspended. Snapshots must explicitly
label the blocks that they are writing during the suspension so that
they do not cause a `write on suspended filesystem' panic.

Reorganize ffs_copyonwrite() to eliminate a deadlock and also to
prevent a race condition that would permit the same block to be
copied twice. This change eliminates an unexpected soft updates
inconsistency in fsck caused by the double allocation.

Use bqrelse rather than brelse for buffers that will be needed
soon again by the snapshot code. This improves snapshot performance.
2000-07-24 05:28:33 +00:00
Robert Watson
bc373480dc o Marius pointed out an unusually inconvenient upper bound on extended
attribute data size.
o Fortunately it turned out to be an unused constant left over from an
  earlier implementation, and is therefore being removed so as not to
  confuse casual observers.

Submitted by:	mbendiks@eunet.no
2000-07-14 03:30:52 +00:00
Boris Popov
3fbd97427e Prevent possible dereference of NULL pointer.
Submitted by:	Marius Bendiksen <mbendiks@eunet.no>
2000-07-13 02:17:14 +00:00
Kirk McKusick
d303f71fdc Brain fault, forgot to update ffs_snapshot.c with the new calling convention
for vn_start_write.
2000-07-12 00:27:27 +00:00
Kirk McKusick
f2a2857bb3 Add snapshots to the fast filesystem. Most of the changes support
the gating of system calls that cause modifications to the underlying
filesystem. The gating can be enabled by any filesystem that needs
to consistently suspend operations by adding the vop_stdgetwritemount
to their set of vnops. Once gating is enabled, the function
vfs_write_suspend stops all new write operations to a filesystem,
allows any filesystem modifying system calls already in progress
to complete, then sync's the filesystem to disk and returns. The
function vfs_write_resume allows the suspended write operations to
begin again. Gating is not added by default for all filesystems as
for SMP systems it adds two extra locks to such critical kernel
paths as the write system call. Thus, gating should only be added
as needed.

Details on the use and current status of snapshots in FFS can be
found in /sys/ufs/ffs/README.snapshot so for brevity and timelyness
is not included here. Unless and until you create a snapshot file,
these changes should have no effect on your system (famous last words).
2000-07-11 22:07:57 +00:00