Commit Graph

126 Commits

Author SHA1 Message Date
Ollivier Robert
8694d8e912 Fix the lockmgr panic everyone is seeing at shutdown time.
vput assumes curproc is the lock holder, but it's not true in this case.

Thanks a lot Luoqi !

Submitted by:	luoqi
Tested by:	phk
2000-08-01 14:15:07 +00:00
Kirk McKusick
9b97113391 This patch corrects the first round of panics and hangs reported
with the new snapshot code.

Update addaliasu to correctly implement the semantics of the old
checkalias function. When a device vnode first comes into existence,
check to see if an anonymous vnode for the same device was created
at boot time by bdevvp(). If so, adopt the bdevvp vnode rather than
creating a new vnode for the device. This corrects a problem which
caused the kernel to panic when taking a snapshot of the root
filesystem.

Change the calling convention of vn_write_suspend_wait() to be the
same as vn_start_write().

Split out softdep_flushworklist() from softdep_flushfiles() so that
it can be used to clear the work queue when suspending filesystem
operations.

Access to buffers becomes recursive so that snapshots can recursively
traverse their indirect blocks using ffs_copyonwrite() when checking
for the need for copy on write when flushing one of their own indirect
blocks. This eliminates a deadlock between the syncer daemon and a
process taking a snapshot.

Ensure that softdep_process_worklist() can never block because of a
snapshot being taken. This eliminates a problem with buffer starvation.

Cleanup change in ffs_sync() which did not synchronously wait when
MNT_WAIT was specified. The result was an unclean filesystem panic
when doing forcible unmount with heavy filesystem I/O in progress.

Return a zero'ed block when reading a block that was not in use at
the time that a snapshot was taken. Normally, these blocks should
never be read. However, the readahead code will occationally read
them which can cause unexpected behavior.

Clean up the debugging code that ensures that no blocks be written
on a filesystem while it is suspended. Snapshots must explicitly
label the blocks that they are writing during the suspension so that
they do not cause a `write on suspended filesystem' panic.

Reorganize ffs_copyonwrite() to eliminate a deadlock and also to
prevent a race condition that would permit the same block to be
copied twice. This change eliminates an unexpected soft updates
inconsistency in fsck caused by the double allocation.

Use bqrelse rather than brelse for buffers that will be needed
soon again by the snapshot code. This improves snapshot performance.
2000-07-24 05:28:33 +00:00
Boris Popov
3fbd97427e Prevent possible dereference of NULL pointer.
Submitted by:	Marius Bendiksen <mbendiks@eunet.no>
2000-07-13 02:17:14 +00:00
Kirk McKusick
f2a2857bb3 Add snapshots to the fast filesystem. Most of the changes support
the gating of system calls that cause modifications to the underlying
filesystem. The gating can be enabled by any filesystem that needs
to consistently suspend operations by adding the vop_stdgetwritemount
to their set of vnops. Once gating is enabled, the function
vfs_write_suspend stops all new write operations to a filesystem,
allows any filesystem modifying system calls already in progress
to complete, then sync's the filesystem to disk and returns. The
function vfs_write_resume allows the suspended write operations to
begin again. Gating is not added by default for all filesystems as
for SMP systems it adds two extra locks to such critical kernel
paths as the write system call. Thus, gating should only be added
as needed.

Details on the use and current status of snapshots in FFS can be
found in /sys/ufs/ffs/README.snapshot so for brevity and timelyness
is not included here. Unless and until you create a snapshot file,
these changes should have no effect on your system (famous last words).
2000-07-11 22:07:57 +00:00
Robert Watson
b2b0497ab5 o If FFS_EXTATTR is defined, don't print out an error message on unmount
if an FFS partition returns EOPNOTSUPP, as it just means extended
  attributes weren't enabled on that partition.  Prevents spurious
  warning per-partition at shutdown.
2000-06-04 04:50:36 +00:00
Robert Watson
f3706a0361 s/ffs_unmonut/ffs_unmount/ in a gratuitous ufs_extattr printf.
Reported by:	knu
2000-05-07 17:21:08 +00:00
Poul-Henning Kamp
9626b608de Separate the struct bio related stuff out of <sys/buf.h> into
<sys/bio.h>.

<sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall
not be made a nested include according to bdes teachings on the
subject of nested includes.

Diskdrivers and similar stuff below specfs::strategy() should no
longer need to include <sys/buf.> unless they need caching of data.

Still a few bogus uses of struct buf to track down.

Repocopy by:    peter
2000-05-05 09:59:14 +00:00
Poul-Henning Kamp
2c9b67a8df Remove unneeded #include <vm/vm_zone.h>
Generated by:	src/tools/tools/kerninclude
2000-04-30 18:52:11 +00:00
Robert Watson
a64ed08955 Introduce extended attribute support for FFS, allowing arbitrary
(name, value) pairs to be associated with inodes.  This support is
used for ACLs, MAC labels, and Capabilities in the TrustedBSD
security extensions, which are currently under development.

In this implementation, attributes are backed to data vnodes in the
style of the quota support in FFS.  Support for FFS extended
attributes may be enabled using the FFS_EXTATTR kernel option
(disabled by default).  Userland utilities and man pages will be
committed in the next batch.  VFS interfaces and man pages have
been in the repo since 4.0-RELEASE and are unchanged.

o ufs/ufs/extattr.h: UFS-specific extattr defines
o ufs/ufs/ufs_extattr.c: bulk of support routines
o ufs/{ufs,ffs,mfs}/*.[ch]: hooks and extattr.h includes
o contrib/softupdates/ffs_softdep.c: extattr.h includes
o conf/options, conf/files, i386/conf/LINT: added FFS_EXTATTR

o coda/coda_vfsops.c: XXX required extattr.h due to ufsmount.h
(This should not be the case, and will be fixed in a future commit)

Currently attributes are not supported in MFS.  This will be fixed.

Reviewed by:	adrian, bp, freebsd-fs, other unthanked souls
Obtained from:	TrustedBSD Project
2000-04-15 03:34:27 +00:00
Poul-Henning Kamp
ba4ad1fcea Give vn_isdisk() a second argument where it can return a suitable errno.
Suggested by:	bde
2000-01-10 12:04:27 +00:00
Kirk McKusick
cf60e8e4bf Several performance improvements for soft updates have been added:
1) Fastpath deletions. When a file is being deleted, check to see if it
   was so recently created that its inode has not yet been written to
   disk. If so, the delete can proceed to immediately free the inode.
2) Background writes: No file or block allocations can be done while the
   bitmap is being written to disk. To avoid these stalls, the bitmap is
   copied to another buffer which is written thus leaving the original
   available for futher allocations.
3) Link count tracking. Constantly track the difference in i_effnlink and
   i_nlink so that inodes that have had no change other than i_effnlink
   need not be written.
4) Identify buffers with rollback dependencies so that the buffer flushing
   daemon can choose to skip over them.
2000-01-10 00:24:24 +00:00
Bruce Evans
7e58bfacbe Update the unclean flag for mount -u. I forgot to handle this case
when I made the absence of the clean flag sticky in rev.1.88.  This
was a problem main for "mount /".  There is no way to mount "/" for
writing without using mount -u (normally implicitly), so after
"mount -f /" of an unclean filesystem, the absence of the clean flag
was sticky forever.
1999-12-23 15:42:14 +00:00
Robert Watson
91f37dcba1 Second pass commit to introduce new ACL and Extended Attribute system
calls, vnops, vfsops, both in /kern, and to individual file systems that
require a vfsop_ array entry.

Reviewed by:	eivind
1999-12-19 06:08:07 +00:00
Eivind Eklund
762e6b856c Introduce NDFREE (and remove VOP_ABORTOP) 1999-12-15 23:02:35 +00:00
Poul-Henning Kamp
38224dcd59 Convert various pieces of code to use vn_isdisk() rather than checking
for vp->v_type == VBLK.

In ccd: we don't need to call VOP_GETATTR to find the type of a vnode.

Reviewed by:    sos
1999-11-22 10:33:55 +00:00
Poul-Henning Kamp
698f9cf828 Next step in the device cleanup process.
Correctly lock vnodes when calling VOP_OPEN() from filesystem mount code.

Unify spec_open() for bdev and cdev cases.

Remove the disabled bdev specific read/write code.
1999-11-09 14:15:33 +00:00
Bruce Evans
5bd5c8b9e5 Quick fix for breakage of ext2fs link counts as reported by stat(2) by
the soft updates changes: only report the link count to be i_effnlink
in ufs_getattr() for file systems that maintain i_effnlink.

Tested by:	Mike Dracopoulos <mdraco@math.uoa.gr>
1999-11-03 12:05:39 +00:00
Mike Smith
6d14782861 Newline-terminate the complaint message about not being able to find
the root vnode pointer.
1999-11-01 23:57:28 +00:00
Poul-Henning Kamp
923502ff91 useracc() the prequel:
Merge the contents (less some trivial bordering the silly comments)
of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>.  This puts
the #defines for the vm_inherit_t and vm_prot_t types next to their
typedefs.

This paves the road for the commit to follow shortly: change
useracc() to use VM_PROT_{READ|WRITE} rather than B_{READ|WRITE}
as argument.
1999-10-29 18:09:36 +00:00
Poul-Henning Kamp
b89392e703 Remove the D_NOCLUSTER[RW] options which were added because vn had
problems.  Now that Matt has fixed vn, this can go.  The vn driver
should have used d_maxio (now si_iosize_max) anyway.
1999-09-30 07:11:30 +00:00
Poul-Henning Kamp
1b5464ef9d Remove v_maxio from struct vnode.
Replace it with mnt_iosize_max in struct mount.

Nits from:	bde
1999-09-29 20:05:33 +00:00
Alfred Perlstein
c24fda81c9 Seperate the export check in VFS_FHTOVP, exports are now checked via
VFS_CHECKEXP.

Add fh(open|stat|stafs) syscalls to allow userland to query filesystems
based on (network) filehandle.

Obtained from:	NetBSD
1999-09-11 00:46:08 +00:00
Peter Wemm
c3aac50f28 $Id$ -> $FreeBSD$ 1999-08-28 01:08:13 +00:00
Poul-Henning Kamp
41d2e3e09e Introduce vn_isdisk(struct vnode *vp) function, and use it to test for diskness. 1999-08-25 12:24:39 +00:00
Poul-Henning Kamp
7dc5cd047f The bdevsw() and cdevsw() are now identical, so kill the former. 1999-08-13 10:29:38 +00:00
Poul-Henning Kamp
0ef1c82630 Decommision miscfs/specfs/specdev.h. Most of it goes into <sys/conf.h>,
a few lines into <sys/vnode.h>.

Add a few fields to struct specinfo, paving the way for the fun part.
1999-08-08 18:43:05 +00:00
Poul-Henning Kamp
68de329e34 Use the fsid from the superblock, unless it looks bogus or has already
been taken by some other filesystem.
1999-07-11 19:16:50 +00:00
Poul-Henning Kamp
2447bec829 Simplify cdevsw registration.
The cdevsw_add() function now finds the major number(s) in the
struct cdevsw passed to it.  cdevsw_add_generic() is no longer
needed, cdevsw_add() does the same thing.

cdevsw_add() will print an message if the d_maj field looks bogus.

Remove nblkdev and nchrdev variables.  Most places they were used
bogusly.  Instead check a dev_t for validity by seeing if devsw()
or bdevsw() returns NULL.

Move bdevsw() and devsw() functions to kern/kern_conf.c

Bump __FreeBSD_version to 400006

This commit removes:
        72 bogus makedev() calls
        26 bogus SYSINIT functions

if_xe.c bogusly accessed cdevsw[], author/maintainer please fix.

I4b and vinum not changed.  Patches emailed to authors.  LINT
probably broken until they catch up.
1999-05-31 11:29:30 +00:00
Poul-Henning Kamp
4be2eb8c49 I got tired of seeing all the cdevsw[major(foo)] all over the place.
Made a new (inline) function devsw(dev_t dev) and substituted it.

Changed to the BDEV variant to this format as well: bdevsw(dev_t dev)

DEVFS will eventually benefit from this change too.
1999-05-08 06:40:31 +00:00
Poul-Henning Kamp
46eede0058 Continue where Julian left off in July 1998:
Virtualize bdevsw[] from cdevsw.  bdevsw() is now an (inline)
        function.

        Join CDEV_MODULE and BDEV_MODULE to DEV_MODULE (please pay attention
        to the order of the cmaj/bmaj arguments!)

        Join CDEV_DRIVER_MODULE and BDEV_DRIVER_MODULE to DEV_DRIVER_MODULE
        (ditto!)

(Next step will be to convert all bdev dev_t's to cdev dev_t's
before they get to do any damage^H^H^H^H^H^Hwork in the kernel.)
1999-05-07 10:11:40 +00:00
Matthew Dillon
8aef171243 Fix warnings in preparation for adding -Wall -Wcast-qual to the
kernel compile
1999-01-28 00:57:57 +00:00
Bruce Evans
de5d1ba57c Don't pass unused unused timestamp args to UFS_UPDATE() or waste
time initializing them.  This almost finishes centralizing (in-core)
timestamp updates in ufs_itimes().
1999-01-07 16:14:19 +00:00
Eivind Eklund
fb1167777a Remove the 'waslocked' parameter to vfs_object_create(). 1999-01-05 18:50:03 +00:00
Eivind Eklund
a777e82019 Remove the last clients of vfs_object_create(..., waslocked=1);
waslocked will go away shortly.

Reviewed by:	dg
1999-01-02 01:32:36 +00:00
Peter Wemm
40c8cfe552 Use TAILQ macros for clean/dirty block list processing. Set b_xflags
rather than abusing the list next pointer with a magic number.
1998-10-31 15:31:29 +00:00
Bruce Evans
b5ee16407f Oops, the redundant tests for major numbers weren't redundant here.
They checked for the magic major number for the "device" behind mfs
mount points.  Use a more obvious check for this device.

Debugged by:		Andrew Gallatin <gallatin@cs.duke.edu>
1998-10-27 11:47:08 +00:00
Bruce Evans
9c0619dace Don't follow null bdevsw pointers. The `major(dev) < nblkdev' test rotted
when bdevsw[] became sparse.  We still depend on magic to avoid having to
check that (v_rdev) device numbers in vnodes are not NODEV.

Removed redundant `major(dev) < nblkdev' tests instead of updating them.
1998-10-25 19:02:48 +00:00
Poul-Henning Kamp
f5ef029e92 Nitpicking and dusting performed on a train. Removes trivial warnings
about unused variables, labels and other lint.
1998-10-25 17:44:59 +00:00
Bruce Evans
0922cce61c Fixed clean flag handling:
- don't set the clean flag on unmount of an unclean filesystem that was
  (forcibly) mounted rw.
- set the clean flag on rw -> ro update of a mounted initially-clean
  filesystem.
- fixed some style bugs (mostly long lines).

This uses the fs_flags field and FS_UNCLEAN state bit which were
introduced in the softdep changes.  NetBSD uses extra state bits in
fs_clean.

Reviewed by:	luoqui
1998-09-26 04:59:42 +00:00
Søren Schmidt
d024c95599 Remove the SLICE code.
This clearly needs alot more thought, and we dont need this to hunt
us down in 3.0-RELEASE.
1998-09-14 19:56:42 +00:00
Bruce Evans
8994ca3ce9 Removed statically configured mount type numbers (MOUNT_*) and all
references to them.

The change a couple of days ago to ignore these numbers in statically
configured vfsconf structs was slightly premature because the cd9660,
cfs, devfs, ext2fs, nfs vfs's still used MOUNT_* instead of the number
in their vfsconf struct.
1998-09-07 13:17:06 +00:00
Bruce Evans
0492d857d1 Removed unused includes. 1998-08-17 19:09:36 +00:00
Julian Elischer
bcbd6c6fdd Don't update superblock if mounted readonly,
also fixes some problems with softupdates on root.
More cleanups are needed here..
Submitted by: Luoqi Chen <luoqi@watermarkgroup.com>
1998-07-08 23:52:27 +00:00
Doug Rabson
8435e0aef5 Use size_t instead of u_int for sizes. 1998-06-04 17:21:39 +00:00
Julian Elischer
c11d29814e try stop the user from using mount -u to set the async flag on
a filesystem currently using soft updates.
Also needs a new copy of ffs_softdep.c to complete the fix.
1998-05-18 06:38:18 +00:00
Mike Smith
79cc756d8b As described by the submitter:
Reverse the VFS_VRELE patch.  Reference counting of vnodes does not need
to be done per-fs.  I noticed this while fixing vfs layering violations.
Doing reference counting in generic code is also the preference cited by
John Heidemann in recent discussions with him.

The implementation of alternative vnode management per-fs is still a valid
requirement for some filesystems but will be revisited sometime later,
most likely using a different framework.

Submitted by:	Michael Hancock <michaelh@cet.co.jp>
1998-05-06 05:29:41 +00:00
Julian Elischer
c0bab11dfe Make the devfs SLICE option a standard type option.
(hopefully it will go away eventually anyhow)
1998-04-20 03:57:41 +00:00
Julian Elischer
3e425b968d Add changes and code to implement a functional DEVFS.
This code will be turned on with the TWO options
DEVFS and SLICE. (see LINT)
Two labels PRE_DEVFS_SLICE and POST_DEVFS_SLICE will deliniate these changes.

/dev will be automatically mounted by init (thanks phk)
on bootup. See /sys/dev/slice/slice.4 for more info.
All code should act the same without these options enabled.

Mike Smith, Poul Henning Kamp, Soeren, and a few dozen others

This code does not support the following:
bad144 handling.
Persistance. (My head is still hurting from the last time we discussed this)
ATAPI flopies are not handled by the SLICE code yet.

When this code is running, all major numbers are arbitrary and COULD
be dynamically assigned. (this is not done, for POLA only)
Minor numbers for disk slices ARE arbitray and dynamically assigned.
1998-04-19 23:32:49 +00:00
Poul-Henning Kamp
227ee8a188 Eradicate the variable "time" from the kernel, using various measures.
"time" wasn't a atomic variable, so splfoo() protection were needed
around any access to it, unless you just wanted the seconds part.

Most uses of time.tv_sec now uses the new variable time_second instead.

gettime() changed to getmicrotime(0.

Remove a couple of unneeded splfoo() protections, the new getmicrotime()
is atomic, (until Bruce sets a breakpoint in it).

A couple of places needed random data, so use read_random() instead
of mucking about with time which isn't random.

Add a new nfs_curusec() function.

Mark a couple of bogosities involving the now disappeard time variable.

Update ffs_update() to avoid the weird "== &time" checks, by fixing the
one remaining call that passwd &time as args.

Change profiling in ncr.c to use ticks instead of time.  Resolution is
the same.

Add new function "tvtohz()" to avoid the bogus "splfoo(), add time, call
hzto() which subtracts time" sequences.

Reviewed by:	bde
1998-03-30 09:56:58 +00:00
Peter Wemm
26cf9c3b75 Enable the use of soft updates on the root filesystem. Previously, the
softdep mode could only be activated on the initial mount of a filesystem
and then only if it was a read-write mount.  A 'mount -r' (as done in the
rootfs mount) followed by a 'mount -u' to convert to read-write didn't
start softdep mode.
1998-03-27 14:20:57 +00:00