Commit Graph

55 Commits

Author SHA1 Message Date
Kirk McKusick
37e2ebfdba This patch fixes a bug on an active filesystem on which a snapshot
is being taken from panicing with either "freeing free block" or
"freeing free inode". The problem arises when the snapshot code
is scanning the filesystem looking for inodes with a reference
count of zero (e.g., unlinked but still open) so that it can
expunge them from its view. If it encounters a reclaimed vnode
and has to restart its scan, then it will panic if it encounters
and tries to free an inode that it has already processed. The fix
is to check each candidate inode to see if it has already been
processed before trying to delete it from the snapshot image.

Sponsored by:   DARPA & NAI Labs.
2003-02-22 00:29:51 +00:00
Poul-Henning Kamp
de6ba7c016 Move the allocation of the inode contents into ffs_vfsops.c rather than
passing malloc types around.
2002-12-27 10:23:03 +00:00
Poul-Henning Kamp
975512a907 Make ffs_mountfs() static.
Remove the malloctype from the ufs mount structure, instead add a callback
to the storage method for freeing inodes: UFS_IFREE().

Add vfs_ifree() method function which frees an inode.

Unvariablelize the malloc type used for allocating inodes.
2002-12-27 10:06:37 +00:00
Poul-Henning Kamp
9bf1a75697 Introduce typedefs for the member functions of struct vfsops and employ
these in the main filesystems.  This does not change the resulting code
but makes the source a little bit more grepable.

Sponsored by:	DARPA and NAI Labs.
2002-08-13 10:05:50 +00:00
Poul-Henning Kamp
17b1994bbe Move ffs_isfreeblock() to ffs_alloc.c and make it static.
Sponsored by: DARPA & NAI Labs.
2002-07-30 11:54:48 +00:00
Kirk McKusick
7aca6291e3 Add support to UFS2 to provide storage for extended attributes.
As this code is not actually used by any of the existing
interfaces, it seems unlikely to break anything (famous
last words).

The internal kernel interface to manipulate these attributes
is invoked using two new IO_ flags: IO_NORMAL and IO_EXT.
These flags may be specified in the ioflags word of VOP_READ,
VOP_WRITE, and VOP_TRUNCATE. Specifying IO_NORMAL means that
you want to do I/O to the normal data part of the file and
IO_EXT means that you want to do I/O to the extended attributes
part of the file. IO_NORMAL and IO_EXT are mutually exclusive
for VOP_READ and VOP_WRITE, but may be specified individually
or together in the case of VOP_TRUNCATE. For example, when
removing a file, VOP_TRUNCATE is called with both IO_NORMAL
and IO_EXT set. For backward compatibility, if neither IO_NORMAL
nor IO_EXT is set, then IO_NORMAL is assumed.

Note that the BA_ and IO_ flags have been `merged' so that they
may both be used in the same flags word. This merger is possible
by assigning the IO_ flags to the low sixteen bits and the BA_
flags the high sixteen bits. This works because the high sixteen
bits of the IO_ word is reserved for read-ahead and help with
write clustering so will never be used for flags. This merge
lets us get away from code of the form:

        if (ioflags & IO_SYNC)
                flags |= BA_SYNC;

For the future, I have considered adding a new field to the
vattr structure, va_extsize. This addition could then be
exported through the stat structure to allow applications to
find out the size of the extended attribute storage and also
would provide a more standard interface for truncating them
(via VOP_SETATTR rather than VOP_TRUNCATE).

I am also contemplating adding a pathconf parameter (for
concreteness, lets call it _PC_MAX_EXTSIZE) which would
let an application determine the maximum size of the extended
atribute storage.

Sponsored by:	DARPA & NAI Labs.
2002-07-19 07:29:39 +00:00
Ian Dowse
5346934fe7 Add the ffs bits necessary to support unloading of the ufs kernel
module. This adds an ffs_uninit() function that calls ufs_uninit()
and also calls a new softdep_uninitialize() function. Add a stub
for softdep_uninitialize() to cover the non-SOFTUPDATES case.

Reviewed by:	mckusick
2002-07-01 11:00:47 +00:00
Kirk McKusick
1c85e6a35d This commit adds basic support for the UFS2 filesystem. The UFS2
filesystem expands the inode to 256 bytes to make space for 64-bit
block pointers. It also adds a file-creation time field, an ability
to use jumbo blocks per inode to allow extent like pointer density,
and space for extended attributes (up to twice the filesystem block
size worth of attributes, e.g., on a 16K filesystem, there is space
for 32K of attributes). UFS2 fully supports and runs existing UFS1
filesystems. New filesystems built using newfs can be built in either
UFS1 or UFS2 format using the -O option. In this commit UFS1 is
the default format, so if you want to build UFS2 format filesystems,
you must specify -O 2. This default will be changed to UFS2 when
UFS2 proves itself to be stable. In this commit the boot code for
reading UFS2 filesystems is not compiled (see /sys/boot/common/ufsread.c)
as there is insufficient space in the boot block. Once the size of the
boot block is increased, this code can be defined.

Things to note: the definition of SBSIZE has changed to SBLOCKSIZE.
The header file <ufs/ufs/dinode.h> must be included before
<ufs/ffs/fs.h> so as to get the definitions of ufs2_daddr_t and
ufs_lbn_t.

Still TODO:
Verify that the first level bootstraps work for all the architectures.
Convert the utility ffsinfo to understand UFS2 and test growfs.
Add support for the extended attribute storage. Update soft updates
to ensure integrity of extended attribute storage. Switch the
current extended attribute interfaces to use the extended attribute
storage. Add the extent like functionality (framework is there,
but is currently never used).

Sponsored by: DARPA & NAI Labs.
Reviewed by:	Poul-Henning Kamp <phk@freebsd.org>
2002-06-21 06:18:05 +00:00
Alfred Perlstein
6f1e855112 Remove __P. 2002-03-19 22:40:48 +00:00
Kirk McKusick
a0595d0249 Add a flags parameter to VFS_VGET to pass through the desired
locking flags when acquiring a vnode. The immediate purpose is
to allow polling lock requests (LK_NOWAIT) needed by soft updates
to avoid deadlock when enlisting other processes to help with
the background cleanup. For the future it will allow the use of
shared locks for read access to vnodes. This change touches a
lot of files as it affects most filesystems within the system.
It has been well tested on FFS, loopback, and CD-ROM filesystems.
only lightly on the others, so if you find a problem there, please
let me (mckusick@mckusick.com) know.
2002-03-17 01:25:47 +00:00
Kirk McKusick
c9f96392c7 When taking a snapshot, we must check for active files that have
been unlinked (e.g., with a zero link count). We have to expunge
all trace of these files from the snapshot so that they are neither
reclaimed prematurely by fsck nor saved unnecessarily by dump.
2002-02-02 01:42:44 +00:00
Kirk McKusick
03a2057a5b This patch fixes a long standing complaint with soft updates in
which small and/or nearly full filesystems would fail with `file
system full' messages when trying to replace a number of existing
files (for example during a system installation). When the allocation
routines are about to fail with a file system full condition, they
make a call to softdep_request_cleanup() which attempts to accelerate
the flushing of pending deletion requests in an effort to free up
space. In the face of filesystem I/O requests that exceed the
available disk transfer capacity, the cleanup request could take
an unbounded amount of time. Thus, the softdep_request_cleanup()
routine will only try for tickdelay seconds (default 2 seconds)
before giving up and returning a filesystem full error. Under typical
conditions, the softdep_request_cleanup() routine is able to free
up space in under fifty milliseconds.
2002-01-22 06:17:22 +00:00
Julian Elischer
b40ce4165d KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after:    ha ha ha ha
2001-09-12 08:38:13 +00:00
John Baldwin
55d132317c Forward declare struct cg to quiet a warning.
Submitted by:	bde
2001-05-30 23:08:40 +00:00
Kirk McKusick
23371b2f22 Refinement to revision 1.16 of ufs/ffs/ffs_snapshot.c to reduce
the amount of time that the filesystem must be suspended. The
current snapshot is elided as well as the earlier snapshots.
2001-05-04 05:49:28 +00:00
Poul-Henning Kamp
855aa097af VOP_BALLOC was never really a VOP in the first place, so convert it
to UFS_BALLOC like the other "between UFS and FFS function interfaces".
2001-04-29 12:36:52 +00:00
Poul-Henning Kamp
0c25dbeb17 Remove faint traces of non-existant ffs_bmap(). 2001-04-29 10:23:32 +00:00
Kirk McKusick
812b1d416c Add kernel support for running fsck on active filesystems. 2001-03-21 04:09:01 +00:00
Kirk McKusick
589c7af992 Fixes to track snapshot copy-on-write checking in the specinfo
structure rather than assuming that the device vnode would reside
in the FFS filesystem (which is obviously a broken assumption with
the device filesystem).
2001-03-07 07:09:55 +00:00
Kirk McKusick
48d617487d Several small but important fixes for snapshots:
1) Be more tolerant of missing snapshot files by only trying to decrement
   their reference count if they are registered as active.

2) Fix for snapshots of filesystems with block sizes larger than 8K
   (from Ollivier Robert <roberto@eurocontrol.fr>).

3) Fix to avoid losing last block in snapshot file when calculating blocks
   that need to be copied (from Don Coleman <coleman@coleman.org>).
2000-12-19 04:41:09 +00:00
Adrian Chadd
0b0c10b48d Initial commit of IFS - a inode-namespaced FFS. Here is a short
description:

How it works:
--

Basically ifs is a copy of ffs, overriding some vfs/vnops. (Yes, hack.)
I didn't see the need in duplicating all of sys/ufs/ffs to get this
off the ground.

File creation is done through a special file - 'newfile' . When newfile
is called, the system allocates and returns an inode. Note that newfile
is done in a cloning fashion:

fd = open("newfile", O_CREAT|O_RDWR, 0644);
fstat(fd, &st);

printf("new file is %d\n", (int)st.st_ino);

Once you have created a file, you can open() and unlink() it by its returned
inode number retrieved from the stat call, ie:

fd = open("5", O_RDWR);

The creation permissions depend entirely if you have write access to the
root directory of the filesystem.

To get the list of currently allocated inodes, VOP_READDIR has been added
which returns a directory listing of those currently allocated.

--

What this entails:

* patching conf/files and conf/options to include IFS as a new compile
  option (and since ifs depends upon FFS, include the FFS routines)

* An entry in i386/conf/NOTES indicating IFS exists and where to go for
  an explanation

* Unstaticize a couple of routines in src/sys/ufs/ffs/ which the IFS
  routines require (ffs_mount() and ffs_reload())

* a new bunch of routines in src/sys/ufs/ifs/ which implement the IFS
  routines. IFS replaces some of the vfsops, and a handful of vnops -
  most notably are VFS_VGET(), VOP_LOOKUP(), VOP_UNLINK() and VOP_READDIR().
  Any other directory operation is marked as invalid.

What this results in:

* an IFS partition's create permissions are controlled by the perm/ownership of
  the root mount point, just like a normal directory

* Each inode has perm and ownership too

* IFS does *NOT* mean an FFS partition can be opened per inode. This is a
  completely seperate filesystem here

* Softupdates doesn't work with IFS, and really I don't think it needs it.
  Besides, fsck's are FAST. (Try it :-)

* Inodes 0 and 1 aren't allocatable because they are special (dump/swap IIRC).
  Inode 2 isn't allocatable since UFS/FFS locks all inodes in the system against
  this particular inode, and unravelling THAT code isn't trivial. Therefore,
  useful inodes start at 3.

Enjoy, and feedback is definitely appreciated!
2000-10-14 03:02:30 +00:00
Kirk McKusick
9b97113391 This patch corrects the first round of panics and hangs reported
with the new snapshot code.

Update addaliasu to correctly implement the semantics of the old
checkalias function. When a device vnode first comes into existence,
check to see if an anonymous vnode for the same device was created
at boot time by bdevvp(). If so, adopt the bdevvp vnode rather than
creating a new vnode for the device. This corrects a problem which
caused the kernel to panic when taking a snapshot of the root
filesystem.

Change the calling convention of vn_write_suspend_wait() to be the
same as vn_start_write().

Split out softdep_flushworklist() from softdep_flushfiles() so that
it can be used to clear the work queue when suspending filesystem
operations.

Access to buffers becomes recursive so that snapshots can recursively
traverse their indirect blocks using ffs_copyonwrite() when checking
for the need for copy on write when flushing one of their own indirect
blocks. This eliminates a deadlock between the syncer daemon and a
process taking a snapshot.

Ensure that softdep_process_worklist() can never block because of a
snapshot being taken. This eliminates a problem with buffer starvation.

Cleanup change in ffs_sync() which did not synchronously wait when
MNT_WAIT was specified. The result was an unclean filesystem panic
when doing forcible unmount with heavy filesystem I/O in progress.

Return a zero'ed block when reading a block that was not in use at
the time that a snapshot was taken. Normally, these blocks should
never be read. However, the readahead code will occationally read
them which can cause unexpected behavior.

Clean up the debugging code that ensures that no blocks be written
on a filesystem while it is suspended. Snapshots must explicitly
label the blocks that they are writing during the suspension so that
they do not cause a `write on suspended filesystem' panic.

Reorganize ffs_copyonwrite() to eliminate a deadlock and also to
prevent a race condition that would permit the same block to be
copied twice. This change eliminates an unexpected soft updates
inconsistency in fsck caused by the double allocation.

Use bqrelse rather than brelse for buffers that will be needed
soon again by the snapshot code. This improves snapshot performance.
2000-07-24 05:28:33 +00:00
Kirk McKusick
f2a2857bb3 Add snapshots to the fast filesystem. Most of the changes support
the gating of system calls that cause modifications to the underlying
filesystem. The gating can be enabled by any filesystem that needs
to consistently suspend operations by adding the vop_stdgetwritemount
to their set of vnops. Once gating is enabled, the function
vfs_write_suspend stops all new write operations to a filesystem,
allows any filesystem modifying system calls already in progress
to complete, then sync's the filesystem to disk and returns. The
function vfs_write_resume allows the suspended write operations to
begin again. Gating is not added by default for all filesystems as
for SMP systems it adds two extra locks to such critical kernel
paths as the write system call. Thus, gating should only be added
as needed.

Details on the use and current status of snapshots in FFS can be
found in /sys/ufs/ffs/README.snapshot so for brevity and timelyness
is not included here. Unless and until you create a snapshot file,
these changes should have no effect on your system (famous last words).
2000-07-11 22:07:57 +00:00
Kirk McKusick
d4c1816924 Clean up warning about undeclared function by declaring softdep_fsync
in mount.h instead of ffs_extern.h. The correct solution is to use
an indirect function pointer so that the kernel does not have to be
built with options FFS, but that will be left for another day.
2000-07-11 19:28:26 +00:00
Poul-Henning Kamp
7523681895 ARGH! I have too many source trees :-(
Fix prototype errors in last commit.
2000-06-16 13:00:33 +00:00
Kirk McKusick
83aaf63ab2 Make static non-exported functions from soft updates. 2000-01-09 22:40:09 +00:00
Eivind Eklund
b2f2b704d0 We do not have ffs_checkexp, so remove the prototype 1999-11-20 16:44:44 +00:00
Alfred Perlstein
c24fda81c9 Seperate the export check in VFS_FHTOVP, exports are now checked via
VFS_CHECKEXP.

Add fh(open|stat|stafs) syscalls to allow userland to query filesystems
based on (network) filehandle.

Obtained from:	NetBSD
1999-09-11 00:46:08 +00:00
Peter Wemm
c3aac50f28 $Id$ -> $FreeBSD$ 1999-08-28 01:08:13 +00:00
Kirk McKusick
c2606ec5c6 Add a hook to ffs_fsync to allow soft updates to get first chance at doing
a sync on the block device for the filesystem. That allows it to push the
bitmap blocks before the inode blocks which greatly reduces the number of
inode rollbacks that need to be done.
1999-05-14 01:26:46 +00:00
Bruce Evans
de5d1ba57c Don't pass unused unused timestamp args to UFS_UPDATE() or waste
time initializing them.  This almost finishes centralizing (in-core)
timestamp updates in ufs_itimes().
1999-01-07 16:14:19 +00:00
Bruce Evans
3d2d6cc3d8 Forward declare even more structs to restore some self-sufficiency.
Didn't fix new dependence on <ufs/ufs/inode.h> and its prerequisites.
1998-03-23 14:12:37 +00:00
Julian Elischer
b1897c197c Reviewed by: dyson@freebsd.org (john Dyson), dg@root.com (david greenman)
Submitted by:	Kirk McKusick (mcKusick@mckusick.com)
Obtained from:  WHistle development tree
1998-03-08 09:59:44 +00:00
Bruce Evans
9cf2c3e77a Forward declare some structs so that this file is more self-sufficient. 1998-02-03 21:52:02 +00:00
Bruce Evans
cb451ebdbd Staticized. 1997-11-22 08:35:46 +00:00
Poul-Henning Kamp
987f569678 Another VFS cleanup "kilo commit"
1.  Remove VOP_UPDATE, it is (also) an UFS/{FFS,LFS,EXT2FS,MFS}
    intereface function, and now lives in the ufsmount structure.

2.  Remove VOP_SEEK, it was unused.

3.  Add mode default vops:

    VOP_ADVLOCK          vop_einval
    VOP_CLOSE            vop_null
    VOP_FSYNC            vop_null
    VOP_IOCTL            vop_enotty
    VOP_MMAP             vop_einval
    VOP_OPEN             vop_null
    VOP_PATHCONF         vop_einval
    VOP_READLINK         vop_einval
    VOP_REALLOCBLKS      vop_eopnotsupp

    And remove identical functionality from filesystems

4.   Add vop_stdpathconf, which returns the canonical stuff.  Use
     it in the filesystems.  (XXX: It's probably wrong that specfs
     and fifofs sets this vop, shouldn't it come from the "host"
     filesystem, for instance ufs or cd9660 ?)

5.   Try to make system wide VOP functions have vop_* names.

6.   Initialize the um_* vectors in LFS.

(Recompile your LKMS!!!)
1997-10-16 20:32:40 +00:00
Poul-Henning Kamp
cec0f20ce7 VFS mega cleanup commit (x/N)
1.  Add new file "sys/kern/vfs_default.c" where default actions for
    VOPs go. Implement proper defaults for ABORTOP, BWRITE, LEASE,
    POLL, REVOKE and STRATEGY.  Various stuff spread over the entire
    tree belongs here.

2.  Change VOP_BLKATOFF to a normal function in cd9660.

3.  Kill VOP_BLKATOFF, VOP_TRUNCATE, VOP_VFREE, VOP_VALLOC.  These
    are private interface functions between UFS and the underlying
    storage manager layer (FFS/LFS/MFS/EXT2FS).  The functions now
    live in struct ufsmount instead.

4.  Remove a kludge of VOP_ functions in all filesystems, that did
    nothing but obscure the simplicity and break the expandability.
    If a filesystem doesn't implement VOP_FOO, it shouldn't have an
    entry for it in its vnops table.  The system will try to DTRT
    if it is not implemented.  There are still some cruft left, but
    the bulk of it is done.

5.  Fix another VCALL in vfs_cache.c (thanks Bruce!)
1997-10-16 10:50:27 +00:00
Poul-Henning Kamp
0be6b890f9 Add type arg to ffs_mountfs and avoid examining v_tag to find out
if MFS is getting a free ride.

Use generic ufs_reclaim().
1997-10-10 18:17:00 +00:00
KATO Takenori
81bca6ddae Clustered read and write are switched at mount-option level.
1. Clustered I/O is switched by the MNT_NOCLUSTERR and MNT_NOCLUSTERW
   bits of the mnt_flag.  The sysctl variables, vfs.foo.doclusterread
   and vfs.foo.doclusterwrite are deleted.  Only mount option can
   control clustered I/O from userland.
2. When foofs_mount mounts block device, foofs_mount checks D_CLUSTERR
   and D_CLUSTERW bits of the d_flags member in the block device switch
   table.  If D_NOCLUSTERR / D_NOCLUSTERW are set, MNT_NOCLUSTERR /
   MNT_NOCLUSTERW bits will be set.  In this case, MNT_NOCLUSTERR and
   MNT_NOCLUSTERW cannot be cleared from userland.
3. Vnode driver disables both clustered read and write.
4. Union filesystem disables clutered write.

Reviewed by:	bde
1997-09-27 13:40:20 +00:00
Garrett Wollman
57bf258e3d Fix all areas of the system (or at least all those in LINT) to avoid storing
socket addresses in mbufs.  (Socket buffers are the one exception.)  A number
of kernel APIs needed to get fixed in order to make this happen.  Also,
fix three protocol families which kept PCBs in mbufs to not malloc them
instead.  Delete some old compatibility cruft while we're at it, and add
some new routines in the in_cksum family.
1997-08-16 19:16:27 +00:00
Peter Wemm
6875d25465 Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.
1997-02-22 09:48:43 +00:00
Mike Pritchard
724ab19569 Add function prototypes for most of the new Lite2 functions.
Also made a few of the miscfs routines static to be
consistent.  Some modules simply required some additional
#includes to remove -Wall warnings.
1997-02-12 06:52:51 +00:00
John Dyson
996c772f58 This is the kernel Lite/2 commit. There are some requisite userland
changes, so don't expect to be able to run the kernel as-is (very well)
without the appropriate Lite/2 userland changes.

The system boots and can mount UFS filesystems.

Untested: ext2fs, msdosfs, NFS
Known problems: Incorrect Berkeley ID strings in some files.
		Mount_std mounts will not work until the getfsent
		library routine is changed.

Reviewed by:	various people
Submitted by:	Jeffery Hsu <hsu@freebsd.org>
1997-02-10 02:22:35 +00:00
Jordan K. Hubbard
1130b656e5 Make the long-awaited change from $Id$ to $FreeBSD$
This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore.  This update would have been
insane otherwise.
1997-01-14 07:20:47 +00:00
Bruce Evans
2043dc9a22 Removed bogus _BEGIN_DECLS/_END_DECLS.
Removed unused struct tag declarations in cloned code.

Added or cleaned up idempotency ifdefs.
1996-05-01 02:16:17 +00:00
Bruce Evans
e6302eab11 Removed vestigial support for the obsolete FIFO option. In ext2fs
it caused null pointer panics for all fifo operations unless FIFO
was defined.
1996-02-25 20:12:36 +00:00
Poul-Henning Kamp
b8dce649f1 Staticize. 1995-12-17 21:14:36 +00:00
Bruce Evans
f57e65478d Introduced a type `vop_t' for vnode operation functions and used
it 1138 times (:-() in casts and a few more times in declarations.
This change is null for the i386.

The type has to be `typedef int vop_t(void *)' and not `typedef
int vop_t()' because `gcc -Wstrict-prototypes' warns about the
latter.  Since vnode op functions are called with args of different
(struct pointer) types, neither of these function types is any use
for type checking of the arg, so it would be preferable not to use
the complete function type, especially since using the complete
type requires adding 1138 casts to avoid compiler warnings and
another 40+ casts to reverse the function pointer conversions before
calling the functions.
1995-11-09 08:17:23 +00:00
John Dyson
3441c37e0b Added indirect pointer for ffs_getpages, and added external declaration. 1995-09-06 05:41:17 +00:00
Bruce Evans
3aa12267a5 Add and move declarations to fix all of the warnings from `gcc -Wimplicit'
(except in netccitt, netiso and netns) that I didn't notice when I fixed
"all" such warnings before.
1995-03-28 07:58:53 +00:00