Commit Graph

275 Commits

Author SHA1 Message Date
Poul-Henning Kamp
46aa3347cb Convert all users of fldoff() to offsetof(). fldoff() is bad
because it only takes a struct tag which makes it impossible to
use unions, typedefs etc.

Define __offsetof() in <machine/ansi.h>

Define offsetof() in terms of __offsetof() in <stddef.h> and <sys/types.h>

Remove myriad of local offsetof() definitions.

Remove includes of <stddef.h> in kernel code.

NB: Kernelcode should *never* include from /usr/include !

Make <sys/queue.h> include <machine/ansi.h> to avoid polluting the API.

Deprecate <struct.h> with a warning.  The warning turns into an error on
01-12-2000 and the file gets removed entirely on 01-01-2001.

Paritials reviews by:   various.
Significant brucifications by:  bde
2000-10-27 11:45:49 +00:00
Eivind Eklund
7eb9fca557 Blow away the v_specmountpoint define, replacing it with what it was
defined as (rdev->si_mountpoint)
2000-10-09 17:31:39 +00:00
Jason Evans
a18b1f1d4d Convert lockmgr locks from using simple locks to using mutexes.
Add lockdestroy() and appropriate invocations, which corresponds to
lockinit() and must be called to clean up after a lockmgr lock is no
longer needed.
2000-10-04 01:29:17 +00:00
Boris Popov
e500f61fde ext2fs depends on ufs code, so update it to properly handle v_lock field.
Noticed by:	bde
2000-09-26 01:31:46 +00:00
Boris Popov
67e871664b Add a lock structure to vnode structure. Previously it was either allocated
separately (nfs, cd9660 etc) or keept as a first element of structure
referenced by v_data pointer(ffs). Such organization leads to known problems
with stacked filesystems.

From this point vop_no*lock*() functions maintain only interlock lock.
vop_std*lock*() functions maintain built-in v_lock structure using lockmgr().
vop_sharedlock() is compatible with vop_stdunlock(), but maintains a shared
lock on vnode.

If filesystem wishes to export lockmgr compatible lock, it can put an address
of this lock to v_vnlock field. This indicates that the upper filesystem
can take advantage of it and use single lock structure for entire (or part)
of stack of vnodes. This field shouldn't be examined or modified by VFS code
except for initialization purposes.

Reviewed in general by:	mckusick
2000-09-25 15:24:04 +00:00
Bruce Evans
96cae770d3 Fixed some serious bugs in ext2_readdir():
The cookie buffer was usually overrun by a large amount whenever
cookies were used.  Cookies are used by nfs and the Linuxulator, so
this bug usually caused panics whenever an ext2fs filesystem was nfs
mounted or a Linux utility that calls readdir() was run on an ext2fs
filesystem.

The directory buffer was sometimes overrun by a small amount.  This
sometimes caused panics and wrong results even for FreeBSD utilities,
but it was usually harmless because FreeBSD utilities use a large
enough buffer size (4K).  Linux utilities usually triggered the bug
since they use a too-small buffer size (512 bytes), at least with the
old RedHat utilities that I tested with.

PR:	19407 (this fix is incomplete or for a slightly different bug)
2000-09-12 17:10:39 +00:00
Kirk McKusick
9b97113391 This patch corrects the first round of panics and hangs reported
with the new snapshot code.

Update addaliasu to correctly implement the semantics of the old
checkalias function. When a device vnode first comes into existence,
check to see if an anonymous vnode for the same device was created
at boot time by bdevvp(). If so, adopt the bdevvp vnode rather than
creating a new vnode for the device. This corrects a problem which
caused the kernel to panic when taking a snapshot of the root
filesystem.

Change the calling convention of vn_write_suspend_wait() to be the
same as vn_start_write().

Split out softdep_flushworklist() from softdep_flushfiles() so that
it can be used to clear the work queue when suspending filesystem
operations.

Access to buffers becomes recursive so that snapshots can recursively
traverse their indirect blocks using ffs_copyonwrite() when checking
for the need for copy on write when flushing one of their own indirect
blocks. This eliminates a deadlock between the syncer daemon and a
process taking a snapshot.

Ensure that softdep_process_worklist() can never block because of a
snapshot being taken. This eliminates a problem with buffer starvation.

Cleanup change in ffs_sync() which did not synchronously wait when
MNT_WAIT was specified. The result was an unclean filesystem panic
when doing forcible unmount with heavy filesystem I/O in progress.

Return a zero'ed block when reading a block that was not in use at
the time that a snapshot was taken. Normally, these blocks should
never be read. However, the readahead code will occationally read
them which can cause unexpected behavior.

Clean up the debugging code that ensures that no blocks be written
on a filesystem while it is suspended. Snapshots must explicitly
label the blocks that they are writing during the suspension so that
they do not cause a `write on suspended filesystem' panic.

Reorganize ffs_copyonwrite() to eliminate a deadlock and also to
prevent a race condition that would permit the same block to be
copied twice. This change eliminates an unexpected soft updates
inconsistency in fsck caused by the double allocation.

Use bqrelse rather than brelse for buffers that will be needed
soon again by the snapshot code. This improves snapshot performance.
2000-07-24 05:28:33 +00:00
Kirk McKusick
f2a2857bb3 Add snapshots to the fast filesystem. Most of the changes support
the gating of system calls that cause modifications to the underlying
filesystem. The gating can be enabled by any filesystem that needs
to consistently suspend operations by adding the vop_stdgetwritemount
to their set of vnops. Once gating is enabled, the function
vfs_write_suspend stops all new write operations to a filesystem,
allows any filesystem modifying system calls already in progress
to complete, then sync's the filesystem to disk and returns. The
function vfs_write_resume allows the suspended write operations to
begin again. Gating is not added by default for all filesystems as
for SMP systems it adds two extra locks to such critical kernel
paths as the write system call. Thus, gating should only be added
as needed.

Details on the use and current status of snapshots in FFS can be
found in /sys/ufs/ffs/README.snapshot so for brevity and timelyness
is not included here. Unless and until you create a snapshot file,
these changes should have no effect on your system (famous last words).
2000-07-11 22:07:57 +00:00
Alexander Langer
0cca1cc078 Fix typo (accessable --> accessible).
PR:		18588
Submitted by:	Anatoly Vorobey <mellon@pobox.com>
Reviewed by:	asmodai
2000-06-14 17:53:40 +00:00
Jake Burkholder
e39756439c Back out the previous change to the queue(3) interface.
It was not discussed and should probably not happen.

Requested by:		msmith and others
2000-05-26 02:09:24 +00:00
Jake Burkholder
740a1973a6 Change the way that the queue(3) structures are declared; don't assume that
the type argument to *_HEAD and *_ENTRY is a struct.

Suggested by:	phk
Reviewed by:	phk
Approved by:	mdodd
2000-05-23 20:41:01 +00:00
Poul-Henning Kamp
9626b608de Separate the struct bio related stuff out of <sys/buf.h> into
<sys/bio.h>.

<sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall
not be made a nested include according to bdes teachings on the
subject of nested includes.

Diskdrivers and similar stuff below specfs::strategy() should no
longer need to include <sys/buf.> unless they need caching of data.

Still a few bogus uses of struct buf to track down.

Repocopy by:    peter
2000-05-05 09:59:14 +00:00
Poul-Henning Kamp
2c9b67a8df Remove unneeded #include <vm/vm_zone.h>
Generated by:	src/tools/tools/kerninclude
2000-04-30 18:52:11 +00:00
Poul-Henning Kamp
87150cb06d s/biowait/bufwait/g
Prodded by: several.
2000-04-29 16:25:22 +00:00
Poul-Henning Kamp
3389ae9350 Remove ~25 unneeded #include <sys/conf.h>
Remove ~60 unneeded #include <sys/malloc.h>
2000-04-19 14:58:28 +00:00
Robert Watson
7ef47eec4c ext2fs relies on UFS support code, and as a result also requires
extattr.h to be included.  This fixes the broken ext2fs build as of
the import of extattr code.

Also added $FreeBSD: $ to a couple of files that didn't have them,
without which I couldn't commit this fix.

Reported by:    "George W. Dinolt" <gdinolt@pacbell.net>
2000-04-15 17:14:22 +00:00
Robert Watson
a64ed08955 Introduce extended attribute support for FFS, allowing arbitrary
(name, value) pairs to be associated with inodes.  This support is
used for ACLs, MAC labels, and Capabilities in the TrustedBSD
security extensions, which are currently under development.

In this implementation, attributes are backed to data vnodes in the
style of the quota support in FFS.  Support for FFS extended
attributes may be enabled using the FFS_EXTATTR kernel option
(disabled by default).  Userland utilities and man pages will be
committed in the next batch.  VFS interfaces and man pages have
been in the repo since 4.0-RELEASE and are unchanged.

o ufs/ufs/extattr.h: UFS-specific extattr defines
o ufs/ufs/ufs_extattr.c: bulk of support routines
o ufs/{ufs,ffs,mfs}/*.[ch]: hooks and extattr.h includes
o contrib/softupdates/ffs_softdep.c: extattr.h includes
o conf/options, conf/files, i386/conf/LINT: added FFS_EXTATTR

o coda/coda_vfsops.c: XXX required extattr.h due to ufsmount.h
(This should not be the case, and will be fixed in a future commit)

Currently attributes are not supported in MFS.  This will be fixed.

Reviewed by:	adrian, bp, freebsd-fs, other unthanked souls
Obtained from:	TrustedBSD Project
2000-04-15 03:34:27 +00:00
Poul-Henning Kamp
c244d2de43 Move B_ERROR flag to b_ioflags and call it BIO_ERROR.
(Much of this done by script)

Move B_ORDERED flag to b_ioflags and call it BIO_ORDERED.

Move b_pblkno and b_iodone_chain to struct bio while we transition, they
will be obsoleted once bio structs chain/stack.

Add bio_queue field for struct bio aware disksort.

Address a lot of stylistic issues brought up by bde.
2000-04-02 15:24:56 +00:00
Matthew Dillon
e4649cfac3 Change the write-behind code to take more care when starting
async I/O's.  The sequential read heuristic has been extended to
    cover writes as well.  We continue to call cluster_write() normally,
    thus blocks in the file will still be reallocated for large (but still
    random) I/O's, but I/O will only be initiated for truely sequential
    writes.

    This solves a number of annoying situations, especially with DBM (hash
    method) writes, and also has the side effect of fixing a number of
    (stupid) benchmarks.

Reviewed-by: mckusick
2000-04-02 00:55:28 +00:00
Poul-Henning Kamp
b99c307a21 Rename the existing BUF_STRATEGY() to DEV_STRATEGY()
substitute BUF_WRITE(foo) for VOP_BWRITE(foo->b_vp, foo)

substitute BUF_STRATEGY(foo) for VOP_STRATEGY(foo->b_vp, foo)

This patch is machine generated except for the ccd.c and buf.h parts.
2000-03-20 11:29:10 +00:00
Poul-Henning Kamp
21144e3bf1 Remove B_READ, B_WRITE and B_FREEBUF and replace them with a new
field in struct buf: b_iocmd.  The b_iocmd is enforced to have
exactly one bit set.

B_WRITE was bogusly defined as zero giving rise to obvious coding
mistakes.

Also eliminate the redundant struct buf flag B_CALL, it can just
as efficiently be done by comparing b_iodone to NULL.

Should you get a panic or drop into the debugger, complaining about
"b_iocmd", don't continue.  It is likely to write on your disk
where it should have been reading.

This change is a step in the direction towards a stackable BIO capability.

A lot of this patch were machine generated (Thanks to style(9) compliance!)

Vinum users:  Greg has not had time to test this yet, be careful.
2000-03-20 10:44:49 +00:00
Kirk McKusick
15e549f668 Bug fixes for currently harmless bugs that could rise to bite
the unwary if the code were called in slightly different ways.

1) In ufs_bmaparray() the code for calculating 'runb' will stop one block
short of the first entry in an indirect block. i.e. if an indirect block
contains N block numbers b[0]..b[N-1] then the code will never check if
b[0] and b[1] are sequential. For reference, compare with the equivalent
code that deals with direct blocks.

2) In ufs_lookup() there is an off-by-one error in the test that checks
if dp->i_diroff is outside the range of the the current directory size.
This is completely harmless, since the following while-loop condition
'dp->i_offset < endsearch' is never met, so the code immediately
does a second pass starting at dp->i_offset = 0.

3) Again in ufs_lookup(), the condition in a sanity check is wrong
for directories that are longer than one block. This bug means that
the sanity check is only effective for small directories.

Submitted by:	Ian Dowse <iedowse@maths.tcd.ie>
2000-03-15 07:18:15 +00:00
Bruce Evans
d1a417f17e Don't forget to check for unsupported features when updating. It was
possible to defeat the check for rw incompatibilty by mounting ro and
updating to rw.

Approved by:	jkh
2000-03-09 05:21:10 +00:00
Bruce Evans
fc47f29c64 MFS (ext2_lookup.c 1.17.2.2, ext2_vnops.c 1.42.2.2: fix "filetype" support).
Approved by:	jkh
2000-03-03 08:00:27 +00:00
Poul-Henning Kamp
ba4ad1fcea Give vn_isdisk() a second argument where it can return a suitable errno.
Suggested by:	bde
2000-01-10 12:04:27 +00:00
Bruce Evans
b9b652d2f6 Support filesystems with the not-so-new "filetype" feature. This
feature gives the d_type field for struct dirent.  We used to panic
in ext2_readdir() for filesystems with this feature.
2000-01-05 19:31:26 +00:00
Bruce Evans
6291b96b03 Don't allow mounting (or mounting R/W) of filesystems with unsupported
features (except for file types in directory entries, which will be
supported soon).

Centralized the magic number and compatibility checking.

Dropped support for ancient (pre-0.2b) filesystems, as in the Linux
version.  Our "support" consisted of printing more details in the error
message before failing at mount time.
2000-01-02 17:40:02 +00:00
Bruce Evans
d68084cd89 Merged changes in ext2_fs.h between Linux 1.2.2 and Linux 2.3.35. The
main changes are:
- many things are more dynamic; e.g., the inode size is a new parameter
  in the superblock instead of a constant.
- extensions are controlled by new flags in the superblock.
- directory entries may have a file type field.
These changes are not used yet, except for a spelling change which affects
ext2_cnv.c
2000-01-01 17:39:21 +00:00
Bruce Evans
c9fbb5bc2c Merged cosmetic changes from the initial import on the vendor branch
(mainly things that were lost or misformatted in a different way by
moving them to ext2_fs_i.h and back, and ifdefs for user mode that
were excessively edited).
2000-01-01 16:26:43 +00:00
Bruce Evans
8e19715af3 Use an ifdef in ext2_fs.h instead of a bogus separate file (ext2_fs_i.h)
to avoid the namespace problems caused by <ufs/ufs/inode.h> #defining
i_mode, etc.

ext2_fs_i.h had nothing to do with the Linux version.  It was a small
part of the Linux version of ext2_fs.h (the part that declares extra
in-core fields for an inode).  We don't need it because we use the
ufs in-core inode for the extra fields.
2000-01-01 14:43:20 +00:00
Bruce Evans
5c8b462df8 Updated/corrected the list of GPL'ed files. 2000-01-01 11:27:50 +00:00
Peter Wemm
c447342094 Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL"
is an application space macro and the applications are supposed to be free
to use it as they please (but cannot).  This is consistant with the other
BSD's who made this change quite some time ago.  More commits to come.
1999-12-29 05:07:58 +00:00
Robert Watson
91f37dcba1 Second pass commit to introduce new ACL and Extended Attribute system
calls, vnops, vfsops, both in /kern, and to individual file systems that
require a vfsop_ array entry.

Reviewed by:	eivind
1999-12-19 06:08:07 +00:00
Eivind Eklund
762e6b856c Introduce NDFREE (and remove VOP_ABORTOP) 1999-12-15 23:02:35 +00:00
Poul-Henning Kamp
0429e37ade struct mountlist and struct mount.mnt_list have no business being
a CIRCLEQ.  Change them to TAILQ_HEAD and TAILQ_ENTRY respectively.

This removes ugly  mp != (void*)&mountlist  comparisons.

Requested by:   phk
Submitted by:   Jake Burkholder jake@checker.org
PR:             14967
1999-11-20 10:00:46 +00:00
David E. O'Brien
594017f90d Fix __asm__ clobber list abuse.
Submitted by:	bde
1999-11-15 23:16:06 +00:00
Eivind Eklund
dd8c04f4c7 Remove WILLRELE from VOP_SYMLINK
Note: Previous commit to these files (except coda_vnops and devfs_vnops)
that claimed to remove WILLRELE from VOP_RENAME actually removed it from
VOP_MKNOD.
1999-11-13 20:58:17 +00:00
Eivind Eklund
edfe736df9 Remove WILLRELE from VOP_RENAME 1999-11-12 03:34:28 +00:00
Poul-Henning Kamp
698f9cf828 Next step in the device cleanup process.
Correctly lock vnodes when calling VOP_OPEN() from filesystem mount code.

Unify spec_open() for bdev and cdev cases.

Remove the disabled bdev specific read/write code.
1999-11-09 14:15:33 +00:00
Bruce Evans
5bd5c8b9e5 Quick fix for breakage of ext2fs link counts as reported by stat(2) by
the soft updates changes: only report the link count to be i_effnlink
in ufs_getattr() for file systems that maintain i_effnlink.

Tested by:	Mike Dracopoulos <mdraco@math.uoa.gr>
1999-11-03 12:05:39 +00:00
Mike Smith
6d14782861 Newline-terminate the complaint message about not being able to find
the root vnode pointer.
1999-11-01 23:57:28 +00:00
Poul-Henning Kamp
b89392e703 Remove the D_NOCLUSTER[RW] options which were added because vn had
problems.  Now that Matt has fixed vn, this can go.  The vn driver
should have used d_maxio (now si_iosize_max) anyway.
1999-09-30 07:11:30 +00:00
Poul-Henning Kamp
1b5464ef9d Remove v_maxio from struct vnode.
Replace it with mnt_iosize_max in struct mount.

Nits from:	bde
1999-09-29 20:05:33 +00:00
Matthew Dillon
67ddfcaf69 More removals of vnode->v_lastr, replaced by preexisting seqcount
heuristic to detect sequential operation.

    VM-related forced clustering code removed from ufs in preparation for a
    commit to vm/vm_fault.c that does it more generally.

Reviewed by: David Greenman <dg@root.com>, Alan Cox <alc@cs.rice.edu>
1999-09-20 23:27:58 +00:00
Poul-Henning Kamp
faad302913 Fix a harmless bug I introduced, simplify a bit more while here. 1999-09-20 21:14:43 +00:00
Poul-Henning Kamp
fae03f66d1 Step one of replacing devsw->d_maxio with si_bsize_max.
Rename dev->si_bsize_max to si_iosize_max and set it in spec_open
if the device didn't.

Set vp->v_maxio from dev->si_bsize_max in spec_open rather than
in ufs_bmap.c
1999-09-20 19:57:28 +00:00
Alfred Perlstein
c24fda81c9 Seperate the export check in VFS_FHTOVP, exports are now checked via
VFS_CHECKEXP.

Add fh(open|stat|stafs) syscalls to allow userland to query filesystems
based on (network) filehandle.

Obtained from:	NetBSD
1999-09-11 00:46:08 +00:00
Peter Wemm
c3aac50f28 $Id$ -> $FreeBSD$ 1999-08-28 01:08:13 +00:00
Poul-Henning Kamp
41d2e3e09e Introduce vn_isdisk(struct vnode *vp) function, and use it to test for diskness. 1999-08-25 12:24:39 +00:00
Bruce Evans
feb54dc506 Oops, the previous commit was missing a new include. 1999-08-23 22:05:49 +00:00
Bruce Evans
939cb7521a Initialise fsids with (user) device numbers again. Bitrot when dev_t's
were changed to pointers was obscured by casting dev_t's to longs.
fsids haven't even been comprised of longs since the Lite2 merge.
1999-08-23 21:07:13 +00:00
Bruce Evans
d918320517 Use devtoname() to print dev_t's instead of casting them to long or u_long
for misprinting in %lx format.
1999-08-23 20:35:21 +00:00
Poul-Henning Kamp
7dc5cd047f The bdevsw() and cdevsw() are now identical, so kill the former. 1999-08-13 10:29:38 +00:00
Poul-Henning Kamp
0ef1c82630 Decommision miscfs/specfs/specdev.h. Most of it goes into <sys/conf.h>,
a few lines into <sys/vnode.h>.

Add a few fields to struct specinfo, paving the way for the fun part.
1999-08-08 18:43:05 +00:00
Bruce Evans
2ac6e74655 Don't set IN_ACCESS for requests to read 0 bytes or for unsuccessful reads.
Translated from: similar fixes in ufs_readwrite.c rev.1.61.  Things
are simpler (but annoyingly different) here because there are no
vm optimisations.
1999-07-25 02:56:17 +00:00
Kirk McKusick
4dc0c8f521 Create the macro DOINGASYNC to check whether the MNT_ASYNC flag has
been set for a mount point. Insert missing checks to ensure that all
write operations are done asynchronously when the MNT_ASYNC option
has been requested.

Submitted by:	Craig A Soules <soules+@andrew.cmu.edu>
Reviewed by:	Kirk McKusick <mckusick@mckusick.com>
1999-07-13 18:20:13 +00:00
Kirk McKusick
67812eacd7 Convert buffer locking from using the B_BUSY and B_WANTED flags to using
lockmgr locks. This commit should be functionally equivalent to the old
semantics. That is, all buffer locking is done with LK_EXCLUSIVE
requests. Changes to take advantage of LK_SHARED and LK_RECURSIVE will
be done in future commits.
1999-06-26 02:47:16 +00:00
Kirk McKusick
f9c8cab591 Add a vnode argument to VOP_BWRITE to get rid of the last vnode
operator special case. Delete special case code from vnode_if.sh,
vnode_if.src, umap_vnops.c, and null_vnops.c.
1999-06-16 23:27:55 +00:00
Poul-Henning Kamp
2447bec829 Simplify cdevsw registration.
The cdevsw_add() function now finds the major number(s) in the
struct cdevsw passed to it.  cdevsw_add_generic() is no longer
needed, cdevsw_add() does the same thing.

cdevsw_add() will print an message if the d_maj field looks bogus.

Remove nblkdev and nchrdev variables.  Most places they were used
bogusly.  Instead check a dev_t for validity by seeing if devsw()
or bdevsw() returns NULL.

Move bdevsw() and devsw() functions to kern/kern_conf.c

Bump __FreeBSD_version to 400006

This commit removes:
        72 bogus makedev() calls
        26 bogus SYSINIT functions

if_xe.c bogusly accessed cdevsw[], author/maintainer please fix.

I4b and vinum not changed.  Patches emailed to authors.  LINT
probably broken until they catch up.
1999-05-31 11:29:30 +00:00
Bruce Evans
22b6b1cd1e Fixed printing of a dev_t in a panic message. Fixed the function name
in this message.
1999-05-13 06:27:51 +00:00
Poul-Henning Kamp
4be2eb8c49 I got tired of seeing all the cdevsw[major(foo)] all over the place.
Made a new (inline) function devsw(dev_t dev) and substituted it.

Changed to the BDEV variant to this format as well: bdevsw(dev_t dev)

DEVFS will eventually benefit from this change too.
1999-05-08 06:40:31 +00:00
Poul-Henning Kamp
46eede0058 Continue where Julian left off in July 1998:
Virtualize bdevsw[] from cdevsw.  bdevsw() is now an (inline)
        function.

        Join CDEV_MODULE and BDEV_MODULE to DEV_MODULE (please pay attention
        to the order of the cmaj/bmaj arguments!)

        Join CDEV_DRIVER_MODULE and BDEV_DRIVER_MODULE to DEV_DRIVER_MODULE
        (ditto!)

(Next step will be to convert all bdev dev_t's to cdev dev_t's
before they get to do any damage^H^H^H^H^H^Hwork in the kernel.)
1999-05-07 10:11:40 +00:00
Alan Cox
4221e284a3 The VFS/BIO subsystem contained a number of hacks in order to optimize
piecemeal, middle-of-file writes for NFS.  These hacks have caused no
end of trouble, especially when combined with mmap().  I've removed
them.  Instead, NFS will issue a read-before-write to fully
instantiate the struct buf containing the write.  NFS does, however,
optimize piecemeal appends to files.  For most common file operations,
you will not notice the difference.  The sole remaining fragment in
the VFS/BIO system is b_dirtyoff/end, which NFS uses to avoid cache
coherency issues with read-merge-write style operations.  NFS also
optimizes the write-covers-entire-buffer case by avoiding the
read-before-write.  There is quite a bit of room for further
optimization in these areas.

The VM system marks pages fully-valid (AKA vm_page_t->valid =
VM_PAGE_BITS_ALL) in several places, most noteably in vm_fault.  This
is not correct operation.  The vm_pager_get_pages() code is now
responsible for marking VM pages all-valid.  A number of VM helper
routines have been added to aid in zeroing-out the invalid portions of
a VM page prior to the page being marked all-valid.  This operation is
necessary to properly support mmap().  The zeroing occurs most often
when dealing with file-EOF situations.  Several bugs have been fixed
in the NFS subsystem, including bits handling file and directory EOF
situations and buf->b_flags consistancy issues relating to clearing
B_ERROR & B_INVAL, and handling B_DONE.

getblk() and allocbuf() have been rewritten.  B_CACHE operation is now
formally defined in comments and more straightforward in
implementation.  B_CACHE for VMIO buffers is based on the validity of
the backing store.  B_CACHE for non-VMIO buffers is based simply on
whether the buffer is B_INVAL or not (B_CACHE set if B_INVAL clear,
and vise-versa).  biodone() is now responsible for setting B_CACHE
when a successful read completes.  B_CACHE is also set when a bdwrite()
is initiated and when a bwrite() is initiated.  VFS VOP_BWRITE
routines (there are only two - nfs_bwrite() and bwrite()) are now
expected to set B_CACHE.  This means that bowrite() and bawrite() also
set B_CACHE indirectly.

There are a number of places in the code which were previously using
buf->b_bufsize (which is DEV_BSIZE aligned) when they should have
been using buf->b_bcount.  These have been fixed.  getblk() now clears
B_DONE on return because the rest of the system is so bad about
dealing with B_DONE.

Major fixes to NFS/TCP have been made.  A server-side bug could cause
requests to be lost by the server due to nfs_realign() overwriting
other rpc's in the same TCP mbuf chain.  The server's kernel must be
recompiled to get the benefit of the fixes.

Submitted by:	Matthew Dillon <dillon@apollo.backplane.com>
1999-05-02 23:57:16 +00:00
Poul-Henning Kamp
75c1354190 This Implements the mumbled about "Jail" feature.
This is a seriously beefed up chroot kind of thing.  The process
is jailed along the same lines as a chroot does it, but with
additional tough restrictions imposed on what the superuser can do.

For all I know, it is safe to hand over the root bit inside a
prison to the customer living in that prison, this is what
it was developed for in fact:  "real virtual servers".

Each prison has an ip number associated with it, which all IP
communications will be coerced to use and each prison has its own
hostname.

Needless to say, you need more RAM this way, but the advantage is
that each customer can run their own particular version of apache
and not stomp on the toes of their neighbors.

It generally does what one would expect, but setting up a jail
still takes a little knowledge.

A few notes:

   I have no scripts for setting up a jail, don't ask me for them.

   The IP number should be an alias on one of the interfaces.

   mount a /proc in each jail, it will make ps more useable.

   /proc/<pid>/status tells the hostname of the prison for
   jailed processes.

   Quotas are only sensible if you have a mountpoint per prison.

   There are no privisions for stopping resource-hogging.

   Some "#ifdef INET" and similar may be missing (send patches!)

If somebody wants to take it from here and develop it into
more of a "virtual machine" they should be most welcome!

Tools, comments, patches & documentation most welcome.

Have fun...

Sponsored by:   http://www.rndassociates.com/
Run for almost a year by:       http://www.servetheweb.com/
1999-04-28 11:38:52 +00:00
Poul-Henning Kamp
f711d546d2 Suser() simplification:
1:
  s/suser/suser_xxx/

2:
  Add new function: suser(struct proc *), prototyped in <sys/proc.h>.

3:
  s/suser_xxx(\([a-zA-Z0-9_]*\)->p_ucred, \&\1->p_acflag)/suser(\1)/

The remaining suser_xxx() calls will be scrutinized and dealt with
later.

There may be some unneeded #include <sys/cred.h>, but they are left
as an exercise for Bruce.

More changes to the suser() API will come along with the "jail" code.
1999-04-27 11:18:52 +00:00
Bruce Evans
44f332052d Don't depend on <ufs/ufs/quota.h> or another (old) prerequisite including
<sys/queue.h>.  This fixes my recent breakage of biosboot by unpolluting
<ufs/ufs/quota.h> in the !KERNEL case.
1999-03-06 05:21:09 +00:00
Warner Losh
5369eb85ca Merge patch to ufs_vnops.c's ufs_rename to the copy of ufs_rename that
lives in ext2_vnops.c for ext2fs.  Also remove cast from comparision.
Bruce pointed out that it was bogus since we'd force a signed
comparision when we really wanted an unsigned comparison.
1999-03-02 05:31:47 +00:00
Bruce Evans
a5c9bce777 Added a used #include (don't depend on "vnode_if.h" including <sys/buf.h>). 1999-02-25 15:54:06 +00:00
Bruce Evans
ae4d334421 Fixed parenthesization botch in previous commit. Async update of inodes
was broken.
1999-01-29 15:36:05 +00:00
Matthew Dillon
8aef171243 Fix warnings in preparation for adding -Wall -Wcast-qual to the
kernel compile
1999-01-28 00:57:57 +00:00
Matthew Dillon
fe08c21a53 Fix warnings in preparation for adding -Wall -Wcast-qual to the
kernel compile.

    This commit includes significant work to proper handle const arguments
    for the DDB symbol routines.
1999-01-27 23:45:44 +00:00
Matthew Dillon
d254af07a1 Fix warnings in preparation for adding -Wall -Wcast-qual to the
kernel compile
1999-01-27 21:50:00 +00:00
Eivind Eklund
65c0c7b08e Avoid warning for unused variable. 1999-01-11 23:32:35 +00:00
Bruce Evans
de5d1ba57c Don't pass unused unused timestamp args to UFS_UPDATE() or waste
time initializing them.  This almost finishes centralizing (in-core)
timestamp updates in ufs_itimes().
1999-01-07 16:14:19 +00:00
Bruce Evans
4591d9bb7e UFS_UPDATE() takes a boolean `waitfor' arg, so don't pass it the value
MNT_WAIT when we mean boolean `true' or check for that value not being
passed.  There was no problem in practice because MNT_WAIT had the
magic value of 1.
1999-01-06 18:18:06 +00:00
Archie Cobbs
f1d19042b0 The "easy" fixes for compiling the kernel -Wunused: remove unreferenced static
and local variables, goto labels, and functions declared but not defined.
1998-12-07 21:58:50 +00:00
Bruce Evans
b54e74eb87 Fixed a misspelling of boolean true as MNT_WAIT. 1998-11-15 15:46:33 +00:00
Peter Wemm
40c8cfe552 Use TAILQ macros for clean/dirty block list processing. Set b_xflags
rather than abusing the list next pointer with a magic number.
1998-10-31 15:31:29 +00:00
Peter Wemm
91ecc00e71 error return assignment was less than ideal. Fix the part that caused
warnings to be the same as the ffs code.  Previously, any error from
the UFS_UPDATE() call was lost (I think).
1998-10-29 09:44:12 +00:00
Peter Wemm
f6020599aa Use vtruncbuf() to clean out cached blocks on a file shorten rather than
the more expensive vinvalbuf(), based on the FFS version of the same
routine.  I don't have any ext2fs filesystems to test this on.
1998-10-29 09:30:52 +00:00
Bruce Evans
b5ee16407f Oops, the redundant tests for major numbers weren't redundant here.
They checked for the magic major number for the "device" behind mfs
mount points.  Use a more obvious check for this device.

Debugged by:		Andrew Gallatin <gallatin@cs.duke.edu>
1998-10-27 11:47:08 +00:00
Bruce Evans
569555b969 Removed redundant bitrotted checks for major numbers instead of updating
them.
1998-10-26 08:53:13 +00:00
Bruce Evans
65baf8f06b Don't follow null bdevsw pointers. The `major(dev) < nblkdev' test rotted
when bdevsw[] became sparse.  We still depend on magic to avoid having to
check that (v_rdev) device numbers in vnodes are not NODEV.
1998-10-25 19:26:18 +00:00
Bruce Evans
d2165c2f7d Fixed bloatage of `struct inode'. We used 5 "spare" fields for ext2fs,
but when i_effnlink was added to support soft updates, there was only
room for 4 spares.  The number of spares was not reduced, so the inode
size became 260 (on i386's), or 512 after rounding up by malloc().
Use one spare field in `struct dinode' instead of the 5th spare field
in the inode and reduced to 4 spares in the inode so that the size is
256 again.

Changed the types of the spares in the inode from int to u_int32_t
so that the inode size has more chance of being <= 256 under other
arches, and downdated ext2fs to match (it was broken to use ints
before rev.1.1).
1998-10-13 15:45:43 +00:00
Bruce Evans
8f359bc68c Quick fix for not being able to sync all the buffers in boot() if
an ext2fs file system is mounted.  The soft update changes added
a check for B_DELWRI buffers.  This exposed the complete brokenness
of the previous quick fix for failing syncs (PR 3571, committed on
1997/08/04).  Use a new buffer flag B_DIRTY and don't abuse B_DELWRI.
B_DIRTY buffers are still written too late, as broken in the previous
fix.  This is fairly harmless, because B_DIRTY is only used for
bitmap buffers and fsck.ext2 can fix up the bitmaps perfectly.

Fixed a race in ULCK_BUF() (bremfree() was outside of the splbio()
section).
1998-10-03 16:19:28 +00:00
Bruce Evans
9702cd0422 Fixed initialization of new inodes. ext2fs doesn't clear inodes when
they are deleted, so inodes must be cleared when they are reused, but
we didn't clear the indirect blocks.  This caused serious filesystem
corruption.
1998-09-29 08:07:32 +00:00
Bruce Evans
6674be30f6 Updated ext2_reload() and ext2_sync(). Locking was broken, and MNT_LAZY
syncs weren't optimized properly (they probably still aren't, but are bug
for bug compatible with ffs).  These fixes are mostly academic, since
ext2fs is too broken to mount read-write (it apparently doesn't clear
indirect blocks).

Obtained from:	mostly from Lite2
1998-09-26 12:42:17 +00:00
Bruce Evans
7cff8977ca Fixed missing newlines in messages in ext2_check_descriptors().
Fixed vnode and memory leaks after an unlikely (?) error in
ext2_mountfs().

Fixed an unconditional memory leak in ext2_unmount().
1998-09-26 07:16:41 +00:00
Bruce Evans
a094db128f Fixed clean flag handling:
Fixes for bugs not shared with ffs:
- don't mount unclean filesystems rw unless forced to.
- accept EXT2_ERROR_FS (treat it like !EXT2_VALID_FS).  We still don't set
  this or honour the maximal mount count.
- don't attempt to print the name of the mount point when mounting an
  unclean file system, since the name of the previous mount point is
  unknown and the name of the current mount point is still "".

Fixes for bugs shared with ffs until recently:
- don't set the clean flag on unmount of an initially-unclean filesystem
  that was (forcibly) mounted rw.
- set the clean flag on rw -> ro update of a mounted initially-clean
  filesystem.
- fixed some style bugs (mostly long lines).

The fixes are slightly simpler than for ffs, because the relevant on-disk
state is not a simple boolean variable, and the superblock has a core-only
extension.

Obtained from:	parts from ffs_vfsops.c, parts from NetBSD
1998-09-26 06:18:59 +00:00
Bruce Evans
5d207357db Fixed the usual missing permissions checks in mount(). As for cd9660,
the damage was limited by the default of 0 for vfs.usermount.

Obtained from:	Lite2 via the -current ffs_vfsops.c
1998-09-09 20:21:18 +00:00
Bruce Evans
05d46b3cd6 Don't forget to initialize the inode lock. This bug caused
surprisingly few problems.  Most fields were initialized to the
correct values by bzero(), but lk_prio was 0 instead of PINOD (=8),
the lk_wmsg was NULL instead of "ext2in", and lk_lockholder was 0
instead of -1.

Obtained from:	Lite2 via the -current ffs_vfsops.c
1998-09-09 13:09:24 +00:00
Bruce Evans
d6c54caabe Support compiling with `gcc -pedantic' (don't use hard newlines in
(asm) string constants).
1998-09-09 12:22:17 +00:00
Bruce Evans
8994ca3ce9 Removed statically configured mount type numbers (MOUNT_*) and all
references to them.

The change a couple of days ago to ignore these numbers in statically
configured vfsconf structs was slightly premature because the cd9660,
cfs, devfs, ext2fs, nfs vfs's still used MOUNT_* instead of the number
in their vfsconf struct.
1998-09-07 13:17:06 +00:00
Bruce Evans
1874ef935c Quick fix for breakage of read clustering on non-IDE drives. Read
clustering is obsolescent technology so hardly anyone noticed.  On
a DORS 32160 SCSI drive with 4 tags, read clustering makes very
little difference even for huge sequential reads.  However, on a
ZIP SCSI drive with 0 tags, the minimum overhead per block is about
40 msec, so very large clusters must be used to get anywhere near
the maximum transfer rate.  Using clusters consisting of 1 8K block
reduces the transfer rate to about 250K/sec.  Under msdosfs, missing
read clustering is normal and a cluster size of 1 512 byte block
reduces the transfer rate to about 25K/sec.

Broken in:	rev.1.18
1998-08-18 03:54:39 +00:00
Mike Smith
f01beb610a "The releaseing of the reference and lock is not temporary and belongs
where it is.  The reference and lock(s) are acquired just above the
 code in VREF() and relookup()."

Submitted by:	Michael Hancock <michaelh@cet.co.jp>
1998-08-12 21:42:54 +00:00
Bruce Evans
85badd7eba Fixed printf format errors. 1998-07-30 17:12:39 +00:00
Julian Elischer
49cc016a39 add anti-panic workaround from chris radek (cradek@in221.inetnebr.com)
Not sure why this is needed but but does stop crashes.
1998-07-30 03:22:52 +00:00
Bruce Evans
ac1e407b32 Fixed printf format errors. 1998-07-11 07:46:16 +00:00
Julian Elischer
6deaf84b1f Catch a few corner cases where FreeBSD differs enough from BSD 4.4 to
confuse Soft updates..
Should solve several "dangling deps" panics.
1998-07-08 01:04:33 +00:00
Julian Elischer
fd5d1124e2 VOP_STRATEGY grows an (struct vnode *) argument
as the value in b_vp is often not really what you want.
(and needs to be frobbed). more cleanups will follow this.
Reviewed by: Bruce Evans <bde@freebsd.org>
1998-07-04 20:45:42 +00:00