Commit Graph

479 Commits

Author SHA1 Message Date
Alan Cox
4221e284a3 The VFS/BIO subsystem contained a number of hacks in order to optimize
piecemeal, middle-of-file writes for NFS.  These hacks have caused no
end of trouble, especially when combined with mmap().  I've removed
them.  Instead, NFS will issue a read-before-write to fully
instantiate the struct buf containing the write.  NFS does, however,
optimize piecemeal appends to files.  For most common file operations,
you will not notice the difference.  The sole remaining fragment in
the VFS/BIO system is b_dirtyoff/end, which NFS uses to avoid cache
coherency issues with read-merge-write style operations.  NFS also
optimizes the write-covers-entire-buffer case by avoiding the
read-before-write.  There is quite a bit of room for further
optimization in these areas.

The VM system marks pages fully-valid (AKA vm_page_t->valid =
VM_PAGE_BITS_ALL) in several places, most noteably in vm_fault.  This
is not correct operation.  The vm_pager_get_pages() code is now
responsible for marking VM pages all-valid.  A number of VM helper
routines have been added to aid in zeroing-out the invalid portions of
a VM page prior to the page being marked all-valid.  This operation is
necessary to properly support mmap().  The zeroing occurs most often
when dealing with file-EOF situations.  Several bugs have been fixed
in the NFS subsystem, including bits handling file and directory EOF
situations and buf->b_flags consistancy issues relating to clearing
B_ERROR & B_INVAL, and handling B_DONE.

getblk() and allocbuf() have been rewritten.  B_CACHE operation is now
formally defined in comments and more straightforward in
implementation.  B_CACHE for VMIO buffers is based on the validity of
the backing store.  B_CACHE for non-VMIO buffers is based simply on
whether the buffer is B_INVAL or not (B_CACHE set if B_INVAL clear,
and vise-versa).  biodone() is now responsible for setting B_CACHE
when a successful read completes.  B_CACHE is also set when a bdwrite()
is initiated and when a bwrite() is initiated.  VFS VOP_BWRITE
routines (there are only two - nfs_bwrite() and bwrite()) are now
expected to set B_CACHE.  This means that bowrite() and bawrite() also
set B_CACHE indirectly.

There are a number of places in the code which were previously using
buf->b_bufsize (which is DEV_BSIZE aligned) when they should have
been using buf->b_bcount.  These have been fixed.  getblk() now clears
B_DONE on return because the rest of the system is so bad about
dealing with B_DONE.

Major fixes to NFS/TCP have been made.  A server-side bug could cause
requests to be lost by the server due to nfs_realign() overwriting
other rpc's in the same TCP mbuf chain.  The server's kernel must be
recompiled to get the benefit of the fixes.

Submitted by:	Matthew Dillon <dillon@apollo.backplane.com>
1999-05-02 23:57:16 +00:00
Poul-Henning Kamp
75c1354190 This Implements the mumbled about "Jail" feature.
This is a seriously beefed up chroot kind of thing.  The process
is jailed along the same lines as a chroot does it, but with
additional tough restrictions imposed on what the superuser can do.

For all I know, it is safe to hand over the root bit inside a
prison to the customer living in that prison, this is what
it was developed for in fact:  "real virtual servers".

Each prison has an ip number associated with it, which all IP
communications will be coerced to use and each prison has its own
hostname.

Needless to say, you need more RAM this way, but the advantage is
that each customer can run their own particular version of apache
and not stomp on the toes of their neighbors.

It generally does what one would expect, but setting up a jail
still takes a little knowledge.

A few notes:

   I have no scripts for setting up a jail, don't ask me for them.

   The IP number should be an alias on one of the interfaces.

   mount a /proc in each jail, it will make ps more useable.

   /proc/<pid>/status tells the hostname of the prison for
   jailed processes.

   Quotas are only sensible if you have a mountpoint per prison.

   There are no privisions for stopping resource-hogging.

   Some "#ifdef INET" and similar may be missing (send patches!)

If somebody wants to take it from here and develop it into
more of a "virtual machine" they should be most welcome!

Tools, comments, patches & documentation most welcome.

Have fun...

Sponsored by:   http://www.rndassociates.com/
Run for almost a year by:       http://www.servetheweb.com/
1999-04-28 11:38:52 +00:00
Mike Smith
f4711b2df4 Simplify the tunefs example, since tunefs uses getfsfile(). Lots of
people complain about working out what device their filesystems are
mounted on.
1999-04-27 21:11:19 +00:00
Poul-Henning Kamp
f711d546d2 Suser() simplification:
1:
  s/suser/suser_xxx/

2:
  Add new function: suser(struct proc *), prototyped in <sys/proc.h>.

3:
  s/suser_xxx(\([a-zA-Z0-9_]*\)->p_ucred, \&\1->p_acflag)/suser(\1)/

The remaining suser_xxx() calls will be scrutinized and dealt with
later.

There may be some unneeded #include <sys/cred.h>, but they are left
as an exercise for Bruce.

More changes to the suser() API will come along with the "jail" code.
1999-04-27 11:18:52 +00:00
Dmitrij Tejblum
8d81b5d631 Change type of a variable from u_int to size_t, so that pointer to it may be
used as a last argument to copyinstr().
1999-04-21 09:41:07 +00:00
Eivind Eklund
ee45a71480 Correct typo in panic message 1999-04-11 02:28:32 +00:00
Peter Wemm
30c56d468c Hold the mfs process's upages in-core with PHOLD rather than P_NOSWAP. 1999-04-06 03:08:43 +00:00
Julian Elischer
8d17e69460 Catch a case spotted by Tor where files mmapped could leave garbage in the
unallocated parts of the last page when the file ended on a frag
but not a page boundary.
Delimitted by tags PRE_MATT_MMAP_EOF and POST_MATT_MMAP_EOF,
in files alpha/alpha/pmap.c i386/i386/pmap.c nfs/nfs_bio.c vm/pmap.h
    vm/vm_page.c vm/vm_page.h vm/vnode_pager.c miscfs/specfs/spec_vnops.c
    ufs/ufs/ufs_readwrite.c kern/vfs_bio.c

Submitted by: Matt Dillon <dillon@freebsd.org>
Reviewed by: Alan Cox <alc@freebsd.org>
1999-04-05 19:38:30 +00:00
Peter Wemm
fa8e1794b5 There's not much point in the EXPORTMFS #ifdef. I've had this sitting
in my tree for 12+ months, and I just noticed that NetBSD have (I think,
I've just seen the commit, not the change) just zapped it there.
It wasn't in the options files or LINT either.
1999-04-05 06:39:10 +00:00
Julian Elischer
51df594922 Stop the mfs from trying to swap out crucial bits of the mfs
as this can lead to deadlock.
Submitted by: Mat dillon <dillon@freebsd.org>
1999-03-12 00:44:03 +00:00
Bruce Evans
44f332052d Don't depend on <ufs/ufs/quota.h> or another (old) prerequisite including
<sys/queue.h>.  This fixes my recent breakage of biosboot by unpolluting
<ufs/ufs/quota.h> in the !KERNEL case.
1999-03-06 05:21:09 +00:00
Bruce Evans
fdc79256c1 Moved kernel declarations inside the KERNEL ifdef, and removed
include of <sys/queue.h> in the !KERNEL case.  The prerequisites
for <ufs/ufs/quota.h> were broken in Lite2 by converting some of
the kernel declarations to use queue macros without including
<sys/queue.h>.  <sys/queue.h> was included in applications in
/usr/src instead.  We polluted this file instead of merging the
changes in the applications.

Include <sys/queue.h> in the KERNEL case, and forward-declare all
structs that are used in prototypes, so that this file is almost
self-sufficient even in the kernel.

Obtained from:	mostly from NetBSD
1999-03-05 11:25:31 +00:00
Bruce Evans
abd022381d Changed the type of quotactl()'s 4th arg from char *' to void *'
so that non-sloppy applications can call it without using disgusting
casts to avoid warnings.  The 4th arg is sort of varargs -- it must
sometimes represent a filename, sometimes a struct pointer, and is
sometimes unused.  The arg type is still caddr_t in the kernel.

Obtained from:	mostly from NetBSD
1999-03-05 09:28:33 +00:00
Kirk McKusick
38e28fd66b Reorganize locking to avoid holding the lock during calls to bdwrite
and brelse (which may sleep in some systems).

Obtained from:	Matthew Dillon <dillon@apollo.backplane.com>
1999-03-02 06:38:07 +00:00
Warner Losh
5369eb85ca Merge patch to ufs_vnops.c's ufs_rename to the copy of ufs_rename that
lives in ext2_vnops.c for ext2fs.  Also remove cast from comparision.
Bruce pointed out that it was bogus since we'd force a signed
comparision when we really wanted an unsigned comparison.
1999-03-02 05:31:47 +00:00
Kirk McKusick
eef33ce9bd When fsync'ing a file on a filesystem using soft updates, we first try
to write all the dirty blocks. If some of those blocks have dependencies,
they will be remarked dirty when the I/O completes. On systems with
really fast I/O systems, it is possible to get in an infinite loop trying
to flush the buffers, because the I/O finishes before we can get all the
dirty buffers off the v_dirtyblkhd list and into the I/O queue. (The
previous algorithm looped over the v_dirtyblkhd list writing out buffers
until the list emptied.) So, now we mark each buffer that we try to
write so that we can distinguish the ones that are being remarked dirty
from those that we have not yet tried to flush. Once we have tried to
push every buffer once, we then push any associated metadata that is
causing the remaining buffers to be redirtied.

Submitted by:	Matthew Dillon <dillon@apollo.backplane.com>
1999-03-02 04:04:31 +00:00
Kirk McKusick
4cbb89d95d Ensure that softdep_sync_metadata can handle bmsafemap and mkdir entries
if they ever arise (which should not happen as softdep_sync_metadata is
currently used).
1999-03-02 00:19:47 +00:00
Warner Losh
00db131a60 Fix last commit based on feedback from Guido, Bruce and Terry.
Specifically, the test was in the wrong place, lacked a cast, didn't
unlock the node, and exited to bad rather than abortit.  Now we don't
allow renaming of a file with LINK_MAX references.  Move the test to
earlier in the code as it is closer to where ip is obtained, as that
is the style of the rest of the function.

Didn't fix the problems bruce pointed out in the rename man page to
include EMLINK, nor address his complaints about how the whole idea of
incrementing the link count during a rename is potentially asking for
trouble.

Also didn't try to correct potential problem Terry pointed out with
decrements not being similarly protected against underflow.
1999-02-26 05:34:16 +00:00
Warner Losh
b82396269c Add missing check for LINK_MAX in ufs_rename. Since ip->i_effnlink and
ip->nlink were different types, there was a masked overflow.

Reported by: Mark Slemco <marcs@znep.com>
1999-02-25 09:52:46 +00:00
Matthew Dillon
06f7b4ebd1 Update ufs_vnops code to use new specinfo fields rather then guess.
This is part of general specinfo / d_parms() commit.
1999-02-25 05:35:53 +00:00
Kirk McKusick
133ff2619a fix double LIST_REMOVE; other cosmetic changes to match version 9.32.
Obtained from: Jeffrey Hsu <hsu@FreeBSD.ORG>
1999-02-17 20:01:20 +00:00
Matthew Dillon
89a01116cf Remove XXX comment in regarsd to why NFS doesn't use VOP_ABORT(). NFS
is being fixed now.
1999-02-13 08:38:28 +00:00
Matthew Dillon
8aef171243 Fix warnings in preparation for adding -Wall -Wcast-qual to the
kernel compile
1999-01-28 00:57:57 +00:00
Matthew Dillon
5e24f1a2f6 Remove unintended trigraph sequences in comments for -Wall 1999-01-27 18:19:53 +00:00
David Greenman
8ab2fa0073 Gutted softdep_deallocate_dependencies and replaced it with a panic. It
turns out to not be useful to unwind the dependencies and continue in
the face of a fatal error.
Also changed the log() to a printf() in softdep_error() so that it will
be output in the case of a impending panic.
Submitted by:	Kirk McKusick <mckusick@mckusick.com>
1999-01-22 09:07:32 +00:00
Matthew Dillon
50c7d0d513 Added support for VOP_FREEBLKS(), reducing MFS's impact on swap and
increasing performance by deallocating at least some of the backing
    store when files are removed.

    Protect mfsp->buf_queue access at splbio().
1999-01-21 09:27:03 +00:00
Matthew Dillon
3fe2487dfc Access to mfsp->buf_queue must be protected at splbio(). Other minor
adjustments also made, such as passing mfsp to mfs_doio() directly.
1999-01-21 09:24:46 +00:00
Matthew Dillon
1c7c3c6a86 This is a rather large commit that encompasses the new swapper,
changes to the VM system to support the new swapper, VM bug
    fixes, several VM optimizations, and some additional revamping of the
    VM code.  The specific bug fixes will be documented with additional
    forced commits.  This commit is somewhat rough in regards to code
    cleanup issues.

Reviewed by:	"John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>
1999-01-21 08:29:12 +00:00
Eivind Eklund
5b1b6c5859 Silence warning about unused debug function. (I'll turn this function
into a DDB command in my next staticization sweep).
1999-01-12 11:42:41 +00:00
Eivind Eklund
a862221fa0 Add a warning about the copyright restraints. 1999-01-08 16:03:12 +00:00
Bruce Evans
de5d1ba57c Don't pass unused unused timestamp args to UFS_UPDATE() or waste
time initializing them.  This almost finishes centralizing (in-core)
timestamp updates in ufs_itimes().
1999-01-07 16:14:19 +00:00
Bruce Evans
4591d9bb7e UFS_UPDATE() takes a boolean `waitfor' arg, so don't pass it the value
MNT_WAIT when we mean boolean `true' or check for that value not being
passed.  There was no problem in practice because MNT_WAIT had the
magic value of 1.
1999-01-06 18:18:06 +00:00
Bruce Evans
d64dbc8719 Ifdefed the conditionally used variable `prtrealloc'. Declare it
as volatile so that there is no chance that the code that it controls
is optimised away.
1999-01-06 17:04:33 +00:00
Bruce Evans
5991fd0370 Backed out rev.1.47. It just broke my optimisations for lazy syncing
of timestamps in rev.1.45.  The soft updates bug was elsewhere.

Forgotten by:	luoqi
1999-01-06 16:52:38 +00:00
Eivind Eklund
fb1167777a Remove the 'waslocked' parameter to vfs_object_create(). 1999-01-05 18:50:03 +00:00
Bruce Evans
289bdf33d3 Ifdefed conditionally used simplock variables. 1999-01-02 11:34:57 +00:00
Eivind Eklund
a777e82019 Remove the last clients of vfs_object_create(..., waslocked=1);
waslocked will go away shortly.

Reviewed by:	dg
1999-01-02 01:32:36 +00:00
Matthew Dillon
2ae122f64a The mount_mfs process that stays in a supervisor context handling MFS
I/O requests must be marked P_SYSTEM because if it isn't and the system
    decides to swap it or (god forbid) kill it, the system stands a good
    chance of locking up.
1999-01-01 04:14:11 +00:00
Bruce Evans
d26105a9ce Fixed null pointer panics which I introduced in rev.1.86. Vnodes
may be revoked, so vnop routines must be careful about accessing
the vnode if they may have blocked.

Fixed marking for update after successfully reading or writing 0
bytes.  In this case, POSIX.1 specifies marking if and only if the
requested count is nonzero, but rev.1.86 never marked.
1998-12-24 09:45:10 +00:00
Bruce Evans
450fefa9f3 Remove unused file. It seems to have been a vestige of when mfs did its
own memory allocation.
1998-12-20 17:05:54 +00:00
Doug Rabson
6839466b30 In ufs_setattr(), if only one of va_atime or va_mtime are != VNOVAL, then
the code set the other field in the inode to VNOVAL.  This can happen
sometimes on an NFS server.
1998-12-20 12:36:01 +00:00
Julian Elischer
feb17f5acb Add comments to code that I was trying to understand.
Hopefully will save others time.

Someone who understands this better might check for correctness.
1998-12-15 03:29:52 +00:00
Matthew Dillon
f7bb75c92a Fix -Wuninitialized warning regarding zero-length var-args ctl element.
( this isn't really an error, but I think it is important to fix the
    warning ).
1998-12-14 05:37:37 +00:00
Julian Elischer
1f35e8c8da Remove some compiler warnings. 1998-12-10 20:11:47 +00:00
Eivind Eklund
bf51e54f46 Make compare correct with unsigned types. (Problem introduced by Lite/2). 1998-12-09 02:06:27 +00:00
Archie Cobbs
f1d19042b0 The "easy" fixes for compiling the kernel -Wunused: remove unreferenced static
and local variables, goto labels, and functions declared but not defined.
1998-12-07 21:58:50 +00:00
Bruce Evans
672be20b9f Don't use the strange null pointer constant `(ufs_daddr_t)0' in a call
to VOP_BMAP().  Don't use uncast NULLs in the same call.
1998-11-29 03:12:06 +00:00
David Greenman
1c680b45a2 Restored the "reallocblks" code to its former glory. What this does is
basically do a on-the-fly defragmentation of the FFS filesystem, changing
file block allocations to make them contiguous. Thanks to Kirk McKusick
for providing hints on what needed to be done to get this working.
1998-11-13 01:01:44 +00:00
Peter Wemm
1c5bb3eaa1 add #include <sys/kernel.h> where it's needed by MALLOC_DEFINE() 1998-11-10 09:16:29 +00:00
Peter Wemm
2ec07c6614 Change dirty block list handling to use TAILQ macros. 1998-10-31 15:33:32 +00:00