Commit Graph

223 Commits

Author SHA1 Message Date
Poul-Henning Kamp
75c1354190 This Implements the mumbled about "Jail" feature.
This is a seriously beefed up chroot kind of thing.  The process
is jailed along the same lines as a chroot does it, but with
additional tough restrictions imposed on what the superuser can do.

For all I know, it is safe to hand over the root bit inside a
prison to the customer living in that prison, this is what
it was developed for in fact:  "real virtual servers".

Each prison has an ip number associated with it, which all IP
communications will be coerced to use and each prison has its own
hostname.

Needless to say, you need more RAM this way, but the advantage is
that each customer can run their own particular version of apache
and not stomp on the toes of their neighbors.

It generally does what one would expect, but setting up a jail
still takes a little knowledge.

A few notes:

   I have no scripts for setting up a jail, don't ask me for them.

   The IP number should be an alias on one of the interfaces.

   mount a /proc in each jail, it will make ps more useable.

   /proc/<pid>/status tells the hostname of the prison for
   jailed processes.

   Quotas are only sensible if you have a mountpoint per prison.

   There are no privisions for stopping resource-hogging.

   Some "#ifdef INET" and similar may be missing (send patches!)

If somebody wants to take it from here and develop it into
more of a "virtual machine" they should be most welcome!

Tools, comments, patches & documentation most welcome.

Have fun...

Sponsored by:   http://www.rndassociates.com/
Run for almost a year by:       http://www.servetheweb.com/
1999-04-28 11:38:52 +00:00
Poul-Henning Kamp
f711d546d2 Suser() simplification:
1:
  s/suser/suser_xxx/

2:
  Add new function: suser(struct proc *), prototyped in <sys/proc.h>.

3:
  s/suser_xxx(\([a-zA-Z0-9_]*\)->p_ucred, \&\1->p_acflag)/suser(\1)/

The remaining suser_xxx() calls will be scrutinized and dealt with
later.

There may be some unneeded #include <sys/cred.h>, but they are left
as an exercise for Bruce.

More changes to the suser() API will come along with the "jail" code.
1999-04-27 11:18:52 +00:00
Poul-Henning Kamp
cc7532aaf0 Add a sysctl variable which can help stop chroot(2) escapes.
kern.chroot_allow_open_directories = 0
	chroot(2) fails if there are open directories.

kern.chroot_allow_open_directories = 1 (default)
	chroot(2) fails if there are open directories and the process
	is subject of a previous chroot(2).

kern.chroot_allow_open_directories = anything else
	filedescriptors are not checked.  (old behaviour).

I'm very interested in reports about software which breaks when
running with the default setting.
1999-03-23 14:26:40 +00:00
Julian Elischer
cb11191c01 Slight cleanup of code resurected for union mounts..
Submitted by: Tony Finch <dot@dotat.at>
1999-03-03 02:35:51 +00:00
Julian Elischer
1871f6cdd2 Fix code for union mounts
Accidentally deleted by peter when he extracted the unionfs stuff in 1.109

Submitted by: Tony Finch <dot@dotat.at>
1999-02-27 07:06:05 +00:00
Bruce Evans
a5c9bce777 Added a used #include (don't depend on "vnode_if.h" including <sys/buf.h>). 1999-02-25 15:54:06 +00:00
Doug Rabson
ce02431ffa * Change sysctl from using linker_set to construct its tree using SLISTs.
This makes it possible to change the sysctl tree at runtime.

* Change KLD to find and register any sysctl nodes contained in the loaded
  file and to unregister them when the file is unloaded.

Reviewed by: Archie Cobbs <archie@whistle.com>,
	Peter Wemm <peter@netplex.com.au> (well they looked at it anyway)
1999-02-16 10:49:55 +00:00
Poul-Henning Kamp
4e48a6bfe0 Use suser() to determine super-user-ness.
Collapse some duplicated checks.

Reviewed by:	bde
1999-01-30 12:27:00 +00:00
Matthew Dillon
697457a133 Fix warnings related to -Wall -Wcast-qual 1999-01-28 17:32:05 +00:00
Matthew Dillon
d254af07a1 Fix warnings in preparation for adding -Wall -Wcast-qual to the
kernel compile
1999-01-27 21:50:00 +00:00
Bruce Evans
73a6265d68 Go back to only supporting revoke() for bdevs and cdevs. It is very
buggy for fifos, and no one seems to have investigated its behaviour
on other types of files.  It has been broken since the Lite2 merge
in rev.1.54.

Nagged about by:	Brian Feldman (green@unixhelp.org)
1999-01-24 06:28:37 +00:00
Eivind Eklund
fb1167777a Remove the 'waslocked' parameter to vfs_object_create(). 1999-01-05 18:50:03 +00:00
Matthew Dillon
4c01697599 PR: kern/8965
Obtained from: Stephen Clawson <sclawson@cs.utah.edu>

    Wakeup anyone waiting on a mount point prior to returning from umount,
    whether an error occurs or not.  Fixes a stat/NFS-umount race and other
    potential future problems.  Fix taken from bug/pr which also indicated
    that the same fix has already been applied to OpenBSD and NetBSD.
1998-12-12 21:07:09 +00:00
Peter Wemm
02fc72dbe5 make mount(2) automatically kldload modules if the requested filesystem
isn't present.
1998-11-03 14:29:09 +00:00
Peter Wemm
8c14bf40a1 Change the #ifdef UNION code into a callable hook. Arrange to have this
set up when unionfs is present, either statically or as a kld module.
1998-11-03 08:01:48 +00:00
Peter Wemm
b421db370b The last argument to vm_object_page_clean() are now bit flags, rather than
the old true/false.

While here, have vfs_msync() only call vm_object_page_clean() with
OBJPC_SYNC if called with MNT_WAIT flags.  vfs_msync() is called at unmount
time (with MNT_WAIT) and from the syncer process (formerly update).
This should make dirty mmap writebacks a little less nasty.

I have tested this a little with SOFTUPDATES enabled, but I don't normally
use it since I've been badly burned too many times.
1998-10-31 07:42:04 +00:00
Luoqi Chen
e266594c25 Eliminate a race in VOP_FSYNC() when softupdates is enabled.
Submitted by:	Kirk McKusick	<mckusick@McKusick.COM>
Two minor changes are also included,
1. Remove gratuitious checks for error return from vn_lock with LK_RETRY set,
   vn_lock should always succeed in these cases.
2. Back out change rev. 1.36->1.37, which unnecessarily makes async mount
   a little more unstable. It also keeps us in sync with other BSDs.
Suggested by:	Bruce Evans	<bde@zeta.org.au>
1998-09-24 15:02:46 +00:00
Tor Egge
a58915fc10 Don't keep the underlying directory locked while performing the file
system specific VFS_MOUNT operation.
PR:		1067
1998-09-10 02:27:52 +00:00
Bruce Evans
a23d65bfc8 Cast pointers to uintptr_t/intptr_t instead of to u_long/long,
respectively.  Most of the longs should probably have been
u_longs, but this changes is just to prevent warnings about
casts between pointers and integers of different sizes, not
to fix poorly chosen types.
1998-07-15 02:32:35 +00:00
David Greenman
e25169f239 Reset MNT_ASYNC flag if needed if unmount() should fail.
Submitted by:	Paul Saab <paul@mu.org>
1998-07-03 03:47:24 +00:00
John Dyson
0d3dd8fbc5 Remove some junk left over from a previous commit.
Submitted by:	phk
1998-06-08 18:18:28 +00:00
Doug Rabson
ecbb00a262 This commit fixes various 64bit portability problems required for
FreeBSD/alpha.  The most significant item is to change the command
argument to ioctl functions from int to u_long.  This change brings us
inline with various other BSD versions.  Driver writers may like to
use (__FreeBSD_version == 300003) to detect this change.

The prototype FreeBSD/alpha machdep will follow in a couple of days
time.
1998-06-07 17:13:14 +00:00
John Dyson
1f56217280 Fix the futimes/undelete/utrace conflict with other BSD's. Note that
the only common  usage of utrace (the possible problem with this
commit) is with malloc, so this should be a real problem.  Add
the various NetBSD syscalls that allow full emulation of their
development environment.
1998-05-11 03:55:28 +00:00
Mike Smith
7be2d30077 In the words of the submitter:
---------
Make callers of namei() responsible for releasing references or locks
instead of having the underlying filesystems do it.  This eliminates
redundancy in all terminal filesystems and makes it possible for stacked
transport layers such as umapfs or nullfs to operate correctly.

Quality testing was done with testvn, and lat_fs from the lmbench suite.

Some NFS client testing courtesy of Patrik Kudo.

vop_mknod and vop_symlink still release the returned vpp.  vop_rename
still releases 4 vnode arguments before it returns.  These remaining cases
will be corrected in the next set of patches.
---------

Submitted by:	Michael Hancock <michaelh@cet.co.jp>
1998-05-07 04:58:58 +00:00
Dag-Erling Smørgrav
59bad7c53b Backed out lseek changes. 1998-04-19 22:20:32 +00:00
Dag-Erling Smørgrav
25096724e8 Return EINVAL and do not change file pointer if resulting offset is negative.
PR:		kern/6184
1998-04-18 19:24:44 +00:00
Wolfram Schneider
5ddc8ded1d New mount option nosymfollow. If enabled, the kernel lookup()
function will not follow symbolic links on the mounted
file system and return EACCES (Permission denied).
1998-04-08 18:31:59 +00:00
John Dyson
006b9b7df9 Correct a significant problem with the softupdates port. Allow fsync
to work properly within the softupdates framework, and thereby eliminate
some unfortunate panics.
1998-03-29 18:23:44 +00:00
Julian Elischer
b1897c197c Reviewed by: dyson@freebsd.org (john Dyson), dg@root.com (david greenman)
Submitted by:	Kirk McKusick (mcKusick@mckusick.com)
Obtained from:  WHistle development tree
1998-03-08 09:59:44 +00:00
John Dyson
8f9110f6a1 This mega-commit is meant to fix numerous interrelated problems. There
has been some bitrot and incorrect assumptions in the vfs_bio code.  These
problems have manifest themselves worse on NFS type filesystems, but can
still affect local filesystems under certain circumstances.  Most of
the problems have involved mmap consistancy, and as a side-effect broke
the vfs.ioopt code.  This code might have been committed seperately, but
almost everything is interrelated.

1)	Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that
	are fully valid.
2)	Rather than deactivating erroneously read initial (header) pages in
	kern_exec, we now free them.
3)	Fix the rundown of non-VMIO buffers that are in an inconsistent
	(missing vp) state.
4)	Fix the disassociation of pages from buffers in brelse.  The previous
	code had rotted and was faulty in a couple of important circumstances.
5)	Remove a gratuitious buffer wakeup in vfs_vmio_release.
6)	Remove a crufty and currently unused cluster mechanism for VBLK
	files in vfs_bio_awrite.  When the code is functional, I'll add back
	a cleaner version.
7)	The page busy count wakeups assocated with the buffer cache usage were
	incorrectly cleaned up in a previous commit by me.  Revert to the
	original, correct version, but with a cleaner implementation.
8)	The cluster read code now tries to keep data associated with buffers
	more aggressively (without breaking the heuristics) when it is presumed
	that the read data (buffers) will be soon needed.
9)	Change to filesystem lockmgr locks so that they use LK_NOPAUSE.  The
	delay loop waiting is not useful for filesystem locks, due to the
	length of the time intervals.
10)	Correct and clean-up spec_getpages.
11)	Implement a fully functional nfs_getpages, nfs_putpages.
12)	Fix nfs_write so that modifications are coherent with the NFS data on
	the server disk (at least as well as NFS seems to allow.)
13)	Properly support MS_INVALIDATE on NFS.
14)	Properly pass down MS_INVALIDATE to lower levels of the VM code from
	vm_map_clean.
15)	Better support the notion of pages being busy but valid, so that
	fewer in-transit waits occur.  (use p->busy more for pageouts instead
	of PG_BUSY.)  Since the page is fully valid, it is still usable for
	reads.
16)	It is possible (in error) for cached pages to be busy.  Make the
	page allocation code handle that case correctly.  (It should probably
	be a printf or panic, but I want the system to handle coding errors
	robustly.  I'll probably add a printf.)
17)	Correct the design and usage of vm_page_sleep.  It didn't handle
	consistancy problems very well, so make the design a little less
	lofty.  After vm_page_sleep, if it ever blocked, it is still important
	to relookup the page (if the object generation count changed), and
	verify it's status (always.)
18)	In vm_pageout.c, vm_pageout_clean had rotted, so clean that up.
19)	Push the page busy for writes and VM_PROT_READ into vm_pageout_flush.
20)	Fix vm_pager_put_pages and it's descendents to support an int flag
	instead of a boolean, so that we can pass down the invalidate bit.
1998-03-07 21:37:31 +00:00
John Dyson
9f24f214c3 Make the rootdir handling more consistent. Now, processes always
have a root vnode associated with them, and no special checks for
the null case are needed.
Submitted by:	terry@freebsd.org
1998-02-15 04:17:09 +00:00
John Dyson
3217023e7c Fix a problem with vn_lock in fsync. 1998-02-08 01:41:33 +00:00
Eivind Eklund
0b08f5f737 Back out DIAGNOSTIC changes. 1998-02-06 12:14:30 +00:00
Eivind Eklund
47cfdb166d Turn DIAGNOSTIC into a new-style option. 1998-02-04 22:34:03 +00:00
John Dyson
95e5e988e0 Make our v_usecount vnode reference count work identically to the
original BSD code.  The association between the vnode and the vm_object
no longer includes reference counts.  The major difference is that
vm_object's are no longer freed gratuitiously from the vnode, and so
once an object is created for the vnode, it will last as long as the
vnode does.

When a vnode object reference count is incremented, then the underlying
vnode reference count is incremented also.  The two "objects" are now
more intimately related, and so the interactions are now much less
complex.

When vnodes are now normally placed onto the free queue with an object still
attached.  The rundown of the object happens at vnode rundown time, and
happens with exactly the same filesystem semantics of the original VFS
code.  There is absolutely no need for vnode_pager_uncache and other
travesties like that anymore.

A side-effect of these changes is that SMP locking should be much simpler,
the I/O copyin/copyout optimizations work, NFS should be more ponderable,
and further work on layered filesystems should be less frustrating, because
of the totally coherent management of the vnode objects and vnodes.

Please be careful with your system while running this code, but I would
greatly appreciate feedback as soon a reasonably possible.
1998-01-06 05:26:17 +00:00
John Dyson
2be70f79f6 Lots of improvements, including restructring the caching and management
of vnodes and objects.  There are some metadata performance improvements
that come along with this.  There are also a few prototypes added when
the need is noticed.  Changes include:

1) Cleaning up vref, vget.
2) Removal of the object cache.
3) Nuke vnode_pager_uncache and friends, because they aren't needed anymore.
4) Correct some missing LK_RETRY's in vn_lock.
5) Correct the page range in the code for msync.

Be gentle, and please give me feedback asap.
1997-12-29 00:25:11 +00:00
Bruce Evans
675ea6f083 Unspammed nested include of <vm/vm_zone.h>. 1997-12-27 02:56:39 +00:00
Eivind Eklund
5591b823d1 Make COMPAT_43 and COMPAT_SUNOS new-style options. 1997-12-16 17:40:42 +00:00
Bruce Evans
52aef196f7 Cleaned up __getcwd(). This should be cosmetic except disabled calls
are now counted.

Reviewed by:	phk
1997-12-02 10:32:21 +00:00
Bruce Evans
865737f450 Staticized.
Use OID_AUTO instead of a magic number for the debug.syncprt sysctl.
(This sysctl doesn't actually work.  FreeBSD nuked it, but parts
of it were mismerged from Lite2.  It is not very good, but better
than nothing.)
1997-11-22 06:41:21 +00:00
Bruce Evans
d02601f8cf Fixed rev.1.81. mp->mnt_kern_flag was restored in the non-error case of
`mount -u'.  This only matters for `mount -u' competing with unmounts.
If I understand the locking correctly: if mount() blocks, then unmount()
may run and set mp->kern_flag for the same mp.  Then unmount() blocks
waiting for mount() to finish.  When unmount() continues, its MNTK flags
(MNTK_UNMOUNT and MNTK_MWAIT) may have been clobbered.

Didn't fix old bugs:
- restoring mp->mnt_kern_flag is wrong for the same reasons in the error
  case.
- the error case of unmount() seems to be broken too:
  (a) MNTK_UNMOUNT gets clobbered, although another unmount() may have
      set it.  Perhaps it shouldn't be set until after the full lock is
      aquired.
  (b) MNTK_MWAIT isn't honoured.

Fixed a nearby style bug.
1997-11-22 06:10:36 +00:00
Julian Elischer
52bf64c787 Reviewed by: hackers@freebsd.org in general
Obtained from: Whistle Communications tree

Add an option to the way UFS works dependent on the SUID bit of directories
This changes makes things a whole lot simpler on systems running as
fileservers for PCs and MACS. to enable the new code you must
1/ enable option SUIDDIR on the kernel.
2/ mount the filesystem with option suiddir.
hopefully this makes it difficult enough for people to
do this accidentally.
see the new chmod(2) man page for detailed info.
1997-11-13 00:28:51 +00:00
Julian Elischer
b1f4a44b03 Reviewed by: various.
Ever since I first say the way the mount flags were used I've hated the
fact that modes, and events, internal and exported, and short-term
and long term flags are all thrown together. Finally it's annoyed me enough..
This patch to the entire FreeBSD tree adds a second mount flag word
to the mount struct. it is not exported to userspace. I have moved
some of the non exported flags over to this word. this means that we now
have 8 free bits in the mount flags. There are another two that might
well move over, but which I'm not sure about.
The only user visible change would have been in pstat -v, except
that davidg has disabled it anyhow.
I'd still like to move the state flags and the 'command' flags
apart from each other.. e.g. MNT_FORCE really doesn't have the
same semantics as MNT_RDONLY, but that's left  for another day.
1997-11-12 05:42:33 +00:00
Poul-Henning Kamp
cb226aaa62 Move the "retval" (3rd) parameter from all syscall functions and put
it in struct proc instead.

This fixes a boatload of compiler warning, and removes a lot of cruft
from the sources.

I have not removed the /*ARGSUSED*/, they will require some looking at.

libkvm, ps and other userland struct proc frobbing programs will need
recompiled.
1997-11-06 19:29:57 +00:00
Bruce Evans
1315bcf7d4 Fixed style bugs in open() fix. 1997-10-28 10:29:55 +00:00
KATO Takenori
1c1ff2947b Disallow non-root mount. If you want to allow non-root mount, change
vfs.usermount into 1 with sysctl.
1997-10-23 09:29:09 +00:00
Joerg Wunsch
2094bd7342 Reject attempts to call open() with an illegal combination of O_RDONLY,
O_WRONLY, O_RDWR.
1997-10-22 07:28:51 +00:00
Poul-Henning Kamp
a1c995b626 Last major round (Unless Bruce thinks of somthing :-) of malloc changes.
Distribute all but the most fundamental malloc types.  This time I also
remembered the trick to making things static:  Put "static" in front of
them.

A couple of finer points by:	bde
1997-10-12 20:26:33 +00:00
Poul-Henning Kamp
ad324c8891 Fix handling of nested mountpoints in __getcwd()
Detected by:	Simon Shapiro <Shimon@i-Connect.Net>
1997-09-28 06:37:02 +00:00
KATO Takenori
81bca6ddae Clustered read and write are switched at mount-option level.
1. Clustered I/O is switched by the MNT_NOCLUSTERR and MNT_NOCLUSTERW
   bits of the mnt_flag.  The sysctl variables, vfs.foo.doclusterread
   and vfs.foo.doclusterwrite are deleted.  Only mount option can
   control clustered I/O from userland.
2. When foofs_mount mounts block device, foofs_mount checks D_CLUSTERR
   and D_CLUSTERW bits of the d_flags member in the block device switch
   table.  If D_NOCLUSTERR / D_NOCLUSTERW are set, MNT_NOCLUSTERR /
   MNT_NOCLUSTERW bits will be set.  In this case, MNT_NOCLUSTERR and
   MNT_NOCLUSTERW cannot be cleared from userland.
3. Vnode driver disables both clustered read and write.
4. Union filesystem disables clutered write.

Reviewed by:	bde
1997-09-27 13:40:20 +00:00
Poul-Henning Kamp
0054419366 A couple of handles to tweak, more statistics. 1997-09-24 07:46:54 +00:00
John Dyson
99448ed11d Change the M_NAMEI allocations to use the zone allocator. This change
plus the previous changes to use the zone allocator decrease the useage
of malloc by half.  The Zone allocator will be upgradeable to be able
to use per CPU-pools, and has more intelligent usage of SPLs.  Additionally,
it has reasonable stats gathering capabilities, while making most calls
inline.
1997-09-21 04:24:27 +00:00
Poul-Henning Kamp
044839fb8b Don't leak memory, from sef.
Stylistic nits and a blunder, from bde.
1997-09-16 08:05:09 +00:00
Poul-Henning Kamp
7874d7a3bb Solve race-condition, return path in normal order.
A couple of stylistic nits from Bruce.

If your libc contains version 1.11 or 1.12 of getcwd.c, (ie: if
you recompiled libc one of the last couple of days):
>>> Recompile LIBC before you boot a new kernel <<<
A new libc will deal with both old and new kernels.
1997-09-15 19:11:07 +00:00
Poul-Henning Kamp
d56f6402d5 Deal more correctly with mountpoints. 1997-09-15 08:25:43 +00:00
Poul-Henning Kamp
7822f1c624 Add a __getcwd() syscall. This is intentionally undocumented, but all
it does is to try to figure the pwd out from the vfs namecache, and
return a reversed string to it.  libc:getcwd() is responsible for
flipping it back.
1997-09-14 16:51:31 +00:00
Bruce Evans
e4ba6a82b0 Removed unused #includes. 1997-09-02 20:06:59 +00:00
Doug Rabson
f6b4c28555 Merge WebNFS support from NetBSD
Obtained from:	NetBSD
1997-07-17 07:17:33 +00:00
Doug Rabson
42146e3747 [Previous comment was incorrect for these files]
Added calls to VFS lock debugging macros to make fixing filesystems' locking
easier.
1997-04-04 17:47:43 +00:00
Doug Rabson
de15ef6aef Add a function vop_sharedlock which a copy of vop_nolock without the
implementation #ifdef out.  This can be used for now by NFS.  As soon
as all the other filesystems' locking is fixed, this can go away.

Print the vnode address in vprint for easier debugging.
1997-04-04 17:46:21 +00:00
Peter Wemm
57862eed22 Code to do lchown(2), copied from chown(2) except it's NOFOLLOW in ND_INIT
instead of FOLLOW.
1997-03-31 12:21:37 +00:00
Peter Wemm
6c14d95d0d Treat symlinks as first class citizens with their own uid/gid rather than
as shadows of their containing directory.  This should solve the problem
of users not being able to delete their symlinks from /tmp once and for
all.

Symlinks do not have modes though, they are accessable to everything that
can read the directory (as before).  They are made to show this fact at
lstat time (they appear as mode 0777 always, since that's how the the
lookup routines in the kernel treat them).

More commits will follow, eg: add a real lchown() syscall and man pages.
1997-03-31 12:02:53 +00:00
Guido van Rooij
8f89943eda Add generation number randomization. Newly created filesystems wil now
automatically have random generation numbers. The kenel way of handling those
also changed. Further it is advised to run fsirand on all your nfs exported
filesystems. the code is mostly copied from OpenBSD, with the randomization
chanegd to use /dev/urandom
Reviewed by:	Garrett
Obtained from: OpenBSD
1997-03-23 20:08:22 +00:00
Bruce Evans
3ac4d1ef0c Don't #include <sys/fcntl.h> in <sys/file.h> if KERNEL is defined.
Fixed everything that depended on getting fcntl.h stuff from the wrong
place.  Most things don't depend on file.h stuff at all.
1997-03-23 03:37:54 +00:00
Mike Smith
3a558f83dd Check that vp->v_mount is non-null in fsync() before dereferencing it to
obtain the mountpoint's MNT_ASYNC flag.

This is a Very Definite Last-Minute 2.2 Bugfix Candidate.

Reviewed by:	sef
1997-03-05 01:42:14 +00:00
Peter Wemm
6875d25465 Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.
1997-02-22 09:48:43 +00:00
Mike Pritchard
61f84e5b27 Don't depend on FIFO being defined to enable mkfifo.
It is now always compiled.

Submitted by:	bde
1997-02-12 16:55:32 +00:00
Mike Pritchard
72a5ee14de Add function protypes for the new Lite2 unionfs functions. 1997-02-12 07:54:22 +00:00
Mike Pritchard
820d8cf44a Comment out a call to the #ifdef DIAGNOSTIC routine
vfs_bufstats().  This routine was not imported in the
Lite2 merge.
1997-02-12 06:46:11 +00:00
John Dyson
996c772f58 This is the kernel Lite/2 commit. There are some requisite userland
changes, so don't expect to be able to run the kernel as-is (very well)
without the appropriate Lite/2 userland changes.

The system boots and can mount UFS filesystems.

Untested: ext2fs, msdosfs, NFS
Known problems: Incorrect Berkeley ID strings in some files.
		Mount_std mounts will not work until the getfsent
		library routine is changed.

Reviewed by:	various people
Submitted by:	Jeffery Hsu <hsu@freebsd.org>
1997-02-10 02:22:35 +00:00
Jordan K. Hubbard
1130b656e5 Make the long-awaited change from $Id$ to $FreeBSD$
This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore.  This update would have been
insane otherwise.
1997-01-14 07:20:47 +00:00
Bruce Evans
bb65f5a1cc Fixed lseek() on named pipes. It always succeeded but should always fail.
Broke locking on named pipes in the same way as locking on non-vnodes
(wrong errno).  This will be fixed later.

The fix involves negative logic.  Named pipes are now distinguished from
other types of files with vnodes, and there is additional code to handle
vnodes and named pipes in the same way only where that makes sense (not
for lseek, locking or TIOCSCTTY).
1996-12-19 19:42:37 +00:00
Nate Williams
030e2e9ebb In sys/time.h, struct timespec is defined as:
/*
         * Structure defined by POSIX.4 to be like a timeval.
         */
        struct timespec {
                time_t  ts_sec;         /* seconds */
                long    ts_nsec;        /* and nanoseconds */
        };

        The correct names of the fields are tv_sec and tv_nsec.

Reminded by:	James Drobina <jdrobina@infinet.com>
1996-09-19 18:21:32 +00:00
Bruce Evans
b71fec07db Eliminated nested include of <sys/unistd.h> in <sys/file.h> in the kernel.
Include it directly in the few places where it is used.

Reduced some #includes of <sys/file.h> to #includes of <sys/fcntl.h> or
nothing.
1996-09-03 14:25:27 +00:00
David Greenman
9e04304259 Implemented kernel side of MNT_NOATIME mount option. This option disables
the file access time update on reads and can be useful in reducing
filesystem overhead in cases where the access time is not important (like
Usenet news spools).
1996-09-03 07:09:11 +00:00
Peter Wemm
472fe5e4db Dont allow directories to be link()ed or unlink()ed, even for root
(returns EPERM always, the errno is specified by POSIX).

If you really have a desperate need to link or unlink a directory, you
can use fsdb. :-)

This should stop any chance of ftpd, rdist, "rm -rf", etc from
bugging out and damaging the filesystem structure or loosing races
with malicious users.

Reviewed by: davidg, bde
1996-05-24 16:19:23 +00:00
Bruce Evans
d03b40173c Hide options for emulators and static file systems in opt_dontuse.h.
These options only apply at config time.  Using them at compile time
would break the corresponding lkms.
1996-05-11 04:39:53 +00:00
David Greenman
c128d2157b Make sure the mountpoint is marked busy before doing operations on it.
This fixes a panic that freefall suffered last night.

Obtained partially from 4.4-lite2, but minus the new bug that it introduced
1996-01-16 13:07:14 +00:00
Garrett Wollman
692910e615 convert FDESC, KERNFS, NULLFS, PORTAL, UMAPFS, and UNION to the new
style of options.
1996-01-05 17:46:14 +00:00
Poul-Henning Kamp
27a0b398a7 Staticize.
Unstaticize a function in scsi/scsi_base that was used, with an undocumented
option.
My last count on the LINT kernel shows:
Total symbols:  3647
unref symbols:   463
undef symbols:     4
1 ref symbols:  1751
2 ref symbols:   485
Approaching the pain threshold now.
1995-12-17 21:23:44 +00:00
John Dyson
a316d390bd Changes to support 1Tb filesizes. Pages are now named by an
(object,index) pair instead of (object,offset) pair.
1995-12-11 04:58:34 +00:00
David Greenman
efeaf95a41 Untangled the vm.h include file spaghetti. 1995-12-07 12:48:31 +00:00
Bruce Evans
75a85811c1 Fixed the errno returned by rename("dir1", "dir2/."). It was EISDIR
(duh); translate it to EINVAL which is the errno for other renames
to ".".
1995-11-18 11:35:05 +00:00
Poul-Henning Kamp
395e673587 Change some of the debug sysctl vars. The semantics of these will change. 1995-11-14 09:19:16 +00:00
Bruce Evans
c8cbd8682f Fixed a cast in olseek().
Fixed confusing order of declarations of getvnode()'s args.
1995-11-13 08:22:21 +00:00
Bruce Evans
d2d3e8751c Included <sys/sysproto.h> to get central declarations for syscall args
structs and prototypes for syscalls.

Ifdefed duplicated decentralized declarations of args structs.  It's
convenient to have this visible but they are hard to maintain.  Some
are already different from the central declarations.  4.4lite2 puts
them in comments in the function headers but I wanted to avoid the
large changes for that.
1995-11-12 06:43:28 +00:00
John Dyson
4dc4e0d228 Make MNT_ASYNC more effective for UFS. It should not be too much more
dangerous than the original MNT_ASYNC.  There might be some minor
security considerations due to data writes not being posted as promptly
as before.  Meta-data operations are still not quite as fast as Linux,
but streaming I/O is still higher.
1995-11-05 21:01:15 +00:00
Bruce Evans
046bc05396 Prototype getvnode() in the right place (where ibcs2_stat.c can see it). 1995-11-04 10:35:26 +00:00
David Greenman
d68a41903e Moved the filesystem read-only check out of the syscalls and into the
filesystem layer, as was done in lite-2. Merged in some other cosmetic
changes while I was at it. Rewrote most of msdosfs_access() to be more
like ufs_access() and to include the FS read-only check.

Obtained from: partially from 4.4BSD-lite2
1995-10-22 09:32:48 +00:00
Steven Wallace
ad7507e248 Remove prototype definitions from <sys/systm.h>.
Prototypes are located in <sys/sysproto.h>.

Add appropriate #include <sys/sysproto.h> to files that needed
protos from systm.h.

Add structure definitions to appropriate files that relied on sys/systm.h,
right before system call definition, as in the rest of the kernel source.

In kern_prot.c, instead of using the dummy structure "args", create
individual dummy structures named <syscall>_args.  This makes
life easier for prototype generation.
1995-10-08 00:06:22 +00:00
Julian Elischer
2b14f991e6 Reviewed by: julian with quick glances by bruce and others
Submitted by:	terry (terry lambert)
This is  a composite of 3 patch sets submitted by terry.
they are:
New low-level init code that supports loadbal modules better
some cleanups in the namei code to help terry in 16-bit character support
some changes to the mount-root code to make it a little more
modular..

NOTE: mounting root off cdrom or NFS MIGHT be broken as I haven't been able
to test those cases..

certainly mounting root of disk still works just fine..
mfs should work but is untested. (tomorrows task)

The low level init stuff includes a total rewrite of init_main.c
to make it possible for new modules to have an init phase by simply
adding an entry to a TEXT_SET (or is it DATA_SET) list. thus a new module can
be added to the kernel without editing any other files other than the
'files' file.
1995-08-28 09:19:25 +00:00
Bruce Evans
cf2455a3ec The cred' and proc' args were missing for some VOP_OPEN() and VOP_CLOSE()
calls.

Found by: gcc -Wstrict-prototypes after I supplied some of the 5000+
missing prototypes.  Now I have 9000+ lines of warnings and errors
about bogus conversions of function pointers.
1995-08-17 11:53:51 +00:00
David Greenman
628641f8a6 Converted mountlist to a CIRCLEQ.
Partially obtained from: 4.4BSD-Lite2
1995-08-11 11:31:18 +00:00
David Greenman
4777741358 Removed my special-case hack for VOP_LINK and fixed the problem with the
wrong vp's ops vector being used by changing the VOP_LINK's argument order.
The special-case hack doesn't go far enough and breaks the generic
bypass routine used in some non-leaf filesystems. Pointed out by Kirk
McKusick.
1995-08-01 18:51:02 +00:00
Bruce Evans
70eec7420c Ignore trailing slashes in pathnames that "refer to a directory",
as is required to be POSIXLY_CORRECT and "right".  I interpret
"referring to a directory" as being a directory or becoming a
directory.  E.g., the trailing slashes in mkdir("/nonesuch/"),
rename("/tmp", /nonesuch/") and link("/tmp", "/root_can_like_dirs/")
are ignored because the target will become a directory if the
syscall succeeds.  A trailing slash on a symlink causes the symlink
to be followed (this is a bug if the symlink doesn't point to a
directory; fix later).
1995-07-31 00:35:58 +00:00
David Greenman
24a1cce34f NOTE: libkvm, w, ps, 'top', and any other utility which depends on struct
proc or any VM system structure will have to be rebuilt!!!

Much needed overhaul of the VM system. Included in this first round of
changes:

1) Improved pager interfaces: init, alloc, dealloc, getpages, putpages,
   haspage, and sync operations are supported. The haspage interface now
   provides information about clusterability. All pager routines now take
   struct vm_object's instead of "pagers".

2) Improved data structures. In the previous paradigm, there is constant
   confusion caused by pagers being both a data structure ("allocate a
   pager") and a collection of routines. The idea of a pager structure has
   escentially been eliminated. Objects now have types, and this type is
   used to index the appropriate pager. In most cases, items in the pager
   structure were duplicated in the object data structure and thus were
   unnecessary. In the few cases that remained, a un_pager structure union
   was created in the object to contain these items.

3) Because of the cleanup of #1 & #2, a lot of unnecessary layering can now
   be removed. For instance, vm_object_enter(), vm_object_lookup(),
   vm_object_remove(), and the associated object hash list were some of the
   things that were removed.

4) simple_lock's removed. Discussion with several people reveals that the
   SMP locking primitives used in the VM system aren't likely the mechanism
   that we'll be adopting. Even if it were, the locking that was in the code
   was very inadequate and would have to be mostly re-done anyway. The
   locking in a uni-processor kernel was a no-op but went a long way toward
   making the code difficult to read and debug.

5) Places that attempted to kludge-up the fact that we don't have kernel
   thread support have been fixed to reflect the reality that we are really
   dealing with processes, not threads. The VM system didn't have complete
   thread support, so the comments and mis-named routines were just wrong.
   We now use tsleep and wakeup directly in the lock routines, for instance.

6) Where appropriate, the pagers have been improved, especially in the
   pager_alloc routines. Most of the pager_allocs have been rewritten and
   are now faster and easier to maintain.

7) The pagedaemon pageout clustering algorithm has been rewritten and
   now tries harder to output an even number of pages before and after
   the requested page. This is sort of the reverse of the ideal pagein
   algorithm and should provide better overall performance.

8) Unnecessary (incorrect) casts to caddr_t in calls to tsleep & wakeup
   have been removed. Some other unnecessary casts have also been removed.

9) Some almost useless debugging code removed.

10) Terminology of shadow objects vs. backing objects straightened out.
    The fact that the vm_object data structure escentially had this
    backwards really confused things. The use of "shadow" and "backing
    object" throughout the code is now internally consistent and correct
    in the Mach terminology.

11) Several minor bug fixes, including one in the vm daemon that caused
    0 RSS objects to not get purged as intended.

12) A "default pager" has now been created which cleans up the transition
    of objects to the "swap" type. The previous checks throughout the code
    for swp->pg_data != NULL were really ugly. This change also provides
    the rudiments for future backing of "anonymous" memory by something
    other than the swap pager (via the vnode pager, for example), and it
    allows the decision about which of these pagers to use to be made
    dynamically (although will need some additional decision code to do
    this, of course).

13) (dyson) MAP_COPY has been deprecated and the corresponding "copy
    object" code has been removed. MAP_COPY was undocumented and non-
    standard. It was furthermore broken in several ways which caused its
    behavior to degrade to MAP_PRIVATE. Binaries that use MAP_COPY will
    continue to work correctly, but via the slightly different semantics
    of MAP_PRIVATE.

14) (dyson) Sharing maps have been removed. It's marginal usefulness in a
    threads design can be worked around in other ways. Both #12 and #13
    were done to simplify the code and improve readability and maintain-
    ability. (As were most all of these changes)

TODO:

1) Rewrite most of the vnode pager to use VOP_GETPAGES/PUTPAGES. Doing
   this will reduce the vnode pager to a mere fraction of its current size.

2) Rewrite vm_fault and the swap/vnode pagers to use the clustering
   information provided by the new haspage pager interface. This will
   substantially reduce the overhead by eliminating a large number of
   VOP_BMAP() calls. The VOP_BMAP() filesystem interface should be
   improved to provide both a "behind" and "ahead" indication of
   contiguousness.

3) Implement the extended features of pager_haspage in swap_pager_haspage().
   It currently just says 0 pages ahead/behind.

4) Re-implement the swap device (swstrategy) in a more elegant way, perhaps
   via a much more general mechanism that could also be used for disk
   striping of regular filesystems.

5) Do something to improve the architecture of vm_object_collapse(). The
   fact that it makes calls into the swap pager and knows too much about
   how the swap pager operates really bothers me. It also doesn't allow
   for collapsing of non-swap pager objects ("unnamed" objects backed by
   other pagers).
1995-07-13 08:48:48 +00:00
David Greenman
aa2cabb958 1) Converted v_vmdata to v_object.
2) Removed unnecessary vm_object_lookup()/pager_cache(object, TRUE) pairs
   after vnode_pager_alloc() calls - the object is already guaranteed to be
   persistent.
3) Removed some gratuitous casts.
1995-06-28 12:01:13 +00:00
David Greenman
9879652657 Fixed VOP_LINK argument order botch. 1995-06-28 07:06:55 +00:00
David Greenman
61f5d51062 Changes to fix the following bugs:
1) Files weren't properly synced on filesystems other than UFS. In some
   cases, this lead to lost data. Most likely would be noticed on NFS.
   The fix is to make the VM page sync/object_clean general rather than
   in each filesystem.
2) Mixing regular and mmaped file I/O on NFS was very broken. It caused
   chunks of files to end up as zeroes rather than the intended contents.
   The fix was to fix several race conditions and to kludge up the
   "b_dirtyoff" and "b_dirtyend" that NFS relies upon - paying attention
   to page modifications that occurred via the mmapping.

Reviewed by:	David Greenman
Submitted by:	John Dyson
1995-05-21 21:39:31 +00:00
David Greenman
1469eec81e Fixed incompleteness that would allow dirty filesystems to get mounted
when the single user shell was terminated. These changes disallow mounting
or R/W upgrading filesystems that are dirty unless "-f" (force) option
is used with mount. /etc/rc has been modified to abort the startup if
one or more non-nfs partitions fail to mount.

Reviewed by:	Poul-Henning Kamp, Rod Grimes
1995-05-15 08:39:37 +00:00
David Greenman
c9ae46b1ad Removed unused variable caused by last commit. 1995-05-02 09:06:04 +00:00
David Greenman
beef0195c9 Fix for sync() to close a potential panic with accessing a mount struct
that had been freed.

Submitted by:	John Dyson
1995-05-02 08:44:31 +00:00
David Greenman
2547597bf4 Added a set of braces to make the compiler happy. 1995-03-29 11:54:02 +00:00
David Greenman
ab828ab8bd Moved call to vnode_pager_uncache in rename() to before the VOP_RENAME.
It was previously after the VOP_RENAME and the reference and lock on
the vnode had already been lost, allowing interesting internel
inconsistencies. This is one of the two reasons why freefall was crashing
every hour or two (the other being nullfs bugs).
Don't call vnode_pager_uncache in revoke(). revoke() is only allowed on
VCHR and VBLK vnodes.
1995-03-19 11:16:58 +00:00
Bruce Evans
b5e8ce9f12 Add and move declarations to fix all of the warnings from `gcc -Wimplicit'
(except in netccitt, netiso and netns) and most of the warnings from
`gcc -Wnested-externs'.  Fix all the bugs found.  There were no serious
ones.
1995-03-16 18:17:34 +00:00
David Greenman
519b3d1aa6 Do a vnode_pager_uncache after the VOP_RENAME to lose the remaining
reference to the old vnode.

Suggested by:	Bruce Evans
1995-02-28 02:52:48 +00:00
David Greenman
2655f62632 In sync(), don't dereference the proc pointer if it's NULL. Should fix
most or all of the problems with calling sync() without a curproc (which
can happen in machdep.c during a panic sync).
1995-02-13 13:45:04 +00:00
David Greenman
c4a7b7e10c From tim@cs.city.ac.uk (Tim Wilkinson):
Find enclosed a short bugfix to get the union filesystem up and running
in FreeBSD-current.  We don't think we've got all the problems yet but
these fixes sort out the major ones (which mostly concert bad locking
of vnodes), no doubt we'll post others as necessary.  Known problems
include the inability of the umount command (not the system call) to unmount
unions in certain circumstances (this is due the way "realpath" works),
and the failure of direntries to always get all available files in
unioned subdirectories.  We are, as they say, working on it.

Submitted by:	tim@cs.city.ac.uk (Tim Wilkinson)
1994-11-04 14:41:46 +00:00
Garrett Wollman
091b0456f4 Make my ALLDEVS kernel compile (basically, LINT minus a lot of options).
This involves fixing a few things I broke last time.
1994-10-21 01:19:28 +00:00
Poul-Henning Kamp
17b9f9f4a1 Fix the problem with panics when mounting on nonexistant directories. Probably
my fault in the first place...
1994-10-15 02:53:26 +00:00
Søren Schmidt
99ec0d5b44 Removed static declaration of getvnode() (used in ibcs2) 1994-10-11 20:40:12 +00:00
Poul-Henning Kamp
dcd01eb305 Cosmetics: added ()'s and fixed prinf-formats to make gcc silent. 1994-10-08 22:33:43 +00:00
David Greenman
8e58bf6875 Stuff object into v_vmdata rather than pager. Not important which at
the moment, but will be in the future. Other changes mostly cosmetic,
but are made for future VMIO considerations.

Submitted by:	John Dyson
1994-10-05 09:48:45 +00:00
Poul-Henning Kamp
797f2d22f0 All of this is cosmetic. prototypes, #includes, printfs and so on. Makes
GCC a lot more silent.
1994-10-02 17:35:40 +00:00
Doug Rabson
9abf4d6ee0 Make NFS ask the filesystems for directory cookies instead of making them
itself.
1994-09-28 16:45:22 +00:00
Garrett Wollman
c9b1d6048d More loadable VFS changes:
- Make a number of filesystems work again when they are statically compiled
  (blush)

- FIFOs are no longer optional; ``options FIFO'' removed from distributed
  config files.
1994-09-22 19:38:41 +00:00
Garrett Wollman
c901836c14 Implemented loadable VFS modules, and made most existing filesystems
loadable.  (NFS is a notable exception.)
1994-09-21 03:47:43 +00:00
David Greenman
8fceb1ba2d Disallow truncating to negative file sizes. Doing so causes ffs_truncate()
and perhaps other fs truncate's to go crazy and panic the machine or worse.
This fixes the truncate bug reported by Michael Class.
1994-09-02 10:23:43 +00:00
David Greenman
2fc62994a0 Make olstat() consistent with lstat() - so they both return the same
owner..

Submitted by:	Kirk McKusick
1994-09-02 04:14:44 +00:00
David Greenman
e0e9c42112 Implemented filesystem clean bit via:
machdep.c:
	Changed printf's a little and call vfs_unmountall() if the sync was
	successful.

cd9660_vfsops.c, ffs_vfsops.c, nfs_vfsops.c, lfs_vfsops.c:
	Allow dismount of root FS. It is now disallowed at a higher level.

vfs_conf.c:
	Removed unused rootfs global.

vfs_subr.c:
	Added new routines vfs_unmountall and vfs_unmountroot. Filesystems
	are now dismounted if the machine is properly rebooted.

ffs_vfsops.c:
	Toggle clean bit at the appropriate places. Print warning if an
	unclean FS is mounted.

ffs_vfsops.c, lfs_vfsops.c:
	Fix bug in selecting proper flags for VOP_CLOSE().

vfs_syscalls.c:
	Disallow dismounting root FS via umount syscall.
1994-08-20 16:03:26 +00:00
David Greenman
3c4dd3568f Added $Id$ 1994-08-02 07:55:43 +00:00
Rodney W. Grimes
26f9a76710 The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.
Reviewed by:	Rodney W. Grimes
Submitted by:	John Dyson and David Greenman
1994-05-25 09:21:21 +00:00
Rodney W. Grimes
df8bae1de4 BSD 4.4 Lite Kernel Sources 1994-05-24 10:09:53 +00:00