Commit Graph

500 Commits

Author SHA1 Message Date
phk
3237babd8b Only initialize f_data and f_ops if nobody else did so already. 2004-06-19 11:41:45 +00:00
phk
86602fc06c Deorbit COMPAT_SUNOS.
We inherited this from the sparc32 port of BSD4.4-Lite1.  We have neither
a sparc32 port nor a SunOS4.x compatibility desire these days.
2004-06-11 11:16:26 +00:00
pjd
c66d0ff628 Remove unused code.
Submitted by:	Bjoern A. Zeeb
2004-06-07 12:19:55 +00:00
tjr
5c5d136c33 Remove a stale comment. 2004-06-04 11:00:22 +00:00
tjr
e167ef630d Eliminate a memory leak in kern_symlink() that could occur if
vn_start_write() failed.
2004-05-11 10:42:02 +00:00
pjd
cf34985420 Always use nd.ni_vp->v_mount as an argument for VFS_QUOTACTL(), just like
in RELENG_4.

Pointed out by:	Alex Lyashkov <umka@sevinter.net>
2004-04-26 15:44:42 +00:00
pjd
f8b184242a Look out! vn_start_write() is able to return 0 and NULL 'mp'.
Submitted by:	Alex Lyashkov <shadow@psoft.net>
2004-04-22 15:40:27 +00:00
bde
122259ad35 Removed some less than useful comments:
- don't say what a small subset of the options includes are for.
- don't mark up functions which use all their args with /* ARGSUSED */.
  The markup should have been removed when the unused retval parameter
  was removed.
- don't comment on what routine suser() checks do.  Removed nearby
  excessive vertical whitespace.
2004-04-06 10:05:02 +00:00
imp
74cf37bd00 Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999.

Approved by: core
2004-04-05 21:03:37 +00:00
dfr
c76397ed09 Add lgetfh(2) which is like getfh(2) but doesn't follow symlinks. 2004-04-05 10:15:53 +00:00
dwmalone
116755fef7 Nudge Giant as far as I can into kern_open(). Mark open() as MPSAFE.
Use kern_open() to implement creat() rather than taking the long route
through open(). Mark creat as MPSAFE.

While I'm at it, mark nosys() (syscall 0) as MPSAFE, for all the
difference it will make.
2004-03-16 10:46:42 +00:00
pjd
b2d1dcd936 Add two new sysctls:
- security.bsd.hardlink_check_uid, when set, means, that unprivileged
		users are not permitted to create hard links to files not
		owned by them,
	- security.bsd.hardlink_check_gid, when set, means, that unprivileged
		users are not permitted to create hard links to files owned
		by group they don't belong to.

OK'ed by:	rwatson
2004-03-08 20:37:25 +00:00
dwmalone
35850f8433 Correct a comment.
Reviewed by:	alfred, tanimura
2004-02-17 12:30:32 +00:00
rwatson
8caf918eda By default, when a process in jail calls getfsstat(), only return the
data for the file system on which the jail's root vnode is located.
Previous behavior (show data for all mountpoints) can be restored
by setting security.jail.getfsstatroot_only to 0.  Note: this also
has the effect of hiding other mounts inside a jail, such as /dev,
/tmp, and /proc, but errs on the side of leaking less information.
2004-02-14 18:31:11 +00:00
des
f270054cd3 New file descriptor allocation code, derived from similar code introduced
in OpenBSD by Niels Provos.  The patch introduces a bitmap of allocated
file descriptors which is used to locate available descriptors when a new
one is needed.  It also moves the task of growing the file descriptor table
out of fdalloc(), reducing complexity in both fdalloc() and do_dup().

Debts of gratitude are owed to tjr@ (who provided the original patch on
which this work is based), grog@ (for the gdb(4) man page) and rwatson@
(for assistance with pxeboot(8)).
2004-01-15 10:15:04 +00:00
des
e77f1737d6 Mechanical whitespace cleanup; parenthesize return values; other minor
style nits.  The #ifdefs in this file give me a headache...
2004-01-11 19:52:10 +00:00
rwatson
fc37f21a15 Document that when we are addressing an open()/close() race, the reason
we call vn_close() manually rather than letting fdrop() take care of it
is that we haven't yet hooked up the various 'struct file' fields.
2003-12-24 17:13:01 +00:00
mckusick
6a4c30bccd Update the statfs structure with 64-bit fields to allow
accurate reporting of multi-terabyte filesystem sizes.

You should build and boot a new kernel BEFORE doing a `make world'
as the new kernel will know about binaries using the old statfs
structure, but an old kernel will not know about the new system
calls that support the new statfs structure. Running an old kernel
after a `make world' will cause programs such as `df' that do a
statfs system call to fail with a bad system call.

Reviewed by:	Bruce Evans <bde@zeta.org.au>
Reviewed by:	Tim Robbins <tjr@freebsd.org>
Reviewed by:	Julian Elischer <julian@elischer.org>
Reviewed by:	the hoards of <arch@freebsd.org>
Sponsored by:   DARPA & NAI Labs.
2003-11-12 08:01:40 +00:00
dwmalone
be405a4cbd falloc allocates a file structure and adds it to the file descriptor
table, acquiring the necessary locks as it works. It usually returns
two references to the new descriptor: one in the descriptor table
and one via a pointer argument.

As falloc releases the FILEDESC lock before returning, there is a
potential for a process to close the reference in the file descriptor
table before falloc's caller gets to use the file. I don't think this
can happen in practice at the moment, because Giant indirectly protects
closes.

To stop the file being completly closed in this situation, this change
makes falloc set the refcount to two when both references are returned.
This makes life easier for several of falloc's callers, because the
first thing they previously did was grab an extra reference on the
file.

Reviewed by:	iedowse
Idea run past:	jhb
2003-10-19 20:41:07 +00:00
rwatson
6f522a9e52 Add mac_check_vnode_deleteextattr() and mac_check_vnode_listextattr():
explicit access control checks to delete and list extended attributes
on a vnode, rather than implicitly combining with the setextattr and
getextattr checks.  This reflects EA API changes in the kernel made
recently, including the move to explicit VOP's for both of these
operations.

Obtained from:	TrustedBSD PRoject
Sponsored by:	DARPA, Network Associates Laboratories
2003-08-21 13:53:01 +00:00
jhb
af302d132f td_dupfd just needs to be less than 0, it does not have to hold the
negative value of the index of the new file, so just use -1.
2003-08-07 17:08:26 +00:00
iedowse
7bf5fa9caf In the mknod(), mkfifo(), link(), symlink() and undelete() syscalls,
use vrele() instead of vput() on the parent directory vnode returned
by namei() in the case where it is equal to the target vnode. This
handles namei()'s somewhat strange (but documented) behaviour of
not locking either vnode when the two vnodes are equal and LOCKPARENT
but not LOCKLEAF is specified.

Note that since a vnode double-unlock is not currently fatal, these
coding errors were effectively harmless.

Spotted by:	Juergen Hannken-Illjes <hannken@eis.cs.tu-bs.de>
Reviewed by:	mckusick
2003-08-05 00:26:51 +00:00
rwatson
d2f7ae9f88 Rename VOP_RMEXTATTR() to VOP_DELETEEXTATTR() for consistency with the
kernel ACL interfaces and system call names.

Break out UFS2 and FFS extattr delete and list vnode operations from
setextattr and getextattr to deleteextattr and listextattr, which
cleans up the implementations, and makes the results more readable,
and makes the APIs more clear.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-07-28 18:53:29 +00:00
phk
e457974b5d Pass the file descriptor index down to vn_open.
If the method vector was replaced and we got the "special return code"
smile and trust that whatever happened below DTRT.
2003-07-27 20:09:13 +00:00
phk
d4d7ca154a Add fdidx argument to vn_open() and vn_open_cred() and pass -1 throughout. 2003-07-27 17:04:56 +00:00
phk
6221ef9078 Add a "int fd" argument to VOP_OPEN() which in the future will
contain the filedescriptor number on opens from userland.

The index is used rather than a "struct file *" since it conveys a bit
more information, which may be useful to in particular fdescfs and /dev/fd/*

For now pass -1 all over the place.
2003-07-26 07:32:23 +00:00
phk
1e92468572 Use the f_vnode field to tell which file descriptors have a vnode. 2003-07-04 12:20:27 +00:00
rwatson
45b727fb41 Prefer the vop_rmextattr() vnode operation for removing extended
attributes from objects over vop_setextattr() with a NULL uio; if
the file system doesn't support the vop_rmextattr() method, fall
back to the vop_setextattr() method.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-06-22 23:03:07 +00:00
phk
c81c59299b Add a f_vnode field to struct file.
Several of the subtypes have an associated vnode which is used for
stuff like the f*() functions.

By giving the vnode a speparate field, a number of checks for the specific
subtype can be replaced simply with a check for f_vnode != NULL, and
we can later free f_data up to subtype specific use.

At this point in time, f_data still points to the vnode, so any code I
might have overlooked will still work.
2003-06-22 08:41:43 +00:00
phk
b64d71d8c8 Don't (re)initialize f_gcflag to zero.
Move initialization of DTYPE_VNODE specific field f_seqcount into
the DTYPE_VNODE specific code.
2003-06-20 08:02:30 +00:00
truckman
84188f1f4f FILE_LOCK() uses a pool mutex, as does the vnode v_vnlock. Since pool
mutexes are supposed to only be used as leaf mutexes, and what appear
to be separate pool mutexes could be aliased together, it is bad idea
for a thread to attempt to hold two pool mutexes at the same time.

Slightly rearrange the code in kern_open() so that FILE_UNLOCK() is
called before calling VOP_GETVOBJECT(), which will grab the v_vnlock
mutex.
2003-06-19 04:10:56 +00:00
phk
a81d7fdac7 Introduce a new flag on a file descriptor: DFLAG_SEEKABLE and use that
rather than assume that only DTYPE_VNODE is seekable.
2003-06-18 19:53:59 +00:00
obrien
3b8fff9e4c Use __FBSDID(). 2003-06-11 00:56:59 +00:00
rwatson
6b8a71ea4a If a system call comes in requesting to retrieve an attribute named
"", temporarily map it to a call to extattr_list_vp() to provide
compatibility for older applications using the "" API to retrieve
EA lists.

Use VOP_LISTEXTATTR() to support extattr_list_vp() rather than
VOP_GETEXTATTR(..., "", ...).

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Asssociates Laboratories
2003-06-05 05:55:34 +00:00
rwatson
df2db1a4c2 Implementations of extattr_list_fd(), extattr_list_file(), and
extattr_list_link() system calls, which return a least of extended
attributes defined for a vnode referenced by a file descriptor
or path name.  Currently, we just invoke VOP_GETEXTATTR() since
it will convert a request for an empty name into a query for a
name list, which was the old (more hackish) API.  At some point
in the near future, we'll push the distinction between get and
list down to the vnode operation layer, but this provides access
to the new API for applications in the short term.

Pointed out by:	Dominic Giampaolo <dbg@apple.com>
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-06-04 03:57:28 +00:00
phk
2048912526 Remove unused variable(s).
Found by:       FlexeLint
2003-05-31 20:29:34 +00:00
kan
9468fdaf14 Deprecate machine/limits.h in favor of new sys/limits.h.
Change all in-tree consumers to include <sys/limits.h>

Discussed on:	standards@
Partially submitted by: Craig Rodrigues <rodrigc@attbi.com>
2003-04-29 13:36:06 +00:00
alc
87da2c3cf3 - Acquire the vm_object's lock when performing vm_object_page_clean().
- Add a parameter to vm_pageout_flush() that tells vm_pageout_flush()
   whether its caller has locked the vm_object.  (This is a temporary
   measure to bootstrap vm_object locking.)
2003-04-24 04:31:25 +00:00
mike
75859ca578 o In struct prison, add an allprison linked list of prisons (protected
by allprison_mtx), a unique prison/jail identifier field, two path
  fields (pr_path for reporting and pr_root vnode instance) to store
  the chroot() point of each jail.
o Add jail_attach(2) to allow a process to bind to an existing jail.
o Add change_root() to perform the chroot operation on a specified
  vnode.
o Generalize change_dir() to accept a vnode, and move namei() calls
  to callers of change_dir().
o Add a new sysctl (security.jail.list) which is a group of
  struct xprison instances that represent a snapshot of active jails.

Reviewed by:	rwatson, tjr
2003-04-09 02:55:18 +00:00
rwatson
3158a8710a Move the initialization of the vattr flags field in setfflags() to
before the MAC check so that we pass the flags field into the MAC
check properly initialized.  This didn't affect any current MAC
modules since they didn't care what the flags argument was (as
they were primarily interested in the fact that it was a meta-data
write, not the contents of the write), but would be relevant to
future modules relying on that field.

Submitted by:	Mike Halderman <mrh@spawar.navy.mil>
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-03-05 23:15:23 +00:00
imp
cf874b345d Back out M_* changes, per decision of the TRB.
Approved by: trb
2003-02-19 05:47:46 +00:00
hsu
762f64befa Remove extraneous FILEDESC_LOCK around atomic read. 2003-02-16 02:15:15 +00:00
rwatson
e9e0de5ca4 Correct handling of locking for chroot() and chdir() cases: rather
than having change_dir() release the vnode lock on success, hold the
lock so that we can use it later when invoking MAC checks and
VOP_ACCESS() in the chroot() code.  Update the comment to reflect
this calling convention.  Update callers to unlock the vnode
lock.  Correct a typo regarding vnode naming in the MAC case that
crept in via the previous patch applied.
2003-01-31 21:13:25 +00:00
rwatson
21c1d8195b Clean up vnode handling on return from chroot() in certain error
cases: we might multiply vrele() a vnode when certain classes of
failures occur.  This appears to stem from earlier Giant/file
descriptor lock pushdown and restructuring.

Submitted by:	maxim
2003-01-31 18:57:04 +00:00
alfred
bf8e8a6e8f Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
2003-01-21 08:56:16 +00:00
dillon
ccd5574cc6 Bow to the whining masses and change a union back into void *. Retain
removal of unnecessary casts and throw in some minor cleanups to see if
anyone complains, just for the hell of it.
2003-01-13 00:33:17 +00:00
dillon
ddf9ef103e Change struct file f_data to un_data, a union of the correct struct
pointer types, and remove a huge number of casts from code using it.

Change struct xfile xf_data to xun_data (ABI is still compatible).

If we need to add a #define for f_data and xf_data we can, but I don't
think it will be necessary.  There are no operational changes in this
commit.
2003-01-12 01:37:13 +00:00
nectar
266526442d Correct file descriptor leaks in lseek and do_dup.
The leak in lseek was introduced in vfs_syscalls.c revision 1.218.
The leak in do_dup was introduced in kern_descrip.c revision 1.158.

Submitted by:	iedowse
2003-01-06 13:19:05 +00:00
alfred
5512694b25 unwrap lines made short enough by SCARGS removal 2002-12-14 08:18:06 +00:00
alfred
7b7fa13344 remove syscallarg().
Suggested by: peter
2002-12-14 02:07:32 +00:00
alfred
d070c0a52d SCARGS removal take II. 2002-12-14 01:56:26 +00:00
alfred
4f48184fb2 Backout removal SCARGS, the code freeze is only "selectively" over. 2002-12-13 22:41:47 +00:00
alfred
d19b4e039d Remove SCARGS.
Reviewed by: md5
2002-12-13 22:27:25 +00:00
iedowse
092b51aeec Fix a case in kern_rename() where a vn_finished_write() call was
missed. This bug has been present since the vn_start_write() and
vn_finished_write() calls were first added in revision 1.159. When
the case is triggered, any attempts to create snapshots on the
filesystem will deadlock and also prevent further write activity
on that filesystem.
2002-10-27 23:23:51 +00:00
wollman
7e9d4df21f Change the way support for asynchronous I/O is indicated to applications
to conform to 1003.1-2001.  Make it possible for applications to actually
tell whether or not asynchronous I/O is supported.

Since FreeBSD's aio implementation works on all descriptor types, don't
call down into file or vnode ops when [f]pathconf() is asked about
_PC_ASYNC_IO; this avoids the need for every file and vnode op to know about
it.
2002-10-27 18:07:41 +00:00
rwatson
91dee1ecbb Hook up most of the MAC entry points relating to file/directory/node
creation, deletion, and rename.  There are one or two other stray
cases I'll catch in follow-up commits (such as unix domain socket
creation); this permits MAC policy modules to limit the ability to
perform these operations based on existing UNIX credential / vnode
attributes, extended attributes, and security labels.  In the rename
case using MAC, we now have to lock the from directory and file
vnodes for the MAC check, but this is done only in the MAC case,
and the locks are immediately released so that the remainder of the
rename implementation remains the same.  Because the create check
takes a vattr to know object type information, we now initialize
additional fields in the VATTR passed to VOP_SYMLINK() in the MAC
case.

Approved by:	re
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-10-19 20:25:57 +00:00
rwatson
683f085f65 Incremental style improvements: more consistently avoid assignments
in conditionals; remove some excess vertical whitespace; remove a
bug in the return handling of the delete_vp() case for MAC.

Spotted by:	bde
2002-10-10 13:59:58 +00:00
rwatson
f1296e0875 Explore new heights in alphabetization for _file and _fd variations on
the extended attribute system calls.
2002-10-10 00:32:08 +00:00
rwatson
48070e93ee Implement extattr_{delete,get,set}_link() system calls: extended attribute
operations that do not follow links.  Sync to MAC tree.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-10-09 21:48:22 +00:00
iedowse
cbd79f434a Add back a fdrop() call at the end of kern_open() that got lost in
revision 1.218. This bug caused a "struct file" reference to be
leaked if VOP_ADVLOCK(), vn_start_write(), or mac_check_vnode_write()
failed during the open operation.

PR:		kern/43739
Reported by:	Arne Woerner <woerner@mediabase-gmbh.de>
2002-10-07 20:49:22 +00:00
rwatson
abda58cc1e Merge support for mac_check_vnode_link(), a MAC framework/policy entry
point that instruments the creation of hard links.  Policy implementations
to follow.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-10-05 18:11:36 +00:00
phk
76d8452fbf Fix mis-indentation.
Spotted by:	FlexeLint
2002-10-02 09:09:25 +00:00
jeff
881a59ab9e - Properly lock v_vflags in getdirents(). 2002-09-25 02:13:38 +00:00
truckman
f280782003 VOP_FSYNC() requires that it's vnode argument be locked, which nfs_link()
wasn't doing.  Rather than just lock and unlock the vnode around the call
to VOP_FSYNC(), implement rwatson's suggestion to lock the file vnode
in kern_link() before calling VOP_LINK(), since the other filesystems
also locked the file vnode right away in their link methods.  Remove the
locking and and unlocking from the leaf filesystem link methods.

Reviewed by:	rwatson, bde  (except for the unionfs_link() changes)
2002-09-19 13:32:45 +00:00
bde
8aa3df4eb2 vfs_syscalls.c:
Changed rename(2) to follow the letter of the POSIX spec.  POSIX
requires rename() to have no effect if its args "resolve to the same
existing file".  I think "file" can only reasonably be read as referring
to the inode, although the rationale and "resolve" seem to say that
sameness is at the level of (resolved) directory entries.

ext2fs_vnops.c, ufs_vnops.c:
Replaced code that gave the historical BSD behaviour of removing one
link name by checks that this code is now unreachable.  This fixes
some races.  All vnodes needed to be unlocked for the removal, and
locking at another level using something like IN_RENAME was not even
attempted, so it was possible for rename(x, y) to return with both x
and y removed even without any unlink(2) syscalls (one process can
remove x using rename(x, y) and another process can remove y using
rename(y, x)).

Prodded by:	alfred
MFC after:	8 weeks
PR:		42617
2002-09-10 11:09:13 +00:00
iedowse
be17b12cb6 Split out a number of mostly VFS and signal related syscalls into
a kernel-internal kern_*() version and a wrapper that is called via
the syscall vector table. For paths and structure pointers, the
internal version either takes a uio_seg parameter or requires the
caller to copyin() the data to kernel memory as appropiate. This
will permit emulation layers to use these syscalls without having
to copy out translated arguments to the stack gap.

Discussed on:		-arch
Review/suggestions:	bde, jhb, peter, marcel
2002-09-01 20:37:28 +00:00
jeff
a9972cd35a - Hold the vnode lock across unlink() so that the v_vflag check is safe.
- Fix the long broken error handling for VV_ROOT and VDIR.
2002-08-21 03:55:35 +00:00
rwatson
a1cb1e3bed Pass active_cred and file_cred into the MAC framework explicitly
for mac_check_vnode_{poll,read,stat,write}().  Pass in fp->f_cred
when calling these checks with a struct file available.  Otherwise,
pass NOCRED.  All currently MAC policies use active_cred, but
could now offer the cached credential semantic used for the base
system security model.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-19 19:04:53 +00:00
rwatson
1a7cd1a210 Break out mac_check_vnode_op() into three seperate checks:
mac_check_vnode_poll(), mac_check_vnode_read(), mac_check_vnode_write().
This improves the consistency with other existing vnode checks, and
allows policies to avoid implementing switch statements to determine
what operations they do and do not want to authorize.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-19 16:43:25 +00:00
rwatson
2b82cd24f1 Make similar changes to fo_stat() and fo_poll() as made earlier to
fo_read() and fo_write(): explicitly use the cred argument to fo_poll()
as "active_cred" using the passed file descriptor's f_cred reference
to provide access to the file credential.  Add an active_cred
argument to fo_stat() so that implementers have access to the active
credential as well as the file credential.  Generally modify callers
of fo_stat() to pass in td->td_ucred rather than fp->f_cred, which
was redundantly provided via the fp argument.  This set of modifications
also permits threads to perform these operations on behalf of another
thread without modifying their credential.

Trickle this change down into fo_stat/poll() implementations:

- badfo_poll(), badfo_stat(): modify/add arguments.
- kqueue_poll(), kqueue_stat(): modify arguments.
- pipe_poll(), pipe_stat(): modify/add arguments, pass active_cred to
  MAC checks rather than td->td_ucred.
- soo_poll(), soo_stat(): modify/add arguments, pass fp->f_cred rather
  than cred to pru_sopoll() to maintain current semantics.
- sopoll(): moidfy arguments.
- vn_poll(), vn_statfile(): modify/add arguments, pass new arguments
  to vn_stat().  Pass active_cred to MAC and fp->f_cred to VOP_POLL()
  to maintian current semantics.
- vn_close(): rename cred to file_cred to reflect reality while I'm here.
- vn_stat(): Add active_cred and file_cred arguments to vn_stat()
  and consumers so that this distinction is maintained at the VFS
  as well as 'struct file' layer.  Pass active_cred instead of
  td->td_ucred to MAC and to VOP_GETATTR() to maintain current semantics.

- fifofs: modify the creation of a "filetemp" so that the file
  credential is properly initialized and can be used in the socket
  code if desired.  Pass ap->a_td->td_ucred as the active
  credential to soo_poll().  If we teach the vnop interface about
  the distinction between file and active credentials, we would use
  the active credential here.

Note that current inconsistent passing of active_cred vs. file_cred to
VOP's is maintained.  It's not clear why GETATTR would be authorized
using active_cred while POLL would be authorized using file_cred at
the file system level.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-16 12:52:03 +00:00
jeff
02517b6731 - Replace v_flag with v_iflag and v_vflag
- v_vflag is protected by the vnode lock and is used when synchronization
   with VOP calls is needed.
 - v_iflag is protected by interlock and is used for dealing with vnode
   management issues.  These flags include X/O LOCK, FREE, DOOMED, etc.
 - All accesses to v_iflag and v_vflag have either been locked or marked with
   mp_fixme's.
 - Many ASSERT_VOP_LOCKED calls have been added where the locking was not
   clear.
 - Many functions in vfs_subr.c were restructured to provide for stronger
   locking.

Idea stolen from:	BSD/OS
2002-08-04 10:29:36 +00:00
rwatson
eac603fb18 Introduce support for Mandatory Access Control and extensible
kernel access control.

Invoke appropriate MAC framework entry points to authorize readdir()
operations in the native ABI.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-01 20:44:52 +00:00
rwatson
a5dcc1fd3d Include file cleanup; mac.h and malloc.h at one point had ordering
relationship requirements, and no longer do.

Reminded by:	bde
2002-08-01 17:47:56 +00:00
rwatson
7af111191c Introduce support for Mandatory Access Control and extensible
kernel access control.

Invoke appropriate MAC entry points to authorize the following
operations:

        truncate on open()                      (write)
        access()                                (access)
        readlink()                              (readlink)
        chflags(), lchflags(), fchflags()       (setflag)
        chmod(), fchmod(), lchmod()             (setmode)
        chown(), fchown(), lchown()             (setowner)
        utimes(), lutimes(), futimes()          (setutimes)
        truncate(), ftrunfcate()                (write)
        revoke()                                (revoke)
        fhopen()                                (open)
        truncate on fhopen()                    (write)
        extattr_set_fd, extattr_set_file()      (setextattr)
        extattr_get_fd, extattr_get_file()      (getextattr)
        extattr_delete_fd(), extattr_delete_file() (setextattr)

These entry points permit MAC policies to enforce a variety of
protections on vnodes.  More vnode checks to come, especially in
non-native ABIs.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-01 15:37:12 +00:00
rwatson
fff16f04c3 Introduce support for Mandatory Access Control and extensible
kernel access control.

Instrument chdir() and chroot()-related system calls to invoke
appropriate MAC entry points to authorize the two operations.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-01 03:50:08 +00:00
rwatson
6d0d48759b Improve formatting and variable use consistency in extattr system
calls.

Submitted by:	green
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-01 01:29:03 +00:00
rwatson
14cc38f1e8 Simplify the logic to enter VFS_EXTATTRCTL().
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-01 01:26:07 +00:00
rwatson
4d5d66e7e4 Introduce support for Mandatory Access Control and extensible
kernel access control.

Implement MAC framework access control entry points relating to
operations on mountpoints.  Currently, this consists only of
access control on mountpoint listing using the various statfs()
variations.  In the future, it might also be desirable to
implement checks on mount() and unmount().

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-31 01:27:33 +00:00
rwatson
5f36208df2 When referencing nd_cnp after namei(), always pass SAVENAME into
NDINIT() operation flags.

Submitted by:	green
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-30 18:48:25 +00:00
rwatson
fcb9022bbd Set VAPPEND in open mode when O_APPEND is specified as an argument to
open() of fhopen().  Currently this has no actual affect due to the
treatment of VAPPEND in vaccess() and vaccess_acl() as a subset of
VWRITE, but when MAC comes in, MAC will distinguish the two.  Note:
if any file systems are cutting their own permission models, they
may wish to now take this into account.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-22 12:51:06 +00:00
mckusick
3abb526f86 Change utimes to set the file creation time (for filesystems that
support creation times such as UFS2) to the value of the
modification time if the value of the modification time is older
than the current creation time. See utimes(2) for further details.

Sponsored by:	DARPA & NAI Labs.
2002-07-17 02:03:19 +00:00
mckusick
a3ff90696f Change the name of st_createtime to st_birthtime. This change is
made to reduce confusion between st_ctime and st_createtime.

Submitted by:	Eric Allman <eric@sendmail.org>
Sponsored by:	DARPA & NAI Labs.
2002-07-16 22:36:00 +00:00
jhb
11a8bfb923 - Change chroot_refuse_vdir_fds() to require that the passed in struct
filedesc is already locked rather than having chroot() unlock the
  filedesc so chroot_refuse_vdir_fds() can immediately relock it.
- Reorder chroot() a bitso that we do the namei lookup before checking
  the process's struct filedesc.  This closes at least one potential race
  and allows us to only acquire the filedsec lock once in chroot().
- Push down Giant slightly into chroot().
2002-07-13 04:07:12 +00:00
mux
eb5a0f4a7e Move every code related to mount(2) in a new file, vfs_mount.c.
The file vfs_conf.c which was dealing with root mounting has
been repo-copied into vfs_mount.c to preserve history.
This makes nmount related development easier, and help reducing
the size of vfs_syscalls.c, which is still an enormous file.

Reviewed by:	rwatson
Repo-copy by:	peter
2002-07-02 17:09:22 +00:00
iedowse
4416f82706 Use indirect function pointer hooks instead of #ifdef SOFTUPDATES
direct calls for the two places where the kernel calls into soft
updates code. Set up the hooks in softdep_initialize() and NULL
them out in softdep_uninitialize(). This change allows soft updates
to function correctly when ufs is loaded as a module.

Reviewed by:	mckusick
2002-07-01 17:59:40 +00:00
alfred
ae44f24f87 Remove unneeded casts to caddr_t. 2002-06-28 23:02:38 +00:00
iedowse
792737cf4d In vn_mkdir(), use vrele() instead of vput() on the parent directory
vnode in the case that the target exists and is the same vnode as
the parent (i.e. "mkdir ."). The namei() call does not leave the
vnode locked in this case even though you might expect it to.

This bug was mostly harmless in practice because unlocking an already
unlocked vnode currently does not trigger any panics or warnings.

Reviewed by:	jeff
2002-06-28 20:06:47 +00:00
mckusick
7d70f5926f Use proper size in bzero of stat structure.
Submitted by:	Jake Burkholder <jake@locore.ca>
Sponsored by:	DARPA & NAI Labs.
2002-06-24 07:14:44 +00:00
mckusick
f09d0a42d9 This patch fixes a size problem with the stat structure for
64-bit architectures that was introduced in the UFS2 code
merge two days ago. The stat structure change that caused
the problem was the addition of the file create time.

Submitted by:	Bruce Evans <bde@zeta.org.au>
Sponsored by:	DARPA & NAI Labs.
2002-06-22 22:01:13 +00:00
mux
24aca74f2d o Remove the initialization of unused fields in the struct
uio now that we don't use uiomove() anymore.
o Enforce stricter checks on the length of the iov's in
  nmount(2) since we now malloc() them individually and
  corrupted iov's could make the kernel crash in malloc()
  with "kmem_map too small".

Reviewed by:	phk
2002-06-22 18:07:05 +00:00
mckusick
88d85c15ef This commit adds basic support for the UFS2 filesystem. The UFS2
filesystem expands the inode to 256 bytes to make space for 64-bit
block pointers. It also adds a file-creation time field, an ability
to use jumbo blocks per inode to allow extent like pointer density,
and space for extended attributes (up to twice the filesystem block
size worth of attributes, e.g., on a 16K filesystem, there is space
for 32K of attributes). UFS2 fully supports and runs existing UFS1
filesystems. New filesystems built using newfs can be built in either
UFS1 or UFS2 format using the -O option. In this commit UFS1 is
the default format, so if you want to build UFS2 format filesystems,
you must specify -O 2. This default will be changed to UFS2 when
UFS2 proves itself to be stable. In this commit the boot code for
reading UFS2 filesystems is not compiled (see /sys/boot/common/ufsread.c)
as there is insufficient space in the boot block. Once the size of the
boot block is increased, this code can be defined.

Things to note: the definition of SBSIZE has changed to SBLOCKSIZE.
The header file <ufs/ufs/dinode.h> must be included before
<ufs/ffs/fs.h> so as to get the definitions of ufs2_daddr_t and
ufs_lbn_t.

Still TODO:
Verify that the first level bootstraps work for all the architectures.
Convert the utility ffsinfo to understand UFS2 and test growfs.
Add support for the extended attribute storage. Update soft updates
to ensure integrity of extended attribute storage. Switch the
current extended attribute interfaces to use the extended attribute
storage. Add the extent like functionality (framework is there,
but is currently never used).

Sponsored by: DARPA & NAI Labs.
Reviewed by:	Poul-Henning Kamp <phk@freebsd.org>
2002-06-21 06:18:05 +00:00
mux
3770ca4156 Change the way we internally store the mount options to
a linked list.  This is to allow the merging of the mount
options in the MNT_UPDATE case, as the current data structure
is unsuitable for this.

There are no functional differences in this commit.

Reviewed by:	phk
2002-06-20 20:03:42 +00:00
mux
8ba975c439 Remove a duplicated vfs_freeopts() that I introduced in last
revision.
2002-05-28 13:27:55 +00:00
mux
334d1908ec Style nit, no functional changes. 2002-05-23 23:22:22 +00:00
mux
67080508a8 Slightly change the way we pass mount options to the filesystem
VFS_NMOUNT operations.

Reviewed by:	phk
2002-05-23 23:02:19 +00:00
mux
85aa3f836d Change two vput() that should have been vrele().
Submitted by:	iedowse
2002-05-20 14:59:43 +00:00
trhodes
28d42899b7 More s/file system/filesystem/g 2002-05-16 21:28:32 +00:00
jeff
ba85b0e087 Disable the shared locking namei() code for now. It breaks several stacking
filesystems.  This is on hold until the rest of VFS Locking is reviewed and
deemed safe.  It can be enabled with 'options LOOKUP_SHARED'.
2002-05-14 21:59:49 +00:00
mux
b2f5ccfa53 Add the lchflags(2) syscall.
Reviewed by:	rwatson
2002-05-05 23:47:41 +00:00
jeff
4323e678da Move a KASSERT() in open() prior to unlocking the vnode. It's not safe to
call VOP_GETVOBJECT without a lock.
2002-05-05 23:17:13 +00:00
mux
7856f6d21c Fix a typo.
Submitted by:	dwmalone
2002-05-04 19:50:09 +00:00
rwatson
780f32f693 Slightly restructure extattr_get_vp() so that there's only one entry point
to VOP_GETEXTATTR().  This simplifies code flow when inserting MAC hooks.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-04-23 01:27:38 +00:00
rwatson
30744d9c56 Improve style consistency of vfs_syscalls.c by converting the style used
in various extattr_*() calls to match the rest of the file.  Originally,
these bits at the end looked more like style(9).  This patch was submitted
by green by way of the TrustedBSD MAC tree, and I fixed a few problems
with it on the way through.  Someone with more time on their hands should
convert the entire file to style(9); this commit is for diff reduction
purposes.

Submitted by:	green
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-04-20 01:37:08 +00:00
iedowse
64322dabea The recent NFS forced unmount improvements introduced a side-effect
where some client operations might be unexpectedly cancelled during
an unsuccessful non-forced unmount attempt. This causes problems
for amd(8), because it periodically attempts a non-forced unmount
to check if the filesystem is still in use.

Fix this by adding a new mountpoint flag MNTK_UNMOUNTF that is set
only during the operation of a forced unmount. Use this instead of
MNTK_UNMOUNT to trigger the cancellation of hung NFS operations.

Also correct a problem where dounmount() might inadvertently clear
the MNTK_UNMOUNT flag.

Reported by:	simokawa
MFC after:	1 week
2002-04-17 01:07:29 +00:00
jeff
0b5e15cef7 Turn #ifdef LOOKUP_SHARED into #ifndef LOOKUP_EXCLUSIVE to enable this
behavior by default.  Also, change the options line to reflect this.

If there are no problems reported this will become the only behavior and the
knob will be removed in a month or so.

Demanded by:	obrien
2002-04-09 05:14:17 +00:00
mux
bf7d877bcd The fourth parameter to copystr() is a size_t, not an int.
Approved by:	peter
2002-04-08 21:14:19 +00:00
mux
00b6b34450 o Change kernel_vmount() interface to be more convenient : pass two
separate strings instead of passing "foo=bar".
o Don't forget to clear the VMOUNT flag on the vnode when vfs_nmount()
  fails because the fs doesn't implement VFS_NMOUNT (and in vfs_mount()
  when the fs doesn't implement VFS_MOUNT) ; also decrement the vfs
  refcount in the !MNT_UPDATE case.
2002-04-07 13:22:47 +00:00
mux
9effffd331 Add two forgotten vfs_unbusy() calls, in vfs_mount() and vfs_nmount().
Reviewed by:	phk
2002-04-03 12:19:03 +00:00
jhb
dc2e474f79 Change the suser() API to take advantage of td_ucred as well as do a
general cleanup of the API.  The entire API now consists of two functions
similar to the pre-KSE API.  The suser() function takes a thread pointer
as its only argument.  The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0.  The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.

Discussed on:	smp@
2002-04-01 21:31:13 +00:00
mux
ac4c018837 - Properly sync vfs_nmount() with changes that have be already done
in vfs_mount(), in particular revisions 1.215, 1.227 and 1.240.
- flag2 is a low quality variable name, change it to kern_flag.
- strncpy NUL-terminates f_fstypename and f_mntonname since the strings
  have length <= <buffer length> - 1, so the explicit NUL-termination is
  bogus.
- M_ZERO'ing space for fstype and fspath is stupid since we never use the
  space beyond the end of the string.
- Do various style(9) cleanups in both functions.

Submitted by:	bde
Reviewed by:	phk
2002-03-28 13:47:32 +00:00
arr
da9c75ac68 - Fixup a few style nits:
- return error -> return (error);
  - move a declaration to the top of the function.
  - become bug for bug compatible with if (error) lines.

Submitted by: bde
2002-03-26 18:07:10 +00:00
mux
124c6d3a26 As discussed in -arch, add the new nmount(2) system call and the
new vfs_getopt()/vfs_copyopt() API.  This is intended to be used
later, when there will be filesystems implementing the VFS_NMOUNT
operation.  The mount(2) system call will disappear when all
filesystems will be converted to the new API.  Documentation will
be committed in a while.

Reviewed by:	phk
2002-03-26 15:33:44 +00:00
arr
db4f882c76 - Recommit the securelevel_gt() calls removed by commits rev. 1.84 of
kern_linker.c and rev. 1.237 of vfs_syscalls.c since these are not the
  source of the recent panics occuring around kldloading file system
  support modules.

Requested by: rwatson
2002-03-25 18:26:34 +00:00
arr
fc49faf982 - Back out the commit to make the linker_load_file() securelevel check
made aware in jail environments.  Supposedly something is broken, so
  this should be backed out until further investigation proves otherwise,
  or a proper fix can be provided.
2002-03-22 04:56:09 +00:00
arr
68e226a99e - Fix a logic error in checking the securelevel that was introduced in the
previous commit.

Pointy hats to: arr, rwatson
2002-03-21 15:27:39 +00:00
arr
fc9167c193 - Change a check of securelevel to securelevel_gt() call in order to help
against users within a jail attempting to load kernel modules.
- Add a check of securelevel_gt() to vfs_mount() in order to chop some
  low hanging fruit for the repair of securelevel checking of linking and
  unlinking files from within jails.  There is more to be done here.

Reviewed by: rwatson
2002-03-20 16:03:42 +00:00
jeff
318cbeeecf Remove references to vm_zone.h and switch over to the new uma API.
Also, remove maxsockets.  If you look carefully you'll notice that the old
zone allocator never honored this anyway.
2002-03-20 04:09:59 +00:00
alfred
357e37e023 Remove __P. 2002-03-19 21:25:46 +00:00
alfred
21fc25cfdf Close a race when vfs_syscalls.c:checkdirs() runs.
To do this protect the filedesc pointer in the proc with PROC_LOCK
in both checkdirs() and kern_descrip.c:fdfree().
2002-03-19 04:30:04 +00:00
jeff
e6d26e8880 This patch adds the "LOCKSHARED" option to namei which causes it to only acquire shared locks on leafs.
The stat() and open() calls have been changed to make use of this new functionality.  Using shared locks in
these cases is sufficient and can significantly reduce their latency if IO is pending to these vnodes.  Also,
this reduces the number of exclusive locks that are floating around in the system, which helps reduce the
number of deadlocks that occur.

A new kernel option "LOOKUP_SHARED" has been added.  It defaults to off so this patch can be turned on for
testing, and should eventually go away once it is proven to be stable.  I have personally been running this
patch for over a year now, so it is believed to be fully stable.

Reviewed by:	jake, obrien
Approved by:	jake
2002-03-12 04:00:11 +00:00
rwatson
8d5b7b21f3 Three p_ucred -> td_ucred's missed in jhb's earlier pass; all appear to
be safe.
2002-03-05 19:45:45 +00:00
rwatson
191712b565 The change from td->td_proc->p_ucred to td->td_ucred has shortened some
lines: more agressively line wrap under those circumstances.
2002-03-05 19:31:25 +00:00
jhb
56094fdb72 - Change namei() to use td_ucred instead of p_ucred.
- Change the hack in access() that uses a temporary credential to set
  td_ucred to the temp cred instead of p_ucred.
2002-02-27 19:15:29 +00:00
jhb
3706cd3509 Simple p_ucred -> td_ucred changes to start using the per-thread ucred
reference.
2002-02-27 18:32:23 +00:00
rwatson
2bbae54d18 Make sure to hold vnode lock when calling into VOP_GETATTR().
Discussed with:	mckusick, phk
2002-02-10 21:44:30 +00:00
rwatson
aa54f85939 Make sure to grab vnode lock on a vnode before calling VOP_GETATTR()
to perform an ownership test in revoke().  This is also required for
MAC hooks so that the vnode lock is held during a call to the MAC
framework.  Release the lock before calling VOP_REVOKE().

Discussed with:	phk, mckusick
2002-02-10 20:45:43 +00:00
rwatson
eef638ac93 Remove a stray 'const' that slept into extattr_set_vp(), and could
result in compiler warnings.
2002-02-10 05:31:55 +00:00
rwatson
6ca91a055c Part I: Update extended attribute API and ABI:
o Modify the system call syntax for extattr_{get,set}_{fd,file}() so
  as not to use the scatter gather API (which appeared not to be used
  by any consumers, and be less portable), rather, accepts 'data'
  and 'nbytes' in the style of other simple read/write interfaces.
  This changes the API and ABI.

o Modify system call semantics so that extattr_get_{fd,file}() return
  a size_t.  When performing a read, the number of bytes read will
  be returned, unless the data pointer is NULL, in which case the
  number of bytes of data are returned.  This changes the API only.

o Modify the VOP_GETEXTATTR() vnode operation to accept a *size_t
  argument so as to return the size, if desirable.  If set to NULL,
  the size will not be returned.

o Update various filesystems (pseodofs, ufs) to DTRT.

These changes should make extended attributes more useful and more
portable.  More commits to rebuild the system call files, as well
as update userland utilities to follow.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-02-10 04:43:22 +00:00
rwatson
d45677811a o Merge various recent fixes from the MAC branch relating to extattrctl():
- Fix null-pointer dereference introduced when snapshotting
	  was introduced.  This occured because unlike the previous code,
	  vn_start_write() doesn't always return a non-NULL mp, as
	  filesystems may not support the VOP_GETWRITEMOUNT() call.  For
	  now, rely on two pointers, so that vn_finished_write() works
	  properly.
	- Fix locking problems on exit, introduced at some past time,
	  some when snapshots came in, where a vnode might not be
	  unlocked before being vrele'd in various error situations.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-02-08 05:58:41 +00:00
julian
b5eb64d6f0 Pre-KSE/M3 commit.
this is a low-functionality change that changes the kernel to access the main
thread of a process via the linked list of threads rather than
assuming that it is embedded in the process. It IS still embeded there
but remove all teh code that assumes that in preparation for the next commit
which will actually move it out.

Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,
2002-02-07 20:58:47 +00:00
alfred
bfbf894c82 Don't recurse on filedesc lock in chroot_refuse_vdir_fds().
Noticed by: Michael Nottebrock <michaelnottebrock@gmx.net>
2002-02-01 18:27:16 +00:00
alfred
1f82bc18d1 Replace ffind_* with fget calls.
Make fget MPsafe.

Make fgetvp and fgetsock use the fget subsystem to reduce code bloat.

Push giant down in fpathconf().
2002-01-14 00:13:45 +00:00
alfred
844237b396 SMP Lock struct file, filedesc and the global file list.
Seigo Tanimura (tanimura) posted the initial delta.

I've polished it quite a bit reducing the need for locking and
adapting it for KSE.

Locks:

1 mutex in each filedesc
   protects all the fields.
   protects "struct file" initialization, while a struct file
     is being changed from &badfileops -> &pipeops or something
     the filedesc should be locked.

1 mutex in each struct file
   protects the refcount fields.
   doesn't protect anything else.
   the flags used for garbage collection have been moved to
     f_gcflag which was the FILLER short, this doesn't need
     locking because the garbage collection is a single threaded
     container.
  could likely be made to use a pool mutex.

1 sx lock for the global filelist.

struct file *	fhold(struct file *fp);
        /* increments reference count on a file */

struct file *	fhold_locked(struct file *fp);
        /* like fhold but expects file to locked */

struct file *	ffind_hold(struct thread *, int fd);
        /* finds the struct file in thread, adds one reference and
                returns it unlocked */

struct file *	ffind_lock(struct thread *, int fd);
        /* ffind_hold, but returns file locked */

I still have to smp-safe the fget cruft, I'll get to that asap.
2002-01-13 11:58:06 +00:00
iedowse
83b07d10e7 Change dounmount() to return EBUSY in the non-MNT_FORCE case if we
can't acquire the mnt_lock without blocking. Normally non-forced
unmount attempts return EBUSY quickly if any vnodes are active, so
this just extends that behaviour to cover the per-mount mnt_lock
too.
2002-01-10 01:59:30 +00:00
se
4f45ba2c53 Return EBADF in case some vnode field has been reset to a NULL pointer.
(There has been some discussion, whether ENOENT or EBADF is more
appropriate. I choose the latter, since the operation is not supported
on the file descriptor at that time, even if it was, immediately before.)

PR:		32681
Reviewed by:	dillon, iedowse, ...
Approved by:	nectar
MFC after:	3 days
		(pending RE approval)
2002-01-03 09:54:24 +00:00
phk
235f3ed483 Define a new mount flag "MNT_JAILDEVFS"
Collect the magic combination of flags which can be updated into
a macro in sys/mount.h rather than inlining them (twice!) in
vfs_syscalls.c
2001-11-05 10:33:45 +00:00
dillon
c9a56085ce Add mnt_reservedvnlist so we can MFC to 4.x, in order to make all mount
structure changes now rather then piecemeal later on.  mnt_nvnodelist
currently holds all the vnodes under the mount point.  This will eventually
be split into a 'dirty' and 'clean' list.  This way we only break kld's once
rather then twice.  nvnodelist will eventually turn into the dirty list
and should remain compatible with the klds.
2001-11-04 18:55:42 +00:00
rwatson
339957cbbd o Remove the local temporary variable "struct proc *p" from vfs_mount()
in vfs_syscalls.c.  Although it did save some indirection, many of
  those savings will be obscured with the impending commit of suser()
  changes, and the result is increased code complexity.  Also, once
  p->p_ucred and td->td_ucred are distinguished, this will make
  vfs_mount() use the correct thread credential, rather than the
  process credential.
2001-11-02 21:11:41 +00:00
phk
9682ea2877 Argh!
patch added the nmount at the bottom first time around.

Take 3!
2001-11-02 19:12:06 +00:00
phk
ea1c496f9a Add empty shell for nmount syscall (take 2!) 2001-11-02 18:35:54 +00:00
phk
a7cf9dee8f Add nmount() stub function and regenerate the syscall-glue which should
not need to check in generated files.
2001-11-02 17:59:23 +00:00
dillon
f421c786e5 unwind v_writecount in fhopen() if we are unable to allocate the
descriptor.

MFC after:	3 days
2001-10-24 18:32:17 +00:00
dillon
45a6fabe87 Change the vnode list under the mount point from a LIST to a TAILQ
in preparation for an implementation of limiting code for kern.maxvnodes.

MFC after:	3 days
2001-10-23 01:21:29 +00:00
rwatson
75a3f18a2a o Complete the migration from suser error checking in the following form
in vfs_syscalls.c:

    if (mp->mnt_stat.f_owner != p->p_ucred->cr_uid &&
        (error = suser_td(td)) != 0) {
            unwrap_lots_of_stuff();
            return (error);
    }

  to:

    if (mp->mnt_stat.f_owner != p->p_ucred->cr_uid) {
            error = suser_td(td);
            if (error) {
                unwrap_lots_of_stuff();
                return (error);
            }
    }

  This makes the code more readable when complex clauses are in use,
  and minimizes conflicts for large outstanding patchsets modifying the
  kernel authorization code (of which I have several), especially where
  existing authorization and context code are combined in the same if()
  conditional.

Obtained from:	TrustedBSD Project
2001-10-01 20:01:07 +00:00
rwatson
f12cf1d430 o vpaccess() -> vn_access() -- Peter reminds me that there is already
a convention for vnop helper routines of this sort.

Submitted by:	Mr Wemm <peter>
2001-09-22 03:07:41 +00:00
rwatson
e925d3281c o Introduce eaccess(2), a version of access(2) that uses the effective
credentials rather than the real credentials.  This is useful for
  implementing GUI's which need to modify icons based on access rights,
  but where use of open(2) is too expensive, use of stat(2) doesn't
  reflect the file system's real protection model, and use of
  access() suffers from real/effective credential confusion.  This
  implementation provides the same semantics as the call of the same
  name on SCO OpenServer.  Note: using this call improperly can
  leave you subject to some of the same races present in the
  access(2) call.
o To implement this, break out the basic logic of access(2) into
  vpaccess(), which accepts a passed credential to perform the
  invocation of VOP_ACCESS().  Add eaccess(2) to invoke vpaccess(),
  and modify access(2) to use vpaccess().

Obtained from:	TrustedBSD Project
2001-09-21 21:33:22 +00:00
julian
5596676e6c KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after:    ha ha ha ha
2001-09-12 08:38:13 +00:00
ache
2718e60089 lseek: simplify overflow checks 2001-08-29 18:35:53 +00:00
ache
c71ba5eea8 Cosmetique & style fixes from bde 2001-08-26 10:23:49 +00:00
ache
1dabef8845 lseek: fix check for vattr.va_size overflow. Check suggested by bde simple not
works with unsigned types.
2001-08-23 17:01:25 +00:00
ache
29e4dde471 Cosmetique: more <sys/*> into one group, separate include families by
blank line
2001-08-23 13:51:17 +00:00
ache
5b25e70c26 Make lseek() POSIXed: for non character special files
1) handle off_t overflow with EOVERFLOW
2) handle negative offsets with EINVAL

Reviewed by:	arch discussion
2001-08-21 21:20:42 +00:00
iedowse
22613ae234 Avoid sleeping while holding a mutex in dounmount(). This problem
has existed for a long time, but I made it worse a few months ago
by by adding calls to VFS_ROOT() and checkdirs() in revision 1.179.

Also, remove the LK_REENABLE flag in the lockmgr() call; this flag
has been ignored by the lockmgr code for 4 years. This was the only
remaining mention of it apart from its definition.

Reviewed by:	jhb
2001-08-20 19:16:31 +00:00
iedowse
b3168db212 Arbitrarily limit to 64k the number of bytes that can be read at
a time using the ogetdirentries() compatibility syscall. This is a
hack to ensure that rediculous values don't get passed to MALLOC().

Reviewed by:	kris
2001-08-10 22:14:18 +00:00
des
1b82a02868 Constify the fstype argument to vfs_mount(). This eliminates at least one
"call discards qualifier" warning (in sys/compat/linux/linux_file.c).
2001-07-09 19:11:51 +00:00
dillon
e028603b7e With Alfred's permission, remove vm_mtx in favor of a fine-grained approach
(this commit is just the first stage).  Also add various GIANT_ macros to
formalize the removal of Giant, making it easy to test in a more piecemeal
fashion. These macros will allow us to test fine-grained locks to a degree
before removing Giant, and also after, and to remove Giant in a piecemeal
fashion via sysctl's on those subsystems which the authors believe can
operate without Giant.
2001-07-04 16:20:28 +00:00
tmm
0e4c21f7c1 Fix an instance of NDINIT in the extattrctl syscall: LOCKLEAF was or'ed
to the operation parameter, not to the flags as it should be.

Reviewed by:	rwatson
2001-06-06 23:34:38 +00:00
rwatson
f504530d9f o Merge contents of struct pcred into struct ucred. Specifically, add the
real uid, saved uid, real gid, and saved gid to ucred, as well as the
  pcred->pc_uidinfo, which was associated with the real uid, only rename
  it to cr_ruidinfo so as not to conflict with cr_uidinfo, which
  corresponds to the effective uid.
o Remove p_cred from struct proc; add p_ucred to struct proc, replacing
  original macro that pointed.
  p->p_ucred to p->p_cred->pc_ucred.
o Universally update code so that it makes use of ucred instead of pcred,
  p->p_ucred instead of p->p_pcred, cr_ruidinfo instead of p_uidinfo,
  cr_{r,sv}{u,g}id instead of p_*, etc.
o Remove pcred0 and its initialization from init_main.c; initialize
  cr_ruidinfo there.
o Restruction many credential modification chunks to always crdup while
  we figure out locking and optimizations; generally speaking, this
  means moving to a structure like this:
        newcred = crdup(oldcred);
        ...
        p->p_ucred = newcred;
        crfree(oldcred);
  It's not race-free, but better than nothing.  There are also races
  in sys_process.c, all inter-process authorization, fork, exec, and
  exit.
o Remove sigio->sio_ruid since sigio->sio_ucred now contains the ruid;
  remove comments indicating that the old arrangement was a problem.
o Restructure exec1() a little to use newcred/oldcred arrangement, and
  use improved uid management primitives.
o Clean up exit1() so as to do less work in credential cleanup due to
  pcred removal.
o Clean up fork1() so as to do less work in credential cleanup and
  allocation.
o Clean up ktrcanset() to take into account changes, and move to using
  suser_xxx() instead of performing a direct uid==0 comparision.
o Improve commenting in various kern_prot.c credential modification
  calls to better document current behavior.  In a couple of places,
  current behavior is a little questionable and we need to check
  POSIX.1 to make sure it's "right".  More commenting work still
  remains to be done.
o Update credential management calls, such as crfree(), to take into
  account new ruidinfo reference.
o Modify or add the following uid and gid helper routines:
      change_euid()
      change_egid()
      change_ruid()
      change_rgid()
      change_svuid()
      change_svgid()
  In each case, the call now acts on a credential not a process, and as
  such no longer requires more complicated process locking/etc.  They
  now assume the caller will do any necessary allocation of an
  exclusive credential reference.  Each is commented to document its
  reference requirements.
o CANSIGIO() is simplified to require only credentials, not processes
  and pcreds.
o Remove lots of (p_pcred==NULL) checks.
o Add an XXX to authorization code in nfs_lock.c, since it's
  questionable, and needs to be considered carefully.
o Simplify posix4 authorization code to require only credentials, not
  processes and pcreds.  Note that this authorization, as well as
  CANSIGIO(), needs to be updated to use the p_cansignal() and
  p_cansched() centralized authorization routines, as they currently
  do not take into account some desirable restrictions that are handled
  by the centralized routines, as well as being inconsistent with other
  similar authorization instances.
o Update libkvm to take these changes into account.

Obtained from:	TrustedBSD Project
Reviewed by:	green, bde, jhb, freebsd-arch, freebsd-audit
2001-05-25 16:59:11 +00:00
jhb
2441ff245e Don't release Giant around vm_oject_page_clean() in fsync() as the pager
putpages called will need Giant.
2001-05-23 22:55:13 +00:00
ru
35437d86aa - FDESC, FIFO, NULL, PORTAL, PROC, UMAP and UNION file
systems were repo-copied from sys/miscfs to sys/fs.

- Renamed the following file systems and their modules:
  fdesc -> fdescfs, portal -> portalfs, union -> unionfs.

- Renamed corresponding kernel options:
  FDESC -> FDESCFS, PORTAL -> PORTALFS, UNION -> UNIONFS.

- Install header files for the above file systems.

- Removed bogus -I${.CURDIR}/../../sys CFLAGS from userland
  Makefiles.
2001-05-23 09:42:29 +00:00
alfred
a3f0842419 Introduce a global lock for the vm subsystem (vm_mtx).
vm_mtx does not recurse and is required for most low level
vm operations.

faults can not be taken without holding Giant.

Memory subsystems can now call the base page allocators safely.

Almost all atomic ops were removed as they are covered under the
vm mutex.

Alpha and ia64 now need to catch up to i386's trap handlers.

FFS and NFS have been tested, other filesystems will need minor
changes (grabbing the vm lock when twiddling page properties).

Reviewed (partially) by: jake, jhb
2001-05-19 01:28:09 +00:00
grog
4b9d9cbaac Revert consequences of changes to mount.h, part 2.
Requested by:	bde
2001-04-29 02:45:39 +00:00
grog
1f5de30718 Correct #includes to work with fixed sys/mount.h. 2001-04-23 09:05:15 +00:00
rwatson
cb4f33ea31 o Introduce extattr_{delete,get,set}_fd() to allow extended attribute
operations on file descriptors, which complement the existing set of
  calls, extattr_{delete,get,set}_file() which act on paths.  In doing
  so, restructure the system call implementation such that the two sets
  of functions share most of the relevant code, rather than duplicating
  it.  This pushes the vnode locking into the shared code, but keeps
  the copying in of some arguments in the system call code.  Allowing
  access via file descriptors reduces the opportunity for race
  conditions when managing extended attributes.

Obtained from:	TrustedBSD Project
2001-03-31 16:20:05 +00:00
jhb
79cf991a6b Convert the allproc and proctree locks from lockmgr locks to sx locks. 2001-03-28 11:52:56 +00:00
bde
a18da66f92 Fixed breakage of access() in rev.1.164. Wrong credentials were used for
the final path component.
2001-03-20 09:38:05 +00:00
rwatson
0012887962 o Rename "namespace" argument to "attrnamespace" as namespace is a C++
reserved word.

Submitted by:	jkh
Obtained from:	TrustedBSD Project
2001-03-19 05:44:15 +00:00
rwatson
f773ff5a87 o Change the API and ABI of the Extended Attribute kernel interfaces to
introduce a new argument, "namespace", rather than relying on a first-
  character namespace indicator.  This is in line with more recent
  thinking on EA interfaces on various mailing lists, including the
  posix1e, Linux acl-devel, and trustedbsd-discuss forums.  Two namespaces
  are defined by default, EXTATTR_NAMESPACE_SYSTEM and
  EXTATTR_NAMESPACE_USER, where the primary distinction lies in the
  access control model: user EAs are accessible based on the normal
  MAC and DAC file/directory protections, and system attributes are
  limited to kernel-originated or appropriately privileged userland
  requests.

o These API changes occur at several levels: the namespace argument is
  introduced in the extattr_{get,set}_file() system call interfaces,
  at the vnode operation level in the vop_{get,set}extattr() interfaces,
  and in the UFS extended attribute implementation.  Changes are also
  introduced in the VFS extattrctl() interface (system call, VFS,
  and UFS implementation), where the arguments are modified to include
  a namespace field, as well as modified to advoid direct access to
  userspace variables from below the VFS layer (in the style of recent
  changes to mount by adrian@FreeBSD.org).  This required some cleanup
  and bug fixing regarding VFS locks and the VFS interface, as a vnode
  pointer may now be optionally submitted to the VFS_EXTATTRCTL()
  call.  Updated documentation for the VFS interface will be committed
  shortly.

o In the near future, the auto-starting feature will be updated to
  search two sub-directories to the ".attribute" directory in appropriate
  file systems: "user" and "system" to locate attributes intended for
  those namespaces, as the single filename is no longer sufficient
  to indicate what namespace the attribute is intended for.  Until this
  is committed, all attributes auto-started by UFS will be placed in
  the EXTATTR_NAMESPACE_SYSTEM namespace.

o The default POSIX.1e attribute names for ACLs and Capabilities have
  been updated to no longer include the '$' in their filename.  As such,
  if you're using these features, you'll need to rename the attribute
  backing files to the same names without '$' symbols in front.

o Note that these changes will require changes in userland, which will
  be committed shortly.  These include modifications to the extended
  attribute utilities, as well as to libutil for new namespace
  string conversion routines.  Once the matching userland changes are
  committed, a buildworld is recommended to update all the necessary
  include files and verify that the kernel and userland environments
  are in sync.  Note: If you do not use extended attributes (most people
  won't), upgrading is not imperative although since the system call
  API has changed, the new userland extended attribute code will no longer
  compile with old include files.

o Couple of minor cleanups while I'm there: make more code compilation
  conditional on FFS_EXTATTR, which should recover a bit of space on
  kernels running without EA's, as well as update copyright dates.

Obtained from:	TrustedBSD Project
2001-03-15 02:54:29 +00:00
jhb
c7d1d65499 Check to see if p_fd is NULL before derferencing it in checkdirs(). It's
possible for us to see a process in the early stages of fork before p_fd
has been initialized.  Ideally, we wouldn't stick a process on the allproc
list until it was fully created however.
2001-03-07 02:25:13 +00:00
adrian
bf6a51f986 Mismatched MFSNAMELEN and MNAMELEN with fstype / fspath.
Submitted by:	Naoki Kobayashi <shibata@geo.titech.ac.jp>
2001-03-02 14:05:49 +00:00
adrian
4018955334 Reviewed by: jlemon
An initial tidyup of the mount() syscall and VFS mount code.

This code replaces the earlier work done by jlemon in an attempt to
make linux_mount() work.

* the guts of the mount work has been moved into vfs_mount().

* move `type', `path' and `flags' from being userland variables into being
  kernel variables in vfs_mount(). `data' remains a pointer into
  userspace.

* Attempt to verify the `type' and `path' strings passed to vfs_mount()
  aren't too long.

* rework mount() and linux_mount() to take the userland parameters
  (besides data, as mentioned) and pass kernel variables to vfs_mount().
  (linux_mount() already did this, I've just tidied it up a little more.)

* remove the copyin*() stuff for `path'. `data' still requires copyin*()
  since its a pointer into userland.

* set `mount->mnt_statf_mntonname' in vfs_mount() rather than in each
  filesystem.  This variable is generally initialised with `path', and
  each filesystem can override it if they want to.

* NOTE: f_mntonname is intiailised with "/" in the case of a root mount.
2001-03-01 21:00:17 +00:00
iedowse
63324ae546 The kernel did not hold a vnode reference associated with the
`rootvnode' pointer, but vfs_syscalls.c's checkdirs() assumed that
it did. This bug reliably caused a panic at reboot time if any
filesystem had been mounted directly over /.

The checkdirs() function is called at mount time to find any process
fd_cdir or fd_rdir pointers referencing the covered mountpoint
vnode. It transfers these to point at the root of the new filesystem.
However, this process was not reversed at unmount time, so processes
with a cwd/root at a mount point would unexpectedly lose their
cwd/root following a mount-unmount cycle at that mountpoint.

This change should fix both of the above issues. Start_init() now
holds an extra vnode reference corresponding to `rootvnode', and
dounmount() releases this reference when the root filesystem is
unmounted just before reboot. Dounmount() now undoes the actions
taken by checkdirs() at mount time; any process cdir/rdir pointers
that reference the root vnode of the unmounted filesystem are
transferred to the now-uncovered vnode.

Reviewed by:	bde, phk
2001-02-28 20:54:28 +00:00
rwatson
ab5676fc87 o Move per-process jail pointer (p->pr_prison) to inside of the subject
credential structure, ucred (cr->cr_prison).
o Allow jail inheritence to be a function of credential inheritence.
o Abstract prison structure reference counting behind pr_hold() and
  pr_free(), invoked by the similarly named credential reference
  management functions, removing this code from per-ABI fork/exit code.
o Modify various jail() functions to use struct ucred arguments instead
  of struct proc arguments.
o Introduce jailed() function to determine if a credential is jailed,
  rather than directly checking pointers all over the place.
o Convert PRISON_CHECK() macro to prison_check() function.
o Move jail() function prototypes to jail.h.
o Emulate the P_JAILED flag in fill_kinfo_proc() and no longer set the
  flag in the process flags field itself.
o Eliminate that "const" qualifier from suser/p_can/etc to reflect
  mutex use.

Notes:

o Some further cleanup of the linux/jail code is still required.
o It's now possible to consider resolving some of the process vs
  credential based permission checking confusion in the socket code.
o Mutex protection of struct prison is still not present, and is
  required to protect the reference count plus some fields in the
  structure.

Reviewed by:	freebsd-arch
Obtained from:	TrustedBSD Project
2001-02-21 06:39:57 +00:00
jlemon
c7ba1f9694 Introduce copyinfrom and copyinstrfrom, which can copy data from either
user or kernel space.  This will allow layering of os-compat (e.g.: linux)
system calls.  Apply the changes to mount.
2001-02-16 14:31:49 +00:00
bmilekic
f364d4ac36 Change and clean the mutex lock interface.
mtx_enter(lock, type) becomes:

mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)

similarily, for releasing a lock, we now have:

mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.

The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.

Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:

MTX_QUIET and MTX_NOSWITCH

The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:

mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.

Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.

Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.

Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.

Finally, caught up to the interface changes in all sys code.

Contributors: jake, jhb, jasone (in no particular order)
2001-02-09 06:11:45 +00:00
jake
a4ad237eaa - Change the allproc_lock to use a macro, ALLPROC_LOCK(how), instead
of explicit calls to lockmgr.  Also provides macros for the flags
  pased to specify shared, exclusive or release which map to the
  lockmgr flags.  This is so that the use of lockmgr can be easily
  replaced with optimized reader-writer locks.
- Add some locking that I missed the first time.
2000-12-13 00:17:05 +00:00
dwmalone
dd75d1d73b Convert more malloc+bzero to malloc+M_ZERO.
Submitted by:	josh@zipperup.org
Submitted by:	Robert Drehmel <robd@gmx.net>
2000-12-08 21:51:06 +00:00
jake
0c0be4e826 Protect the following with a lockmgr lock:
allproc
	zombproc
	pidhashtbl
	proc.p_list
	proc.p_hash
	nextpid

Reviewed by:	jhb
Obtained from:	BSD/OS and netbsd
2000-11-22 07:42:04 +00:00
dillon
15a44d16ca This patchset fixes a large number of file descriptor race conditions.
Pre-rfork code assumed inherent locking of a process's file descriptor
    array.  However, with the advent of rfork() the file descriptor table
    could be shared between processes.  This patch closes over a dozen
    serious race conditions related to one thread manipulating the table
    (e.g. closing or dup()ing a descriptor) while another is blocked in
    an open(), close(), fcntl(), read(), write(), etc...

PR: kern/11629
Discussed with: Alexander Viro <viro@math.psu.edu>
2000-11-18 21:01:04 +00:00
phk
4e063f5534 Take VBLK devices further out of their missery.
This should fix the panic I introduced in my previous commit on this topic.
2000-11-02 21:14:13 +00:00
jhb
d944886e4d Catch up to moving headers:
- machine/ipl.h -> sys/ipl.h
- machine/mutex.h -> sys/mutex.h
2000-10-20 07:58:15 +00:00
jasone
4e290e67b7 Convert lockmgr locks from using simple locks to using mutexes.
Add lockdestroy() and appropriate invocations, which corresponds to
lockinit() and must be called to clean up after a lockmgr lock is no
longer needed.
2000-10-04 01:29:17 +00:00
eivind
e6b9704a88 Add function comments for functions missing them 2000-09-14 19:13:59 +00:00
eivind
957c331f7f Blow away COMPAT_43 support for mount 2000-09-14 18:11:44 +00:00
bp
a7bc78c86d Add three new VOPs: VOP_CREATEVOBJECT, VOP_DESTROYVOBJECT and VOP_GETVOBJECT.
They will be used by nullfs and other stacked filesystems to support full
cache coherency.

Reviewed in general by:	mckusick, dillon
2000-09-12 09:49:08 +00:00
rwatson
15609dd6ef o Remove commented out code which modified return values from
extattr_{get,set} syscalls in the face of partial reads or writes.

Obtained from:	TrustedBSD Project
2000-09-05 02:13:14 +00:00
truckman
bc29ebd97d access() shouldn't diddle with the contents of a potentially shared
credential.  Create a temporary copy of the current credential and
modify the copy.

Submitted by:	tegge
2000-09-02 12:31:55 +00:00
tegge
31d32eb7b1 Don't set flags on the mount structure before all permission checks have
been done.

Don't allow multiple mount operations with MNT_UPDATE at the same
time on the same mount point.  When the first mount operation
completed, MNT_UPDATE was cleared in the mount structure, causing
the second to complete as if it was a no-update mount operation
with the following bad side effects:

        - mount structure inserted multiple times onto the mountlist
        - vp->v_mountedhere incorrectly set, causing next namei
          operation walking into the mountpoint to crash with
          a locking against myself panic.

Plug a vnode leak in case vinvalbuf fails.
2000-08-09 01:57:11 +00:00
rwatson
bf8742e138 o Modify extattr_{set,get}() syscalls so that partial reads and writes
with an error condition such as EINTR, EWOULDBLOCK, and ERESTART,
  are reported to the application, not silently conceal.  This
  behavior was copied from the {read,write}v() syscalls, and is
  appropriate there but not here.
o Correct a bug in extattr_delete() wherein the LOCKLEAF flag is
  passed to the wrong argument in namei(), resulting in some
  unexpected errors during name resolution, and passing in an unlocked
  vnode.

Obtained from:	TrustedBSD Project
2000-07-28 19:52:38 +00:00
rwatson
97118d11db o Lock vnode before calling extattr_* VOP's, and modify vnode spec to
allow for that.
o Remember to call NDFREE() if exiting as a result of a failed
  vn_start_write() when snapshotting.

Reviewed by:	mckusick
Obtained from:	TrustedBSD Project
2000-07-26 20:29:20 +00:00
mckusick
d439206673 Do not need vrele(nd.ni_vp) as that is done by NDFREE(&nd, 0);
Submitted by:	Peter Holm <pho@freebsd.org>
2000-07-25 05:38:54 +00:00
mckusick
a3d0c189ea Add snapshots to the fast filesystem. Most of the changes support
the gating of system calls that cause modifications to the underlying
filesystem. The gating can be enabled by any filesystem that needs
to consistently suspend operations by adding the vop_stdgetwritemount
to their set of vnops. Once gating is enabled, the function
vfs_write_suspend stops all new write operations to a filesystem,
allows any filesystem modifying system calls already in progress
to complete, then sync's the filesystem to disk and returns. The
function vfs_write_resume allows the suspended write operations to
begin again. Gating is not added by default for all filesystems as
for SMP systems it adds two extra locks to such critical kernel
paths as the write system call. Thus, gating should only be added
as needed.

Details on the use and current status of snapshots in FFS can be
found in /sys/ufs/ffs/README.snapshot so for brevity and timelyness
is not included here. Unless and until you create a snapshot file,
these changes should have no effect on your system (famous last words).
2000-07-11 22:07:57 +00:00
mckusick
040e64cd97 Move the truncation code out of vn_open and into the open system call
after the acquisition of any advisory locks. This fix corrects a case
in which a process tries to open a file with a non-blocking exclusive
lock. Even if it fails to get the lock it would still truncate the
file even though its open failed. With this change, the truncation
is done only after the lock is successfully acquired.

Obtained from:	 BSD/OS
2000-07-04 03:34:11 +00:00
phk
2a91a9dd04 Make the two calls from kern/* into softupdates #ifdef SOFTUPDATES,
that is way cleaner than using the softupdates_stub stunt, which
should be killed when convenient.

Discussed with:	mckusick
2000-07-03 13:26:54 +00:00
archie
0e6c8a1f1b Move the securelevel check before loading KLD's into linker_load_file(),
instead of requiring every caller of linker_load_file() to perform the
check itself. This avoids netgraph loading KLD's when securelevel > 0,
not to mention any future code that may call linker_load_file().

Reviewed by:	dfr
2000-06-29 17:57:04 +00:00
phk
cb90cb2b60 Revert part of my bioops change which implemented panic(8). 2000-06-16 14:32:13 +00:00
phk
4ec91666fa Virtualizes & untangles the bioops operations vector.
Ref: Message-ID: <18317.961014572@critter.freebsd.dk> To: current@
2000-06-16 08:48:51 +00:00
phk
36c3965ff9 Separate the struct bio related stuff out of <sys/buf.h> into
<sys/bio.h>.

<sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall
not be made a nested include according to bdes teachings on the
subject of nested includes.

Diskdrivers and similar stuff below specfs::strategy() should no
longer need to include <sys/buf.> unless they need caching of data.

Still a few bogus uses of struct buf to track down.

Repocopy by:    peter
2000-05-05 09:59:14 +00:00
dillon
689641c1ea Commit major SMP cleanups and move the BGL (big giant lock) in the
syscall path inward.  A system call may select whether it needs the MP
    lock or not (the default being that it does need it).

    A great deal of conditional SMP code for various deadended experiments
    has been removed.  'cil' and 'cml' have been removed entirely, and the
    locking around the cpl has been removed.  The conditional
    separately-locked fast-interrupt code has been removed, meaning that
    interrupts must hold the CPL now (but they pretty much had to anyway).
    Another reason for doing this is that the original separate-lock for
    interrupts just doesn't apply to the interrupt thread mechanism being
    contemplated.

    Modifications to the cpl may now ONLY occur while holding the MP
    lock.  For example, if an otherwise MP safe syscall needs to mess with
    the cpl, it must hold the MP lock for the duration and must (as usual)
    save/restore the cpl in a nested fashion.

    This is precursor work for the real meat coming later: avoiding having
    to hold the MP lock for common syscalls and I/O's and interrupt threads.
    It is expected that the spl mechanisms and new interrupt threading
    mechanisms will be able to run in tandem, allowing a slow piecemeal
    transition to occur.

    This patch should result in a moderate performance improvement due to
    the considerable amount of code that has been removed from the critical
    path, especially the simplification of the spl*() calls.  The real
    performance gains will come later.

Approved by: jkh
Reviewed by: current, bde (exception.s)
Some work taken from: luoqi's patch
2000-03-28 07:16:37 +00:00
mckusick
2f9951ffbd Add bwillwrite to all system calls that create things in the filesystem.
Benchmarks that create huge trees of empty files overwhelm the buffer cache.
2000-01-10 00:08:53 +00:00