Commit Graph

466 Commits

Author SHA1 Message Date
Jeff Roberson
94a9458501 - Change all vfs syscalls to use VFS_LOCK_GIANT(), and MPSAFE nds.
- Move Giant acquisition into the few vfs syscalls that weren't already
   directly acquiring it.

Sponsored By:	Isilon Systems, Inc.
2005-01-24 10:25:44 +00:00
Poul-Henning Kamp
e39db32ab0 Ditch vfs_object_create() and make the callers call VOP_CREATEVOBJECT()
directly.
2005-01-13 12:25:19 +00:00
Poul-Henning Kamp
8df6bac4c7 Remove the unused credential argument from VOP_FSYNC() and VFS_SYNC().
I'm not sure why a credential was added to these in the first place, it is
not used anywhere and it doesn't make much sense:

	The credentials for syncing a file (ability to write to the
	file) should be checked at the system call level.

	Credentials for syncing one or more filesystems ("none")
	should be checked at the system call level as well.

	If the filesystem implementation needs a particular credential
	to carry out the syncing it would logically have to the
	cached mount credential, or a credential cached along with
	any delayed write data.

Discussed with:	rwatson
2005-01-11 07:36:22 +00:00
Warner Losh
9454b2d864 /* -> /*- for copyright notices, minor format tweaks as necessary 2005-01-06 23:35:40 +00:00
Poul-Henning Kamp
9bb4281603 Eliminate pointless goto. 2004-11-16 08:22:06 +00:00
Poul-Henning Kamp
48ab5b2d21 Forgot to remove now unused variable in last commit. 2004-11-15 21:28:00 +00:00
Poul-Henning Kamp
136211e58e It is not necessary to hold vn_start_write/vn_finished_write around VOP_REVOKE. 2004-11-15 21:27:06 +00:00
Poul-Henning Kamp
718fe8e2bf Next FILEDESC_LOCK properly around FILE_LOCK 2004-11-15 21:26:13 +00:00
Poul-Henning Kamp
124e4c3be8 Introduce an alias for FILEDESC_{UN}LOCK() with the suffix _FAST.
Use this in all the places where sleeping with the lock held is not
an issue.

The distinction will become significant once we finalize the exact
lock-type to use for this kind of case.
2004-11-13 11:53:02 +00:00
Poul-Henning Kamp
ef11fbd7c4 Introduce fdclose() which will clean an entry in a filedesc.
Replace homerolled versions with call to fdclose().

Make fdunused() static to kern_descrip.c
2004-11-07 22:16:07 +00:00
Colin Percival
56f21b9d74 Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This is
somewhat clearer, but more importantly allows for a consistent naming
scheme for suser_cred flags.

The old name is still defined, but will be removed in a few days (unless I
hear any complaints...)

Discussed with:	rwatson, scottl
Requested by:	jhb
2004-07-26 07:24:04 +00:00
Alfred Perlstein
f257b7a54b Make VFS_ROOT() and vflush() take a thread argument.
This is to allow filesystems to decide based on the passed thread
which vnode to return.
Several filesystems used curthread, they now use the passed thread.
2004-07-12 08:14:09 +00:00
Robert Watson
4f3bf9b9b4 Don't cuddle else's so much as we removed additional parts of each
block.
2004-06-24 17:22:29 +00:00
Robert Watson
5e11031e05 Remove temporary API bandage that allowed applications speaking the
older API to list attributes on a file (zero-length attribute name)
to function.  extattr_list_*() are now the only available APIs to
use when listing attributes.
2004-06-24 17:14:28 +00:00
Robert Watson
9260798fd7 Acquire Giant in link() so that the system call can be marked
MPSAFE.  Don't want to acquire Giant in kern_link() sync linux
compat code performs actions requiring Giant prior to calling
kern_link().
2004-06-22 04:34:05 +00:00
Robert Watson
694b21cf7b Acquire Giant in link() so that we can mark it as MSTD in
syscalls.master.  Don't want to do it in kern_link() since the
Linux emulation code calls kern_link() after performing other
actions requiring Giant.
2004-06-22 04:29:07 +00:00
Poul-Henning Kamp
d7086f313a Only initialize f_data and f_ops if nobody else did so already. 2004-06-19 11:41:45 +00:00
Poul-Henning Kamp
1930e303cf Deorbit COMPAT_SUNOS.
We inherited this from the sparc32 port of BSD4.4-Lite1.  We have neither
a sparc32 port nor a SunOS4.x compatibility desire these days.
2004-06-11 11:16:26 +00:00
Pawel Jakub Dawidek
79db0f1cbf Remove unused code.
Submitted by:	Bjoern A. Zeeb
2004-06-07 12:19:55 +00:00
Tim J. Robbins
c4d85674d5 Remove a stale comment. 2004-06-04 11:00:22 +00:00
Tim J. Robbins
f52e2ef29f Eliminate a memory leak in kern_symlink() that could occur if
vn_start_write() failed.
2004-05-11 10:42:02 +00:00
Pawel Jakub Dawidek
6c0ad4a77a Always use nd.ni_vp->v_mount as an argument for VFS_QUOTACTL(), just like
in RELENG_4.

Pointed out by:	Alex Lyashkov <umka@sevinter.net>
2004-04-26 15:44:42 +00:00
Pawel Jakub Dawidek
0c0c597faa Look out! vn_start_write() is able to return 0 and NULL 'mp'.
Submitted by:	Alex Lyashkov <shadow@psoft.net>
2004-04-22 15:40:27 +00:00
Bruce Evans
295ed75297 Removed some less than useful comments:
- don't say what a small subset of the options includes are for.
- don't mark up functions which use all their args with /* ARGSUSED */.
  The markup should have been removed when the unused retval parameter
  was removed.
- don't comment on what routine suser() checks do.  Removed nearby
  excessive vertical whitespace.
2004-04-06 10:05:02 +00:00
Warner Losh
7f8a436ff2 Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999.

Approved by: core
2004-04-05 21:03:37 +00:00
Doug Rabson
0b0a60fb43 Add lgetfh(2) which is like getfh(2) but doesn't follow symlinks. 2004-04-05 10:15:53 +00:00
David Malone
31c7e8b05b Nudge Giant as far as I can into kern_open(). Mark open() as MPSAFE.
Use kern_open() to implement creat() rather than taking the long route
through open(). Mark creat as MPSAFE.

While I'm at it, mark nosys() (syscall 0) as MPSAFE, for all the
difference it will make.
2004-03-16 10:46:42 +00:00
Pawel Jakub Dawidek
dd604e2647 Add two new sysctls:
- security.bsd.hardlink_check_uid, when set, means, that unprivileged
		users are not permitted to create hard links to files not
		owned by them,
	- security.bsd.hardlink_check_gid, when set, means, that unprivileged
		users are not permitted to create hard links to files owned
		by group they don't belong to.

OK'ed by:	rwatson
2004-03-08 20:37:25 +00:00
David Malone
a1cc6206fb Correct a comment.
Reviewed by:	alfred, tanimura
2004-02-17 12:30:32 +00:00
Robert Watson
f08df373a3 By default, when a process in jail calls getfsstat(), only return the
data for the file system on which the jail's root vnode is located.
Previous behavior (show data for all mountpoints) can be restored
by setting security.jail.getfsstatroot_only to 0.  Note: this also
has the effect of hiding other mounts inside a jail, such as /dev,
/tmp, and /proc, but errs on the side of leaking less information.
2004-02-14 18:31:11 +00:00
Dag-Erling Smørgrav
a2fe44e8cf New file descriptor allocation code, derived from similar code introduced
in OpenBSD by Niels Provos.  The patch introduces a bitmap of allocated
file descriptors which is used to locate available descriptors when a new
one is needed.  It also moves the task of growing the file descriptor table
out of fdalloc(), reducing complexity in both fdalloc() and do_dup().

Debts of gratitude are owed to tjr@ (who provided the original patch on
which this work is based), grog@ (for the gdb(4) man page) and rwatson@
(for assistance with pxeboot(8)).
2004-01-15 10:15:04 +00:00
Dag-Erling Smørgrav
05c3c5c8b6 Mechanical whitespace cleanup; parenthesize return values; other minor
style nits.  The #ifdefs in this file give me a headache...
2004-01-11 19:52:10 +00:00
Robert Watson
69546b2fbb Document that when we are addressing an open()/close() race, the reason
we call vn_close() manually rather than letting fdrop() take care of it
is that we haven't yet hooked up the various 'struct file' fields.
2003-12-24 17:13:01 +00:00
Kirk McKusick
fde81c7d8e Update the statfs structure with 64-bit fields to allow
accurate reporting of multi-terabyte filesystem sizes.

You should build and boot a new kernel BEFORE doing a `make world'
as the new kernel will know about binaries using the old statfs
structure, but an old kernel will not know about the new system
calls that support the new statfs structure. Running an old kernel
after a `make world' will cause programs such as `df' that do a
statfs system call to fail with a bad system call.

Reviewed by:	Bruce Evans <bde@zeta.org.au>
Reviewed by:	Tim Robbins <tjr@freebsd.org>
Reviewed by:	Julian Elischer <julian@elischer.org>
Reviewed by:	the hoards of <arch@freebsd.org>
Sponsored by:   DARPA & NAI Labs.
2003-11-12 08:01:40 +00:00
David Malone
e1419c08e2 falloc allocates a file structure and adds it to the file descriptor
table, acquiring the necessary locks as it works. It usually returns
two references to the new descriptor: one in the descriptor table
and one via a pointer argument.

As falloc releases the FILEDESC lock before returning, there is a
potential for a process to close the reference in the file descriptor
table before falloc's caller gets to use the file. I don't think this
can happen in practice at the moment, because Giant indirectly protects
closes.

To stop the file being completly closed in this situation, this change
makes falloc set the refcount to two when both references are returned.
This makes life easier for several of falloc's callers, because the
first thing they previously did was grab an extra reference on the
file.

Reviewed by:	iedowse
Idea run past:	jhb
2003-10-19 20:41:07 +00:00
Robert Watson
c096756c00 Add mac_check_vnode_deleteextattr() and mac_check_vnode_listextattr():
explicit access control checks to delete and list extended attributes
on a vnode, rather than implicitly combining with the setextattr and
getextattr checks.  This reflects EA API changes in the kernel made
recently, including the move to explicit VOP's for both of these
operations.

Obtained from:	TrustedBSD PRoject
Sponsored by:	DARPA, Network Associates Laboratories
2003-08-21 13:53:01 +00:00
John Baldwin
b2db7dc658 td_dupfd just needs to be less than 0, it does not have to hold the
negative value of the index of the new file, so just use -1.
2003-08-07 17:08:26 +00:00
Ian Dowse
76bd23557b In the mknod(), mkfifo(), link(), symlink() and undelete() syscalls,
use vrele() instead of vput() on the parent directory vnode returned
by namei() in the case where it is equal to the target vnode. This
handles namei()'s somewhat strange (but documented) behaviour of
not locking either vnode when the two vnodes are equal and LOCKPARENT
but not LOCKLEAF is specified.

Note that since a vnode double-unlock is not currently fatal, these
coding errors were effectively harmless.

Spotted by:	Juergen Hannken-Illjes <hannken@eis.cs.tu-bs.de>
Reviewed by:	mckusick
2003-08-05 00:26:51 +00:00
Robert Watson
9080ff25cf Rename VOP_RMEXTATTR() to VOP_DELETEEXTATTR() for consistency with the
kernel ACL interfaces and system call names.

Break out UFS2 and FFS extattr delete and list vnode operations from
setextattr and getextattr to deleteextattr and listextattr, which
cleans up the implementations, and makes the results more readable,
and makes the APIs more clear.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-07-28 18:53:29 +00:00
Poul-Henning Kamp
cf7742997a Pass the file descriptor index down to vn_open.
If the method vector was replaced and we got the "special return code"
smile and trust that whatever happened below DTRT.
2003-07-27 20:09:13 +00:00
Poul-Henning Kamp
7c89f162bc Add fdidx argument to vn_open() and vn_open_cred() and pass -1 throughout. 2003-07-27 17:04:56 +00:00
Poul-Henning Kamp
a8d43c90af Add a "int fd" argument to VOP_OPEN() which in the future will
contain the filedescriptor number on opens from userland.

The index is used rather than a "struct file *" since it conveys a bit
more information, which may be useful to in particular fdescfs and /dev/fd/*

For now pass -1 all over the place.
2003-07-26 07:32:23 +00:00
Poul-Henning Kamp
1226914c17 Use the f_vnode field to tell which file descriptors have a vnode. 2003-07-04 12:20:27 +00:00
Robert Watson
6b42f0a2eb Prefer the vop_rmextattr() vnode operation for removing extended
attributes from objects over vop_setextattr() with a NULL uio; if
the file system doesn't support the vop_rmextattr() method, fall
back to the vop_setextattr() method.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-06-22 23:03:07 +00:00
Poul-Henning Kamp
3b6d965263 Add a f_vnode field to struct file.
Several of the subtypes have an associated vnode which is used for
stuff like the f*() functions.

By giving the vnode a speparate field, a number of checks for the specific
subtype can be replaced simply with a check for f_vnode != NULL, and
we can later free f_data up to subtype specific use.

At this point in time, f_data still points to the vnode, so any code I
might have overlooked will still work.
2003-06-22 08:41:43 +00:00
Poul-Henning Kamp
eaaca5deee Don't (re)initialize f_gcflag to zero.
Move initialization of DTYPE_VNODE specific field f_seqcount into
the DTYPE_VNODE specific code.
2003-06-20 08:02:30 +00:00
Don Lewis
6084b6c9d5 FILE_LOCK() uses a pool mutex, as does the vnode v_vnlock. Since pool
mutexes are supposed to only be used as leaf mutexes, and what appear
to be separate pool mutexes could be aliased together, it is bad idea
for a thread to attempt to hold two pool mutexes at the same time.

Slightly rearrange the code in kern_open() so that FILE_UNLOCK() is
called before calling VOP_GETVOBJECT(), which will grab the v_vnlock
mutex.
2003-06-19 04:10:56 +00:00
Poul-Henning Kamp
2db4b023bb Introduce a new flag on a file descriptor: DFLAG_SEEKABLE and use that
rather than assume that only DTYPE_VNODE is seekable.
2003-06-18 19:53:59 +00:00
David E. O'Brien
677b542ea2 Use __FBSDID(). 2003-06-11 00:56:59 +00:00
Robert Watson
777621799b If a system call comes in requesting to retrieve an attribute named
"", temporarily map it to a call to extattr_list_vp() to provide
compatibility for older applications using the "" API to retrieve
EA lists.

Use VOP_LISTEXTATTR() to support extattr_list_vp() rather than
VOP_GETEXTATTR(..., "", ...).

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Asssociates Laboratories
2003-06-05 05:55:34 +00:00
Robert Watson
8bebbb1a32 Implementations of extattr_list_fd(), extattr_list_file(), and
extattr_list_link() system calls, which return a least of extended
attributes defined for a vnode referenced by a file descriptor
or path name.  Currently, we just invoke VOP_GETEXTATTR() since
it will convert a request for an empty name into a query for a
name list, which was the old (more hackish) API.  At some point
in the near future, we'll push the distinction between get and
list down to the vnode operation layer, but this provides access
to the new API for applications in the short term.

Pointed out by:	Dominic Giampaolo <dbg@apple.com>
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-06-04 03:57:28 +00:00
Poul-Henning Kamp
670966596b Remove unused variable(s).
Found by:       FlexeLint
2003-05-31 20:29:34 +00:00
Alexander Kabaev
104a9b7e3e Deprecate machine/limits.h in favor of new sys/limits.h.
Change all in-tree consumers to include <sys/limits.h>

Discussed on:	standards@
Partially submitted by: Craig Rodrigues <rodrigc@attbi.com>
2003-04-29 13:36:06 +00:00
Alan Cox
b6e48e0372 - Acquire the vm_object's lock when performing vm_object_page_clean().
- Add a parameter to vm_pageout_flush() that tells vm_pageout_flush()
   whether its caller has locked the vm_object.  (This is a temporary
   measure to bootstrap vm_object locking.)
2003-04-24 04:31:25 +00:00
Mike Barcroft
fd7a8150fb o In struct prison, add an allprison linked list of prisons (protected
by allprison_mtx), a unique prison/jail identifier field, two path
  fields (pr_path for reporting and pr_root vnode instance) to store
  the chroot() point of each jail.
o Add jail_attach(2) to allow a process to bind to an existing jail.
o Add change_root() to perform the chroot operation on a specified
  vnode.
o Generalize change_dir() to accept a vnode, and move namei() calls
  to callers of change_dir().
o Add a new sysctl (security.jail.list) which is a group of
  struct xprison instances that represent a snapshot of active jails.

Reviewed by:	rwatson, tjr
2003-04-09 02:55:18 +00:00
Robert Watson
a184d471e2 Move the initialization of the vattr flags field in setfflags() to
before the MAC check so that we pass the flags field into the MAC
check properly initialized.  This didn't affect any current MAC
modules since they didn't care what the flags argument was (as
they were primarily interested in the fact that it was a meta-data
write, not the contents of the write), but would be relevant to
future modules relying on that field.

Submitted by:	Mike Halderman <mrh@spawar.navy.mil>
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-03-05 23:15:23 +00:00
Warner Losh
a163d034fa Back out M_* changes, per decision of the TRB.
Approved by: trb
2003-02-19 05:47:46 +00:00
Jeffrey Hsu
a44009e07d Remove extraneous FILEDESC_LOCK around atomic read. 2003-02-16 02:15:15 +00:00
Robert Watson
565211b27f Correct handling of locking for chroot() and chdir() cases: rather
than having change_dir() release the vnode lock on success, hold the
lock so that we can use it later when invoking MAC checks and
VOP_ACCESS() in the chroot() code.  Update the comment to reflect
this calling convention.  Update callers to unlock the vnode
lock.  Correct a typo regarding vnode naming in the MAC case that
crept in via the previous patch applied.
2003-01-31 21:13:25 +00:00
Robert Watson
7278944df1 Clean up vnode handling on return from chroot() in certain error
cases: we might multiply vrele() a vnode when certain classes of
failures occur.  This appears to stem from earlier Giant/file
descriptor lock pushdown and restructuring.

Submitted by:	maxim
2003-01-31 18:57:04 +00:00
Alfred Perlstein
44956c9863 Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
2003-01-21 08:56:16 +00:00
Matthew Dillon
48e3128b34 Bow to the whining masses and change a union back into void *. Retain
removal of unnecessary casts and throw in some minor cleanups to see if
anyone complains, just for the hell of it.
2003-01-13 00:33:17 +00:00
Matthew Dillon
cd72f2180b Change struct file f_data to un_data, a union of the correct struct
pointer types, and remove a huge number of casts from code using it.

Change struct xfile xf_data to xun_data (ABI is still compatible).

If we need to add a #define for f_data and xf_data we can, but I don't
think it will be necessary.  There are no operational changes in this
commit.
2003-01-12 01:37:13 +00:00
Jacques Vidrine
f0c093284d Correct file descriptor leaks in lseek and do_dup.
The leak in lseek was introduced in vfs_syscalls.c revision 1.218.
The leak in do_dup was introduced in kern_descrip.c revision 1.158.

Submitted by:	iedowse
2003-01-06 13:19:05 +00:00
Alfred Perlstein
f97182acf8 unwrap lines made short enough by SCARGS removal 2002-12-14 08:18:06 +00:00
Alfred Perlstein
b80521fee5 remove syscallarg().
Suggested by: peter
2002-12-14 02:07:32 +00:00
Alfred Perlstein
d1e405c5ce SCARGS removal take II. 2002-12-14 01:56:26 +00:00
Alfred Perlstein
bc9e75d7ca Backout removal SCARGS, the code freeze is only "selectively" over. 2002-12-13 22:41:47 +00:00
Alfred Perlstein
0bbe7292e1 Remove SCARGS.
Reviewed by: md5
2002-12-13 22:27:25 +00:00
Ian Dowse
4e08ccb2ff Fix a case in kern_rename() where a vn_finished_write() call was
missed. This bug has been present since the vn_start_write() and
vn_finished_write() calls were first added in revision 1.159. When
the case is triggered, any attempts to create snapshots on the
filesystem will deadlock and also prevent further write activity
on that filesystem.
2002-10-27 23:23:51 +00:00
Garrett Wollman
c7047e5204 Change the way support for asynchronous I/O is indicated to applications
to conform to 1003.1-2001.  Make it possible for applications to actually
tell whether or not asynchronous I/O is supported.

Since FreeBSD's aio implementation works on all descriptor types, don't
call down into file or vnode ops when [f]pathconf() is asked about
_PC_ASYNC_IO; this avoids the need for every file and vnode op to know about
it.
2002-10-27 18:07:41 +00:00
Robert Watson
7587203c2f Hook up most of the MAC entry points relating to file/directory/node
creation, deletion, and rename.  There are one or two other stray
cases I'll catch in follow-up commits (such as unix domain socket
creation); this permits MAC policy modules to limit the ability to
perform these operations based on existing UNIX credential / vnode
attributes, extended attributes, and security labels.  In the rename
case using MAC, we now have to lock the from directory and file
vnodes for the MAC check, but this is done only in the MAC case,
and the locks are immediately released so that the remainder of the
rename implementation remains the same.  Because the create check
takes a vattr to know object type information, we now initialize
additional fields in the VATTR passed to VOP_SYMLINK() in the MAC
case.

Approved by:	re
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-10-19 20:25:57 +00:00
Robert Watson
2dba710ddb Incremental style improvements: more consistently avoid assignments
in conditionals; remove some excess vertical whitespace; remove a
bug in the return handling of the delete_vp() case for MAC.

Spotted by:	bde
2002-10-10 13:59:58 +00:00
Robert Watson
b101411be1 Explore new heights in alphabetization for _file and _fd variations on
the extended attribute system calls.
2002-10-10 00:32:08 +00:00
Robert Watson
6f90723cad Implement extattr_{delete,get,set}_link() system calls: extended attribute
operations that do not follow links.  Sync to MAC tree.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-10-09 21:48:22 +00:00
Ian Dowse
197b023b1b Add back a fdrop() call at the end of kern_open() that got lost in
revision 1.218. This bug caused a "struct file" reference to be
leaked if VOP_ADVLOCK(), vn_start_write(), or mac_check_vnode_write()
failed during the open operation.

PR:		kern/43739
Reported by:	Arne Woerner <woerner@mediabase-gmbh.de>
2002-10-07 20:49:22 +00:00
Robert Watson
0a69419678 Merge support for mac_check_vnode_link(), a MAC framework/policy entry
point that instruments the creation of hard links.  Policy implementations
to follow.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-10-05 18:11:36 +00:00
Poul-Henning Kamp
8c5d013757 Fix mis-indentation.
Spotted by:	FlexeLint
2002-10-02 09:09:25 +00:00
Jeff Roberson
d40a8125f5 - Properly lock v_vflags in getdirents(). 2002-09-25 02:13:38 +00:00
Don Lewis
fa288043e2 VOP_FSYNC() requires that it's vnode argument be locked, which nfs_link()
wasn't doing.  Rather than just lock and unlock the vnode around the call
to VOP_FSYNC(), implement rwatson's suggestion to lock the file vnode
in kern_link() before calling VOP_LINK(), since the other filesystems
also locked the file vnode right away in their link methods.  Remove the
locking and and unlocking from the leaf filesystem link methods.

Reviewed by:	rwatson, bde  (except for the unionfs_link() changes)
2002-09-19 13:32:45 +00:00
Bruce Evans
d3a7b5e70e vfs_syscalls.c:
Changed rename(2) to follow the letter of the POSIX spec.  POSIX
requires rename() to have no effect if its args "resolve to the same
existing file".  I think "file" can only reasonably be read as referring
to the inode, although the rationale and "resolve" seem to say that
sameness is at the level of (resolved) directory entries.

ext2fs_vnops.c, ufs_vnops.c:
Replaced code that gave the historical BSD behaviour of removing one
link name by checks that this code is now unreachable.  This fixes
some races.  All vnodes needed to be unlocked for the removal, and
locking at another level using something like IN_RENAME was not even
attempted, so it was possible for rename(x, y) to return with both x
and y removed even without any unlink(2) syscalls (one process can
remove x using rename(x, y) and another process can remove y using
rename(y, x)).

Prodded by:	alfred
MFC after:	8 weeks
PR:		42617
2002-09-10 11:09:13 +00:00
Ian Dowse
8f19eb88df Split out a number of mostly VFS and signal related syscalls into
a kernel-internal kern_*() version and a wrapper that is called via
the syscall vector table. For paths and structure pointers, the
internal version either takes a uio_seg parameter or requires the
caller to copyin() the data to kernel memory as appropiate. This
will permit emulation layers to use these syscalls without having
to copy out translated arguments to the stack gap.

Discussed on:		-arch
Review/suggestions:	bde, jhb, peter, marcel
2002-09-01 20:37:28 +00:00
Jeff Roberson
856d3a056f - Hold the vnode lock across unlink() so that the v_vflag check is safe.
- Fix the long broken error handling for VV_ROOT and VDIR.
2002-08-21 03:55:35 +00:00
Robert Watson
177142e458 Pass active_cred and file_cred into the MAC framework explicitly
for mac_check_vnode_{poll,read,stat,write}().  Pass in fp->f_cred
when calling these checks with a struct file available.  Otherwise,
pass NOCRED.  All currently MAC policies use active_cred, but
could now offer the cached credential semantic used for the base
system security model.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-19 19:04:53 +00:00
Robert Watson
7f724f8b51 Break out mac_check_vnode_op() into three seperate checks:
mac_check_vnode_poll(), mac_check_vnode_read(), mac_check_vnode_write().
This improves the consistency with other existing vnode checks, and
allows policies to avoid implementing switch statements to determine
what operations they do and do not want to authorize.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-19 16:43:25 +00:00
Robert Watson
ea6027a8e1 Make similar changes to fo_stat() and fo_poll() as made earlier to
fo_read() and fo_write(): explicitly use the cred argument to fo_poll()
as "active_cred" using the passed file descriptor's f_cred reference
to provide access to the file credential.  Add an active_cred
argument to fo_stat() so that implementers have access to the active
credential as well as the file credential.  Generally modify callers
of fo_stat() to pass in td->td_ucred rather than fp->f_cred, which
was redundantly provided via the fp argument.  This set of modifications
also permits threads to perform these operations on behalf of another
thread without modifying their credential.

Trickle this change down into fo_stat/poll() implementations:

- badfo_poll(), badfo_stat(): modify/add arguments.
- kqueue_poll(), kqueue_stat(): modify arguments.
- pipe_poll(), pipe_stat(): modify/add arguments, pass active_cred to
  MAC checks rather than td->td_ucred.
- soo_poll(), soo_stat(): modify/add arguments, pass fp->f_cred rather
  than cred to pru_sopoll() to maintain current semantics.
- sopoll(): moidfy arguments.
- vn_poll(), vn_statfile(): modify/add arguments, pass new arguments
  to vn_stat().  Pass active_cred to MAC and fp->f_cred to VOP_POLL()
  to maintian current semantics.
- vn_close(): rename cred to file_cred to reflect reality while I'm here.
- vn_stat(): Add active_cred and file_cred arguments to vn_stat()
  and consumers so that this distinction is maintained at the VFS
  as well as 'struct file' layer.  Pass active_cred instead of
  td->td_ucred to MAC and to VOP_GETATTR() to maintain current semantics.

- fifofs: modify the creation of a "filetemp" so that the file
  credential is properly initialized and can be used in the socket
  code if desired.  Pass ap->a_td->td_ucred as the active
  credential to soo_poll().  If we teach the vnop interface about
  the distinction between file and active credentials, we would use
  the active credential here.

Note that current inconsistent passing of active_cred vs. file_cred to
VOP's is maintained.  It's not clear why GETATTR would be authorized
using active_cred while POLL would be authorized using file_cred at
the file system level.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-16 12:52:03 +00:00
Jeff Roberson
e6e370a7fe - Replace v_flag with v_iflag and v_vflag
- v_vflag is protected by the vnode lock and is used when synchronization
   with VOP calls is needed.
 - v_iflag is protected by interlock and is used for dealing with vnode
   management issues.  These flags include X/O LOCK, FREE, DOOMED, etc.
 - All accesses to v_iflag and v_vflag have either been locked or marked with
   mp_fixme's.
 - Many ASSERT_VOP_LOCKED calls have been added where the locking was not
   clear.
 - Many functions in vfs_subr.c were restructured to provide for stronger
   locking.

Idea stolen from:	BSD/OS
2002-08-04 10:29:36 +00:00
Robert Watson
18b770b2fb Introduce support for Mandatory Access Control and extensible
kernel access control.

Invoke appropriate MAC framework entry points to authorize readdir()
operations in the native ABI.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-01 20:44:52 +00:00
Robert Watson
f9d0d52459 Include file cleanup; mac.h and malloc.h at one point had ordering
relationship requirements, and no longer do.

Reminded by:	bde
2002-08-01 17:47:56 +00:00
Robert Watson
f4d2cfdda6 Introduce support for Mandatory Access Control and extensible
kernel access control.

Invoke appropriate MAC entry points to authorize the following
operations:

        truncate on open()                      (write)
        access()                                (access)
        readlink()                              (readlink)
        chflags(), lchflags(), fchflags()       (setflag)
        chmod(), fchmod(), lchmod()             (setmode)
        chown(), fchown(), lchown()             (setowner)
        utimes(), lutimes(), futimes()          (setutimes)
        truncate(), ftrunfcate()                (write)
        revoke()                                (revoke)
        fhopen()                                (open)
        truncate on fhopen()                    (write)
        extattr_set_fd, extattr_set_file()      (setextattr)
        extattr_get_fd, extattr_get_file()      (getextattr)
        extattr_delete_fd(), extattr_delete_file() (setextattr)

These entry points permit MAC policies to enforce a variety of
protections on vnodes.  More vnode checks to come, especially in
non-native ABIs.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-01 15:37:12 +00:00
Robert Watson
b3e13e1c3f Introduce support for Mandatory Access Control and extensible
kernel access control.

Instrument chdir() and chroot()-related system calls to invoke
appropriate MAC entry points to authorize the two operations.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-01 03:50:08 +00:00
Robert Watson
b285e7f9a8 Improve formatting and variable use consistency in extattr system
calls.

Submitted by:	green
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-01 01:29:03 +00:00
Robert Watson
956fc3f8a5 Simplify the logic to enter VFS_EXTATTRCTL().
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-01 01:26:07 +00:00
Robert Watson
2712d0ee89 Introduce support for Mandatory Access Control and extensible
kernel access control.

Implement MAC framework access control entry points relating to
operations on mountpoints.  Currently, this consists only of
access control on mountpoint listing using the various statfs()
variations.  In the future, it might also be desirable to
implement checks on mount() and unmount().

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-31 01:27:33 +00:00
Robert Watson
e66c87b70e When referencing nd_cnp after namei(), always pass SAVENAME into
NDINIT() operation flags.

Submitted by:	green
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-30 18:48:25 +00:00
Robert Watson
0b1040cb88 Set VAPPEND in open mode when O_APPEND is specified as an argument to
open() of fhopen().  Currently this has no actual affect due to the
treatment of VAPPEND in vaccess() and vaccess_acl() as a subset of
VWRITE, but when MAC comes in, MAC will distinguish the two.  Note:
if any file systems are cutting their own permission models, they
may wish to now take this into account.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-22 12:51:06 +00:00
Kirk McKusick
fb36a3d847 Change utimes to set the file creation time (for filesystems that
support creation times such as UFS2) to the value of the
modification time if the value of the modification time is older
than the current creation time. See utimes(2) for further details.

Sponsored by:	DARPA & NAI Labs.
2002-07-17 02:03:19 +00:00
Kirk McKusick
faab4e2722 Change the name of st_createtime to st_birthtime. This change is
made to reduce confusion between st_ctime and st_createtime.

Submitted by:	Eric Allman <eric@sendmail.org>
Sponsored by:	DARPA & NAI Labs.
2002-07-16 22:36:00 +00:00
John Baldwin
03d7a9fffb - Change chroot_refuse_vdir_fds() to require that the passed in struct
filedesc is already locked rather than having chroot() unlock the
  filedesc so chroot_refuse_vdir_fds() can immediately relock it.
- Reorder chroot() a bitso that we do the namei lookup before checking
  the process's struct filedesc.  This closes at least one potential race
  and allows us to only acquire the filedsec lock once in chroot().
- Push down Giant slightly into chroot().
2002-07-13 04:07:12 +00:00
Maxime Henrion
2b4edb69f1 Move every code related to mount(2) in a new file, vfs_mount.c.
The file vfs_conf.c which was dealing with root mounting has
been repo-copied into vfs_mount.c to preserve history.
This makes nmount related development easier, and help reducing
the size of vfs_syscalls.c, which is still an enormous file.

Reviewed by:	rwatson
Repo-copy by:	peter
2002-07-02 17:09:22 +00:00
Ian Dowse
6bd521df93 Use indirect function pointer hooks instead of #ifdef SOFTUPDATES
direct calls for the two places where the kernel calls into soft
updates code. Set up the hooks in softdep_initialize() and NULL
them out in softdep_uninitialize(). This change allows soft updates
to function correctly when ufs is loaded as a module.

Reviewed by:	mckusick
2002-07-01 17:59:40 +00:00
Alfred Perlstein
a788442584 Remove unneeded casts to caddr_t. 2002-06-28 23:02:38 +00:00
Ian Dowse
84b2995b2f In vn_mkdir(), use vrele() instead of vput() on the parent directory
vnode in the case that the target exists and is the same vnode as
the parent (i.e. "mkdir ."). The namei() call does not leave the
vnode locked in this case even though you might expect it to.

This bug was mostly harmless in practice because unlocking an already
unlocked vnode currently does not trigger any panics or warnings.

Reviewed by:	jeff
2002-06-28 20:06:47 +00:00
Kirk McKusick
d374764fd3 Use proper size in bzero of stat structure.
Submitted by:	Jake Burkholder <jake@locore.ca>
Sponsored by:	DARPA & NAI Labs.
2002-06-24 07:14:44 +00:00
Kirk McKusick
6524dddcd5 This patch fixes a size problem with the stat structure for
64-bit architectures that was introduced in the UFS2 code
merge two days ago. The stat structure change that caused
the problem was the addition of the file create time.

Submitted by:	Bruce Evans <bde@zeta.org.au>
Sponsored by:	DARPA & NAI Labs.
2002-06-22 22:01:13 +00:00
Maxime Henrion
cacd1c9b49 o Remove the initialization of unused fields in the struct
uio now that we don't use uiomove() anymore.
o Enforce stricter checks on the length of the iov's in
  nmount(2) since we now malloc() them individually and
  corrupted iov's could make the kernel crash in malloc()
  with "kmem_map too small".

Reviewed by:	phk
2002-06-22 18:07:05 +00:00
Kirk McKusick
1c85e6a35d This commit adds basic support for the UFS2 filesystem. The UFS2
filesystem expands the inode to 256 bytes to make space for 64-bit
block pointers. It also adds a file-creation time field, an ability
to use jumbo blocks per inode to allow extent like pointer density,
and space for extended attributes (up to twice the filesystem block
size worth of attributes, e.g., on a 16K filesystem, there is space
for 32K of attributes). UFS2 fully supports and runs existing UFS1
filesystems. New filesystems built using newfs can be built in either
UFS1 or UFS2 format using the -O option. In this commit UFS1 is
the default format, so if you want to build UFS2 format filesystems,
you must specify -O 2. This default will be changed to UFS2 when
UFS2 proves itself to be stable. In this commit the boot code for
reading UFS2 filesystems is not compiled (see /sys/boot/common/ufsread.c)
as there is insufficient space in the boot block. Once the size of the
boot block is increased, this code can be defined.

Things to note: the definition of SBSIZE has changed to SBLOCKSIZE.
The header file <ufs/ufs/dinode.h> must be included before
<ufs/ffs/fs.h> so as to get the definitions of ufs2_daddr_t and
ufs_lbn_t.

Still TODO:
Verify that the first level bootstraps work for all the architectures.
Convert the utility ffsinfo to understand UFS2 and test growfs.
Add support for the extended attribute storage. Update soft updates
to ensure integrity of extended attribute storage. Switch the
current extended attribute interfaces to use the extended attribute
storage. Add the extent like functionality (framework is there,
but is currently never used).

Sponsored by: DARPA & NAI Labs.
Reviewed by:	Poul-Henning Kamp <phk@freebsd.org>
2002-06-21 06:18:05 +00:00
Maxime Henrion
7d2d440991 Change the way we internally store the mount options to
a linked list.  This is to allow the merging of the mount
options in the MNT_UPDATE case, as the current data structure
is unsuitable for this.

There are no functional differences in this commit.

Reviewed by:	phk
2002-06-20 20:03:42 +00:00
Maxime Henrion
8eb0098f4c Remove a duplicated vfs_freeopts() that I introduced in last
revision.
2002-05-28 13:27:55 +00:00
Maxime Henrion
2274ec995c Style nit, no functional changes. 2002-05-23 23:22:22 +00:00
Maxime Henrion
9ee6bf717f Slightly change the way we pass mount options to the filesystem
VFS_NMOUNT operations.

Reviewed by:	phk
2002-05-23 23:02:19 +00:00
Maxime Henrion
e9e705b0df Change two vput() that should have been vrele().
Submitted by:	iedowse
2002-05-20 14:59:43 +00:00
Tom Rhodes
d394511de3 More s/file system/filesystem/g 2002-05-16 21:28:32 +00:00
Jeff Roberson
0e2d6cc899 Disable the shared locking namei() code for now. It breaks several stacking
filesystems.  This is on hold until the rest of VFS Locking is reviewed and
deemed safe.  It can be enabled with 'options LOOKUP_SHARED'.
2002-05-14 21:59:49 +00:00
Maxime Henrion
9d997d8be8 Add the lchflags(2) syscall.
Reviewed by:	rwatson
2002-05-05 23:47:41 +00:00
Jeff Roberson
576365ba36 Move a KASSERT() in open() prior to unlocking the vnode. It's not safe to
call VOP_GETVOBJECT without a lock.
2002-05-05 23:17:13 +00:00
Maxime Henrion
afd458b0fa Fix a typo.
Submitted by:	dwmalone
2002-05-04 19:50:09 +00:00
Robert Watson
7a0776e477 Slightly restructure extattr_get_vp() so that there's only one entry point
to VOP_GETEXTATTR().  This simplifies code flow when inserting MAC hooks.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-04-23 01:27:38 +00:00
Robert Watson
0510317039 Improve style consistency of vfs_syscalls.c by converting the style used
in various extattr_*() calls to match the rest of the file.  Originally,
these bits at the end looked more like style(9).  This patch was submitted
by green by way of the TrustedBSD MAC tree, and I fixed a few problems
with it on the way through.  Someone with more time on their hands should
convert the entire file to style(9); this commit is for diff reduction
purposes.

Submitted by:	green
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-04-20 01:37:08 +00:00
Ian Dowse
df99ca52f1 The recent NFS forced unmount improvements introduced a side-effect
where some client operations might be unexpectedly cancelled during
an unsuccessful non-forced unmount attempt. This causes problems
for amd(8), because it periodically attempts a non-forced unmount
to check if the filesystem is still in use.

Fix this by adding a new mountpoint flag MNTK_UNMOUNTF that is set
only during the operation of a forced unmount. Use this instead of
MNTK_UNMOUNT to trigger the cancellation of hung NFS operations.

Also correct a problem where dounmount() might inadvertently clear
the MNTK_UNMOUNT flag.

Reported by:	simokawa
MFC after:	1 week
2002-04-17 01:07:29 +00:00
Jeff Roberson
a59f8b9e6c Turn #ifdef LOOKUP_SHARED into #ifndef LOOKUP_EXCLUSIVE to enable this
behavior by default.  Also, change the options line to reflect this.

If there are no problems reported this will become the only behavior and the
knob will be removed in a month or so.

Demanded by:	obrien
2002-04-09 05:14:17 +00:00
Maxime Henrion
a48ca36999 The fourth parameter to copystr() is a size_t, not an int.
Approved by:	peter
2002-04-08 21:14:19 +00:00
Maxime Henrion
9d8353732e o Change kernel_vmount() interface to be more convenient : pass two
separate strings instead of passing "foo=bar".
o Don't forget to clear the VMOUNT flag on the vnode when vfs_nmount()
  fails because the fs doesn't implement VFS_NMOUNT (and in vfs_mount()
  when the fs doesn't implement VFS_MOUNT) ; also decrement the vfs
  refcount in the !MNT_UPDATE case.
2002-04-07 13:22:47 +00:00
Maxime Henrion
bcc931752f Add two forgotten vfs_unbusy() calls, in vfs_mount() and vfs_nmount().
Reviewed by:	phk
2002-04-03 12:19:03 +00:00
John Baldwin
44731cab3b Change the suser() API to take advantage of td_ucred as well as do a
general cleanup of the API.  The entire API now consists of two functions
similar to the pre-KSE API.  The suser() function takes a thread pointer
as its only argument.  The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0.  The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.

Discussed on:	smp@
2002-04-01 21:31:13 +00:00
Maxime Henrion
daab5e2472 - Properly sync vfs_nmount() with changes that have be already done
in vfs_mount(), in particular revisions 1.215, 1.227 and 1.240.
- flag2 is a low quality variable name, change it to kern_flag.
- strncpy NUL-terminates f_fstypename and f_mntonname since the strings
  have length <= <buffer length> - 1, so the explicit NUL-termination is
  bogus.
- M_ZERO'ing space for fstype and fspath is stupid since we never use the
  space beyond the end of the string.
- Do various style(9) cleanups in both functions.

Submitted by:	bde
Reviewed by:	phk
2002-03-28 13:47:32 +00:00
Andrew R. Reiter
dcce8874eb - Fixup a few style nits:
- return error -> return (error);
  - move a declaration to the top of the function.
  - become bug for bug compatible with if (error) lines.

Submitted by: bde
2002-03-26 18:07:10 +00:00
Maxime Henrion
17594b936b As discussed in -arch, add the new nmount(2) system call and the
new vfs_getopt()/vfs_copyopt() API.  This is intended to be used
later, when there will be filesystems implementing the VFS_NMOUNT
operation.  The mount(2) system call will disappear when all
filesystems will be converted to the new API.  Documentation will
be committed in a while.

Reviewed by:	phk
2002-03-26 15:33:44 +00:00
Andrew R. Reiter
517f30c2c1 - Recommit the securelevel_gt() calls removed by commits rev. 1.84 of
kern_linker.c and rev. 1.237 of vfs_syscalls.c since these are not the
  source of the recent panics occuring around kldloading file system
  support modules.

Requested by: rwatson
2002-03-25 18:26:34 +00:00
Andrew R. Reiter
fe3240e9aa - Back out the commit to make the linker_load_file() securelevel check
made aware in jail environments.  Supposedly something is broken, so
  this should be backed out until further investigation proves otherwise,
  or a proper fix can be provided.
2002-03-22 04:56:09 +00:00
Andrew R. Reiter
e85b9ae9ac - Fix a logic error in checking the securelevel that was introduced in the
previous commit.

Pointy hats to: arr, rwatson
2002-03-21 15:27:39 +00:00
Andrew R. Reiter
c457a4403a - Change a check of securelevel to securelevel_gt() call in order to help
against users within a jail attempting to load kernel modules.
- Add a check of securelevel_gt() to vfs_mount() in order to chop some
  low hanging fruit for the repair of securelevel checking of linking and
  unlinking files from within jails.  There is more to be done here.

Reviewed by: rwatson
2002-03-20 16:03:42 +00:00
Jeff Roberson
c897b81311 Remove references to vm_zone.h and switch over to the new uma API.
Also, remove maxsockets.  If you look carefully you'll notice that the old
zone allocator never honored this anyway.
2002-03-20 04:09:59 +00:00
Alfred Perlstein
4d77a549fe Remove __P. 2002-03-19 21:25:46 +00:00
Alfred Perlstein
4a950215ef Close a race when vfs_syscalls.c:checkdirs() runs.
To do this protect the filedesc pointer in the proc with PROC_LOCK
in both checkdirs() and kern_descrip.c:fdfree().
2002-03-19 04:30:04 +00:00
Jeff Roberson
8de00f4a87 This patch adds the "LOCKSHARED" option to namei which causes it to only acquire shared locks on leafs.
The stat() and open() calls have been changed to make use of this new functionality.  Using shared locks in
these cases is sufficient and can significantly reduce their latency if IO is pending to these vnodes.  Also,
this reduces the number of exclusive locks that are floating around in the system, which helps reduce the
number of deadlocks that occur.

A new kernel option "LOOKUP_SHARED" has been added.  It defaults to off so this patch can be turned on for
testing, and should eventually go away once it is proven to be stable.  I have personally been running this
patch for over a year now, so it is believed to be fully stable.

Reviewed by:	jake, obrien
Approved by:	jake
2002-03-12 04:00:11 +00:00
Robert Watson
89e1164ee2 Three p_ucred -> td_ucred's missed in jhb's earlier pass; all appear to
be safe.
2002-03-05 19:45:45 +00:00
Robert Watson
b0ad6e203a The change from td->td_proc->p_ucred to td->td_ucred has shortened some
lines: more agressively line wrap under those circumstances.
2002-03-05 19:31:25 +00:00
John Baldwin
bdd67d483c - Change namei() to use td_ucred instead of p_ucred.
- Change the hack in access() that uses a temporary credential to set
  td_ucred to the temp cred instead of p_ucred.
2002-02-27 19:15:29 +00:00
John Baldwin
a854ed9893 Simple p_ucred -> td_ucred changes to start using the per-thread ucred
reference.
2002-02-27 18:32:23 +00:00
Robert Watson
1ea030d8fe Make sure to hold vnode lock when calling into VOP_GETATTR().
Discussed with:	mckusick, phk
2002-02-10 21:44:30 +00:00
Robert Watson
c0a9dc83c8 Make sure to grab vnode lock on a vnode before calling VOP_GETATTR()
to perform an ownership test in revoke().  This is also required for
MAC hooks so that the vnode lock is held during a call to the MAC
framework.  Release the lock before calling VOP_REVOKE().

Discussed with:	phk, mckusick
2002-02-10 20:45:43 +00:00
Robert Watson
56e04d01c0 Remove a stray 'const' that slept into extattr_set_vp(), and could
result in compiler warnings.
2002-02-10 05:31:55 +00:00
Robert Watson
74237f55b0 Part I: Update extended attribute API and ABI:
o Modify the system call syntax for extattr_{get,set}_{fd,file}() so
  as not to use the scatter gather API (which appeared not to be used
  by any consumers, and be less portable), rather, accepts 'data'
  and 'nbytes' in the style of other simple read/write interfaces.
  This changes the API and ABI.

o Modify system call semantics so that extattr_get_{fd,file}() return
  a size_t.  When performing a read, the number of bytes read will
  be returned, unless the data pointer is NULL, in which case the
  number of bytes of data are returned.  This changes the API only.

o Modify the VOP_GETEXTATTR() vnode operation to accept a *size_t
  argument so as to return the size, if desirable.  If set to NULL,
  the size will not be returned.

o Update various filesystems (pseodofs, ufs) to DTRT.

These changes should make extended attributes more useful and more
portable.  More commits to rebuild the system call files, as well
as update userland utilities to follow.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-02-10 04:43:22 +00:00
Robert Watson
143bb598d0 o Merge various recent fixes from the MAC branch relating to extattrctl():
- Fix null-pointer dereference introduced when snapshotting
	  was introduced.  This occured because unlike the previous code,
	  vn_start_write() doesn't always return a non-NULL mp, as
	  filesystems may not support the VOP_GETWRITEMOUNT() call.  For
	  now, rely on two pointers, so that vn_finished_write() works
	  properly.
	- Fix locking problems on exit, introduced at some past time,
	  some when snapshots came in, where a vnode might not be
	  unlocked before being vrele'd in various error situations.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-02-08 05:58:41 +00:00
Julian Elischer
079b7badea Pre-KSE/M3 commit.
this is a low-functionality change that changes the kernel to access the main
thread of a process via the linked list of threads rather than
assuming that it is embedded in the process. It IS still embeded there
but remove all teh code that assumes that in preparation for the next commit
which will actually move it out.

Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,
2002-02-07 20:58:47 +00:00
Alfred Perlstein
b7184973ed Don't recurse on filedesc lock in chroot_refuse_vdir_fds().
Noticed by: Michael Nottebrock <michaelnottebrock@gmx.net>
2002-02-01 18:27:16 +00:00
Alfred Perlstein
a4db49537b Replace ffind_* with fget calls.
Make fget MPsafe.

Make fgetvp and fgetsock use the fget subsystem to reduce code bloat.

Push giant down in fpathconf().
2002-01-14 00:13:45 +00:00
Alfred Perlstein
426da3bcfb SMP Lock struct file, filedesc and the global file list.
Seigo Tanimura (tanimura) posted the initial delta.

I've polished it quite a bit reducing the need for locking and
adapting it for KSE.

Locks:

1 mutex in each filedesc
   protects all the fields.
   protects "struct file" initialization, while a struct file
     is being changed from &badfileops -> &pipeops or something
     the filedesc should be locked.

1 mutex in each struct file
   protects the refcount fields.
   doesn't protect anything else.
   the flags used for garbage collection have been moved to
     f_gcflag which was the FILLER short, this doesn't need
     locking because the garbage collection is a single threaded
     container.
  could likely be made to use a pool mutex.

1 sx lock for the global filelist.

struct file *	fhold(struct file *fp);
        /* increments reference count on a file */

struct file *	fhold_locked(struct file *fp);
        /* like fhold but expects file to locked */

struct file *	ffind_hold(struct thread *, int fd);
        /* finds the struct file in thread, adds one reference and
                returns it unlocked */

struct file *	ffind_lock(struct thread *, int fd);
        /* ffind_hold, but returns file locked */

I still have to smp-safe the fget cruft, I'll get to that asap.
2002-01-13 11:58:06 +00:00
Ian Dowse
1f493270a1 Change dounmount() to return EBUSY in the non-MNT_FORCE case if we
can't acquire the mnt_lock without blocking. Normally non-forced
unmount attempts return EBUSY quickly if any vnodes are active, so
this just extends that behaviour to cover the per-mount mnt_lock
too.
2002-01-10 01:59:30 +00:00