Commit Graph

335 Commits

Author SHA1 Message Date
Ian Dowse
8f19eb88df Split out a number of mostly VFS and signal related syscalls into
a kernel-internal kern_*() version and a wrapper that is called via
the syscall vector table. For paths and structure pointers, the
internal version either takes a uio_seg parameter or requires the
caller to copyin() the data to kernel memory as appropiate. This
will permit emulation layers to use these syscalls without having
to copy out translated arguments to the stack gap.

Discussed on:		-arch
Review/suggestions:	bde, jhb, peter, marcel
2002-09-01 20:37:28 +00:00
Jeff Roberson
856d3a056f - Hold the vnode lock across unlink() so that the v_vflag check is safe.
- Fix the long broken error handling for VV_ROOT and VDIR.
2002-08-21 03:55:35 +00:00
Robert Watson
177142e458 Pass active_cred and file_cred into the MAC framework explicitly
for mac_check_vnode_{poll,read,stat,write}().  Pass in fp->f_cred
when calling these checks with a struct file available.  Otherwise,
pass NOCRED.  All currently MAC policies use active_cred, but
could now offer the cached credential semantic used for the base
system security model.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-19 19:04:53 +00:00
Robert Watson
7f724f8b51 Break out mac_check_vnode_op() into three seperate checks:
mac_check_vnode_poll(), mac_check_vnode_read(), mac_check_vnode_write().
This improves the consistency with other existing vnode checks, and
allows policies to avoid implementing switch statements to determine
what operations they do and do not want to authorize.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-19 16:43:25 +00:00
Robert Watson
ea6027a8e1 Make similar changes to fo_stat() and fo_poll() as made earlier to
fo_read() and fo_write(): explicitly use the cred argument to fo_poll()
as "active_cred" using the passed file descriptor's f_cred reference
to provide access to the file credential.  Add an active_cred
argument to fo_stat() so that implementers have access to the active
credential as well as the file credential.  Generally modify callers
of fo_stat() to pass in td->td_ucred rather than fp->f_cred, which
was redundantly provided via the fp argument.  This set of modifications
also permits threads to perform these operations on behalf of another
thread without modifying their credential.

Trickle this change down into fo_stat/poll() implementations:

- badfo_poll(), badfo_stat(): modify/add arguments.
- kqueue_poll(), kqueue_stat(): modify arguments.
- pipe_poll(), pipe_stat(): modify/add arguments, pass active_cred to
  MAC checks rather than td->td_ucred.
- soo_poll(), soo_stat(): modify/add arguments, pass fp->f_cred rather
  than cred to pru_sopoll() to maintain current semantics.
- sopoll(): moidfy arguments.
- vn_poll(), vn_statfile(): modify/add arguments, pass new arguments
  to vn_stat().  Pass active_cred to MAC and fp->f_cred to VOP_POLL()
  to maintian current semantics.
- vn_close(): rename cred to file_cred to reflect reality while I'm here.
- vn_stat(): Add active_cred and file_cred arguments to vn_stat()
  and consumers so that this distinction is maintained at the VFS
  as well as 'struct file' layer.  Pass active_cred instead of
  td->td_ucred to MAC and to VOP_GETATTR() to maintain current semantics.

- fifofs: modify the creation of a "filetemp" so that the file
  credential is properly initialized and can be used in the socket
  code if desired.  Pass ap->a_td->td_ucred as the active
  credential to soo_poll().  If we teach the vnop interface about
  the distinction between file and active credentials, we would use
  the active credential here.

Note that current inconsistent passing of active_cred vs. file_cred to
VOP's is maintained.  It's not clear why GETATTR would be authorized
using active_cred while POLL would be authorized using file_cred at
the file system level.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-16 12:52:03 +00:00
Jeff Roberson
e6e370a7fe - Replace v_flag with v_iflag and v_vflag
- v_vflag is protected by the vnode lock and is used when synchronization
   with VOP calls is needed.
 - v_iflag is protected by interlock and is used for dealing with vnode
   management issues.  These flags include X/O LOCK, FREE, DOOMED, etc.
 - All accesses to v_iflag and v_vflag have either been locked or marked with
   mp_fixme's.
 - Many ASSERT_VOP_LOCKED calls have been added where the locking was not
   clear.
 - Many functions in vfs_subr.c were restructured to provide for stronger
   locking.

Idea stolen from:	BSD/OS
2002-08-04 10:29:36 +00:00
Robert Watson
18b770b2fb Introduce support for Mandatory Access Control and extensible
kernel access control.

Invoke appropriate MAC framework entry points to authorize readdir()
operations in the native ABI.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-01 20:44:52 +00:00
Robert Watson
f9d0d52459 Include file cleanup; mac.h and malloc.h at one point had ordering
relationship requirements, and no longer do.

Reminded by:	bde
2002-08-01 17:47:56 +00:00
Robert Watson
f4d2cfdda6 Introduce support for Mandatory Access Control and extensible
kernel access control.

Invoke appropriate MAC entry points to authorize the following
operations:

        truncate on open()                      (write)
        access()                                (access)
        readlink()                              (readlink)
        chflags(), lchflags(), fchflags()       (setflag)
        chmod(), fchmod(), lchmod()             (setmode)
        chown(), fchown(), lchown()             (setowner)
        utimes(), lutimes(), futimes()          (setutimes)
        truncate(), ftrunfcate()                (write)
        revoke()                                (revoke)
        fhopen()                                (open)
        truncate on fhopen()                    (write)
        extattr_set_fd, extattr_set_file()      (setextattr)
        extattr_get_fd, extattr_get_file()      (getextattr)
        extattr_delete_fd(), extattr_delete_file() (setextattr)

These entry points permit MAC policies to enforce a variety of
protections on vnodes.  More vnode checks to come, especially in
non-native ABIs.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-01 15:37:12 +00:00
Robert Watson
b3e13e1c3f Introduce support for Mandatory Access Control and extensible
kernel access control.

Instrument chdir() and chroot()-related system calls to invoke
appropriate MAC entry points to authorize the two operations.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-01 03:50:08 +00:00
Robert Watson
b285e7f9a8 Improve formatting and variable use consistency in extattr system
calls.

Submitted by:	green
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-01 01:29:03 +00:00
Robert Watson
956fc3f8a5 Simplify the logic to enter VFS_EXTATTRCTL().
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-01 01:26:07 +00:00
Robert Watson
2712d0ee89 Introduce support for Mandatory Access Control and extensible
kernel access control.

Implement MAC framework access control entry points relating to
operations on mountpoints.  Currently, this consists only of
access control on mountpoint listing using the various statfs()
variations.  In the future, it might also be desirable to
implement checks on mount() and unmount().

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-31 01:27:33 +00:00
Robert Watson
e66c87b70e When referencing nd_cnp after namei(), always pass SAVENAME into
NDINIT() operation flags.

Submitted by:	green
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-30 18:48:25 +00:00
Robert Watson
0b1040cb88 Set VAPPEND in open mode when O_APPEND is specified as an argument to
open() of fhopen().  Currently this has no actual affect due to the
treatment of VAPPEND in vaccess() and vaccess_acl() as a subset of
VWRITE, but when MAC comes in, MAC will distinguish the two.  Note:
if any file systems are cutting their own permission models, they
may wish to now take this into account.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-22 12:51:06 +00:00
Kirk McKusick
fb36a3d847 Change utimes to set the file creation time (for filesystems that
support creation times such as UFS2) to the value of the
modification time if the value of the modification time is older
than the current creation time. See utimes(2) for further details.

Sponsored by:	DARPA & NAI Labs.
2002-07-17 02:03:19 +00:00
Kirk McKusick
faab4e2722 Change the name of st_createtime to st_birthtime. This change is
made to reduce confusion between st_ctime and st_createtime.

Submitted by:	Eric Allman <eric@sendmail.org>
Sponsored by:	DARPA & NAI Labs.
2002-07-16 22:36:00 +00:00
John Baldwin
03d7a9fffb - Change chroot_refuse_vdir_fds() to require that the passed in struct
filedesc is already locked rather than having chroot() unlock the
  filedesc so chroot_refuse_vdir_fds() can immediately relock it.
- Reorder chroot() a bitso that we do the namei lookup before checking
  the process's struct filedesc.  This closes at least one potential race
  and allows us to only acquire the filedsec lock once in chroot().
- Push down Giant slightly into chroot().
2002-07-13 04:07:12 +00:00
Maxime Henrion
2b4edb69f1 Move every code related to mount(2) in a new file, vfs_mount.c.
The file vfs_conf.c which was dealing with root mounting has
been repo-copied into vfs_mount.c to preserve history.
This makes nmount related development easier, and help reducing
the size of vfs_syscalls.c, which is still an enormous file.

Reviewed by:	rwatson
Repo-copy by:	peter
2002-07-02 17:09:22 +00:00
Ian Dowse
6bd521df93 Use indirect function pointer hooks instead of #ifdef SOFTUPDATES
direct calls for the two places where the kernel calls into soft
updates code. Set up the hooks in softdep_initialize() and NULL
them out in softdep_uninitialize(). This change allows soft updates
to function correctly when ufs is loaded as a module.

Reviewed by:	mckusick
2002-07-01 17:59:40 +00:00
Alfred Perlstein
a788442584 Remove unneeded casts to caddr_t. 2002-06-28 23:02:38 +00:00
Ian Dowse
84b2995b2f In vn_mkdir(), use vrele() instead of vput() on the parent directory
vnode in the case that the target exists and is the same vnode as
the parent (i.e. "mkdir ."). The namei() call does not leave the
vnode locked in this case even though you might expect it to.

This bug was mostly harmless in practice because unlocking an already
unlocked vnode currently does not trigger any panics or warnings.

Reviewed by:	jeff
2002-06-28 20:06:47 +00:00
Kirk McKusick
d374764fd3 Use proper size in bzero of stat structure.
Submitted by:	Jake Burkholder <jake@locore.ca>
Sponsored by:	DARPA & NAI Labs.
2002-06-24 07:14:44 +00:00
Kirk McKusick
6524dddcd5 This patch fixes a size problem with the stat structure for
64-bit architectures that was introduced in the UFS2 code
merge two days ago. The stat structure change that caused
the problem was the addition of the file create time.

Submitted by:	Bruce Evans <bde@zeta.org.au>
Sponsored by:	DARPA & NAI Labs.
2002-06-22 22:01:13 +00:00
Maxime Henrion
cacd1c9b49 o Remove the initialization of unused fields in the struct
uio now that we don't use uiomove() anymore.
o Enforce stricter checks on the length of the iov's in
  nmount(2) since we now malloc() them individually and
  corrupted iov's could make the kernel crash in malloc()
  with "kmem_map too small".

Reviewed by:	phk
2002-06-22 18:07:05 +00:00
Kirk McKusick
1c85e6a35d This commit adds basic support for the UFS2 filesystem. The UFS2
filesystem expands the inode to 256 bytes to make space for 64-bit
block pointers. It also adds a file-creation time field, an ability
to use jumbo blocks per inode to allow extent like pointer density,
and space for extended attributes (up to twice the filesystem block
size worth of attributes, e.g., on a 16K filesystem, there is space
for 32K of attributes). UFS2 fully supports and runs existing UFS1
filesystems. New filesystems built using newfs can be built in either
UFS1 or UFS2 format using the -O option. In this commit UFS1 is
the default format, so if you want to build UFS2 format filesystems,
you must specify -O 2. This default will be changed to UFS2 when
UFS2 proves itself to be stable. In this commit the boot code for
reading UFS2 filesystems is not compiled (see /sys/boot/common/ufsread.c)
as there is insufficient space in the boot block. Once the size of the
boot block is increased, this code can be defined.

Things to note: the definition of SBSIZE has changed to SBLOCKSIZE.
The header file <ufs/ufs/dinode.h> must be included before
<ufs/ffs/fs.h> so as to get the definitions of ufs2_daddr_t and
ufs_lbn_t.

Still TODO:
Verify that the first level bootstraps work for all the architectures.
Convert the utility ffsinfo to understand UFS2 and test growfs.
Add support for the extended attribute storage. Update soft updates
to ensure integrity of extended attribute storage. Switch the
current extended attribute interfaces to use the extended attribute
storage. Add the extent like functionality (framework is there,
but is currently never used).

Sponsored by: DARPA & NAI Labs.
Reviewed by:	Poul-Henning Kamp <phk@freebsd.org>
2002-06-21 06:18:05 +00:00
Maxime Henrion
7d2d440991 Change the way we internally store the mount options to
a linked list.  This is to allow the merging of the mount
options in the MNT_UPDATE case, as the current data structure
is unsuitable for this.

There are no functional differences in this commit.

Reviewed by:	phk
2002-06-20 20:03:42 +00:00
Maxime Henrion
8eb0098f4c Remove a duplicated vfs_freeopts() that I introduced in last
revision.
2002-05-28 13:27:55 +00:00
Maxime Henrion
2274ec995c Style nit, no functional changes. 2002-05-23 23:22:22 +00:00
Maxime Henrion
9ee6bf717f Slightly change the way we pass mount options to the filesystem
VFS_NMOUNT operations.

Reviewed by:	phk
2002-05-23 23:02:19 +00:00
Maxime Henrion
e9e705b0df Change two vput() that should have been vrele().
Submitted by:	iedowse
2002-05-20 14:59:43 +00:00
Tom Rhodes
d394511de3 More s/file system/filesystem/g 2002-05-16 21:28:32 +00:00
Jeff Roberson
0e2d6cc899 Disable the shared locking namei() code for now. It breaks several stacking
filesystems.  This is on hold until the rest of VFS Locking is reviewed and
deemed safe.  It can be enabled with 'options LOOKUP_SHARED'.
2002-05-14 21:59:49 +00:00
Maxime Henrion
9d997d8be8 Add the lchflags(2) syscall.
Reviewed by:	rwatson
2002-05-05 23:47:41 +00:00
Jeff Roberson
576365ba36 Move a KASSERT() in open() prior to unlocking the vnode. It's not safe to
call VOP_GETVOBJECT without a lock.
2002-05-05 23:17:13 +00:00
Maxime Henrion
afd458b0fa Fix a typo.
Submitted by:	dwmalone
2002-05-04 19:50:09 +00:00
Robert Watson
7a0776e477 Slightly restructure extattr_get_vp() so that there's only one entry point
to VOP_GETEXTATTR().  This simplifies code flow when inserting MAC hooks.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-04-23 01:27:38 +00:00
Robert Watson
0510317039 Improve style consistency of vfs_syscalls.c by converting the style used
in various extattr_*() calls to match the rest of the file.  Originally,
these bits at the end looked more like style(9).  This patch was submitted
by green by way of the TrustedBSD MAC tree, and I fixed a few problems
with it on the way through.  Someone with more time on their hands should
convert the entire file to style(9); this commit is for diff reduction
purposes.

Submitted by:	green
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-04-20 01:37:08 +00:00
Ian Dowse
df99ca52f1 The recent NFS forced unmount improvements introduced a side-effect
where some client operations might be unexpectedly cancelled during
an unsuccessful non-forced unmount attempt. This causes problems
for amd(8), because it periodically attempts a non-forced unmount
to check if the filesystem is still in use.

Fix this by adding a new mountpoint flag MNTK_UNMOUNTF that is set
only during the operation of a forced unmount. Use this instead of
MNTK_UNMOUNT to trigger the cancellation of hung NFS operations.

Also correct a problem where dounmount() might inadvertently clear
the MNTK_UNMOUNT flag.

Reported by:	simokawa
MFC after:	1 week
2002-04-17 01:07:29 +00:00
Jeff Roberson
a59f8b9e6c Turn #ifdef LOOKUP_SHARED into #ifndef LOOKUP_EXCLUSIVE to enable this
behavior by default.  Also, change the options line to reflect this.

If there are no problems reported this will become the only behavior and the
knob will be removed in a month or so.

Demanded by:	obrien
2002-04-09 05:14:17 +00:00
Maxime Henrion
a48ca36999 The fourth parameter to copystr() is a size_t, not an int.
Approved by:	peter
2002-04-08 21:14:19 +00:00
Maxime Henrion
9d8353732e o Change kernel_vmount() interface to be more convenient : pass two
separate strings instead of passing "foo=bar".
o Don't forget to clear the VMOUNT flag on the vnode when vfs_nmount()
  fails because the fs doesn't implement VFS_NMOUNT (and in vfs_mount()
  when the fs doesn't implement VFS_MOUNT) ; also decrement the vfs
  refcount in the !MNT_UPDATE case.
2002-04-07 13:22:47 +00:00
Maxime Henrion
bcc931752f Add two forgotten vfs_unbusy() calls, in vfs_mount() and vfs_nmount().
Reviewed by:	phk
2002-04-03 12:19:03 +00:00
John Baldwin
44731cab3b Change the suser() API to take advantage of td_ucred as well as do a
general cleanup of the API.  The entire API now consists of two functions
similar to the pre-KSE API.  The suser() function takes a thread pointer
as its only argument.  The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0.  The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.

Discussed on:	smp@
2002-04-01 21:31:13 +00:00
Maxime Henrion
daab5e2472 - Properly sync vfs_nmount() with changes that have be already done
in vfs_mount(), in particular revisions 1.215, 1.227 and 1.240.
- flag2 is a low quality variable name, change it to kern_flag.
- strncpy NUL-terminates f_fstypename and f_mntonname since the strings
  have length <= <buffer length> - 1, so the explicit NUL-termination is
  bogus.
- M_ZERO'ing space for fstype and fspath is stupid since we never use the
  space beyond the end of the string.
- Do various style(9) cleanups in both functions.

Submitted by:	bde
Reviewed by:	phk
2002-03-28 13:47:32 +00:00
Andrew R. Reiter
dcce8874eb - Fixup a few style nits:
- return error -> return (error);
  - move a declaration to the top of the function.
  - become bug for bug compatible with if (error) lines.

Submitted by: bde
2002-03-26 18:07:10 +00:00
Maxime Henrion
17594b936b As discussed in -arch, add the new nmount(2) system call and the
new vfs_getopt()/vfs_copyopt() API.  This is intended to be used
later, when there will be filesystems implementing the VFS_NMOUNT
operation.  The mount(2) system call will disappear when all
filesystems will be converted to the new API.  Documentation will
be committed in a while.

Reviewed by:	phk
2002-03-26 15:33:44 +00:00
Andrew R. Reiter
517f30c2c1 - Recommit the securelevel_gt() calls removed by commits rev. 1.84 of
kern_linker.c and rev. 1.237 of vfs_syscalls.c since these are not the
  source of the recent panics occuring around kldloading file system
  support modules.

Requested by: rwatson
2002-03-25 18:26:34 +00:00
Andrew R. Reiter
fe3240e9aa - Back out the commit to make the linker_load_file() securelevel check
made aware in jail environments.  Supposedly something is broken, so
  this should be backed out until further investigation proves otherwise,
  or a proper fix can be provided.
2002-03-22 04:56:09 +00:00
Andrew R. Reiter
e85b9ae9ac - Fix a logic error in checking the securelevel that was introduced in the
previous commit.

Pointy hats to: arr, rwatson
2002-03-21 15:27:39 +00:00
Andrew R. Reiter
c457a4403a - Change a check of securelevel to securelevel_gt() call in order to help
against users within a jail attempting to load kernel modules.
- Add a check of securelevel_gt() to vfs_mount() in order to chop some
  low hanging fruit for the repair of securelevel checking of linking and
  unlinking files from within jails.  There is more to be done here.

Reviewed by: rwatson
2002-03-20 16:03:42 +00:00
Jeff Roberson
c897b81311 Remove references to vm_zone.h and switch over to the new uma API.
Also, remove maxsockets.  If you look carefully you'll notice that the old
zone allocator never honored this anyway.
2002-03-20 04:09:59 +00:00
Alfred Perlstein
4d77a549fe Remove __P. 2002-03-19 21:25:46 +00:00
Alfred Perlstein
4a950215ef Close a race when vfs_syscalls.c:checkdirs() runs.
To do this protect the filedesc pointer in the proc with PROC_LOCK
in both checkdirs() and kern_descrip.c:fdfree().
2002-03-19 04:30:04 +00:00
Jeff Roberson
8de00f4a87 This patch adds the "LOCKSHARED" option to namei which causes it to only acquire shared locks on leafs.
The stat() and open() calls have been changed to make use of this new functionality.  Using shared locks in
these cases is sufficient and can significantly reduce their latency if IO is pending to these vnodes.  Also,
this reduces the number of exclusive locks that are floating around in the system, which helps reduce the
number of deadlocks that occur.

A new kernel option "LOOKUP_SHARED" has been added.  It defaults to off so this patch can be turned on for
testing, and should eventually go away once it is proven to be stable.  I have personally been running this
patch for over a year now, so it is believed to be fully stable.

Reviewed by:	jake, obrien
Approved by:	jake
2002-03-12 04:00:11 +00:00
Robert Watson
89e1164ee2 Three p_ucred -> td_ucred's missed in jhb's earlier pass; all appear to
be safe.
2002-03-05 19:45:45 +00:00
Robert Watson
b0ad6e203a The change from td->td_proc->p_ucred to td->td_ucred has shortened some
lines: more agressively line wrap under those circumstances.
2002-03-05 19:31:25 +00:00
John Baldwin
bdd67d483c - Change namei() to use td_ucred instead of p_ucred.
- Change the hack in access() that uses a temporary credential to set
  td_ucred to the temp cred instead of p_ucred.
2002-02-27 19:15:29 +00:00
John Baldwin
a854ed9893 Simple p_ucred -> td_ucred changes to start using the per-thread ucred
reference.
2002-02-27 18:32:23 +00:00
Robert Watson
1ea030d8fe Make sure to hold vnode lock when calling into VOP_GETATTR().
Discussed with:	mckusick, phk
2002-02-10 21:44:30 +00:00
Robert Watson
c0a9dc83c8 Make sure to grab vnode lock on a vnode before calling VOP_GETATTR()
to perform an ownership test in revoke().  This is also required for
MAC hooks so that the vnode lock is held during a call to the MAC
framework.  Release the lock before calling VOP_REVOKE().

Discussed with:	phk, mckusick
2002-02-10 20:45:43 +00:00
Robert Watson
56e04d01c0 Remove a stray 'const' that slept into extattr_set_vp(), and could
result in compiler warnings.
2002-02-10 05:31:55 +00:00
Robert Watson
74237f55b0 Part I: Update extended attribute API and ABI:
o Modify the system call syntax for extattr_{get,set}_{fd,file}() so
  as not to use the scatter gather API (which appeared not to be used
  by any consumers, and be less portable), rather, accepts 'data'
  and 'nbytes' in the style of other simple read/write interfaces.
  This changes the API and ABI.

o Modify system call semantics so that extattr_get_{fd,file}() return
  a size_t.  When performing a read, the number of bytes read will
  be returned, unless the data pointer is NULL, in which case the
  number of bytes of data are returned.  This changes the API only.

o Modify the VOP_GETEXTATTR() vnode operation to accept a *size_t
  argument so as to return the size, if desirable.  If set to NULL,
  the size will not be returned.

o Update various filesystems (pseodofs, ufs) to DTRT.

These changes should make extended attributes more useful and more
portable.  More commits to rebuild the system call files, as well
as update userland utilities to follow.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-02-10 04:43:22 +00:00
Robert Watson
143bb598d0 o Merge various recent fixes from the MAC branch relating to extattrctl():
- Fix null-pointer dereference introduced when snapshotting
	  was introduced.  This occured because unlike the previous code,
	  vn_start_write() doesn't always return a non-NULL mp, as
	  filesystems may not support the VOP_GETWRITEMOUNT() call.  For
	  now, rely on two pointers, so that vn_finished_write() works
	  properly.
	- Fix locking problems on exit, introduced at some past time,
	  some when snapshots came in, where a vnode might not be
	  unlocked before being vrele'd in various error situations.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-02-08 05:58:41 +00:00
Julian Elischer
079b7badea Pre-KSE/M3 commit.
this is a low-functionality change that changes the kernel to access the main
thread of a process via the linked list of threads rather than
assuming that it is embedded in the process. It IS still embeded there
but remove all teh code that assumes that in preparation for the next commit
which will actually move it out.

Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,
2002-02-07 20:58:47 +00:00
Alfred Perlstein
b7184973ed Don't recurse on filedesc lock in chroot_refuse_vdir_fds().
Noticed by: Michael Nottebrock <michaelnottebrock@gmx.net>
2002-02-01 18:27:16 +00:00
Alfred Perlstein
a4db49537b Replace ffind_* with fget calls.
Make fget MPsafe.

Make fgetvp and fgetsock use the fget subsystem to reduce code bloat.

Push giant down in fpathconf().
2002-01-14 00:13:45 +00:00
Alfred Perlstein
426da3bcfb SMP Lock struct file, filedesc and the global file list.
Seigo Tanimura (tanimura) posted the initial delta.

I've polished it quite a bit reducing the need for locking and
adapting it for KSE.

Locks:

1 mutex in each filedesc
   protects all the fields.
   protects "struct file" initialization, while a struct file
     is being changed from &badfileops -> &pipeops or something
     the filedesc should be locked.

1 mutex in each struct file
   protects the refcount fields.
   doesn't protect anything else.
   the flags used for garbage collection have been moved to
     f_gcflag which was the FILLER short, this doesn't need
     locking because the garbage collection is a single threaded
     container.
  could likely be made to use a pool mutex.

1 sx lock for the global filelist.

struct file *	fhold(struct file *fp);
        /* increments reference count on a file */

struct file *	fhold_locked(struct file *fp);
        /* like fhold but expects file to locked */

struct file *	ffind_hold(struct thread *, int fd);
        /* finds the struct file in thread, adds one reference and
                returns it unlocked */

struct file *	ffind_lock(struct thread *, int fd);
        /* ffind_hold, but returns file locked */

I still have to smp-safe the fget cruft, I'll get to that asap.
2002-01-13 11:58:06 +00:00
Ian Dowse
1f493270a1 Change dounmount() to return EBUSY in the non-MNT_FORCE case if we
can't acquire the mnt_lock without blocking. Normally non-forced
unmount attempts return EBUSY quickly if any vnodes are active, so
this just extends that behaviour to cover the per-mount mnt_lock
too.
2002-01-10 01:59:30 +00:00
Stefan Eßer
10cc6dff87 Return EBADF in case some vnode field has been reset to a NULL pointer.
(There has been some discussion, whether ENOENT or EBADF is more
appropriate. I choose the latter, since the operation is not supported
on the file descriptor at that time, even if it was, immediately before.)

PR:		32681
Reviewed by:	dillon, iedowse, ...
Approved by:	nectar
MFC after:	3 days
		(pending RE approval)
2002-01-03 09:54:24 +00:00
Poul-Henning Kamp
751a2cd05b Define a new mount flag "MNT_JAILDEVFS"
Collect the magic combination of flags which can be updated into
a macro in sys/mount.h rather than inlining them (twice!) in
vfs_syscalls.c
2001-11-05 10:33:45 +00:00
Matthew Dillon
6b8bd2efc1 Add mnt_reservedvnlist so we can MFC to 4.x, in order to make all mount
structure changes now rather then piecemeal later on.  mnt_nvnodelist
currently holds all the vnodes under the mount point.  This will eventually
be split into a 'dirty' and 'clean' list.  This way we only break kld's once
rather then twice.  nvnodelist will eventually turn into the dirty list
and should remain compatible with the klds.
2001-11-04 18:55:42 +00:00
Robert Watson
cd778f0244 o Remove the local temporary variable "struct proc *p" from vfs_mount()
in vfs_syscalls.c.  Although it did save some indirection, many of
  those savings will be obscured with the impending commit of suser()
  changes, and the result is increased code complexity.  Also, once
  p->p_ucred and td->td_ucred are distinguished, this will make
  vfs_mount() use the correct thread credential, rather than the
  process credential.
2001-11-02 21:11:41 +00:00
Poul-Henning Kamp
0bd1a2d087 Argh!
patch added the nmount at the bottom first time around.

Take 3!
2001-11-02 19:12:06 +00:00
Poul-Henning Kamp
bad699770a Add empty shell for nmount syscall (take 2!) 2001-11-02 18:35:54 +00:00
Poul-Henning Kamp
06d133c475 Add nmount() stub function and regenerate the syscall-glue which should
not need to check in generated files.
2001-11-02 17:59:23 +00:00
Matthew Dillon
a06fe5111e unwind v_writecount in fhopen() if we are unable to allocate the
descriptor.

MFC after:	3 days
2001-10-24 18:32:17 +00:00
Matthew Dillon
c72ccd014d Change the vnode list under the mount point from a LIST to a TAILQ
in preparation for an implementation of limiting code for kern.maxvnodes.

MFC after:	3 days
2001-10-23 01:21:29 +00:00
Robert Watson
c6ab2f6b4e o Complete the migration from suser error checking in the following form
in vfs_syscalls.c:

    if (mp->mnt_stat.f_owner != p->p_ucred->cr_uid &&
        (error = suser_td(td)) != 0) {
            unwrap_lots_of_stuff();
            return (error);
    }

  to:

    if (mp->mnt_stat.f_owner != p->p_ucred->cr_uid) {
            error = suser_td(td);
            if (error) {
                unwrap_lots_of_stuff();
                return (error);
            }
    }

  This makes the code more readable when complex clauses are in use,
  and minimizes conflicts for large outstanding patchsets modifying the
  kernel authorization code (of which I have several), especially where
  existing authorization and context code are combined in the same if()
  conditional.

Obtained from:	TrustedBSD Project
2001-10-01 20:01:07 +00:00
Robert Watson
b4799065ef o vpaccess() -> vn_access() -- Peter reminds me that there is already
a convention for vnop helper routines of this sort.

Submitted by:	Mr Wemm <peter>
2001-09-22 03:07:41 +00:00
Robert Watson
9c94f7731e o Introduce eaccess(2), a version of access(2) that uses the effective
credentials rather than the real credentials.  This is useful for
  implementing GUI's which need to modify icons based on access rights,
  but where use of open(2) is too expensive, use of stat(2) doesn't
  reflect the file system's real protection model, and use of
  access() suffers from real/effective credential confusion.  This
  implementation provides the same semantics as the call of the same
  name on SCO OpenServer.  Note: using this call improperly can
  leave you subject to some of the same races present in the
  access(2) call.
o To implement this, break out the basic logic of access(2) into
  vpaccess(), which accepts a passed credential to perform the
  invocation of VOP_ACCESS().  Add eaccess(2) to invoke vpaccess(),
  and modify access(2) to use vpaccess().

Obtained from:	TrustedBSD Project
2001-09-21 21:33:22 +00:00
Julian Elischer
b40ce4165d KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after:    ha ha ha ha
2001-09-12 08:38:13 +00:00
Andrey A. Chernov
63347f1e8f lseek: simplify overflow checks 2001-08-29 18:35:53 +00:00
Andrey A. Chernov
c4778eed9f Cosmetique & style fixes from bde 2001-08-26 10:23:49 +00:00
Andrey A. Chernov
db106eff39 lseek: fix check for vattr.va_size overflow. Check suggested by bde simple not
works with unsigned types.
2001-08-23 17:01:25 +00:00
Andrey A. Chernov
b82f5b624c Cosmetique: more <sys/*> into one group, separate include families by
blank line
2001-08-23 13:51:17 +00:00
Andrey A. Chernov
383f169d4a Make lseek() POSIXed: for non character special files
1) handle off_t overflow with EOVERFLOW
2) handle negative offsets with EINVAL

Reviewed by:	arch discussion
2001-08-21 21:20:42 +00:00
Ian Dowse
8774836bf8 Avoid sleeping while holding a mutex in dounmount(). This problem
has existed for a long time, but I made it worse a few months ago
by by adding calls to VFS_ROOT() and checkdirs() in revision 1.179.

Also, remove the LK_REENABLE flag in the lockmgr() call; this flag
has been ignored by the lockmgr code for 4 years. This was the only
remaining mention of it apart from its definition.

Reviewed by:	jhb
2001-08-20 19:16:31 +00:00
Ian Dowse
a9a8ba3d71 Arbitrarily limit to 64k the number of bytes that can be read at
a time using the ogetdirentries() compatibility syscall. This is a
hack to ensure that rediculous values don't get passed to MALLOC().

Reviewed by:	kris
2001-08-10 22:14:18 +00:00
Dag-Erling Smørgrav
f0cc1c6f81 Constify the fstype argument to vfs_mount(). This eliminates at least one
"call discards qualifier" warning (in sys/compat/linux/linux_file.c).
2001-07-09 19:11:51 +00:00
Matthew Dillon
0cddd8f023 With Alfred's permission, remove vm_mtx in favor of a fine-grained approach
(this commit is just the first stage).  Also add various GIANT_ macros to
formalize the removal of Giant, making it easy to test in a more piecemeal
fashion. These macros will allow us to test fine-grained locks to a degree
before removing Giant, and also after, and to remove Giant in a piecemeal
fashion via sysctl's on those subsystems which the authors believe can
operate without Giant.
2001-07-04 16:20:28 +00:00
Thomas Moestl
c0a0fb85e2 Fix an instance of NDINIT in the extattrctl syscall: LOCKLEAF was or'ed
to the operation parameter, not to the flags as it should be.

Reviewed by:	rwatson
2001-06-06 23:34:38 +00:00
Robert Watson
b1fc0ec1a7 o Merge contents of struct pcred into struct ucred. Specifically, add the
real uid, saved uid, real gid, and saved gid to ucred, as well as the
  pcred->pc_uidinfo, which was associated with the real uid, only rename
  it to cr_ruidinfo so as not to conflict with cr_uidinfo, which
  corresponds to the effective uid.
o Remove p_cred from struct proc; add p_ucred to struct proc, replacing
  original macro that pointed.
  p->p_ucred to p->p_cred->pc_ucred.
o Universally update code so that it makes use of ucred instead of pcred,
  p->p_ucred instead of p->p_pcred, cr_ruidinfo instead of p_uidinfo,
  cr_{r,sv}{u,g}id instead of p_*, etc.
o Remove pcred0 and its initialization from init_main.c; initialize
  cr_ruidinfo there.
o Restruction many credential modification chunks to always crdup while
  we figure out locking and optimizations; generally speaking, this
  means moving to a structure like this:
        newcred = crdup(oldcred);
        ...
        p->p_ucred = newcred;
        crfree(oldcred);
  It's not race-free, but better than nothing.  There are also races
  in sys_process.c, all inter-process authorization, fork, exec, and
  exit.
o Remove sigio->sio_ruid since sigio->sio_ucred now contains the ruid;
  remove comments indicating that the old arrangement was a problem.
o Restructure exec1() a little to use newcred/oldcred arrangement, and
  use improved uid management primitives.
o Clean up exit1() so as to do less work in credential cleanup due to
  pcred removal.
o Clean up fork1() so as to do less work in credential cleanup and
  allocation.
o Clean up ktrcanset() to take into account changes, and move to using
  suser_xxx() instead of performing a direct uid==0 comparision.
o Improve commenting in various kern_prot.c credential modification
  calls to better document current behavior.  In a couple of places,
  current behavior is a little questionable and we need to check
  POSIX.1 to make sure it's "right".  More commenting work still
  remains to be done.
o Update credential management calls, such as crfree(), to take into
  account new ruidinfo reference.
o Modify or add the following uid and gid helper routines:
      change_euid()
      change_egid()
      change_ruid()
      change_rgid()
      change_svuid()
      change_svgid()
  In each case, the call now acts on a credential not a process, and as
  such no longer requires more complicated process locking/etc.  They
  now assume the caller will do any necessary allocation of an
  exclusive credential reference.  Each is commented to document its
  reference requirements.
o CANSIGIO() is simplified to require only credentials, not processes
  and pcreds.
o Remove lots of (p_pcred==NULL) checks.
o Add an XXX to authorization code in nfs_lock.c, since it's
  questionable, and needs to be considered carefully.
o Simplify posix4 authorization code to require only credentials, not
  processes and pcreds.  Note that this authorization, as well as
  CANSIGIO(), needs to be updated to use the p_cansignal() and
  p_cansched() centralized authorization routines, as they currently
  do not take into account some desirable restrictions that are handled
  by the centralized routines, as well as being inconsistent with other
  similar authorization instances.
o Update libkvm to take these changes into account.

Obtained from:	TrustedBSD Project
Reviewed by:	green, bde, jhb, freebsd-arch, freebsd-audit
2001-05-25 16:59:11 +00:00
John Baldwin
bdc60f5bd3 Don't release Giant around vm_oject_page_clean() in fsync() as the pager
putpages called will need Giant.
2001-05-23 22:55:13 +00:00
Ruslan Ermilov
99d300a1ec - FDESC, FIFO, NULL, PORTAL, PROC, UMAP and UNION file
systems were repo-copied from sys/miscfs to sys/fs.

- Renamed the following file systems and their modules:
  fdesc -> fdescfs, portal -> portalfs, union -> unionfs.

- Renamed corresponding kernel options:
  FDESC -> FDESCFS, PORTAL -> PORTALFS, UNION -> UNIONFS.

- Install header files for the above file systems.

- Removed bogus -I${.CURDIR}/../../sys CFLAGS from userland
  Makefiles.
2001-05-23 09:42:29 +00:00
Alfred Perlstein
2395531439 Introduce a global lock for the vm subsystem (vm_mtx).
vm_mtx does not recurse and is required for most low level
vm operations.

faults can not be taken without holding Giant.

Memory subsystems can now call the base page allocators safely.

Almost all atomic ops were removed as they are covered under the
vm mutex.

Alpha and ia64 now need to catch up to i386's trap handlers.

FFS and NFS have been tested, other filesystems will need minor
changes (grabbing the vm lock when twiddling page properties).

Reviewed (partially) by: jake, jhb
2001-05-19 01:28:09 +00:00
Greg Lehey
60fb0ce365 Revert consequences of changes to mount.h, part 2.
Requested by:	bde
2001-04-29 02:45:39 +00:00
Greg Lehey
d98dc34f52 Correct #includes to work with fixed sys/mount.h. 2001-04-23 09:05:15 +00:00
Robert Watson
fec605c882 o Introduce extattr_{delete,get,set}_fd() to allow extended attribute
operations on file descriptors, which complement the existing set of
  calls, extattr_{delete,get,set}_file() which act on paths.  In doing
  so, restructure the system call implementation such that the two sets
  of functions share most of the relevant code, rather than duplicating
  it.  This pushes the vnode locking into the shared code, but keeps
  the copying in of some arguments in the system call code.  Allowing
  access via file descriptors reduces the opportunity for race
  conditions when managing extended attributes.

Obtained from:	TrustedBSD Project
2001-03-31 16:20:05 +00:00
John Baldwin
1005a129e5 Convert the allproc and proctree locks from lockmgr locks to sx locks. 2001-03-28 11:52:56 +00:00