237 Commits

Author SHA1 Message Date
arr
fc49faf982 - Back out the commit to make the linker_load_file() securelevel check
made aware in jail environments.  Supposedly something is broken, so
  this should be backed out until further investigation proves otherwise,
  or a proper fix can be provided.
2002-03-22 04:56:09 +00:00
arr
68e226a99e - Fix a logic error in checking the securelevel that was introduced in the
previous commit.

Pointy hats to: arr, rwatson
2002-03-21 15:27:39 +00:00
arr
fc9167c193 - Change a check of securelevel to securelevel_gt() call in order to help
against users within a jail attempting to load kernel modules.
- Add a check of securelevel_gt() to vfs_mount() in order to chop some
  low hanging fruit for the repair of securelevel checking of linking and
  unlinking files from within jails.  There is more to be done here.

Reviewed by: rwatson
2002-03-20 16:03:42 +00:00
jeff
318cbeeecf Remove references to vm_zone.h and switch over to the new uma API.
Also, remove maxsockets.  If you look carefully you'll notice that the old
zone allocator never honored this anyway.
2002-03-20 04:09:59 +00:00
alfred
357e37e023 Remove __P. 2002-03-19 21:25:46 +00:00
alfred
21fc25cfdf Close a race when vfs_syscalls.c:checkdirs() runs.
To do this protect the filedesc pointer in the proc with PROC_LOCK
in both checkdirs() and kern_descrip.c:fdfree().
2002-03-19 04:30:04 +00:00
jeff
e6d26e8880 This patch adds the "LOCKSHARED" option to namei which causes it to only acquire shared locks on leafs.
The stat() and open() calls have been changed to make use of this new functionality.  Using shared locks in
these cases is sufficient and can significantly reduce their latency if IO is pending to these vnodes.  Also,
this reduces the number of exclusive locks that are floating around in the system, which helps reduce the
number of deadlocks that occur.

A new kernel option "LOOKUP_SHARED" has been added.  It defaults to off so this patch can be turned on for
testing, and should eventually go away once it is proven to be stable.  I have personally been running this
patch for over a year now, so it is believed to be fully stable.

Reviewed by:	jake, obrien
Approved by:	jake
2002-03-12 04:00:11 +00:00
rwatson
8d5b7b21f3 Three p_ucred -> td_ucred's missed in jhb's earlier pass; all appear to
be safe.
2002-03-05 19:45:45 +00:00
rwatson
191712b565 The change from td->td_proc->p_ucred to td->td_ucred has shortened some
lines: more agressively line wrap under those circumstances.
2002-03-05 19:31:25 +00:00
jhb
56094fdb72 - Change namei() to use td_ucred instead of p_ucred.
- Change the hack in access() that uses a temporary credential to set
  td_ucred to the temp cred instead of p_ucred.
2002-02-27 19:15:29 +00:00
jhb
3706cd3509 Simple p_ucred -> td_ucred changes to start using the per-thread ucred
reference.
2002-02-27 18:32:23 +00:00
rwatson
2bbae54d18 Make sure to hold vnode lock when calling into VOP_GETATTR().
Discussed with:	mckusick, phk
2002-02-10 21:44:30 +00:00
rwatson
aa54f85939 Make sure to grab vnode lock on a vnode before calling VOP_GETATTR()
to perform an ownership test in revoke().  This is also required for
MAC hooks so that the vnode lock is held during a call to the MAC
framework.  Release the lock before calling VOP_REVOKE().

Discussed with:	phk, mckusick
2002-02-10 20:45:43 +00:00
rwatson
eef638ac93 Remove a stray 'const' that slept into extattr_set_vp(), and could
result in compiler warnings.
2002-02-10 05:31:55 +00:00
rwatson
6ca91a055c Part I: Update extended attribute API and ABI:
o Modify the system call syntax for extattr_{get,set}_{fd,file}() so
  as not to use the scatter gather API (which appeared not to be used
  by any consumers, and be less portable), rather, accepts 'data'
  and 'nbytes' in the style of other simple read/write interfaces.
  This changes the API and ABI.

o Modify system call semantics so that extattr_get_{fd,file}() return
  a size_t.  When performing a read, the number of bytes read will
  be returned, unless the data pointer is NULL, in which case the
  number of bytes of data are returned.  This changes the API only.

o Modify the VOP_GETEXTATTR() vnode operation to accept a *size_t
  argument so as to return the size, if desirable.  If set to NULL,
  the size will not be returned.

o Update various filesystems (pseodofs, ufs) to DTRT.

These changes should make extended attributes more useful and more
portable.  More commits to rebuild the system call files, as well
as update userland utilities to follow.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-02-10 04:43:22 +00:00
rwatson
d45677811a o Merge various recent fixes from the MAC branch relating to extattrctl():
- Fix null-pointer dereference introduced when snapshotting
	  was introduced.  This occured because unlike the previous code,
	  vn_start_write() doesn't always return a non-NULL mp, as
	  filesystems may not support the VOP_GETWRITEMOUNT() call.  For
	  now, rely on two pointers, so that vn_finished_write() works
	  properly.
	- Fix locking problems on exit, introduced at some past time,
	  some when snapshots came in, where a vnode might not be
	  unlocked before being vrele'd in various error situations.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-02-08 05:58:41 +00:00
julian
b5eb64d6f0 Pre-KSE/M3 commit.
this is a low-functionality change that changes the kernel to access the main
thread of a process via the linked list of threads rather than
assuming that it is embedded in the process. It IS still embeded there
but remove all teh code that assumes that in preparation for the next commit
which will actually move it out.

Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,
2002-02-07 20:58:47 +00:00
alfred
bfbf894c82 Don't recurse on filedesc lock in chroot_refuse_vdir_fds().
Noticed by: Michael Nottebrock <michaelnottebrock@gmx.net>
2002-02-01 18:27:16 +00:00
alfred
1f82bc18d1 Replace ffind_* with fget calls.
Make fget MPsafe.

Make fgetvp and fgetsock use the fget subsystem to reduce code bloat.

Push giant down in fpathconf().
2002-01-14 00:13:45 +00:00
alfred
844237b396 SMP Lock struct file, filedesc and the global file list.
Seigo Tanimura (tanimura) posted the initial delta.

I've polished it quite a bit reducing the need for locking and
adapting it for KSE.

Locks:

1 mutex in each filedesc
   protects all the fields.
   protects "struct file" initialization, while a struct file
     is being changed from &badfileops -> &pipeops or something
     the filedesc should be locked.

1 mutex in each struct file
   protects the refcount fields.
   doesn't protect anything else.
   the flags used for garbage collection have been moved to
     f_gcflag which was the FILLER short, this doesn't need
     locking because the garbage collection is a single threaded
     container.
  could likely be made to use a pool mutex.

1 sx lock for the global filelist.

struct file *	fhold(struct file *fp);
        /* increments reference count on a file */

struct file *	fhold_locked(struct file *fp);
        /* like fhold but expects file to locked */

struct file *	ffind_hold(struct thread *, int fd);
        /* finds the struct file in thread, adds one reference and
                returns it unlocked */

struct file *	ffind_lock(struct thread *, int fd);
        /* ffind_hold, but returns file locked */

I still have to smp-safe the fget cruft, I'll get to that asap.
2002-01-13 11:58:06 +00:00
iedowse
83b07d10e7 Change dounmount() to return EBUSY in the non-MNT_FORCE case if we
can't acquire the mnt_lock without blocking. Normally non-forced
unmount attempts return EBUSY quickly if any vnodes are active, so
this just extends that behaviour to cover the per-mount mnt_lock
too.
2002-01-10 01:59:30 +00:00
se
4f45ba2c53 Return EBADF in case some vnode field has been reset to a NULL pointer.
(There has been some discussion, whether ENOENT or EBADF is more
appropriate. I choose the latter, since the operation is not supported
on the file descriptor at that time, even if it was, immediately before.)

PR:		32681
Reviewed by:	dillon, iedowse, ...
Approved by:	nectar
MFC after:	3 days
		(pending RE approval)
2002-01-03 09:54:24 +00:00
phk
235f3ed483 Define a new mount flag "MNT_JAILDEVFS"
Collect the magic combination of flags which can be updated into
a macro in sys/mount.h rather than inlining them (twice!) in
vfs_syscalls.c
2001-11-05 10:33:45 +00:00
dillon
c9a56085ce Add mnt_reservedvnlist so we can MFC to 4.x, in order to make all mount
structure changes now rather then piecemeal later on.  mnt_nvnodelist
currently holds all the vnodes under the mount point.  This will eventually
be split into a 'dirty' and 'clean' list.  This way we only break kld's once
rather then twice.  nvnodelist will eventually turn into the dirty list
and should remain compatible with the klds.
2001-11-04 18:55:42 +00:00
rwatson
339957cbbd o Remove the local temporary variable "struct proc *p" from vfs_mount()
in vfs_syscalls.c.  Although it did save some indirection, many of
  those savings will be obscured with the impending commit of suser()
  changes, and the result is increased code complexity.  Also, once
  p->p_ucred and td->td_ucred are distinguished, this will make
  vfs_mount() use the correct thread credential, rather than the
  process credential.
2001-11-02 21:11:41 +00:00
phk
9682ea2877 Argh!
patch added the nmount at the bottom first time around.

Take 3!
2001-11-02 19:12:06 +00:00
phk
ea1c496f9a Add empty shell for nmount syscall (take 2!) 2001-11-02 18:35:54 +00:00
phk
a7cf9dee8f Add nmount() stub function and regenerate the syscall-glue which should
not need to check in generated files.
2001-11-02 17:59:23 +00:00
dillon
f421c786e5 unwind v_writecount in fhopen() if we are unable to allocate the
descriptor.

MFC after:	3 days
2001-10-24 18:32:17 +00:00
dillon
45a6fabe87 Change the vnode list under the mount point from a LIST to a TAILQ
in preparation for an implementation of limiting code for kern.maxvnodes.

MFC after:	3 days
2001-10-23 01:21:29 +00:00
rwatson
75a3f18a2a o Complete the migration from suser error checking in the following form
in vfs_syscalls.c:

    if (mp->mnt_stat.f_owner != p->p_ucred->cr_uid &&
        (error = suser_td(td)) != 0) {
            unwrap_lots_of_stuff();
            return (error);
    }

  to:

    if (mp->mnt_stat.f_owner != p->p_ucred->cr_uid) {
            error = suser_td(td);
            if (error) {
                unwrap_lots_of_stuff();
                return (error);
            }
    }

  This makes the code more readable when complex clauses are in use,
  and minimizes conflicts for large outstanding patchsets modifying the
  kernel authorization code (of which I have several), especially where
  existing authorization and context code are combined in the same if()
  conditional.

Obtained from:	TrustedBSD Project
2001-10-01 20:01:07 +00:00
rwatson
f12cf1d430 o vpaccess() -> vn_access() -- Peter reminds me that there is already
a convention for vnop helper routines of this sort.

Submitted by:	Mr Wemm <peter>
2001-09-22 03:07:41 +00:00
rwatson
e925d3281c o Introduce eaccess(2), a version of access(2) that uses the effective
credentials rather than the real credentials.  This is useful for
  implementing GUI's which need to modify icons based on access rights,
  but where use of open(2) is too expensive, use of stat(2) doesn't
  reflect the file system's real protection model, and use of
  access() suffers from real/effective credential confusion.  This
  implementation provides the same semantics as the call of the same
  name on SCO OpenServer.  Note: using this call improperly can
  leave you subject to some of the same races present in the
  access(2) call.
o To implement this, break out the basic logic of access(2) into
  vpaccess(), which accepts a passed credential to perform the
  invocation of VOP_ACCESS().  Add eaccess(2) to invoke vpaccess(),
  and modify access(2) to use vpaccess().

Obtained from:	TrustedBSD Project
2001-09-21 21:33:22 +00:00
julian
5596676e6c KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after:    ha ha ha ha
2001-09-12 08:38:13 +00:00
ache
2718e60089 lseek: simplify overflow checks 2001-08-29 18:35:53 +00:00
ache
c71ba5eea8 Cosmetique & style fixes from bde 2001-08-26 10:23:49 +00:00
ache
1dabef8845 lseek: fix check for vattr.va_size overflow. Check suggested by bde simple not
works with unsigned types.
2001-08-23 17:01:25 +00:00
ache
29e4dde471 Cosmetique: more <sys/*> into one group, separate include families by
blank line
2001-08-23 13:51:17 +00:00
ache
5b25e70c26 Make lseek() POSIXed: for non character special files
1) handle off_t overflow with EOVERFLOW
2) handle negative offsets with EINVAL

Reviewed by:	arch discussion
2001-08-21 21:20:42 +00:00
iedowse
22613ae234 Avoid sleeping while holding a mutex in dounmount(). This problem
has existed for a long time, but I made it worse a few months ago
by by adding calls to VFS_ROOT() and checkdirs() in revision 1.179.

Also, remove the LK_REENABLE flag in the lockmgr() call; this flag
has been ignored by the lockmgr code for 4 years. This was the only
remaining mention of it apart from its definition.

Reviewed by:	jhb
2001-08-20 19:16:31 +00:00
iedowse
b3168db212 Arbitrarily limit to 64k the number of bytes that can be read at
a time using the ogetdirentries() compatibility syscall. This is a
hack to ensure that rediculous values don't get passed to MALLOC().

Reviewed by:	kris
2001-08-10 22:14:18 +00:00
des
1b82a02868 Constify the fstype argument to vfs_mount(). This eliminates at least one
"call discards qualifier" warning (in sys/compat/linux/linux_file.c).
2001-07-09 19:11:51 +00:00
dillon
e028603b7e With Alfred's permission, remove vm_mtx in favor of a fine-grained approach
(this commit is just the first stage).  Also add various GIANT_ macros to
formalize the removal of Giant, making it easy to test in a more piecemeal
fashion. These macros will allow us to test fine-grained locks to a degree
before removing Giant, and also after, and to remove Giant in a piecemeal
fashion via sysctl's on those subsystems which the authors believe can
operate without Giant.
2001-07-04 16:20:28 +00:00
tmm
0e4c21f7c1 Fix an instance of NDINIT in the extattrctl syscall: LOCKLEAF was or'ed
to the operation parameter, not to the flags as it should be.

Reviewed by:	rwatson
2001-06-06 23:34:38 +00:00
rwatson
f504530d9f o Merge contents of struct pcred into struct ucred. Specifically, add the
real uid, saved uid, real gid, and saved gid to ucred, as well as the
  pcred->pc_uidinfo, which was associated with the real uid, only rename
  it to cr_ruidinfo so as not to conflict with cr_uidinfo, which
  corresponds to the effective uid.
o Remove p_cred from struct proc; add p_ucred to struct proc, replacing
  original macro that pointed.
  p->p_ucred to p->p_cred->pc_ucred.
o Universally update code so that it makes use of ucred instead of pcred,
  p->p_ucred instead of p->p_pcred, cr_ruidinfo instead of p_uidinfo,
  cr_{r,sv}{u,g}id instead of p_*, etc.
o Remove pcred0 and its initialization from init_main.c; initialize
  cr_ruidinfo there.
o Restruction many credential modification chunks to always crdup while
  we figure out locking and optimizations; generally speaking, this
  means moving to a structure like this:
        newcred = crdup(oldcred);
        ...
        p->p_ucred = newcred;
        crfree(oldcred);
  It's not race-free, but better than nothing.  There are also races
  in sys_process.c, all inter-process authorization, fork, exec, and
  exit.
o Remove sigio->sio_ruid since sigio->sio_ucred now contains the ruid;
  remove comments indicating that the old arrangement was a problem.
o Restructure exec1() a little to use newcred/oldcred arrangement, and
  use improved uid management primitives.
o Clean up exit1() so as to do less work in credential cleanup due to
  pcred removal.
o Clean up fork1() so as to do less work in credential cleanup and
  allocation.
o Clean up ktrcanset() to take into account changes, and move to using
  suser_xxx() instead of performing a direct uid==0 comparision.
o Improve commenting in various kern_prot.c credential modification
  calls to better document current behavior.  In a couple of places,
  current behavior is a little questionable and we need to check
  POSIX.1 to make sure it's "right".  More commenting work still
  remains to be done.
o Update credential management calls, such as crfree(), to take into
  account new ruidinfo reference.
o Modify or add the following uid and gid helper routines:
      change_euid()
      change_egid()
      change_ruid()
      change_rgid()
      change_svuid()
      change_svgid()
  In each case, the call now acts on a credential not a process, and as
  such no longer requires more complicated process locking/etc.  They
  now assume the caller will do any necessary allocation of an
  exclusive credential reference.  Each is commented to document its
  reference requirements.
o CANSIGIO() is simplified to require only credentials, not processes
  and pcreds.
o Remove lots of (p_pcred==NULL) checks.
o Add an XXX to authorization code in nfs_lock.c, since it's
  questionable, and needs to be considered carefully.
o Simplify posix4 authorization code to require only credentials, not
  processes and pcreds.  Note that this authorization, as well as
  CANSIGIO(), needs to be updated to use the p_cansignal() and
  p_cansched() centralized authorization routines, as they currently
  do not take into account some desirable restrictions that are handled
  by the centralized routines, as well as being inconsistent with other
  similar authorization instances.
o Update libkvm to take these changes into account.

Obtained from:	TrustedBSD Project
Reviewed by:	green, bde, jhb, freebsd-arch, freebsd-audit
2001-05-25 16:59:11 +00:00
jhb
2441ff245e Don't release Giant around vm_oject_page_clean() in fsync() as the pager
putpages called will need Giant.
2001-05-23 22:55:13 +00:00
ru
35437d86aa - FDESC, FIFO, NULL, PORTAL, PROC, UMAP and UNION file
systems were repo-copied from sys/miscfs to sys/fs.

- Renamed the following file systems and their modules:
  fdesc -> fdescfs, portal -> portalfs, union -> unionfs.

- Renamed corresponding kernel options:
  FDESC -> FDESCFS, PORTAL -> PORTALFS, UNION -> UNIONFS.

- Install header files for the above file systems.

- Removed bogus -I${.CURDIR}/../../sys CFLAGS from userland
  Makefiles.
2001-05-23 09:42:29 +00:00
alfred
a3f0842419 Introduce a global lock for the vm subsystem (vm_mtx).
vm_mtx does not recurse and is required for most low level
vm operations.

faults can not be taken without holding Giant.

Memory subsystems can now call the base page allocators safely.

Almost all atomic ops were removed as they are covered under the
vm mutex.

Alpha and ia64 now need to catch up to i386's trap handlers.

FFS and NFS have been tested, other filesystems will need minor
changes (grabbing the vm lock when twiddling page properties).

Reviewed (partially) by: jake, jhb
2001-05-19 01:28:09 +00:00
grog
4b9d9cbaac Revert consequences of changes to mount.h, part 2.
Requested by:	bde
2001-04-29 02:45:39 +00:00
grog
1f5de30718 Correct #includes to work with fixed sys/mount.h. 2001-04-23 09:05:15 +00:00