Commit Graph

5463 Commits

Author SHA1 Message Date
Matthew Dillon
21c2d0479c Alright, fix the problems with the elf loader for the Alpha. It turns
out that there is no easy way to discern the difference between a text
segment and a data segment through the read-only OR execute attribute
in the elf segment header, so revert the algorithm to what it was before.

Neither can we account for multiple data load segments in the vmspace
structure (at least not without more work), due to assumptions obreak()
makes in regards to the data start and data size fields.

Retain RLIMIT_VMEM checking by using a local variable to track the
total bytes of data being loaded.

Reviewed by:	peter
X-MFC after:	ASAP
2002-09-04 04:42:12 +00:00
Peter Wemm
9782ecbab0 Make the text segment locating heuristics from rev 1.121 more reliable
so that it works on the Alpha.  This defines the segment that the entry
point exists in as 'text' and any others (usually one) as data.

Submitted by: tmm
Tested on: i386, alpha
2002-09-03 21:18:17 +00:00
John Baldwin
5fc3031366 - Change falloc() to acquire an fd from the process table last so that
it can do it w/o needing to hold the filelist_lock sx lock.
- fdalloc() doesn't need Giant to call free() anymore.  It also doesn't
  need to drop and reacquire the filedesc lock around free() now as a
  result.
- Try to make the code that copies fd tables when extending the fd table in
  fdalloc() a bit more readable by performing assignments in separate
  statements.  This is still a bit ugly though.
- Use max() instead of an if statement so to figure out the starting point
  in the search-for-a-free-fd loop in fdalloc() so it reads better next to
  the min() in the previous line.
- Don't grow nfiles in steps up to the size needed if we dup2() to some
  really large number.  Go ahead and double 'nfiles' in a loop prior
  to doing the malloc().
- malloc() doesn't need Giant now.
- Use malloc() and free() instead of MALLOC() and FREE() in fdalloc().
- Check to see if the size we are going to grow to is too big, not if the
  current size of the fd table is too big in the loop in fdalloc().  This
  means if we are out of space or if dup2() requests too high of a fd,
  then we will return an error before we go off and try to allocate some
  huge table and copy the existing table into it.
- Move all of the logic for dup'ing a file descriptor into do_dup() instead
  of putting some of it in do_dup() and duplicating other parts in four
  different places.  This makes dup(), dup2(), and fcntl(F_DUPFD) basically
  wrappers of do_dup now.  fcntl() still has an extra check since it uses
  a different error return value in one case then the other functions.
- Add a KASSERT() for an assertion that may not always be true where the
  fdcheckstd() function assumes that falloc() returns the fd requested and
  not some other fd.  I think that the assertion is always true because we
  are always single-threaded when we get to this point, but if one was
  using rfork() and another process sharing the fd table were playing with
  the fd table, there might could be a problem.
- To handle the problem of a file descriptor we are dup()'ing being closed
  out from under us in dup() in general, do_dup() now obtains a reference
  on the file in question before calling fdalloc().  If after the call to
  fdalloc() the file for the fd we are dup'ing is a different file, then
  we drop our reference on the original file and return EBADF.  This
  race was only handled in the dup2() case before and would just retry
  the operation.  The error return allows the user to know they are being
  stupid since they have a locking bug in their app instead of dup'ing
  some other descriptor and returning it to them.

Tested on:	i386, alpha, sparc64
2002-09-03 20:16:31 +00:00
John Baldwin
0d975d6341 Add some KASSERT()'s to ensure that we don't perform spin mutex ops on
sleep mutexes and vice versa.  WITNESS normally should catch this but
not everyone uses WITNESS so this is a fallback to catch nasty but easy
to do bugs.
2002-09-03 18:25:16 +00:00
David Xu
35c32a76f9 In the kernel code, we have the tsleep() call with the PCATCH argument.
PCATCH means 'if we get a signal, interrupt me!" and tsleep returns
either EINTR or ERESTART depending on the circumstances.  ERESTART is
"special" because it causes the system call to fail, but right as it
returns back to userland it tells the trap handler to move %eip back a
bit so that userland will immediately re-run the syscall.
This is a syscall restart. It only works for things like read() etc where
nothing has changed yet. Note that *userland* is tricked into restarting
the syscall by the kernel. The kernel doesn't actually do the restart. It
is deadly for things like select, poll, nanosleep etc where it might cause
the elapsed time to be reset and start again from scratch.  So those
syscalls do this to prevent userland rerunning the syscall:
  if (error == ERESTART) error = EINTR;

Fake "signals" like SIGTSTP from ^Z etc do not normally invoke userland
signal handlers. But, in -current, the PCATCH *is* being triggered and
tsleep is returning ERESTART, and the syscall is aborted even though no
userland signal handler was run.
That is the fault here.  We're triggering the PCATCH in cases that we
shouldn't.  ie: it is being triggered on *any* signal processing, rather
than the case where the signal is posted to userland.
	--- Peter

The work of psignal() is a patchwork of special case required by the process
debugging and job-control facilities...
	--- Kirk McKusick
	"The design and impelementation of the 4.4BSD Operating system"
	Page 105

in STABLE source, when psignal is posting a STOP signal to sleeping
process and the signal action of the process is SIG_DFL, system will
directly change the process state from SSLEEP to SSTOP, and when
SIGCONT is posted to the stopped process, if it finds that the process
is still on sleep queue, the process state will be restored to SSLEEP,
and won't wakeup the process.

this commit mimics the behaviour in STABLE source tree.

Reviewed by: Jon Mini, Tim Robbins, Peter Wemm
Approved by: julian@freebsd.org (mentor)
2002-09-03 12:56:01 +00:00
Ian Dowse
48b52b7a32 Split up __getcwd so that kernel callers of the internal version
can specify whether the buffer is in user or system space.
2002-09-02 22:40:30 +00:00
Ian Dowse
49c2ff159f Split fcntl() into a wrapper and a kernel-callable kern_fcntl()
implementation. The wrapper is responsible for copying additional
structure arguments (struct flock) to and from userland.
2002-09-02 22:24:14 +00:00
Matthew Dillon
05ef87980a Grammer cleanup 2002-09-02 17:27:30 +00:00
David Xu
67bdda9718 fix bogus CTR3 message.
Reviewed by: julian@freebsd.org (mentor)
2002-09-02 07:55:06 +00:00
Jake Burkholder
5fe3ed629a Moved elf brand identification into a function. Fully identify the
brand early in the process of loading an elf file, so that we can
identify the sysentvec, and so that we do not continue if we do not
have a brand (and thus a sysentvec).  Use the values in the sysentvec
for the page size and vm ranges unconditionally, since they are all
filled in now.
2002-09-02 04:50:57 +00:00
Alan Cox
8a59b15cd4 o Synchronize updates to struct vm_page::cow with the page queues lock. 2002-09-02 04:04:12 +00:00
Jake Burkholder
8cf034521b Fixed more indentation bugs. 2002-09-02 02:41:26 +00:00
Jake Burkholder
f36ba45234 Added fields for VM_MIN_ADDRESS, PS_STRINGS and stack protections to
sysentvec.  Initialized all fields of all sysentvecs, which will allow
them to be used instead of constants in more places.  Provided stack
fixup routines for emulations that previously used the default.
2002-09-01 21:41:24 +00:00
Ian Dowse
8f19eb88df Split out a number of mostly VFS and signal related syscalls into
a kernel-internal kern_*() version and a wrapper that is called via
the syscall vector table. For paths and structure pointers, the
internal version either takes a uio_seg parameter or requires the
caller to copyin() the data to kernel memory as appropiate. This
will permit emulation layers to use these syscalls without having
to copy out translated arguments to the stack gap.

Discussed on:		-arch
Review/suggestions:	bde, jhb, peter, marcel
2002-09-01 20:37:28 +00:00
Matthew Dillon
cac4515267 Implement data, text, and vmem limit checking in the elf loader and svr4
compat code.  Clean up accounting for multiple segments.  Part 1/2.

Submitted by:	Andrey Alekseyev <uitm@zenon.net> (with some modifications)
MFC after:	3 days
2002-08-30 18:09:46 +00:00
Peter Wemm
447b3772dc Change hw.physmem and hw.usermem to unsigned long like they used to be
in the original hardwired sysctl implementation.

The buf size calculator still overflows an integer on machines with large
KVA (eg: ia64) where the number of pages does not fit into an int.  Use
'long' there.

Change Maxmem and physmem and related variables to 'long', mostly for
completeness.  Machines are not likely to overflow 'int' pages in the
near term, but then again, 640K ought to be enough for anybody.  This
comes for free on 32 bit machines, so why not?
2002-08-30 04:04:37 +00:00
Julian Elischer
472be95807 Rejig the code to figure out estcpu and work out how long a KSEGRP has been
idle. What was there before was surprisingly ALMOST correct.

Peter and I fried our brains on this for a couple of hours figuring out
what this actually means in the context of multiple threads.

Reviewed by:	peter@freebsd.org
2002-08-30 00:25:49 +00:00
Peter Wemm
ee92a1ab51 Actually remove the a.out kld loader. While I am not 100% sure, I believe
it is broken.  It certainly has been suffering neglect.  It is not needed
because we never shipped a.out kld's and they never really worked right.
2002-08-29 23:04:05 +00:00
Julian Elischer
88151aa3f5 Fix crack-smoking code that was panicing on the quad xeon:
- If either of proc or kse are NULL during thread_exit(), then
          the kernel is going to fault because parts of the function
          assume they aren't NULL.  Instead, just assert they aren't NULL
          (as well as the kse group) and assume they are in all of the
          code.  It doesn't make sense for them to be NULL here anyways.
        - Move the PROC_UNLOCK(p) up above clearing td_proc, etc. since
          otherwise we will panic if the proc's lock is contested.

Submitted by:	jhb@freebsd.org
2002-08-29 19:49:53 +00:00
Mitsuru IWASAKI
3aea1e1405 Add sanity check seeing if adjusted start address exceeds end address
after boundary and alignment adjustment.
2002-08-29 12:39:21 +00:00
Jake Burkholder
bafbd49201 Renamed poorly named setregs to exec_setregs. Moved its prototype to
imgact.h with the other exec support functions.
2002-08-29 06:17:48 +00:00
Jake Burkholder
f3bec5d746 Don't require that sysentvec.sv_szsigcode be non-NULL. 2002-08-29 01:28:27 +00:00
Jake Burkholder
b17c50db93 Unrot SPARSE_MAPPING code (vm_map_pageable -> vm_map_wire). 2002-08-29 01:16:14 +00:00
Peter Wemm
d13947c3b0 updatepri() works on a ksegrp (where the scheduling parameters are), so
directly give it the ksegrp instead of the thread.  The only thing it used
to use in the thread was the ksegrp.

Reviewed by:	julian
2002-08-28 23:45:15 +00:00
Archie Cobbs
f2f03122c3 accept(2) on a socket that has been shutdown(2) normally returns
ECONNABORTED. Make this happen in the non-blocking case as well.
The previous behavior was to return EAGAIN, which (a) is not
consistent with the blocking case and (b) causes the application
to think the socket is still valid.

PR:		bin/42100
Reviewed by:	freebsd-net
MFC after:	3 days
2002-08-28 20:56:01 +00:00
Bruce Evans
8302d183f3 Include <sys/lockmgr.h> for the definitions of the locking interfaces that
are implemented here instead of depending on namespace pollution in
<sys/lock.h>.  Fixed nearby include messes (1 disordered include and 1
unused include).
2002-08-27 09:59:47 +00:00
Ian Dowse
02bd1bcd2a Add a new KTR type KTR_CONTENTION, and use it in the mutex code to
log the start and end of periods during which mtx_lock() is waiting
to acquire a sleep mutex. The log message includes the file and
line of both the waiter and the holder.

Reviewed by:	jhb, jake
2002-08-26 18:39:38 +00:00
Ian Dowse
9261400aa2 Add WITNESS_FILE() and WITNESS_LINE(), which allow users of witness
to print out the file and line from the lock object. These will be
used shortly by CTR() calls in the mutex code.

Reviewed by:	jhb, jake
2002-08-26 18:31:26 +00:00
Julian Elischer
b39f32841b move the assert to cover more cases 2002-08-26 05:02:56 +00:00
Jake Burkholder
81f223ca02 Fixed most indentation bugs. 2002-08-25 22:36:52 +00:00
Jake Burkholder
ca0387ef9f Fixed placement of operators. Wrapped long lines. 2002-08-25 20:48:45 +00:00
Philippe Charnier
93b0017f88 Replace various spelling with FALLTHROUGH which is lint()able 2002-08-25 13:23:09 +00:00
Jake Burkholder
fd559a8a39 Fixed white space around operators, casts and reserved words.
Reviewed by:	md5
2002-08-24 22:55:16 +00:00
Jake Burkholder
a7cddfed7f return x; -> return (x);
return(x); -> return (x);

Reviewed by:	md5
2002-08-24 22:01:40 +00:00
Marcel Moolenaar
5cf8741861 Work around a GCC optimization bug on ia64: In link_elf_symbol_values(),
a pointer to a symbol is given and we have to find the containing symbol
table. We do this by bounds checking. For some strange reason (ie I
haven't found the root cause) the first test succeeded for said symbol,
implying that the symbol came from the .dynsym table. In reality however
the symbol actually resided in the .symtab table. Needless to say that
all that was returned was junk.

The upper bounds check was: (symptr - baseptr) < symtab_size
This has been rewritten to: symptr < (baseptr + symtab_size)

As a side-effect, slightly more optimal (and still correct :-) code can
be generated on ia64.
2002-08-24 05:01:33 +00:00
Peter Wemm
2149c527f5 Move the TAILQ_INIT(&td->td_selq) before the retry: label. Otherwise in
some circumstances when we get a select collision, we can end up with
cases where we do not clear some sip->si_thread on the way out, leading to
page faults in selwakeup().  This should solve the problem where postfix
can crash the kernel during select collisions.

Reviewed by: alfred
2002-08-23 22:43:28 +00:00
Julian Elischer
d9d6e34fd0 Don't re-lock the sched lock if we didn't unlock it.
Original error by: David Xu <bsddiy@yahoo.com>
Fix by:	David Xu <bsddiy@yahoo.com>
Completely failed to spot it: Julian Elischer <julian@freebsd.org>
2002-08-23 07:23:44 +00:00
Jeff Roberson
ad32f726db - Fix a mistake in my last few commits. The PDROP flag stops msleep from
re-acquiring the mutex.

Pointy hat to:	me
Noticed by:	tegge
2002-08-23 00:32:03 +00:00
Peter Wemm
c6d6cf1772 s/sus/sys/ in the a.out kernel case.
Submitted by:	julian
2002-08-22 22:01:53 +00:00
Julian Elischer
49539972e9 slight cleanup of single-threading code for KSE processes 2002-08-22 21:45:58 +00:00
Archie Cobbs
4a6a94d8d8 Replace (ab)uses of "NULL" where "0" is really meant. 2002-08-22 21:24:01 +00:00
Peter Wemm
3e4517beb6 Instead of grabbing the userland a.out.h/link.h (or worse, from
/usr/include!), use sys/nlist_aout.h, machine/reloc.h, sys/imgact_aout.h
and sys/link_aout.h.
2002-08-22 20:43:07 +00:00
Peter Wemm
f99803876e Instead of nlist.h and link.h, use sys/nlist_aout.h and sys/link_elf.h
This avoids reaching out into userland sources (or worse: /usr/include!)
for building the kernel.
2002-08-22 20:39:30 +00:00
Robert Watson
1c39a77468 Spell proprly properly:
failed to set signal flags proprly for ast()
  failed to set signal flags proprly for ast()
  failed to set signal flags proprly for ast()
  failed to set signal flags proprly for ast()
2002-08-22 14:36:03 +00:00
Bruce Evans
5fd65482e0 Include <sys/systm.h> for the declarations of many things instead of
depending on namespace pollution in <sys/mumble.h>.
2002-08-22 12:47:22 +00:00
Alan Cox
0a179f8025 o Remove the AIOCBLIST_ASYNCFREE flag and related code. It's never set.
Submitted by:	Romer Gil <rgil@cs.rice.edu>
2002-08-22 08:50:15 +00:00
Jeff Roberson
4b6049cafa - Closer inspection revealed a possible deadlock situation in vn_lock() that
was introduced by my last commit but not caught by stress testing.  Fix
   that and slightly restructure the code so that it is more readable.
2002-08-22 07:57:43 +00:00
Jeff Roberson
9abf54f032 - Make vn_lock() vget() and VOP_LOCK() all behave the same way WRT
LK_INTERLOCK.  The interlock will never be held on return from these
   functions even when there is an error.  Errors typically only occur when
   the XLOCK is held which means this isn't the vnode we want anyway.  Almost
   all users of these interfaces expected this behavior even though it was
   not provided before.
2002-08-22 07:44:45 +00:00
Jeff Roberson
510939d089 - Return two shared locks to exclusive locks. This was premature.
- Document the problems that prevent us from using shared locks.
2002-08-22 07:26:18 +00:00
Jeff Roberson
6c54a1f5f0 - Fix interlock handling in vn_lock(). Previously, vn_lock() could return
with interlock held in error conditions when the caller did not specify
   LK_INTERLOCK.
 - Add several comments to vn_lock() describing the rational behind the code
   flow since it was not immediately obvious.
2002-08-22 06:58:11 +00:00
Jeff Roberson
183158485a - Fix interlock handling in vn_lock(). Previously, vn_lock() could return
with interlock held in error conditions when the caller did not specify
   LK_INTERLOCK.
 - Add several comments to vn_lock() describing the rational behind the code
   flow since it was not immediately obvious.
2002-08-22 06:51:06 +00:00
Archie Cobbs
55f7c614fd Don't use "NULL" when "0" is really meant. 2002-08-21 23:39:52 +00:00
Julian Elischer
721e591067 Revert some suspension/sleep/signal code from KSE-III
We need to rethink a bit of this and it doesn't matter if
we break the KSE test program for now as long
as non-KSE programs act as expected.

Submitted by:	David Xu <bsddiy@yahoo.com>
	(this guy's just asking to get hit with a commit bit..)
2002-08-21 20:03:55 +00:00
Jeff Roberson
0b600db425 - Document two cases, one in vget and the other in vn_lock, where the state
of interlock on exit is not consistent.  There are probably several bugs
   relating to this.
2002-08-21 08:34:48 +00:00
Jeff Roberson
88cf6b94bd - If vn_lock fails with the LK_INTERLOCK flag set, interlock will not be
released.  vcanrecycle() failed to unlock interlock under this condition.
 - Remove an extra VOP_UNLOCK from a failure case in vcanrecycle().

Pointed out by:	rwatson
2002-08-21 06:40:34 +00:00
Jeff Roberson
71ea4ba57c - Add two new debugging macros: ASSERT_VI_LOCKED and ASSERT_VI_UNLOCKED
- Use the new VI asserts in place of the old mtx_assert checks.
 - Add the VI asserts to the automated lock checking in the VOP calls.  The
   interlock should not be held across vops with a few exceptions.
 - Add the vop_(un)lock_{pre,post} functions to assert that interlock is held
   when LK_INTERLOCK is set.
2002-08-21 06:19:29 +00:00
Jeff Roberson
856d3a056f - Hold the vnode lock across unlink() so that the v_vflag check is safe.
- Fix the long broken error handling for VV_ROOT and VDIR.
2002-08-21 03:55:35 +00:00
Robert Watson
e5cb5e37d4 Close a race in process label changing opened due to dropping the
proc locking when revoking access to mmaps.  Instead, perform this
later once we've changed the process label (hold onto a reference
to the new cred so that we don't lose it when we release the
process lock if another thread changes the credential).

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-19 20:26:32 +00:00
Robert Watson
8815d2e899 Regen. 2002-08-19 20:02:29 +00:00
Robert Watson
f61b85492c mac_syscall is now implemented, switch to MSTD.
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-19 20:01:31 +00:00
Robert Watson
177142e458 Pass active_cred and file_cred into the MAC framework explicitly
for mac_check_vnode_{poll,read,stat,write}().  Pass in fp->f_cred
when calling these checks with a struct file available.  Otherwise,
pass NOCRED.  All currently MAC policies use active_cred, but
could now offer the cached credential semantic used for the base
system security model.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-19 19:04:53 +00:00
Robert Watson
27f2eac7f3 Provide an implementation of mac_syscall() so that security modules
can offer new services without reserving system call numbers, or
augmented versions of existing services.  User code requests a
target policy by name, and specifies the policy-specific API plus
target.  This is required in particular for our port of SELinux/FLASK
to the MAC framework since it offers additional security services.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-19 17:59:48 +00:00
Robert Watson
c024c3eeb1 Break out mac_check_pipe_op() into component check entry points:
mac_check_pipe_poll(), mac_check_pipe_read(), mac_check_pipe_stat(),
and mac_check_pipe_write().  This is improves consistency with other
access control entry points and permits security modules to only
control the object methods that they are interested in, avoiding
switch statements.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-19 16:59:37 +00:00
Robert Watson
7f724f8b51 Break out mac_check_vnode_op() into three seperate checks:
mac_check_vnode_poll(), mac_check_vnode_read(), mac_check_vnode_write().
This improves the consistency with other existing vnode checks, and
allows policies to avoid implementing switch statements to determine
what operations they do and do not want to authorize.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-19 16:43:25 +00:00
Robert Watson
b12baf55a4 Assert process locks in proces-related access control checks.
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-19 15:30:30 +00:00
Robert Watson
851704bbd0 Add a missing vnode assertion for the exec() check.
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-19 15:28:39 +00:00
Poul-Henning Kamp
fee7d450d8 Keep a copy of the credential used to mount filesystems around so
we can check and use it later on.

Change the pieces of code which relied on mount->mnt_stat.f_owner
to check which user mounted the filesystem.

This became needed as the EA code needs to be able to allocate
blocks for "system" EA users like ACLs.

There seems to be some half-baked (probably only quarter- actually)
notion that the superuser for a given filesystem is the user who
mounted it, but this has far from been carried through.  It is
unclear if it should be.

Sponsored by: DARPA & NAI Labs.
2002-08-19 06:52:21 +00:00
Poul-Henning Kamp
91afe0874d A side effect of some debugging: prototypify and deregister. 2002-08-18 21:24:22 +00:00
Maxim Sobolev
62f7648682 Increase size of ifnet.if_flags from 16 bits (short) to 32 bits (int). To avoid
breaking application ABI use unused ifreq.ifru_flags[1] for upper 16 bits in
SIOCSIFFLAGS and SIOCGIFFLAGS ioctl's.

Reviewed by:	-hackers, -net
2002-08-18 07:05:00 +00:00
Robert Watson
d49fa1ca6e In continuation of early fileop credential changes, modify fo_ioctl() to
accept an 'active_cred' argument reflecting the credential of the thread
initiating the ioctl operation.

- Change fo_ioctl() to accept active_cred; change consumers of the
  fo_ioctl() interface to generally pass active_cred from td->td_ucred.
- In fifofs, initialize filetmp.f_cred to ap->a_cred so that the
  invocations of soo_ioctl() are provided access to the calling f_cred.
  Pass ap->a_td->td_ucred as the active_cred, but note that this is
  required because we don't yet distinguish file_cred and active_cred
  in invoking VOP's.
- Update kqueue_ioctl() for its new argument.
- Update pipe_ioctl() for its new argument, pass active_cred rather
  than td_ucred to MAC for authorization.
- Update soo_ioctl() for its new argument.
- Update vn_ioctl() for its new argument, use active_cred rather than
  td->td_ucred to authorize VOP_IOCTL() and the associated VOP_GETATTR().

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-17 02:36:16 +00:00
David Greenman
79cb7eb41c Further improved the performance of sbreserve() by moving the calculation
of the adjusted sb_max into a sysctl handler for sb_max and assigning it to
a variable that is used instead. This eliminates the 32bit multiply and
divide from the fast path that was being done previously.
2002-08-16 18:41:48 +00:00
Robert Watson
f050add5c1 Wrap maintenance of varios nmac{objectname} counters in MAC_DEBUG so we
can avoid the cost of a large number of atomic operations if we're not
interested in the object count statistics.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-16 14:21:38 +00:00
Robert Watson
49cde51dfd Correct white space nits that crept in during my recent merges of
trustedbsd_mac material.
2002-08-16 14:12:40 +00:00
Robert Watson
ea6027a8e1 Make similar changes to fo_stat() and fo_poll() as made earlier to
fo_read() and fo_write(): explicitly use the cred argument to fo_poll()
as "active_cred" using the passed file descriptor's f_cred reference
to provide access to the file credential.  Add an active_cred
argument to fo_stat() so that implementers have access to the active
credential as well as the file credential.  Generally modify callers
of fo_stat() to pass in td->td_ucred rather than fp->f_cred, which
was redundantly provided via the fp argument.  This set of modifications
also permits threads to perform these operations on behalf of another
thread without modifying their credential.

Trickle this change down into fo_stat/poll() implementations:

- badfo_poll(), badfo_stat(): modify/add arguments.
- kqueue_poll(), kqueue_stat(): modify arguments.
- pipe_poll(), pipe_stat(): modify/add arguments, pass active_cred to
  MAC checks rather than td->td_ucred.
- soo_poll(), soo_stat(): modify/add arguments, pass fp->f_cred rather
  than cred to pru_sopoll() to maintain current semantics.
- sopoll(): moidfy arguments.
- vn_poll(), vn_statfile(): modify/add arguments, pass new arguments
  to vn_stat().  Pass active_cred to MAC and fp->f_cred to VOP_POLL()
  to maintian current semantics.
- vn_close(): rename cred to file_cred to reflect reality while I'm here.
- vn_stat(): Add active_cred and file_cred arguments to vn_stat()
  and consumers so that this distinction is maintained at the VFS
  as well as 'struct file' layer.  Pass active_cred instead of
  td->td_ucred to MAC and to VOP_GETATTR() to maintain current semantics.

- fifofs: modify the creation of a "filetemp" so that the file
  credential is properly initialized and can be used in the socket
  code if desired.  Pass ap->a_td->td_ucred as the active
  credential to soo_poll().  If we teach the vnop interface about
  the distinction between file and active credentials, we would use
  the active credential here.

Note that current inconsistent passing of active_cred vs. file_cred to
VOP's is maintained.  It's not clear why GETATTR would be authorized
using active_cred while POLL would be authorized using file_cred at
the file system level.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-16 12:52:03 +00:00
David Greenman
8c71ce8a4e Rewrote the space check algorithm in sbreserve() so that the extremely
expensive (!) 64bit multiply, divide, and comparison aren't necessary
(this came in originally from rev 1.19 to fix an overflow with large
sb_max or MCLBYTES).
The 64bit math in this function was measured in some kernel profiles as
being as much as 5-8% of the total overhead of the TCP/IP stack and
is eliminated with this commit. There is a harmless rounding error (of
about .4% with the standard values) introduced with this change,
however this is in the conservative direction (downward toward a
slightly smaller maximum socket buffer size).

MFC after:	3 days
2002-08-16 05:08:46 +00:00
Robert Watson
9ca435893b In order to better support flexible and extensible access control,
make a series of modifications to the credential arguments relating
to file read and write operations to cliarfy which credential is
used for what:

- Change fo_read() and fo_write() to accept "active_cred" instead of
  "cred", and change the semantics of consumers of fo_read() and
  fo_write() to pass the active credential of the thread requesting
  an operation rather than the cached file cred.  The cached file
  cred is still available in fo_read() and fo_write() consumers
  via fp->f_cred.  These changes largely in sys_generic.c.

For each implementation of fo_read() and fo_write(), update cred
usage to reflect this change and maintain current semantics:

- badfo_readwrite() unchanged
- kqueue_read/write() unchanged
  pipe_read/write() now authorize MAC using active_cred rather
  than td->td_ucred
- soo_read/write() unchanged
- vn_read/write() now authorize MAC using active_cred but
  VOP_READ/WRITE() with fp->f_cred

Modify vn_rdwr() to accept two credential arguments instead of a
single credential: active_cred and file_cred.  Use active_cred
for MAC authorization, and select a credential for use in
VOP_READ/WRITE() based on whether file_cred is NULL or not.  If
file_cred is provided, authorize the VOP using that cred,
otherwise the active credential, matching current semantics.

Modify current vn_rdwr() consumers to pass a file_cred if used
in the context of a struct file, and to always pass active_cred.
When vn_rdwr() is used without a file_cred, pass NOCRED.

These changes should maintain current semantics for read/write,
but avoid a redundant passing of fp->f_cred, as well as making
it more clear what the origin of each credential is in file
descriptor read/write operations.

Follow-up commits will make similar changes to other file descriptor
operations, and modify the MAC framework to pass both credentials
to MAC policy modules so they can implement either semantic for
revocation.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-15 20:55:08 +00:00
Robert Watson
d61198e422 Rename mac_check_socket_receive() to mac_check_socket_deliver() so that
we can use the names _receive() and _send() for the receive() and send()
checks.  Rename related constants, policy implementations, etc.

PR:
Submitted by:
Reviewed by:
Approved by:
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
MFC after:
2002-08-15 18:51:26 +00:00
Robert Watson
4b9c2fa1fb Fix return case for negative namelen by jumping to normal exit processing
rather than immediately returning, or we may not unlock necessary locks.

Noticed by:	Mike Heffner <mheffner@acm.vt.edu>
2002-08-15 17:34:03 +00:00
Bosko Milekic
5fee904c3c Make m_flags an int instead of a short, this is consistent with the
type of the 'flags' argument m_getcl() was using anyway; m_extadd()
needed to be changed to accept an int instead of a short for 'flags.'
This makes things more consistent and also gives us more bits to
use for m_flags in the future (we have almost run out).

Requested by: sam (Sam Leffler)
2002-08-15 14:09:16 +00:00
Robert Watson
99fa64f863 Sync to trustedbsd_mac tree: default to sigsegv rather than copy-on-write
during a label change resulting in an mmap removal.  This is "fail stop"
behavior, which is preferred, although it offers slightly less
transparency.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-15 02:28:32 +00:00
Alfred Perlstein
b618bb96f0 return foo -> return (foo) 2002-08-15 02:10:12 +00:00
David Greenman
9e63574ea4 Moved sf_buf_alloc and sf_buf_free function declarations to sys/socketvar.h
so that they can be seen by external callers.
2002-08-13 19:03:19 +00:00
David Greenman
a370c70055 Remove obsolete comment about sf_buf_* functions being static. They were
made un-static in rev 1.114.
2002-08-13 18:20:08 +00:00
Poul-Henning Kamp
6f21160218 Remember to unlock the (optional) vnode in vfs_stdextattrctl(). Failing
to do this made the following script hang:

	#!/bin/sh
	set -ex

	extattrctl start /tmp
	extattrctl initattr 64 /tmp/EA00
	extattrctl enable /tmp user ea00 /tmp/EA00
	extattrctl showattr /tmp/EA00

if the filesystem backing /tmp did not support EAs.

The real solution is probably to have the extattrctl syscall do the
unlocking rather than depend on the filesystem to do it.  Considering
that extattrctl is going to be made obsolete anyway, this has dogwash
priority.

Sponsored by:	DARPA & NAI Labs.
2002-08-13 11:11:51 +00:00
Poul-Henning Kamp
7f52a691f0 Add a #include for <sys/mount.h> 2002-08-13 10:07:05 +00:00
Alfred Perlstein
149004e99d Make SYSVSEM mpsafe. Each semaphore set gets its own lock, however
there is a global lock over the undo structures because of the way
they are managed.

Switch to using SLIST instead of rolling our own linked list.

Fix several races where a permission check was done before a
copyin/copyout, if the copy happened to fault it may have been
possible to race for access to a semaphore set that one shouldn't
have access to.

Requested by: rwatson
Tested by: NetBSD regression suite.
2002-08-13 08:47:17 +00:00
Alfred Perlstein
4b6ef3a176 Make SYSVMSG mpsafe. Right now there is a global lock over the
entire subsystem, we could move to per-message queue locks, however
the messages themselves seem to come from a global pool and to avoid
over-locking this code (locking individual queues, then the global
pool) I've opted to just do it this way.

Requested by: rwatson
Tested by: NetBSD's regression suite.
2002-08-13 08:00:36 +00:00
Jeff Roberson
619eb6e579 - Hold the vnode lock throughout execve.
- Set VV_TEXT in the top level execve code.
 - Fixup the image activators to deal with the newly locked vnode.
2002-08-13 06:55:28 +00:00
Jeff Roberson
055c012332 - Extend the vnode_free_list_mtx to cover numvnodes and freevnodes. This
was done only some of the time before, and now it is uniformly applied.
2002-08-13 05:29:48 +00:00
Robert Watson
925860774d Introduce support for labeling and access control of pipe objects
as part of the TrustedBSD MAC framework.  Instrument the creation
and destruction of pipes, as well as relevant operations, with
necessary calls to the MAC framework.  Note that the locking
here is probably not quite right yet, but fixes will be forthcoming.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-13 02:47:13 +00:00
Robert Watson
5c5384fe80 Use the credential authorizing the socket creation operation to perform
the jail check and the MAC socket labeling in socreate().  This handles
socket creation using a cached credential better (such as in the NFS
client code when rebuilding a socket following a disconnect: the new
socket should be created using the nfsmount cached cred, not the cred
of the thread causing the socket to be rebuilt).

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-12 16:49:03 +00:00
Robert Watson
818d7e6d8a Enforce MAC policy in cttyread() as well as the other operations
already instrumented.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-12 16:45:19 +00:00
Robert Watson
0231c03df4 Implement IO_NOMACCHECK in vn_rdwr() -- perform MAC checks (assuming
'options MAC') as long as IO_NOMACCHECK is not set in the IO flags.
If IO_NOMACCHECK is set, bypass MAC checks in vn_rdwr().  This allows
vn_rdwr() to be used as a utility function inside of file systems
where MAC checks have already been performed, or where the operation
is being done on behalf of the kernel not the user.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI LAbs
2002-08-12 16:15:34 +00:00
Robert Watson
7ba28492c5 Declare a module service "kernel_mac_support" when MAC support is
enabled and the kernel provides the MAC registration and entry point
service.  Declare a dependency on that module service for any
MAC module registered using mac_policy.h.  For now, hard code the
version as 1, but once we've come up with a versioning policy, we'll
move to a #define of some sort.  In the mean time, this will prevent
loading a MAC module when 'options MAC' isn't present, which (due to
a bug in the kernel linker) can result if the MAC module is preloaded
via loader.conf.

This particular evil recommended by:	peter
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI LAbs
2002-08-12 02:00:21 +00:00
Semen Ustimenko
87df4f8f18 Fix sendfile(), who was calling vn_rdwr() without aresid parameter and
thus hiting EIO at the end of file. This is believed to be a feature
(not a bug) of vn_rdwr(), so we turn it off by supplying aresid param.

Reviewed by:	rwatson, dg
2002-08-11 20:33:11 +00:00
Alan Cox
ad49abc087 o Make a correction to the last change: In aio_cancel(2) return AIO_ALLDONE
instead of EINVAL if p->p_aioinfo is NULL.
2002-08-11 19:04:17 +00:00
David Malone
af338bea64 Make kern.log_console_output a tuneable aswell as a sysctl.
MFC after:	1 week
2002-08-11 18:47:42 +00:00
Jens Schweikhardt
2b239dd118 Fix typos; each file has at least one s/seperat/separat/
(I skipped those in contrib/, gnu/ and crypto/)
While I was at it, fixed a lot more found by ispell that I
could identify with certainty to be errors. All of these
were in comments or text, not in actual code.

Suggested by:	bde
MFC after:	3 days
2002-08-11 13:05:30 +00:00
Alan Cox
b6c1f1efa2 o In aio_cancel(2), make sure that p->p_aioinfo isn't NULL before
dereferencing it.

Submitted by:	saureen <sshah@apple.com>
2002-08-11 04:09:14 +00:00
Maxime Henrion
5965373e69 - Introduce a new struct xvfsconf, the userland version of struct vfsconf.
- Make getvfsbyname() take a struct xvfsconf *.
- Convert several consumers of getvfsbyname() to use struct xvfsconf.
- Correct the getvfsbyname.3 manpage.
- Create a new vfs.conflist sysctl to dump all the struct xvfsconf in the
  kernel, and rewrite getvfsbyname() to use this instead of the weird
  existing API.
- Convert some {set,get,end}vfsent() consumers to use the new vfs.conflist
  sysctl.
- Convert a vfsload() call in nfsiod.c to kldload() and remove the useless
  vfsisloadable() and endvfsent() calls.
- Add a warning printf() in vfs_sysctl() to tell people they are using
  an old userland.

After these changes, it's possible to modify struct vfsconf without
breaking the binary compatibility.  Please note that these changes don't
break this compatibility either.

When bp will have updated mount_smbfs(8) with the patch I sent him, there
will be no more consumers of the {set,get,end}vfsent(), vfsisloadable()
and vfsload() API, and I will promptly delete it.
2002-08-10 20:19:04 +00:00
Maxime Henrion
306e6b8393 Introduce a new sysctl flag, CTLFLAG_SKIP, which will cause
sysctl_sysctl_next() to skip this sysctl.  The sysctl is
still available, but doesn't appear in a "sysctl -a".

This is especially useful when you want to deprecate a sysctl,
and add a warning into it to warn users that they are using
an old interface.  Without this flag, the warning would get
echoed when running "sysctl -a" (which happens at boot).
2002-08-10 19:56:45 +00:00
Jacques Vidrine
5b770403b5 While we're at it, add range checks similar to those in previous commit to
getsockname() and getpeername(), too.
2002-08-09 12:58:11 +00:00
Robert Watson
82d9ad331a Add additional range checks for copyout targets.
Submitted by:	Silvio Cesare <silvio@qualys.com>
2002-08-09 05:50:32 +00:00
Bosko Milekic
850be9af25 Only my brain can fart while fixing a previous brain fart. 2002-08-08 13:31:57 +00:00
Bosko Milekic
0584320e56 YIKES, I take the pointy-hat for a really big braino here. I
appologize to those of you who may have been seeing crashes in
code that uses sendfile(2) or other types of external buffers
with mbufs.

Pointed out by, and provided trace:
    Niels Chr. Bank-Pedersen <ncbp at bank-pedersen.dk>
2002-08-08 13:29:32 +00:00
Robert Watson
92e35b6006 Due to layering problems, remove the MAC checks from vn_rdwr() -- this
VOP wrapper is called from within file systems so can result in odd
loopback effects when MAC enforcement is use with the active (as
opposed to saved) credential.  These checks will be moved elsewhere.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-08 12:45:30 +00:00
Julian Elischer
6933e3c12b Do some work on keeping better track of stopped/continued state.
I'm not sure what happenned to the original setting of the P_CONTINUED
flag. it appears to have been lost in the paper shuffling...

Submitted by:	David Xu <bsddiy@yahoo.com>
2002-08-08 06:18:41 +00:00
Robert Watson
55ac5e1861 Correct a bug introduced in 1.26: M_PKTHDR is set in the 'flags'
argument, not the 'type' argument.  As a result of the buf, the
MAC label on some packet header mbufs might not be set in mbufs
allocated using m_getcl(), resulting in a page fault.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-07 20:15:29 +00:00
Thomas Moestl
d88d37d604 Use the CPU_* OID constants instead of OID_AUTO for the clock-related
sysctls for compatability with old applications.
2002-08-07 19:43:54 +00:00
Robert Watson
2d70161756 Cache the credential provided during accton() for use in later accounting
vnode operations.  This permits the rights of the user (typically root)
used to turn on accounting to be used when writing out accounting entries,
rather than the credentials of the process generating the accounting
record.  This fixes accounting in a number of environments, including
file systems that offer revocation support, MAC environments, some
securelevel scenarios, and in some NFS environments.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-07 19:30:16 +00:00
Robert Watson
4d1a4bb79f Refresh the credential on the first initproc thread following divorcing
the initproc credential from the proc0 credential.  Otherwise, the
proc0 credential is used instead of initproc's credentil when authorizing
start_init() activities prior to initproc hitting userland for the
first time.  This could result in the incorrect credential being used
to authorize mounting of the root file system, which could in turn cause
problems for NFS when used in combination with uid/gid ipfw rules, or
with MAC.

Discussed with:	julian
2002-08-07 17:53:31 +00:00
Matthew N. Dodd
df95311a10 Move code block added in 1.157 to a safer part of fork1().
Submitted by:	 jake
2002-08-07 11:31:45 +00:00
Alan Cox
b46f1c55f9 Set the ident field of the struct kevent that is registered by _aio_aqueue()
to the address of the user's aiocb rather than the kernel's aiocb.  (In other
words, prior to this change, the ident field returned by kevent(2) on
completion of an AIO was effectively garbage.)

Submitted by:	Romer Gil <rgil@cs.rice.edu>
2002-08-06 19:01:08 +00:00
Jake Burkholder
a520b73cdc Remove new console devices with cnremove before initializing them in
cninit.  This allows a console driver to replace the existing console
by calling cninit again, eg during the device probe.  Otherwise the
multiple console code sends output to both, which is unfortunate if
they're using the same hardware.
2002-08-06 18:56:41 +00:00
Bruce Evans
1c530be49c Try harder to "set signal flags proprly [sic] for ast()". See rev.1.154. 2002-08-06 15:22:09 +00:00
Robert Watson
d2118dfaba Regen. 2002-08-06 15:16:55 +00:00
Robert Watson
280f0785e8 Rename mac_policy() to mac_syscall() to be more reflective of its
purpose.

Submitted by:	cvance@tislabs.com
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-06 15:15:53 +00:00
Don Lewis
b74f2c1878 Don't automagically call vslock() from SYSCTL_OUT(). Instead, complain
about calls to SYSCTL_OUT() made with locks held if the buffer has not
been pre-wired.  SYSCTL_OUT() should not be called while holding locks,
but if this is not possible, the buffer should be wired by calling
sysctl_wire_old_buffer() before grabbing any locks.
2002-08-06 11:28:09 +00:00
Alan Cox
20fb589d13 o The introduction of kevent() broke lio_listio(): _aio_aqueue() thought
that LIO_READ and LIO_WRITE were requests for kevent()-based
   notification of completion.  Modify _aio_aqueue() to recognize LIO_READ
   and LIO_WRITE.

Notes: (1) The patch provided by the PR perpetuates a second bug in this
code, a direct access to user-space memory.  This change fixes that bug
as well. (2) This change is to code that implements a deprecated interface.
It should probably be removed after an MFC.

PR:		kern/39556
2002-08-05 19:14:27 +00:00
Dag-Erling Smørgrav
ea4c8f8ca1 Check the far end before registering an EVFILT_WRITE filter on a pipe. 2002-08-05 15:03:03 +00:00
Jeff Roberson
8947be9ba0 - Move some logic from getnewvnode() to a new function vcanrecycle()
- Unlock the free list mutex around vcanrecycle to prevent a lock order
   reversal.
2002-08-05 10:15:56 +00:00
Jeff Roberson
18c6acee26 - Move a VOP assert to the right place.
Spotted by:	i386 tinderbox
2002-08-05 08:55:53 +00:00
Alfred Perlstein
4442e4a436 Cleanup:
Fix line wrapping.
Remove 'register'.
malloc(9) with M_WAITOK can't fail, so remove checks for that.
2002-08-05 05:16:09 +00:00
Luigi Rizzo
fc1c73c21a Temporarily disable polling when no processes are active, while I
investigate the problem described below.

I am seeing some strange livelock on recent -current sources with
a slow box under heavy load, which disappears with this change.
This might suggest some kind of problem (either insufficient locking,
or mishandling of priorities) in the poll_idle thread.
2002-08-04 21:00:49 +00:00
Jeff Roberson
e6e370a7fe - Replace v_flag with v_iflag and v_vflag
- v_vflag is protected by the vnode lock and is used when synchronization
   with VOP calls is needed.
 - v_iflag is protected by interlock and is used for dealing with vnode
   management issues.  These flags include X/O LOCK, FREE, DOOMED, etc.
 - All accesses to v_iflag and v_vflag have either been locked or marked with
   mp_fixme's.
 - Many ASSERT_VOP_LOCKED calls have been added where the locking was not
   clear.
 - Many functions in vfs_subr.c were restructured to provide for stronger
   locking.

Idea stolen from:	BSD/OS
2002-08-04 10:29:36 +00:00
Alan Cox
4453ada654 o Convert a vm_page_sleep_busy() into a vm_page_sleep_if_busy()
with appropriate page queue locking.
2002-08-04 06:27:37 +00:00
Matthew N. Dodd
9ccba881d9 Kernel modifications necessary to allow to follow fork()ed children.
PR:		 bin/25587 (in part)
MFC after:	 3 weeks
2002-08-04 01:07:02 +00:00
Alan Cox
3327872297 o Convert two instances of vm_page_sleep_busy() to vm_page_sleep_if_busy()
with appropriate page queue locking.
2002-08-03 18:59:19 +00:00
Maxime Henrion
f2b17113cf Make the consumers of the linker_load_file() function use
linker_load_module() instead.

This fixes a bug where the kernel was unable to properly locate and
load a kernel module in vfs_mount() (and probably in the netgraph
code as well since it was using the same function).  This is because
the linker_load_file() does not properly search the module path.

Problem found by:	peter
Reviewed by:		peter
Thanks to:		peter
2002-08-02 20:56:07 +00:00
Robert Watson
18b770b2fb Introduce support for Mandatory Access Control and extensible
kernel access control.

Invoke appropriate MAC framework entry points to authorize readdir()
operations in the native ABI.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-01 20:44:52 +00:00
Julian Elischer
67759b33f6 Fix a comment. 2002-08-01 19:10:40 +00:00
Julian Elischer
04774f2357 Slight cleanup of some comments/whitespace.
Make idle process state more consistant.
Add an assert on thread state.
Clean up idleproc/mi_switch() interaction.
Use a local instead of referencing curthread 7 times in a row
(I've been told curthread can be expensive on some architectures)
Remove some commented out code.
Add a little commented out code (completion coming soon)

Reviewed by:	jhb@freebsd.org
2002-08-01 18:45:10 +00:00
Robert Watson
ee0812f320 Since we have the struct file data pointer cached in vp, use that
instead when invoking VOP_POLL().
2002-08-01 18:29:30 +00:00
Alan Cox
46086ddf91 o Acquire the page queues lock before calling vm_page_io_finish().
o Assert that the page queues lock is held in vm_page_io_finish().
2002-08-01 17:57:42 +00:00
Robert Watson
f9d0d52459 Include file cleanup; mac.h and malloc.h at one point had ordering
relationship requirements, and no longer do.

Reminded by:	bde
2002-08-01 17:47:56 +00:00
Robert Watson
4a58340e98 Introduce support for Mandatory Access Control and extensible
kernel access control

Invoke appropriate MAC framework entry points to authorize a number
of vnode operations, including read, write, stat, poll.  This permits
MAC policies to revoke access to files following label changes,
and to limit information spread about the file to user processes.

Note: currently the file cached credential is used for some of
these authorization check.  We will need to expand some of the
MAC entry point APIs to permit multiple creds to be passed to
the access control check to allow diverse policy behavior.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-01 17:23:22 +00:00
Robert Watson
37bde6c0a3 Introduce support for Mandatory Access Control and extensible
kernel access control.

Restructure the vn_open_cred() access control checks to invoke
the MAC entry point for open authorization.  Note that MAC can
reject open requests where existing DAC code skips the open
authorization check due to O_CREAT.  However, the failure mode
here is the same as other failure modes following creation,
wherein an empty file may be left behind.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-01 17:14:28 +00:00
Robert Watson
f4d2cfdda6 Introduce support for Mandatory Access Control and extensible
kernel access control.

Invoke appropriate MAC entry points to authorize the following
operations:

        truncate on open()                      (write)
        access()                                (access)
        readlink()                              (readlink)
        chflags(), lchflags(), fchflags()       (setflag)
        chmod(), fchmod(), lchmod()             (setmode)
        chown(), fchown(), lchown()             (setowner)
        utimes(), lutimes(), futimes()          (setutimes)
        truncate(), ftrunfcate()                (write)
        revoke()                                (revoke)
        fhopen()                                (open)
        truncate on fhopen()                    (write)
        extattr_set_fd, extattr_set_file()      (setextattr)
        extattr_get_fd, extattr_get_file()      (getextattr)
        extattr_delete_fd(), extattr_delete_file() (setextattr)

These entry points permit MAC policies to enforce a variety of
protections on vnodes.  More vnode checks to come, especially in
non-native ABIs.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-01 15:37:12 +00:00
Robert Watson
339b79b939 Introduce support for Mandatory Access Control and extensible
kernel access control.

Invoke an appropriate MAC entry point to authorize execution of
a file by a process.  The check is placed slightly differently
than it appears in the trustedbsd_mac tree so that it prevents
a little more information leakage about the target of the execve()
operation.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-01 14:31:58 +00:00
Bosko Milekic
abc1263a51 Move the MAC label init/destroy stuff to more appropriate places so that
the inits/destroys are done without the cache locks held even in the
persistent-lock calls.  I may be cheating a little by using the MAC
"already initialized" flag for now.
2002-08-01 14:24:41 +00:00
John Baldwin
12240b1159 Revert previous revision which accidentally snuck in with another commit.
It just removed a comment that doesn't make sense to me personally.
2002-08-01 13:44:33 +00:00
John Baldwin
0711ca46c5 Revert previous revision which was accidentally committed and has not been
tested yet.
2002-08-01 13:39:33 +00:00
John Baldwin
fbd140c786 If we fail to write to a vnode during a ktrace write, then we drop all
other references to that vnode as a trace vnode in other processes as well
as in any pending requests on the todo list.  Thus, it is possible for a
ktrace request structure to have a NULL ktr_vp when it is destroyed in
ktr_freerequest().  We shouldn't call vrele() on the vnode in that case.

Reported by:	bde
2002-08-01 13:35:38 +00:00
Robert Watson
b3e13e1c3f Introduce support for Mandatory Access Control and extensible
kernel access control.

Instrument chdir() and chroot()-related system calls to invoke
appropriate MAC entry points to authorize the two operations.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-01 03:50:08 +00:00
Robert Watson
b827919594 Introduce support for Mandatory Access Control and extensible
kernel access control.

Implement two IOCTLs at the socket level to retrieve the primary
and peer labels from a socket.  Note that this user process interface
will be changing to improve multi-policy support.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-01 03:45:40 +00:00
Robert Watson
b285e7f9a8 Improve formatting and variable use consistency in extattr system
calls.

Submitted by:	green
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-01 01:29:03 +00:00
Robert Watson
956fc3f8a5 Simplify the logic to enter VFS_EXTATTRCTL().
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-01 01:26:07 +00:00
Robert Watson
d03db4290d Introduce support for Mandatory Access Control and extensible
kernel access control.

Authorize vop_readlink() and vop_lookup() activities during recursive
path lookup via namei() via calls to appropriate MAC entry points.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-01 01:21:40 +00:00
Robert Watson
6ea48a903c Introduce support for Mandatory Access Control and extensible
kernel access control.

Authorize the creation of UNIX domain sockets in the file system
namespace via an appropriate invocation a MAC framework entry
point.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-01 01:18:42 +00:00
Robert Watson
b65f6f6b69 When invoking NDINIT() in preparation for CREATE, set SAVENAME since
we'll use nd.ni_cnp later.

Submitted by:	green
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-01 01:16:22 +00:00
Robert Watson
62b24bcc26 Introduce support for Mandatory Access Control and extensible
kernel access control.

Instrument ctty driver invocations of various vnode operations on the
terminal controlling tty to perform appropriate MAC framework
authorization checks.

Note: VOP_IOCTL() on the ctty appears to be authorized using NOCRED in
the existing code rather than td->td_ucred.  Why?

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-01 01:09:54 +00:00
Robert Watson
467a273ca0 Introduce support for Mandatory Access Control and extensible
kernel access control.

Instrument the ktrace write operation so that it invokes the MAC
framework's vnode write authorization check.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-01 01:07:03 +00:00
Robert Watson
c86ca022eb Introduce support for Mandatory Access Control and extensible
kernel access control.

Instrument the kernel ACL retrieval and modification system calls
to invoke MAC framework entry points to authorize these operations.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-01 01:04:16 +00:00
Robert Watson
62f5f684fb Introduce support for Mandatory Access Control and extensible
kernel access control.

Instrument connect(), listen(), and bind() system calls to invoke
MAC framework entry points to permit policies to authorize these
requests.  This can be useful for policies that want to limit
the activity of processes involving particular types of IPC and
network activity.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-31 16:39:49 +00:00
Dag-Erling Smørgrav
aefe27a25c Have the kern.file sysctl export xfiles rather than files. The truth is
out there!

Sponsored by:	DARPA, NAI Labs
2002-07-31 12:26:52 +00:00
Dag-Erling Smørgrav
3072197229 Nit in previous commit: the correct sysctl type is "S,xvnode" 2002-07-31 12:25:28 +00:00
Dag-Erling Smørgrav
217b2a0b61 Initialize v_cachedid to -1 in getnewvnode().
Reintroduce the kern.vnode sysctl and make it export xvnodes rather than
vnodes.

Sponsored by:	DARPA, NAI Labs
2002-07-31 12:24:35 +00:00
Dag-Erling Smørgrav
4eee8de77c Introduce struct xvnode, which will be used instead of struct vnode for
sysctl purposes.  Also add two fields to struct vnode, v_cachedfs and
v_cachedid, which hold the vnode's device and file id and are filled in
by vn_open_cred() and vn_stat().

Sponsored by:	DARPA, NAI Labs
2002-07-31 12:19:49 +00:00
Alan Cox
67c1fae92e o Lock page accesses by vm_page_io_start() with the page queues lock.
o Assert that the page queues lock is held in vm_page_io_start().
2002-07-31 07:27:08 +00:00
Robert Watson
335654d73e Introduce support for Mandatory Access Control and extensible
kernel access control.

Invoke the necessary MAC entry points to maintain labels on sockets.
In particular, invoke entry points during socket allocation and
destruction, as well as creation by a process or during an
accept-scenario (sonewconn).  For UNIX domain sockets, also assign
a peer label.  As the socket code isn't locked down yet, locking
interactions are not yet clear.  Various protocol stack socket
operations (such as peer label assignment for IPv4) will follow.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-31 03:03:22 +00:00
Robert Watson
07bdba7e2d Note that the privilege indicating flag to vaccess() originally used
by the process accounting system is now deprecated.
2002-07-31 02:05:12 +00:00
Robert Watson
a0ee6ed1c0 Introduce support for Mandatory Access Control and extensible
kernel access control.

Invoke the necessary MAC entry points to maintain labels on vnodes.
In particular, initialize the label when the vnode is allocated or
reused, and destroy the label when the vnode is going to be released,
or reused.  Wow, an object where there really is exactly one place
where it's allocated, and one other where it's freed.  Amazing.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-31 02:03:46 +00:00
Robert Watson
e32a5b94d8 Introduce support for Mandatory Access Control and extensible
kernel access control.

Invoke additional MAC entry points when an mbuf packet header is
copied to another mbuf: release the old label if any, reinitialize
the new header, and ask the MAC framework to copy the header label
data.  Note that this requires a potential allocation operation,
but m_copy_pkthdr() is not permitted to fail, so we must block.
Since we now use interrupt threads, this is possible, but not
desirable.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-31 01:51:34 +00:00
Robert Watson
a3abeda755 Introduce support for Mandatory Access Control and extensible
kernel access control.

Invoke the necessary MAC entry points to maintain labels on header
mbufs.  In particular, invoke entry points during the two mbuf
header allocation cases, and the mbuf freeing case.  Pass the "how"
argument at allocation time to the MAC framework so that it can
determine if it is permitted to block (as with policy modules),
and permit the initialization entry point to fail if it needs to
allocate memory but is not permitted to, failing the mbuf
allocation.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-31 01:42:19 +00:00
Robert Watson
2712d0ee89 Introduce support for Mandatory Access Control and extensible
kernel access control.

Implement MAC framework access control entry points relating to
operations on mountpoints.  Currently, this consists only of
access control on mountpoint listing using the various statfs()
variations.  In the future, it might also be desirable to
implement checks on mount() and unmount().

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-31 01:27:33 +00:00
Robert Watson
a87cdf8335 Introduce support for Mandatory Access Control and extensible
kernel access control.

Invoke the necessary MAC entry points to maintain labels on
mount structures.  In particular, invoke entry points for
intialization and destruction in various scenarios (root,
non-root).  Also introduce an entry point in the boot procedure
following the mount of the root file system, but prior to the
start of the userland init process to permit policies to
perform further initialization.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-31 01:11:29 +00:00
Robert Watson
8a1d977d66 Introduce support for Mandatory Access Control and extensible
kernel access control.

Implement inter-process access control entry points for the MAC
framework.  This permits policy modules to augment the decision
making process for process and socket visibility, process debugging,
re-scheduling, and signaling.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-31 00:48:24 +00:00
Robert Watson
4024496496 Introduce support for Mandatory Access Control and extensible
kernel access control.

Invoke the necessary MAC entry points to maintain labels on
process credentials.  In particular, invoke entry points for
the initialization and destruction of struct ucred, the copying
of struct ucred, and permit the initial labels to be set for
both process 0 (parent of all kernel processes) and process 1
(parent of all user processes).

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-31 00:39:19 +00:00
Robert Watson
47ac133d33 Regen. 2002-07-31 00:16:58 +00:00
Robert Watson
55fb783052 Introduce support for Mandatory Access Control and extensible
kernel access control.

Replace 'void *' with 'struct mac *' now that mac.h is in the base
tree.  The current POSIX.1e-derived userland MAC interface is
schedule for replacement, but will act as a functional placeholder
until the replacement is done.  These system calls allow userland
processes to get and set labels on both the current process, as well
as file system objects and file descriptor backed objects.
2002-07-30 22:43:20 +00:00
Robert Watson
f8ef020e2e Begin committing support for Mandatory Access Control and extensible
kernel access control.  The MAC framework permits loadable kernel
modules to link to the kernel at compile-time, boot-time, or run-time,
and augment the system security policy.  This commit includes the
initial kernel implementation, although the interface with the userland
components of the operating system is still under work, and not all
kernel subsystems are supported.  Later in this commit sequence,
documentation of which kernel subsystems will not work correctly with
a kernel compiled with MAC support will be added.

Introduce two node vnode operations required to support MAC.  First,
VOP_REFRESHLABEL(), which will be invoked by callers requiring that
vp->v_label be sufficiently "fresh" for access control purposes.
Second, VOP_SETLABEL(), which be invoked by callers requiring that
the passed label contents be updated.  The file system is responsible
for updating v_label if appropriate in coordination with the MAC
framework, as well as committing to disk.  File systems that are
not MAC-aware need not implement these VOPs, as the MAC framework
will default to maintaining a single label for all vnodes based
on the label on the file system mount point.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-30 22:15:09 +00:00
Robert Watson
95fab37ea8 Begin committing support for Mandatory Access Control and extensible
kernel access control.  The MAC framework permits loadable kernel
modules to link to the kernel at compile-time, boot-time, or run-time,
and augment the system security policy.  This commit includes the
initial kernel implementation, although the interface with the userland
components of the oeprating system is still under work, and not all
kernel subsystems are supported.  Later in this commit sequence,
documentation of which kernel subsystems will not work correctly with
a kernel compiled with MAC support will be added.

kern_mac.c contains the body of the MAC framework.  Kernel and
user APIs defined in mac.h are implemented here, providing a front end
to loaded security modules.  This code implements a module registration
service, state (label) management, security configuration and policy
composition.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-30 21:36:05 +00:00
Julian Elischer
4d492b4369 Don't need to hold schedlock specifically for stop() ans it calls wakeup()
that locks it anyhow.

Reviewed by: jhb@freebsd.org
2002-07-30 21:13:48 +00:00
Bosko Milekic
c89137ff90 Make reference counting for mbuf clusters [only] work like in RELENG_4.
While I don't think this is the best solution, it certainly is the
fastest and in trying to find bottlenecks in network related code
I want this out of the way, so that I don't have to think about it.
What this means, for mbuf clusters anyway is:
- one less malloc() to do for every cluster allocation (replaced with
  a relatively quick calculation + assignment)
- no more free() in the cluster free case (replaced with empty space) :-)

This can offer a substantial throughput improvement, but it may not for
all cases.  Particularly noticable for larger buffer sends/recvs.
See http://people.freebsd.org/~bmilekic/code/measure2.txt for a rough
idea.
2002-07-30 21:06:27 +00:00
Alan Cox
1812190d09 o Replace vm_page_sleep_busy() with vm_page_sleep_if_busy()
in vfs_busy_pages().
2002-07-30 20:41:10 +00:00
Julian Elischer
b8e45df779 Remove code that removes thread from sleep queue before
adding it to a condvar wait.
We do not have asleep() any more so this can not happen.
2002-07-30 20:34:30 +00:00
Alan Cox
1161b86a15 o In do_sendfile(), replace vm_page_sleep_busy() by vm_page_sleep_if_busy()
and extend the scope of the page queues lock to cover all accesses
   to the page's flags and busy fields.
2002-07-30 18:51:07 +00:00
Robert Watson
e66c87b70e When referencing nd_cnp after namei(), always pass SAVENAME into
NDINIT() operation flags.

Submitted by:	green
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-30 18:48:25 +00:00
Robert Watson
e37b1fcdee Make M_COPY_PKTHDR() macro into a wrapper for a m_copy_pkthdr()
function.  This permits conditionally compiled extensions to the
packet header copying semantic, such as extensions to copy MAC
labels.

Reviewed by:	bmilekic
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-30 18:28:58 +00:00
Robert Watson
4266d0d0ce Regen. 2002-07-30 16:52:22 +00:00
Robert Watson
aedbd622fe Introduce a mac_policy() system call that will provide MAC policies
with a general purpose front end entry point for user applications
to invoke.  The MAC framework will route the system call to the
appropriate policy by name.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-30 16:50:25 +00:00
Jacques Vidrine
89ab930718 For processes which are set-user-ID or set-group-ID, the kernel performs a few
special actions for safety.  One of these is to make sure that file descriptors
0..2 are in use, by opening /dev/null for those that are not already open.
Another is to close any file descriptors 0..2 that reference procfs.  However,
these checks were made out of order, so that it was still possible for a
set-user-ID or set-group-ID process to be started with some of the file
descriptors 0..2 unused.

Submitted by:	Georgi Guninski <guninski@guninski.com>
2002-07-30 15:38:29 +00:00
Seigo Tanimura
133267776c In endtsleep() and cv_timedwait_end(), a thread marked TDF_TIMEOUT may
be swapped out.  Do not put such the thread directly back to the run
queue.

Spotted by:	David Xu <davidx@viasoft.com.cn>

While I am here, s/PS_TIMEOUT/TDF_TIMEOUT/.
2002-07-30 10:12:11 +00:00
Jeff Roberson
1e4c7a1368 - Acknowledge recursive vnode locks in the vop_unlock specification. The
vnode may not be unlocked even if the operation succeeded.
2002-07-30 08:50:52 +00:00
Seigo Tanimura
9eb881f804 - Optimize wakeup() and its friends; if a thread waken up is being
swapped in, we do not have to ask for the scheduler thread to do
  that.

- Assert that a process is not swapped out in runq functions and
  swapout().

- Introduce thread_safetoswapout() for readability.

- In swapout_procs(), perform a test that may block (check of a
  thread working on its vm map) first.  This lets us call swapout()
  with the sched_lock held, providing a better atomicity.
2002-07-30 06:54:05 +00:00
Mike Silbersack
c4441bc769 Update docs to reflect change in count of procs reserved for root
from 1 to 10.

PR:             kern/40515
Submitted by:   David Schultz <dschultz@uclink.Berkeley.EDU>
MFC after:      1 day
2002-07-30 05:37:00 +00:00
Robert Watson
03a719dcd1 Rebuild of files generated from syscalls.master.
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-30 02:09:24 +00:00
Robert Watson
5d37d00afc Prototype function arguments, only with MAC-specific structures
replaced with void until we bring in the actual structure definitions.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-30 02:06:34 +00:00
Robert Watson
7bc8250003 Stubs for the TrustedBSD MAC system calls to permit TrustedBSD MAC
userland code to operate on kernel's from the main tree.  Not much
in this file yet.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-30 02:04:05 +00:00
Julian Elischer
1d7b9ed2e6 Create a new thread state to describe threads that would be ready to run
except for the fact tha they are presently swapped out. Also add a process
flag to indicate that the process has started the struggle to swap
back in. This will be  needed for the case where multiple threads
start the swapin action top a collision. Also add code to stop
a process fropm being swapped out if one of the threads in this
process is actually off running on another CPU.. that might hurt...

Submitted by:	Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp>
2002-07-29 18:33:32 +00:00
Jeff Roberson
a562685f65 - Backout the patch made in revision 1.75 of vfs_mount.c. The vputs here
were hiding the real problem of the missing unlock in sync_inactive.
 - Add the missing unlock in sync_inactive.

Submitted by:	iedowse
2002-07-29 06:26:55 +00:00
Don Lewis
9e74cba35a Make a temporary copy of the output data in the generic sysctl handlers
so that the data is less likely to be inconsistent if SYSCTL_OUT() blocks.
If the data is large, wire the output buffer instead.

This is somewhat less than optimal, since the handler could skip the copy
if it knew that the data was static.

If the data is dynamic, we are still not guaranteed to get a consistent
copy since another processor could change the data while the copy is in
progress because the data is not locked.  This problem could be solved if
the generic handlers had the ability to grab the proper lock before the
copy and release it afterwards.

This may duplicate work done in other sysctl handlers in the kernel which
also copy the data, possibly while a lock is held, before calling they call
a generic handler to output the data.  These handlers should probably call
SYSCTL_OUT() directly.
2002-07-28 21:06:14 +00:00
Don Lewis
5c38b6dbce Wire the sysctl output buffer before grabbing any locks to prevent
SYSCTL_OUT() from blocking while locks are held.  This should
only be done when it would be inconvenient to make a temporary copy of
the data and defer calling SYSCTL_OUT() until after the locks are
released.
2002-07-28 19:59:31 +00:00
David Malone
25dec7474c If a socket is disconnected for some reason (like a TCP connection
not responding) then drop any data on the outgoing queue in
soisdisconnected because there is no way to get it to its destination
any longer.

The only objection to this patch I got on -net was from Terry, who
wasn't sure that the condition in question could arise, so I provided
some example code.
2002-07-27 23:06:52 +00:00
Robert Watson
d06c0d4d40 Slight restructuring of the logic for credential change case identification
during execve() to use a 'credential_changing' variable.  This makes it
easier to have outstanding patchsets against this code, as well as to
add conditionally defined clauses.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-27 18:06:49 +00:00
John Baldwin
ce39e722ec Disable optimization of spinlocks on UP kernels w/o debugging for now
since it breaks mtx_owned() on spin mutexes when used outside of
mtx_assert().  Unfortunately we currently use it in the i386 MD code
and in the sio(4) driver.

Reported by:	bde
2002-07-27 16:54:23 +00:00
Jeff Roberson
0ea5e55265 - The default for lock, unlock, and islocked is now std* instead of no*. 2002-07-27 05:16:20 +00:00
Robert Drehmel
e25dadb05d Fix -Werror build for sparc64: Use the appropriate conversion
specifier for an 'unsigned int' argument.
2002-07-26 12:57:57 +00:00
Julian Elischer
8625e3721f get suspension counting right.
fix an error message

Submitted by:	David Xu <bsddiy@yahoo.com>
2002-07-25 03:21:35 +00:00
Julian Elischer
e3b9bf7198 fix some style problems and remove a mis-merged assert. 2002-07-25 00:27:39 +00:00