specific privilege names to a broad range of privileges. These may
require some future tweaking.
Sponsored by: nCircle Network Security, Inc.
Obtained from: TrustedBSD Project
Discussed on: arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
Alex Lyashkov <umka at sevcity dot net>,
Skip Ford <skip dot ford at verizon dot net>,
Antoine Brodin <antoine dot brodin at laposte dot net>
begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now
contains the userspace and user<->kernel API and definitions, with all
in-kernel interfaces moved to mac_framework.h, which is now included
across most of the kernel instead.
This change is the first step in a larger cleanup and sweep of MAC
Framework interfaces in the kernel, and will not be MFC'd.
Obtained from: TrustedBSD Project
Sponsored by: SPARTA
ibcs2_getdents(), ibcs2_read(), ogetdirentries(), svr4_sys_getdents(),
and svr4_sys_getdents64() similar to that in getdirentries().
- Mark ibcs2_getdents(), ibcs2_read(), linux_getdents(), linux_getdents64(),
linux_readdir(), ogetdirentries(), svr4_sys_getdents(), and
svr4_sys_getdents64() MPSAFE.
ibcs2_[gs]etgroups() rather than using the stackgap. This also makes
ibcs2_[gs]etgroups() MPSAFE. Also, it cleans up one bit of weirdness in
the old setgroups() where it allocated an entire credential just so it had
a place to copy the group list into. Now setgroups just allocates a
NGROUPS_MAX array on the stack that it copies into and then passes to
kern_setgroups().
the callers if the exec either succeeds or fails early.
- Move the code to call exit1() if the exec fails after the vmspace is
gone to the bottom of kern_execve() to cut down on some code duplication.
the semantics in that the returned filename to use is now a kernel
pointer rather than a user space pointer. This required changing the
arguments to the CHECKALT*() macros some and changing the various system
calls that used pathnames to use the kern_foo() functions that can accept
kernel space filename pointers instead of calling the system call
directly.
- Use kern_open(), kern_access(), kern_execve(), kern_mkfifo(), kern_mknod(),
kern_setitimer(), kern_getrusage(), kern_utimes(), kern_unlink(),
kern_chdir(), kern_chmod(), kern_chown(), kern_symlink(), kern_readlink(),
kern_select(), kern_statfs(), kern_fstatfs(), kern_stat(), kern_lstat(),
kern_fstat().
- Drop the unused 'uap' argument from spx_open().
- Replace a stale duplication of vn_access() in xenix_access() lacking
recent additions such as MAC checks, etc. with a call to kern_access().
options, status pointer and rusage pointer as arguments. It is up to
the caller to copyout the status and rusage to userland if needed. This
lets us axe the 'compat' argument and hide all that functionality in
owait(), by the way. This also cleans up some locking in kern_wait()
since it no longer has to drop locks around copyout() since all the
copyout()'s are deferred.
- Convert owait(), wait4(), and the various ABI compat wait() syscalls to
use kern_wait() rather than wait1() or wait4(). This removes a bit
more stackgap usage.
Tested on: i386
Compiled on: i386, alpha, amd64
- struct plimit includes a mutex to protect a reference count. The plimit
structure is treated similarly to struct ucred in that is is always copy
on write, so having a reference to a structure is sufficient to read from
it without needing a further lock.
- The proc lock protects the p_limit pointer and must be held while reading
limits from a process to keep the limit structure from changing out from
under you while reading from it.
- Various global limits that are ints are not protected by a lock since
int writes are atomic on all the archs we support and thus a lock
wouldn't buy us anything.
- All accesses to individual resource limits from a process are abstracted
behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return
either an rlimit, or the current or max individual limit of the specified
resource from a process.
- dosetrlimit() was renamed to kern_setrlimit() to match existing style of
other similar syscall helper functions.
- The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit()
(it didn't used the stackgap when it should have) but uses lim_rlimit()
and kern_setrlimit() instead.
- The svr4 compat no longer uses the stackgap for resource limits calls,
but uses lim_rlimit() and kern_setrlimit() instead.
- The ibcs2 compat no longer uses the stackgap for resource limits. It
also no longer uses the stackgap for accessing sysctl's for the
ibcs2_sysconf() syscall but uses kernel_sysctl() instead. As a result,
ibcs2_sysconf() no longer needs Giant.
- The p_rlimit macro no longer exists.
Submitted by: mtm (mostly, I only did a few cleanups and catchups)
Tested on: i386
Compiled on: alpha, amd64
- Return NULL instead of returning memory outside of the stackgap
in stackgap_alloc() (FreeBSD-SA-00:42.linux)
- Check for stackgap_alloc() returning NULL in ibcs2_emul_find();
other calls to stackgap_alloc() have not been changed since they
are small fixed-size allocations.
- Replace use of strcpy() with strlcpy() in exec_coff_imgact()
to avoid buffer overflow
- Use strlcat() instead of strcat() to avoid a one byte buffer
overflow in ibcs2_setipdomainname()
- Use copyinstr() instead of copyin() in ibcs2_setipdomainname()
to ensure that the string is null-terminated
- Avoid integer overflow in ibcs2_setgroups() and ibcs2_setgroups()
by checking that gidsetsize argument is non-negative and
no larger than NGROUPS_MAX.
- Range-check signal numbers in ibcs2_wait(), ibcs2_sigaction(),
ibcs2_sigsys() and ibcs2_kill() to avoid accessing array past
the end (or before the start)
Several of the subtypes have an associated vnode which is used for
stuff like the f*() functions.
By giving the vnode a speparate field, a number of checks for the specific
subtype can be replaced simply with a check for f_vnode != NULL, and
we can later free f_data up to subtype specific use.
At this point in time, f_data still points to the vnode, so any code I
might have overlooked will still work.
pointer types, and remove a huge number of casts from code using it.
Change struct xfile xf_data to xun_data (ABI is still compatible).
If we need to add a #define for f_data and xf_data we can, but I don't
think it will be necessary. There are no operational changes in this
commit.
access control: as with SVR4, very few changes required since almost
all services are implemented by wrapping existing native FreeBSD
system calls. Only readdir() calls need additional instrumentation.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs
general cleanup of the API. The entire API now consists of two functions
similar to the pre-KSE API. The suser() function takes a thread pointer
as its only argument. The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0. The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.
Discussed on: smp@
Seigo Tanimura (tanimura) posted the initial delta.
I've polished it quite a bit reducing the need for locking and
adapting it for KSE.
Locks:
1 mutex in each filedesc
protects all the fields.
protects "struct file" initialization, while a struct file
is being changed from &badfileops -> &pipeops or something
the filedesc should be locked.
1 mutex in each struct file
protects the refcount fields.
doesn't protect anything else.
the flags used for garbage collection have been moved to
f_gcflag which was the FILLER short, this doesn't need
locking because the garbage collection is a single threaded
container.
could likely be made to use a pool mutex.
1 sx lock for the global filelist.
struct file * fhold(struct file *fp);
/* increments reference count on a file */
struct file * fhold_locked(struct file *fp);
/* like fhold but expects file to locked */
struct file * ffind_hold(struct thread *, int fd);
/* finds the struct file in thread, adds one reference and
returns it unlocked */
struct file * ffind_lock(struct thread *, int fd);
/* ffind_hold, but returns file locked */
I still have to smp-safe the fget cruft, I'll get to that asap.
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.
Sorry john! (your next MFC will be a doosie!)
Reviewed by: peter@freebsd.org, dillon@freebsd.org
X-MFC after: ha ha ha ha
other "system" header files.
Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.
Sort sys/*.h includes where possible in affected files.
OK'ed by: bde (with reservations)
-----------------------------
The compatibility code and/or emulators have been updated:
iBCS2 now mostly uses the older syscalls. SVR4 now properly
handles all signals. This has been achieved by using the
new sigset_t throughout the emulator. The Linuxulator has
been severely updated. Internally the new Linux sigset_t is
made the default. These are then mapped to and from the
new FreeBSD sigset_t.
Also, rt_sigsuspend has been implemented in the Linuxulator.
Implementing this syscall basicly caused all this sigset_t
changing in the first place and the syscall has been used
throughout the change as a means for testing. It basicly is
too much work to undo the implementation so that it can
later be added again.
A special note on the use of sv_sigtbl and sv_sigsize in
struct sysentvec:
Every signal larger than sv_sigsize is not translated and is
passed on to the signal handler unmodified. Signals in the
range 1 upto and including sv_sigsize are translated.
The rationale is that only the system defined signals need to
be translated.
The emulators also have been updated so that the translation
tables are only indexed for valid (system defined) signals.
This change also fixes the translation bug already in the
SVR4 emulator.
1:
s/suser/suser_xxx/
2:
Add new function: suser(struct proc *), prototyped in <sys/proc.h>.
3:
s/suser_xxx(\([a-zA-Z0-9_]*\)->p_ucred, \&\1->p_acflag)/suser(\1)/
The remaining suser_xxx() calls will be scrutinized and dealt with
later.
There may be some unneeded #include <sys/cred.h>, but they are left
as an exercise for Bruce.
More changes to the suser() API will come along with the "jail" code.
not the necessarily the same as the seconds part of getmicrotime()
yet, and anyway, we should have used `time_second' if we only wanted
a sloppy value for the seconds part. There is no point in making
ibcs2's time(2) more efficient than FreeBSD's time(3).
it in struct proc instead.
This fixes a boatload of compiler warning, and removes a lot of cruft
from the sources.
I have not removed the /*ARGSUSED*/, they will require some looking at.
libkvm, ps and other userland struct proc frobbing programs will need
recompiled.
were returning EFAULT, when it is a completely acceptable thing to do.
Also, at the same time, be a *bit* optimizing and don't allocate any
"stackgrap" memory if we're not going to use it.
This is another Oracle-discovered problem.
Submitted by: Steven Wallace
struct direct, not using UFS' definition of DIRBLKSIZ, using directory
seek cookies to make reading non-UFS directories reliable
(e.g. cd9660, ext2fs).
A special thanks to Robert Eckardt for providing an ISC binary of GNU
ls so that I could test these changes.