audit event identifier associated with each system call, which will
be stored by makesyscalls.sh in the sy_auevent field of struct sysent.
For now, default the audit identifier on all system calls to AUE_NULL,
but in the near future, other BSM event identifiers will be used. The
mapping of system calls to event identifiers is many:one due to
multiple system calls that map to the same end functionality across
compatibility wrappers, ABI wrappers, etc.
Submitted by: wsalamon
Obtained from: TrustedBSD Project
the type of object represented by the handle argument.
- Allow vm_mmap() to map device memory via cdev objects in addition to
vnodes and anonymous memory. Note that mmaping a cdev directly does not
currently perform any MAC checks like mapping a vnode does.
- Unbreak the DRM getbufs ioctl by having it call vm_mmap() directly on the
cdev the ioctl is acting on rather than trying to find a suitable vnode
to map from.
Reviewed by: alc, arch@
svr4_do_getmsg(). In principle this bug could disclose data from
kernel memory, but in practice, the SVR4 emulation layer is probably
not functional enough to cause the relevant code path to be executed.
In any case, the emulator has been disconnected from the build since
5.0-RELEASE.
Found by: Coverity Prevent analysis tool
o change the mapping arrays to have a zero offset rather than base 1;
this eliminates lots of signo adjustments and brings the code
back inline with the original netbsd code
o purge use of SVR4_SIGTBLZ; SVR4_NSIG is the only definition for
how big a mapping array is
o change the mapping loops to explicitly ignore signal 0
o purge some bogus code from bsd_to_svr4_sigset
o adjust svr4_sysentvec to deal with the mapping table change
Enticed into fixing by: Coverity Prevent analysis tool
Glanced at by: marcel, jhb
former is callable from user space and the latter from the kernel one. Make
kernel version take additional argument which tells if the respective call
should check for additional restrictions for sending signals to suid/sugid
applications or not.
Make all emulation layers using non-checked version, since signal numbers in
emulation layers can have different meaning that in native mode and such
protection can cause misbehaviour.
As a result remove LIBTHR from the signals allowed to be delivered to a
suid/sugid application.
Requested (sorta) by: rwatson
MFC after: 2 weeks
the semantics in that the returned filename to use is now a kernel
pointer rather than a user space pointer. This required changing the
arguments to the CHECKALT*() macros some and changing the various system
calls that used pathnames to use the kern_foo() functions that can accept
kernel space filename pointers instead of calling the system call
directly.
- Use kern_open(), kern_access(), kern_msgctl(), kern_execve(),
kern_mkfifo(), kern_mknod(), kern_statfs(), kern_fstatfs(),
kern_setitimer(), kern_stat(), kern_lstat(), kern_fstat(), kern_utimes(),
kern_pathconf(), and kern_unlink().
copies arguments into the kernel space and one that operates
completely in the kernel space;
o use kernel-only version of execve(2) to kill another stackgap in
linuxlator/i386.
Obtained from: DragonFlyBSD (partially)
MFC after: 2 weeks
Use this in all the places where sleeping with the lock held is not
an issue.
The distinction will become significant once we finalize the exact
lock-type to use for this kind of case.
the raw values including for child process statistics and only compute the
system and user timevals on demand.
- Fix the various kern_wait() syscall wrappers to only pass in a rusage
pointer if they are going to use the result.
- Add a kern_getrusage() function for the ABI syscalls to use so that they
don't have to play stackgap games to call getrusage().
- Fix the svr4_sys_times() syscall to just call calcru() to calculate the
times it needs rather than calling getrusage() twice with associated
stackgap, etc.
- Add a new rusage_ext structure to store raw time stats such as tick counts
for user, system, and interrupt time as well as a bintime of the total
runtime. A new p_rux field in struct proc replaces the same inline fields
from struct proc (i.e. p_[isu]ticks, p_[isu]u, and p_runtime). A new p_crux
field in struct proc contains the "raw" child time usage statistics.
ruadd() has been changed to handle adding the associated rusage_ext
structures as well as the values in rusage. Effectively, the values in
rusage_ext replace the ru_utime and ru_stime values in struct rusage. These
two fields in struct rusage are no longer used in the kernel.
- calcru() has been split into a static worker function calcru1() that
calculates appropriate timevals for user and system time as well as updating
the rux_[isu]u fields of a passed in rusage_ext structure. calcru() uses a
copy of the process' p_rux structure to compute the timevals after updating
the runtime appropriately if any of the threads in that process are
currently executing. It also now only locks sched_lock internally while
doing the rux_runtime fixup. calcru() now only requires the caller to
hold the proc lock and calcru1() only requires the proc lock internally.
calcru() also no longer allows callers to ask for an interrupt timeval
since none of them actually did.
- calcru() now correctly handles threads executing on other CPUs.
- A new calccru() function computes the child system and user timevals by
calling calcru1() on p_crux. Note that this means that any code that wants
child times must now call this function rather than reading from p_cru
directly. This function also requires the proc lock.
- This finishes the locking for rusage and friends so some of the Giant locks
in exit1() and kern_wait() are now gone.
- The locking in ttyinfo() has been tweaked so that a shared lock of the
proctree lock is used to protect the process group rather than the process
group lock. By holding this lock until the end of the function we now
ensure that the process/thread that we pick to dump info about will no
longer vanish while we are trying to output its info to the console.
Submitted by: bde (mostly)
MFC after: 1 month
directly. This removes a few more users of the stackgap and also marks
the syscalls using these wrappers MP safe where appropriate.
Tested on: i386 with linux acroread5
Compiled on: i386, alpha LINT
for unknown events.
A number of modules return EINVAL in this instance, and I have left
those alone for now and instead taught MOD_QUIESCE to accept this
as "didn't do anything".
Add copyiniov() which copies a struct iovec array in from userland into
a malloc'ed struct iovec. Caller frees.
Change uiofromiov() to malloc the uio (caller frees) and name it
copyinuio() which is more appropriate.
Add cloneuio() which returns a malloc'ed copy. Caller frees.
Use them throughout.
The big lines are:
NODEV -> NULL
NOUDEV -> NODEV
udev_t -> dev_t
udev2dev() -> findcdev()
Various minor adjustments including handling of userland access to kernel
space struct cdev etc.
SOCK_LOCK(so):
- Hold socket lock over calls to MAC entry points reading or
manipulating socket labels.
- Assert socket lock in MAC entry point implementations.
- When externalizing the socket label, first make a thread-local
copy while holding the socket lock, then release the socket lock
to externalize to userspace.
options, status pointer and rusage pointer as arguments. It is up to
the caller to copyout the status and rusage to userland if needed. This
lets us axe the 'compat' argument and hide all that functionality in
owait(), by the way. This also cleans up some locking in kern_wait()
since it no longer has to drop locks around copyout() since all the
copyout()'s are deferred.
- Convert owait(), wait4(), and the various ABI compat wait() syscalls to
use kern_wait() rather than wait1() or wait4(). This removes a bit
more stackgap usage.
Tested on: i386
Compiled on: i386, alpha, amd64
nodes, or if they did, they're now locked away on the Kurt Gdel
memorial home for the numerically confused:
Don't cast a kernel pointer (from makedev(9)) to an integer (maj+minor combo).
- struct plimit includes a mutex to protect a reference count. The plimit
structure is treated similarly to struct ucred in that is is always copy
on write, so having a reference to a structure is sufficient to read from
it without needing a further lock.
- The proc lock protects the p_limit pointer and must be held while reading
limits from a process to keep the limit structure from changing out from
under you while reading from it.
- Various global limits that are ints are not protected by a lock since
int writes are atomic on all the archs we support and thus a lock
wouldn't buy us anything.
- All accesses to individual resource limits from a process are abstracted
behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return
either an rlimit, or the current or max individual limit of the specified
resource from a process.
- dosetrlimit() was renamed to kern_setrlimit() to match existing style of
other similar syscall helper functions.
- The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit()
(it didn't used the stackgap when it should have) but uses lim_rlimit()
and kern_setrlimit() instead.
- The svr4 compat no longer uses the stackgap for resource limits calls,
but uses lim_rlimit() and kern_setrlimit() instead.
- The ibcs2 compat no longer uses the stackgap for resource limits. It
also no longer uses the stackgap for accessing sysctl's for the
ibcs2_sysconf() syscall but uses kernel_sysctl() instead. As a result,
ibcs2_sysconf() no longer needs Giant.
- The p_rlimit macro no longer exists.
Submitted by: mtm (mostly, I only did a few cleanups and catchups)
Tested on: i386
Compiled on: alpha, amd64
Presumably, at some point, you had to include jail.h if you included
proc.h, but that is no longer required.
Result of: self injury involving adding something to struct prison
purpose and the resulting vattr structure was ignored. In addition,
the VOP_GETATTR call was made with no vnode lock held, resulting in
vnode locking violation panic with debug kernels.
Reported by: truckman
Approved by: re@ (rwatson)
- Return NULL instead of returning memory outside of the stackgap
in stackgap_alloc() (FreeBSD-SA-00:42.linux)
- Check for stackgap_alloc() returning NULL in svr4_emul_find(),
and clean_pipe().
- Avoid integer overflow on large nfds argument in svr4_sys_poll()
- Reject negative nbytes argument in svr4_sys_getdents()
- Don't copy out past the end of the struct componentname
pathname buffer in svr4_sys_resolvepath()
- Reject out-of-range signal numbers in svr4_sys_sigaction(),
svr4_sys_signal(), and svr4_sys_kill().
- Don't malloc() user-specified lengths in show_ioc() and
show_strbuf(), place arbitrary limits instead.
- Range-check lengths in si_listen(), ti_getinfo(), ti_bind(),
svr4_do_putmsg(), svr4_do_getmsg(), svr4_stream_ti_ioctl().
Some fixes obtain from OpenBSD.
systems where the data/stack/etc limits are too big for a 32 bit process.
Move the 5 or so identical instances of ELF_RTLD_ADDR() into imgact_elf.c.
Supply an ia32_fixlimits function. Export the clip/default values to
sysctl under the compat.ia32 heirarchy.
Have mmap(0, ...) respect the current p->p_limits[RLIMIT_DATA].rlim_max
value rather than the sysctl tweakable variable. This allows mmap to
place mappings at sensible locations when limits have been reduced.
Have the imgact_elf.c ld-elf.so.1 placement algorithm use the same
method as mmap(0, ...) now does.
Note that we cannot remove all references to the sysctl tweakable
maxdsiz etc variables because /etc/login.conf specifies a datasize
of 'unlimited'. And that causes exec etc to fail since it can no
longer find space to mmap things.
Several of the subtypes have an associated vnode which is used for
stuff like the f*() functions.
By giving the vnode a speparate field, a number of checks for the specific
subtype can be replaced simply with a check for f_vnode != NULL, and
we can later free f_data up to subtype specific use.
At this point in time, f_data still points to the vnode, so any code I
might have overlooked will still work.
- Move struct sigacts out of the u-area and malloc() it using the
M_SUBPROC malloc bucket.
- Add a small sigacts_*() API for managing sigacts structures: sigacts_alloc(),
sigacts_free(), sigacts_copy(), sigacts_share(), and sigacts_shared().
- Remove the p_sigignore, p_sigacts, and p_sigcatch macros.
- Add a mutex to struct sigacts that protects all the members of the struct.
- Add sigacts locking.
- Remove Giant from nosys(), kill(), killpg(), and kern_sigaction() now
that sigacts is locked.
- Several in-kernel functions such as psignal(), tdsignal(), trapsignal(),
and thread_stopped() are now MP safe.
Reviewed by: arch@
Approved by: re (rwatson)
kern_sigprocmask() in the various binary compatibility emulators.
- Replace calls to sigsuspend(), sigaltstack(), sigaction(), and
sigprocmask() that used the stackgap with calls to the corresponding
kern_sig*() functions instead without using the stackgap.