proctree lock and the process lock are held when updating p_pptr and
p_oppid. When we are just reaading p_pptr we only need the proc lock and
not a proctree lock as well.
user space. It has already been copied in and mp->mnt_stat.f_mntonname has
already been initialised by the caller.
This fixes a panic on the alpha caused by the fact that the variable
'size' wasn't initialised because the call to copyinstr() bailed out with
an EFAULT error.
credential structure, ucred (cr->cr_prison).
o Allow jail inheritence to be a function of credential inheritence.
o Abstract prison structure reference counting behind pr_hold() and
pr_free(), invoked by the similarly named credential reference
management functions, removing this code from per-ABI fork/exit code.
o Modify various jail() functions to use struct ucred arguments instead
of struct proc arguments.
o Introduce jailed() function to determine if a credential is jailed,
rather than directly checking pointers all over the place.
o Convert PRISON_CHECK() macro to prison_check() function.
o Move jail() function prototypes to jail.h.
o Emulate the P_JAILED flag in fill_kinfo_proc() and no longer set the
flag in the process flags field itself.
o Eliminate that "const" qualifier from suser/p_can/etc to reflect
mutex use.
Notes:
o Some further cleanup of the linux/jail code is still required.
o It's now possible to consider resolving some of the process vs
credential based permission checking confusion in the socket code.
o Mutex protection of struct prison is still not present, and is
required to protect the reference count plus some fields in the
structure.
Reviewed by: freebsd-arch
Obtained from: TrustedBSD Project
mtx_enter(lock, type) becomes:
mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)
similarily, for releasing a lock, we now have:
mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.
The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.
Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:
MTX_QUIET and MTX_NOSWITCH
The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:
mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.
Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.
Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.
Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.
Finally, caught up to the interface changes in all sys code.
Contributors: jake, jhb, jasone (in no particular order)
of explicit calls to lockmgr. Also provides macros for the flags
pased to specify shared, exclusive or release which map to the
lockmgr flags. This is so that the use of lockmgr can be easily
replaced with optimized reader-writer locks.
- Add some locking that I missed the first time.
rather than implementing its own {uid,gid,other} checks against vnode
mode. Similar change to linprocfs currently under review.
Obtained from: TrustedBSD Project
int p_can(p1, p2, operation, privused)
which allows specification of subject process, object process,
inter-process operation, and an optional call-by-reference privused
flag, allowing the caller to determine if privilege was required
for the call to succeed. This allows jail, kern.ps_showallprocs and
regular credential-based interaction checks to occur in one block of
code. Possible operations are P_CAN_SEE, P_CAN_SCHED, P_CAN_KILL,
and P_CAN_DEBUG. p_can currently breaks out as a wrapper to a
series of static function checks in kern_prot, which should not
be invoked directly.
o Commented out capabilities entries are included for some checks.
o Update most inter-process authorization to make use of p_can() instead
of manual checks, PRISON_CHECK(), P_TRESPASS(), and
kern.ps_showallprocs.
o Modify suser{,_xxx} to use const arguments, as it no longer modifies
process flags due to the disabling of ASU.
o Modify some checks/errors in procfs so that ENOENT is returned instead
of ESRCH, further improving concealment of processes that should not
be visible to other processes. Also introduce new access checks to
improve hiding of processes for procfs_lookup(), procfs_getattr(),
procfs_readdir(). Correct a bug reported by bp concerning not
handling the CREATE case in procfs_lookup(). Remove volatile flag in
procfs that caused apparently spurious qualifier warnigns (approved by
bde).
o Add comment noting that ktrace() has not been updated, as its access
control checks are different from ptrace(), whereas they should
probably be the same. Further discussion should happen on this topic.
Reviewed by: bde, green, phk, freebsd-security, others
Approved by: bde
Obtained from: TrustedBSD Project
There's no excuse to have code in synthetic filestores that allows direct
references to the textvp anymore.
Feature requested by: msmith
Feature agreed to by: warner
Move requested by: phk
Move agreed to by: bde
is an application space macro and the applications are supposed to be free
to use it as they please (but cannot). This is consistant with the other
BSD's who made this change quite some time ago. More commits to come.
maps onto the upages. We used to use this extensively, particularly
for ps and gdb. Both of these have been "fixed". ps gets the p_stats
via eproc along with all the other stats, and gdb uses the regs, fpregs
etc files.
Once apon a time the UPAGES were mapped here, but that changed back
in January '96. This essentially kills my revisions 1.16 and 1.17.
The 2-page "hole" above the stack can be reclaimed now.
p_trespass(struct proc *p1, struct proc *p2)
which returns zero or an errno depending on the legality of p1 trespassing
on p2.
Replace kern_sig.c:CANSIGNAL() with call to p_trespass() and one
extra signal related check.
Replace procfs.h:CHECKIO() macros with calls to p_trespass().
Only show command lines to process which can trespass on the target
process.
file object. Also explain some possible directions to re-implement it --
I'm not sure it should be, given the minimal application use. (Other
than having the debugger automatically access the symbols for a process,
the main use I'd found was with some minor accounting ability, but _that_
depends on it being in the filesystem space; an ioctl access method would
be useless in that case.)
This is a code-less change; only a comment has been added.
continue doing it despite objections by me (the principal author).
Note that this doesn't fix the real problem -- the real problem is generally
bad setup by ignorant users, and education is the right way to fix it.
So while this doesn't actually solve the prolem mentioned in the complaint
(since it's still possible to do it via other methods, although they mostly
involve a bit more complicity), and there are better methods to do this,
nobody was willing or able to provide me with a real world example that
couldn't be worked around using the existing permissions and group
mechanism. And therefore, security by removing features is the method of
the day.
I only had three applications that used it, in any event. One of them would
have made debugging easier, but I still haven't finished it, and won't
now, so it doesn't really matter.
Merge the contents (less some trivial bordering the silly comments)
of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts
the #defines for the vm_inherit_t and vm_prot_t types next to their
typedefs.
This paves the road for the commit to follow shortly: change
useracc() to use VM_PROT_{READ|WRITE} rather than B_{READ|WRITE}
as argument.
-----------------------------
The core of the signalling code has been rewritten to operate
on the new sigset_t. No methodological changes have been made.
Most references to a sigset_t object are through macros (see
signalvar.h) to create a level of abstraction and to provide
a basis for further improvements.
The NSIG constant has not been changed to reflect the maximum
number of signals possible. The reason is that it breaks
programs (especially shells) which assume that all signals
have a non-null name in sys_signame. See src/bin/sh/trap.c
for an example. Instead _SIG_MAXSIG has been introduced to
hold the maximum signal possible with the new sigset_t.
struct sigprop has been moved from signalvar.h to kern_sig.c
because a) it is only used there, and b) access must be done
though function sigprop(). The latter because the table doesn't
holds properties for all signals, but only for the first NSIG
signals.
signal.h has been reorganized to make reading easier and to
add the new and/or modified structures. The "old" structures
are moved to signalvar.h to prevent namespace polution.
Especially the coda filesystem suffers from the change, because
it contained lines like (p->p_sigmask == SIGIO), which is easy
to do for integral types, but not for compound types.
NOTE: kdump (and port linux_kdump) must be recompiled.
Thanks to Garrett Wollman and Daniel Eischen for pressing the
importance of changing sigreturn as well.
reasonable defaults.
This avoids confusing and ugly casting to eopnotsupp or making dummy functions.
Bogus casting of filesystem sysctls to eopnotsupp() have been removed.
This should make *_vfsops.c more readable and reduce bloat.
Reviewed by: msmith, eivind
Approved by: phk
Tested by: Jeroen Ruigrok/Asmodai <asmodai@wxs.nl>