record queue, so move the offset field from the per-record
audit_pipe_entry structure to the audit_pipe structure.
Now that we support reading more than one record at a time, add a
new summary field to audit_pipe, ap_qbyteslen, which tracks the
total number of bytes present in a pipe, and return that (minus
the current offset) via FIONREAD and kqueue's data variable for
the pending byte count rather than the number of bytes remaining
in only the first record.
Add a number of asserts to confirm that these counts and offsets
following the expected rules.
MFC after: 2 months
Sponsored by: Apple, Inc.
read(2), which meant that records longer than the buffer passed to read(2)
were dropped. Instead take the approach of allowing partial reads to be
continued across multiple system calls more in the style of streaming
character device.
This means retaining a record on the per-pipe queue in a partially read
state, so maintain a current offset into the record. Keep the record on
the queue during a read, so add a new lock, ap_sx, to serialize removal
of records from the queue by either read(2) or ioctl(2) requesting a pipe
flush. Modify the kqueue handler to return bytes left in the current
record rather than simply the size of the current record.
It is now possible to use praudit, which used the standard FILE * buffer
sizes, to track much larger record sizes from /dev/auditpipe, such as
very long command lines to execve(2).
MFC after: 2 months
Sponsored by: Apple, Inc.
pipe has overflowed, drop the newest, rather than oldest, record. This
makes overflow drop behavior consistent with memory allocation failure
leading to drop, avoids touching the consumer end of the queue from a
producer, and lowers the CPU overhead of dropping a record by dropping
before memory allocation and copying.
Obtained from: Apple, Inc.
MFC after: 2 months
protecting the list of audit pipes, and a per-pipe mutex protecting the
queue.
Likewise, replace the single global condition variable used to signal
delivery of a record to one or more pipes, and add a per-pipe condition
variable to avoid spurious wakeups when event subscriptions differ
across multiple pipes.
This slightly increases the cost of delivering to audit pipes, but should
reduce lock contention in the presence of multiple readers as only the
per-pipe lock is required to read from a pipe, as well as avoid
overheading when different pipes are used in different ways.
MFC after: 2 months
Sponsored by: Apple, Inc.
mutex, as it's rarely changed but frequently accessed read-only from
multiple threads, so a potentially significant source of contention.
MFC after: 1 month
Sponsored by: Apple, Inc.
access control checks in mac_bsdextended are not in the same
namespace as the MBI_ flags used in ugidfw policies, so add an
explicit conversion routine to get from one to the other.
Obtained from: TrustedBSD Project
to add more V* constants, and the variables changed by this patch were often
being assigned to mode_t variables, which is 16 bit.
Approved by: rwatson (mentor)
memory mappings when the MAC label on a process changes, to
mac_proc_vm_revoke(),
It now also acquires its own credential reference directly from the
affected process rather than accepting one passed by the the caller,
simplifying the API and consumer code.
Obtained from: TrustedBSD Project
that they operate directly on credentials: mac_proc_create_swapper(),
mac_proc_create_init(), and mac_proc_associate_nfsd(). Update policies.
Obtained from: TrustedBSD Project
be a no-op request, and why this might have to change if we want to allow
leaving a partition someday.
Obtained from: TrustedBSD Project
MFC after: 3 days
control logic and policy registration remaining in that file, and access
control checks broken out into other files by class of check.
Obtained from: TrustedBSD Project
fragment reassembly queues.
This allows policies to label reassembly queues, perform access
control checks when matching fragments to a queue, update a queue
label when fragments are matched, and label the resulting
reassembled datagram.
Obtained from: TrustedBSD Project
When I changed kern_conf.c three months ago I made device unit numbers
equal to (unneeded) device minor numbers. We used to require
bitshifting, because there were eight bits in the middle that were
reserved for a device major number. Not very long after I turned
dev2unit(), minor(), unit2minor() and minor2unit() into macro's.
The unit2minor() and minor2unit() macro's were no-ops.
We'd better not remove these four macro's from the kernel, because there
is a lot of (external) code that may still depend on them. For now it's
harmless to remove all invocations of unit2minor() and minor2unit().
Reviewed by: kib
years by the priv_check(9) interface and just very few places are left.
Note that compatibility stub with older FreeBSD version
(all above the 8 limit though) are left in order to reduce diffs against
old versions. It is responsibility of the maintainers for any module, if
they think it is the case, to axe out such cases.
This patch breaks KPI so __FreeBSD_version will be bumped into a later
commit.
This patch needs to be credited 50-50 with rwatson@ as he found time to
explain me how the priv_check() works in detail and to review patches.
Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
Reviewed by: rwatson
appropriate even if Solaris doesn't document it (E2BIG) or use it
(EOVERFLOW).
Submitted by: nectar at apple dot com
Sponsored by: Apple, Inc.
MFC after: 3 days
(1) Abstract interpreter vnode labeling in execve(2) and mac_execve(2)
so that the general exec code isn't aware of the details of
allocating, copying, and freeing labels, rather, simply passes in
a void pointer to start and stop functions that will be used by
the framework. This change will be MFC'd.
(2) Introduce a new flags field to the MAC_POLICY_SET(9) interface
allowing policies to declare which types of objects require label
allocation, initialization, and destruction, and define a set of
flags covering various supported object types (MPC_OBJECT_PROC,
MPC_OBJECT_VNODE, MPC_OBJECT_INPCB, ...). This change reduces the
overhead of compiling the MAC Framework into the kernel if policies
aren't loaded, or if policies require labels on only a small number
or even no object types. Each time a policy is loaded or unloaded,
we recalculate a mask of labeled object types across all policies
present in the system. Eliminate MAC_ALWAYS_LABEL_MBUF option as it
is no longer required.
MFC after: 1 week ((1) only)
Reviewed by: csjp
Obtained from: TrustedBSD Project
Sponsored by: Apple, Inc.
space provided by its argument structure, return EOVERFLOW instead of
E2BIG. The latter is documented in Solaris's man page, but the
former is implemented. In either case, the caller should use
getaudit_addr(2) to return the IPv6 address.
Submitted by: sson
Obtained from: Apple, Inc.
MFC after: 3 days
It is possible that the audit pipe(s) have different preselection configs
then the global preselection mask.
Spotted by: Vincenzo Iozzo
MFC after: 2 weeks
return success if the passed vnode pointer is NULL (rather than
panicking). This can occur if either audit or accounting are
disabled while the policy is running.
Since the swapoff control has no real relevance to this policy,
which is concerned about intent to write rather than water under the
bridge, remove it.
PR: kern/126100
Reported by: Alan Amesbury <amesbury at umn dot edu>
MFC after: 3 days
processes are not producing absolute pathname tokens. It is required
that audited pathnames are generated relative to the global root mount
point. This modification changes our implementation of audit_canon_path(9)
and introduces a new function: vn_fullpath_global(9) which performs a
vnode -> pathname translation relative to the global mount point based
on the contents of the name cache. Much like vn_fullpath,
vn_fullpath_global is a wrapper function which called vn_fullpath1.
Further, the string parsing routines have been converted to use the
sbuf(9) framework. This change also removes the conditional acquisition
of Giant, since the vn_fullpath1 method will not dip into file system
dependent code.
The vnode locking was modified to use vhold()/vdrop() instead the vref()
and vrele(). This will modify the hold count instead of modifying the
user count. This makes more sense since it's the kernel that requires
the reference to the vnode. This also makes sure that the vnode does not
get recycled we hold the reference to it. [1]
Discussed with: rwatson
Reviewed by: kib [1]
MFC after: 2 weeks
semaphores. Specifically, semaphores are now represented as new file
descriptor type that is set to close on exec. This removes the need for
all of the manual process reference counting (and fork, exec, and exit
event handlers) as the normal file descriptor operations handle all of
that for us nicely. It is also suggested as one possible implementation
in the spec and at least one other OS (OS X) uses this approach.
Some bugs that were fixed as a result include:
- References to a named semaphore whose name is removed still work after
the sem_unlink() operation. Prior to this patch, if a semaphore's name
was removed, valid handles from sem_open() would get EINVAL errors from
sem_getvalue(), sem_post(), etc. This fixes that.
- Unnamed semaphores created with sem_init() were not cleaned up when a
process exited or exec'd. They were only cleaned up if the process
did an explicit sem_destroy(). This could result in a leak of semaphore
objects that could never be cleaned up.
- On the other hand, if another process guessed the id (kernel pointer to
'struct ksem' of an unnamed semaphore (created via sem_init)) and had
write access to the semaphore based on UID/GID checks, then that other
process could manipulate the semaphore via sem_destroy(), sem_post(),
sem_wait(), etc.
- As part of the permission check (UID/GID), the umask of the proces
creating the semaphore was not honored. Thus if your umask denied group
read/write access but the explicit mode in the sem_init() call allowed
it, the semaphore would be readable/writable by other users in the
same group, for example. This includes access via the previous bug.
- If the module refused to unload because there were active semaphores,
then it might have deregistered one or more of the semaphore system
calls before it noticed that there was a problem. I'm not sure if
this actually happened as the order that modules are discovered by the
kernel linker depends on how the actual .ko file is linked. One can
make the order deterministic by using a single module with a mod_event
handler that explicitly registers syscalls (and deregisters during
unload after any checks). This also fixes a race where even if the
sem_module unloaded first it would have destroyed locks that the
syscalls might be trying to access if they are still executing when
they are unloaded.
XXX: By the way, deregistering system calls doesn't do any blocking
to drain any threads from the calls.
- Some minor fixes to errno values on error. For example, sem_init()
isn't documented to return ENFILE or EMFILE if we run out of semaphores
the way that sem_open() can. Instead, it should return ENOSPC in that
case.
Other changes:
- Kernel semaphores now use a hash table to manage the namespace of
named semaphores nearly in a similar fashion to the POSIX shared memory
object file descriptors. Kernel semaphores can now also have names
longer than 14 chars (up to MAXPATHLEN) and can include subdirectories
in their pathname.
- The UID/GID permission checks for access to a named semaphore are now
done via vaccess() rather than a home-rolled set of checks.
- Now that kernel semaphores have an associated file object, the various
MAC checks for POSIX semaphores accept both a file credential and an
active credential. There is also a new posixsem_check_stat() since it
is possible to fstat() a semaphore file descriptor.
- A small set of regression tests (using the ksem API directly) is present
in src/tools/regression/posixsem.
Reported by: kris (1)
Tested by: kris
Reviewed by: rwatson (lightly)
MFC after: 1 month
same as the global variable defined in ip_input.c. Instead, adopt the name
'q' as found in about 1/2 of uses in ip_input.c, preventing a collision on
the name. This is non-harmful, but means that search and replace on the
global works less well (as in the virtualization work), as well as indexing
tools.
MFC after: 1 week
Reported by: julian
Except for the case where we use the cloner library (clone_create() and
friends), there is no reason to enforce a unique device minor number
policy. There are various drivers in the source tree that allocate unr
pools and such to provide minor numbers, without using them themselves.
Because we still need to support unique device minor numbers for the
cloner library, introduce a new flag called D_NEEDMINOR. All cdevsw's
that are used in combination with the cloner library should be marked
with this flag to make the cloning work.
This means drivers can now freely use si_drv0 to store their own flags
and state, making it effectively the same as si_drv1 and si_drv2. We
still keep the minor() and dev2unit() routines around to make drivers
happy.
The NTFS code also used the minor number in its hash table. We should
not do this anymore. If the si_drv0 field would be changed, it would no
longer end up in the same list.
Approved by: philip (mentor)
which label mbufs. This leak can occur if one policy successfully allocates
label storage and subsequent allocations from other policies fail.
Spotted by: rwatson
MFC after: 1 week
than checking whether audit is enabled globally, instead check whether
the current thread has an audit record. This avoids entering the audit
code to collect argument data if auditing is enabled but the current
system call is not of interest to audit.
MFC after: 1 week
Sponsored by: Apple, Inc.
exit requires entering the audit code. The result is much the same,
but they mean different things.
MFC afer: 3 days
Submitted by: Diego Giagio <dgiagio at gmail dot com>
explicitly select write locking for all use of the inpcb mutex.
Update some pcbinfo lock assertions to assert locked rather than
write-locked, although in practice almost all uses of the pcbinfo
rwlock main exclusive, and all instances of inpcb lock acquisition
are exclusive.
This change should introduce (ideally) little functional change.
However, it lays the groundwork for significantly increased
parallelism in the TCP/IP code.
MFC after: 3 months
Tested by: kris (superset of committered patch)
after each SYSINIT() macro invocation. This makes a number of
lightweight C parsers much happier with the FreeBSD kernel
source, including cflow's prcc and lxr.
MFC after: 1 month
Discussed with: imp, rink
returns EINVAL. Right now we return 0 or success for invalid commands,
which could be quite problematic in certain conditions.
MFC after: 1 week
Discussed with: rwatson
a queue entry field, just copy out the unsigned int that is the trigger
message. In practice, auditd always requested sizeof(unsigned int), so
the extra bytes were ignored, but copying them out was not the intent.
MFC after: 1 month
global audit mutex and condition variables, with an sx lock which protects
the trail vnode and credential while in use, and is acquired by the system
call code when rotating the trail. Previously, a "message" would be sent
to the kernel audit worker, which did the rotation, but the new code is
simpler and (hopefully) less error-prone.
Obtained from: TrustedBSD Project
MFC after: 1 month
This makes sure that process tokens credentials with un-initialized
audit contexts are handled correctly. Currently, when invariants are
enabled, this change fixes a panic by ensuring that we have a valid
termid family. Also, this fixes token generation for process tokens
making sure that userspace is always getting a valid token.
This is consistent with what Solaris does when an audit context is
un-initialized.
Obtained from: TrustedBSD Project
MFC after: 1 week
relabel check for MLS rather than returning 0 directly.
This problem didn't result in a vulnerability currently as the central
implementation of ifnet relabeling also checks for UNIX privilege, and
we currently don't guarantee containment for the root user in mac_mls,
but we should be using the MLS definition of privilege as well as the
UNIX definition in anticipation of supporting root containment at some
point.
MFC after: 3 days
Submitted by: Zhouyi Zhou <zhouzhouyi at gmail dot com>
Sponsored by: Google SoC 2007
"BSM conversion requested for unknown event 43140"
It should be noted that we need to audit the fd argument for this system
call.
Obtained from: TrustedBSD Project
MFC after: 1 week
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.
KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.
Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.
Manpage and FreeBSD_version will be updated through further commits.
As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.
Tested by: Diego Sardina <siarodx at gmail dot com>,
Andrea Di Pasquale <whyx dot it at gmail dot com>
implement shm_open(2) and shm_unlink(2) in the kernel:
- Each shared memory file descriptor is associated with a swap-backed vm
object which provides the backing store. Each descriptor starts off with
a size of zero, but the size can be altered via ftruncate(2). The shared
memory file descriptors also support fstat(2). read(2), write(2),
ioctl(2), select(2), poll(2), and kevent(2) are not supported on shared
memory file descriptors.
- shm_open(2) and shm_unlink(2) are now implemented as system calls that
manage shared memory file descriptors. The virtual namespace that maps
pathnames to shared memory file descriptors is implemented as a hash
table where the hash key is generated via the 32-bit Fowler/Noll/Vo hash
of the pathname.
- As an extension, the constant 'SHM_ANON' may be specified in place of the
path argument to shm_open(2). In this case, an unnamed shared memory
file descriptor will be created similar to the IPC_PRIVATE key for
shmget(2). Note that the shared memory object can still be shared among
processes by sharing the file descriptor via fork(2) or sendmsg(2), but
it is unnamed. This effectively serves to implement the getmemfd() idea
bandied about the lists several times over the years.
- The backing store for shared memory file descriptors are garbage
collected when they are not referenced by any open file descriptors or
the shm_open(2) virtual namespace.
Submitted by: dillon, peter (previous versions)
Submitted by: rwatson (I based this on his version)
Reviewed by: alc (suggested converting getmemfd() to shm_open())
for that argument. This will allow DDB to detect the broad category of
reason why the debugger has been entered, which it can use for the
purposes of deciding which DDB script to run.
Assign approximate why values to all current consumers of the
kdb_enter() interface.
would be properly disposed of, but the global label structure for the
semaphore wouldn't be freed.
MFC after: 3 days
Reported by: tanyong <tanyong at ercist dot iscas dot ac dot cn>,
zhouzhouyi
supports the removal of hard-coded audit class constants in OpenBSM
1.0. All audit classes are now dynamically configured via the
audit_class database.
Obtained from: TrustedBSD Project
entry point, which is no longer required now that we don't support
old-style multicast tunnels. This removes the last mbuf object class
entry point that isn't init/copy/destroy.
Obtained from: TrustedBSD Project
Framework by moving from mac_mbuf_create_netlayer() to more specific
entry points for specific network services:
- mac_netinet_firewall_reply() to be used when replying to in-bound TCP
segments in pf and ipfw (etc).
- Rename mac_netinet_icmp_reply() to mac_netinet_icmp_replyinplace() and
add mac_netinet_icmp_reply(), reflecting that in some cases we overwrite
a label in place, but in others we apply the label to a new mbuf.
Obtained from: TrustedBSD Project
in the TrustedBSD MAC Framework:
- Add mac_atalk.c and add explicit entry point mac_netatalk_aarp_send()
for AARP packet labeling, rather than using a generic link layer
entry point.
- Add mac_inet6.c and add explicit entry point mac_netinet6_nd6_send()
for ND6 packet labeling, rather than using a generic link layer entry
point.
- Add expliict entry point mac_netinet_arp_send() for ARP packet
labeling, and mac_netinet_igmp_send() for IGMP packet labeling,
rather than using a generic link layer entry point.
- Remove previous genering link layer entry point,
mac_mbuf_create_linklayer() as it is no longer used.
- Add implementations of new entry points to various policies, largely
by replicating the existing link layer entry point for them; remove
old link layer entry point implementation.
- Make MAC_IFNET_LOCK(), MAC_IFNET_UNLOCK(), and mac_ifnet_mtx global
to the MAC Framework rather than static to mac_net.c as it is now
needed outside of mac_net.c.
Obtained from: TrustedBSD Project
we move towards netinet as a pseudo-object for the MAC Framework.
Rename 'mac_create_mbuf_linklayer' to 'mac_mbuf_create_linklayer' to
reflect general object-first ordering preference.
Sponsored by: SPARTA (original patches against Mac OS X)
Obtained from: TrustedBSD Project, Apple Computer
This change introduces audit_proc_coredump() which is called by coredump(9)
to create an audit record for the coredump event. When a process
dumps a core, it could be security relevant. It could be an indicator that
a stack within the process has been overflowed with an incorrectly constructed
malicious payload or a number of other events.
The record that is generated looks like this:
header,111,10,process dumped core,0,Thu Oct 25 19:36:29 2007, + 179 msec
argument,0,0xb,signal
path,/usr/home/csjp/test.core
subject,csjp,csjp,staff,csjp,staff,1101,1095,50457,10.37.129.2
return,success,1
trailer,111
- We allocate a completely new record to make sure we arent clobbering
the audit data associated with the syscall that produced the core
(assuming the core is being generated in response to SIGABRT and not
an invalid memory access).
- Shuffle around expand_name() so we can use the coredump name at the very
beginning of the coredump call. Make sure we free the storage referenced
by "name" if we need to bail out early.
- Audit both successful and failed coredump creation efforts
Obtained from: TrustedBSD Project
Reviewed by: rwatson
MFC after: 1 month
primary object type, and then by secondarily by method name. This sorts
entry points relating to particular objects, such as pipes, sockets, and
vnodes together.
Sponsored by: SPARTA (original patches against Mac OS X)
Obtained from: TrustedBSD Project, Apple Computer
from mac_vfs.c to mac_process.c to join other functions that setup up
process labels for specific purposes. Unlike the two proc create calls,
this call is intended to run after creation when a process registers as
the NFS daemon, so remains an _associate_ call..
Obtained from: TrustedBSD Project
than mac_<policy>_whatever, as this shortens the names and makes the code
a bit easier to read.
When dealing with label structures, name variables 'mb', 'ml', 'mm rather
than the longer 'mac_biba', 'mac_lomac', and 'mac_mls', likewise making
the code a little easier to read.
Obtained from: TrustedBSD Project
from Mac OS X Leopard--rationalize naming for entry points to
the following general forms:
mac_<object>_<method/action>
mac_<object>_check_<method/action>
The previous naming scheme was inconsistent and mostly
reversed from the new scheme. Also, make object types more
consistent and remove spaces from object types that contain
multiple parts ("posix_sem" -> "posixsem") to make mechanical
parsing easier. Introduce a new "netinet" object type for
certain IPv4/IPv6-related methods. Also simplify, slightly,
some entry point names.
All MAC policy modules will need to be recompiled, and modules
not updates as part of this commit will need to be modified to
conform to the new KPI.
Sponsored by: SPARTA (original patches against Mac OS X)
Obtained from: TrustedBSD Project, Apple Computer
state is stored in an extended subject token now. Make sure
that we are using the extended data. This fixes the termID
for process tokens.
Obtained from: TrustedBSD Project
Discussed with: rwatson
MFC after: 1 week
to kproc_xxx as they actually make whole processes.
Thos makes way for us to add REAL kthread_create() and friends
that actually make theads. it turns out that most of these
calls actually end up being moved back to the thread version
when it's added. but we need to make this cosmetic change first.
I'd LOVE to do this rename in 7.0 so that we can eventually MFC the
new kthread_xxx() calls.
on multiple different audit pipes. The old method used cv_signal()
which would result in only one thread being woken up after we
appended a record to it's queue. This resulted in un-timely wake-ups
when processing audit records real-time.
- Assign PSOCK priority to threads that have been sleeping on a read(2).
This is the same priority threads are woken up with when they select(2)
or poll(2). This yields fairness between various forms of sleep on
the audit pipes.
Obtained from: TrustedBSD Project
Discussed with: rwatson
MFC after: 1 week
- p_sflag was mostly protected by PROC_LOCK rather than the PROC_SLOCK or
previously the sched_lock. These bugs have existed for some time.
- Allow swapout to try each thread in a process individually and then
swapin the whole process if any of these fail. This allows us to move
most scheduler related swap flags into td_flags.
- Keep ki_sflag for backwards compat but change all in source tools to
use the new and more correct location of P_INMEM.
Reported by: pho
Reviewed by: attilio, kib
Approved by: re (kensmith)
point to mac_check_vnode_unlink(), reflecting UNIX naming conventions.
This is the first of several commits to synchronize the MAC Framework
in FreeBSD 7.0 with the MAC Framework as it will appear in Mac OS X
Leopard.
Reveiwed by: csjp, Samy Bahra <sbahra at gwu dot edu>
Submitted by: Jacques Vidrine <nectar at apple dot com>
Obtained from: Apple Computer, Inc.
Sponsored by: SPARTA, SPAWAR
Approved by: re (bmah)
previously conditionally acquired Giant based on debug.mpsafenet. As that
has now been removed, they are no longer required. Removing them
significantly simplifies error-handling in the socket layer, eliminated
quite a bit of unwinding of locking in error cases.
While here clean up the now unneeded opt_net.h, which previously was used
for the NET_WITH_GIANT kernel option. Clean up some related gotos for
consistency.
Reviewed by: bz, csjp
Tested by: kris
Approved by: re (kensmith)
vnode label for a check rather than the directory vnode label a second
time.
MFC after: 3 days
Submitted by: Zhouyi ZHOU <zhouzhouyi at FreeBSD dot org>
Reviewed by: csjp
Sponsored by: Google Summer of Code 2007
Approved by: re (bmah)
- Sort copyrights by date.
- Re-wrap, and in some cases, fix comments.
- Fix tabbing, white space, remove extra blank lines.
- Remove commented out debugging printfs.
Approved by: re (kensmith)
and replace with software-testable sysctl node (security.audit) that
can be used to detect kernel audit support.
Obtained from: TrustedBSD Project
Approved by: re (kensmith)
- In audit_bsm.c, make sure all the arguments: ARG_AUID, ARG_ASID, ARG_AMASK,
and ARG_TERMID{_ADDR} are valid before auditing their arguments. (This is done
for both setaudit and setaudit_addr.
- Audit the arguments passed to setaudit_addr(2)
- AF_INET6 does not equate to AU_IPv6. Change this in au_to_in_addr_ex() so the
audit token is created with the correct type. This fixes the processing of the
in_addr_ex token in users pace.
- Change the size of the token (as generated by the kernel) from 5*4 bytes to
4*4 bytes (the correct size of an ip6 address)
- Correct regression from ucred work which resulted in getaudit() not returning
E2BIG if the subject had an ip6 termid
- Correct slight regression in getaudit(2) which resulted in the size of a pointer
being passed instead of the size of the structure. (This resulted in invalid
auditinfo data being returned via getaudit(2))
Reviewed by: rwatson
Approved by: re@ (kensmith)
Obtained from: TrustedBSD Project
MFC after: 1 month
mpo_check_proc_setaudit_addr to be used when controlling use of
setaudit_addr(), rather than mpo_check_proc_setaudit(), which takes a
different argument type.
Reviewed by: csjp
Approved by: re (kensmith)
some cases, move to priv_check() if it was an operation on a thread and
no other flags were present.
Eliminate caller-side jail exception checking (also now-unused); jail
privilege exception code now goes solely in kern_jail.c.
We can't yet eliminate suser() due to some cases in the KAME code where
a privilege check is performed and then used in many different deferred
paths. Do, however, move those prototypes to priv.h.
Reviewed by: csjp
Obtained from: TrustedBSD Project
embedded storage in struct ucred. This allows audit state to be cached
with the thread, avoiding locking operations with each system call, and
makes it available in asynchronous execution contexts, such as deep in
the network stack or VFS.
Reviewed by: csjp
Approved by: re (kensmith)
Obtained from: TrustedBSD Project
- Use thread_lock() rather than sched_lock for per-thread scheduling
sychronization.
- Use the per-process spinlock rather than the sched_lock for per-process
scheduling synchronization.
Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
argument from being file descriptor index into the pointer to struct file:
part 2. Convert calls missed in the first big commit.
Noted by: rwatson
Pointy hat to: kib
remove associated comments.
Slip audit_file_rotate_wait assignment in audit_rotate_vnode() before
the drop of the global audit mutex.
Obtained from: TrustedBSD Project
inline it when needed already, and the symbol is also required outside of
audit.c. This silences a new gcc warning on the topic of using __inline__
instead of __inline.
MFC after: 3 days
where similar data structures exist to support devfs and the MAC
Framework, but are named differently.
Obtained from: TrustedBSD Project
Sponsored by: SPARTA, Inc.
variable name conventions for arguments passed into the framework --
for example, name network interfaces 'ifp', sockets 'so', mounts 'mp',
mbufs 'm', processes 'p', etc, wherever possible. Previously there
was significant variation in this regard.
Normalize copyright lists to ranges where sensible.
labels: the mount label (label of the mountpoint) and the fs label (label
of the file system). In practice, policies appear to only ever use one,
and the distinction is not helpful.
Combine mnt_mntlabel and mnt_fslabel into a single mnt_label, and
eliminate extra machinery required to maintain the additional label.
Update policies to reflect removal of extra entry points and label.
Obtained from: TrustedBSD Project
Sponsored by: SPARTA, Inc.
the introduction of priv(9) and MAC Framework entry points for privilege
checking/granting. These entry points exactly aligned with privileges and
provided no additional security context:
- mac_check_sysarch_ioperm()
- mac_check_kld_unload()
- mac_check_settime()
- mac_check_system_nfsd()
Add mpo_priv_check() implementations to Biba and LOMAC policies, which,
for each privilege, determine if they can be granted to processes
considered unprivileged by those two policies. These mostly, but not
entirely, align with the set of privileges granted in jails.
Obtained from: TrustedBSD Project
- Redistribute counter declarations to where they are used, rather than at
the file header, so it's more clear where we do (and don't) have
counters.
- Add many more counters, one per policy entry point, so that many
individual access controls and object life cycle events are tracked.
- Perform counter increments for label destruction explicitly in entry
point functions rather than in LABEL_DESTROY().
- Use LABEL_INIT() instead of SLOT_SET() directly in label init functions
to be symmetric with destruction.
- Align counter names more carefully with entry point names.
- More constant and variable name normalization.
Obtained from: TrustedBSD Project
- Add a more detailed comment describing the mac_test policy.
- Add COUNTER_DECL() and COUNTER_INC() macros to declare and manage
various test counters, reducing the verbosity of the test policy
quite a bit.
- Add LABEL_CHECK() macro to abbreviate normal validation of labels.
Unlike the previous check macros, this checks for a NULL label and
doesn't test NULL labels. This means that optionally passed labels
will now be handled automatically, although in the case of optional
credentials, NULL-checks are still required.
- Add LABEL_DESTROY() macro to abbreviate the handling of label
validation and tear-down.
- Add LABEL_NOTFREE() macro to abbreviate check for non-free labels.
- Normalize the names of counters, magic values.
- Remove unused policy "enabled" flag.
Obtained from: TrustedBSD Project
calls. Add MAC Framework entry points and MAC policy entry points for
audit(), auditctl(), auditon(), setaudit(), aud setauid().
MAC Framework entry points are only added for audit system calls where
additional argument context may be useful for policy decision-making; other
audit system calls without arguments may be controlled via the priv(9)
entry points.
Update various policy modules to implement audit-related checks, and in
some cases, other missing system-related checks.
Obtained from: TrustedBSD Project
Sponsored by: SPARTA, Inc.
tokens. Currently, we do not support the set{get}audit_addr(2) system
calls which allows processes like sshd to set extended or ip6
information for subject tokens.
The approach that was taken was to change the process audit state
slightly to use an extended terminal ID in the kernel. This allows
us to store both IPv4 IPv6 addresses. In the case that an IPv4 address
is in use, we convert the terminal ID from an struct auditinfo_addr to
a struct auditinfo.
If getaudit(2) is called when the subject is bound to an ip6 address,
we return E2BIG.
- Change the internal audit record to store an extended terminal ID
- Introduce ARG_TERMID_ADDR
- Change the kaudit <-> BSM conversion process so that we are using
the appropriate subject token. If the address associated with the
subject is IPv4, we use the standard subject32 token. If the subject
has an IPv6 address associated with them, we use an extended subject32
token.
- Fix a couple of endian issues where we do a couple of byte swaps when
we shouldn't be. IP addresses are already in the correct byte order,
so reading the ip6 address 4 bytes at a time and swapping them results
in in-correct address data. It should be noted that the same issue was
found in the openbsm library and it has been changed there too on the
vendor branch
- Change A_GETPINFO to use the appropriate structures
- Implement A_GETPINFO_ADDR which basically does what A_GETPINFO does,
but can also handle ip6 addresses
- Adjust get{set}audit(2) syscalls to convert the data
auditinfo <-> auditinfo_addr
- Fully implement set{get}audit_addr(2)
NOTE: This adds the ability for processes to correctly set extended subject
information. The appropriate userspace utilities still need to be updated.
MFC after: 1 month
Reviewed by: rwatson
Obtained from: TrustedBSD
and flags with an sxlock. This leads to a significant and measurable
performance improvement as a result of access to shared locking for
frequent lookup operations, reduced general overhead, and reduced overhead
in the event of contention. All of these are imported for threaded
applications where simultaneous access to a shared file descriptor array
occurs frequently. Kris has reported 2x-4x transaction rate improvements
on 8-core MySQL benchmarks; smaller improvements can be expected for many
workloads as a result of reduced overhead.
- Generally eliminate the distinction between "fast" and regular
acquisisition of the filedesc lock; the plan is that they will now all
be fast. Change all locking instances to either shared or exclusive
locks.
- Correct a bug (pointed out by kib) in fdfree() where previously msleep()
was called without the mutex held; sx_sleep() is now always called with
the sxlock held exclusively.
- Universally hold the struct file lock over changes to struct file,
rather than the filedesc lock or no lock. Always update the f_ops
field last. A further memory barrier is required here in the future
(discussed with jhb).
- Improve locking and reference management in linux_at(), which fails to
properly acquire vnode references before using vnode pointers. Annotate
improper use of vn_fullpath(), which will be replaced at a future date.
In fcntl(), we conservatively acquire an exclusive lock, even though in
some cases a shared lock may be sufficient, which should be revisited.
The dropping of the filedesc lock in fdgrowtable() is no longer required
as the sxlock can be held over the sleep operation; we should consider
removing that (pointed out by attilio).
Tested by: kris
Discussed with: jhb, kris, attilio, jeff
system calls now enter without Giant held, and then in some cases, acquire
Giant explicitly.
Remove a number of other MPSAFE annotations in the credential code and
tweak one or two other adjacent comments.
it isn't used in the access control decision. This became visible to
Coverity with the change to a function call retrieving label values.
Coverity CID: 1723
LABEL_TO_SLOT() macro used by policy modules to query and set label data
in struct label. Instead of using a union, store an intptr_t, simplifying
the API.
Update policies: in most cases this required only small tweaks to current
wrapper macros. In two cases, a single wrapper macros had to be split into
separate get and set macros.
Move struct label definition from _label.h to mac_internal.h and remove
_label.h. With this change, policies may now treat struct label * as
opaque, allowing us to change the layout of struct label without breaking
the policy module ABI. For example, we could make the maximum number of
policies with labels modifiable at boot-time rather than just at
compile-time.
Obtained from: TrustedBSD Project
Don't perform a nested include of _label.h in mac.h, as mac.h now
describes only the user API to MAC, and _label.h defines the in-kernel
representation of MAC labels.
Remove mac.h includes from policies and MAC framework components that do
not use userspace MAC API definitions.
Add _KERNEL inclusion checks to mac_internal.h and mac_policy.h, as these
are kernel-only include files
Obtained from: TrustedBSD Project
(due to an early reset or the like), remember to unlock the socket lock.
This will not occur in 7-CURRENT, but could in theory occur in 6-STABLE.
MFC after: 1 week
been introduced to the MAC framework:
mpo_associate_nfsd_label
mpo_create_mbuf_from_firewall
mpo_check_system_nfsd
mpo_check_vnode_mmap_downgrade
mpo_check_vnode_mprotect
mpo_init_syncache_label
mpo_destroy_syncache_label
mpo_init_syncache_from_inpcb
mpo_create_mbuf_from_syncache
MFC after: 2 weeks [1]
[1] The syncache related entry points will NOT be MFCed as the changes in
the syncache subsystem are not present in RELENG_6 yet.
exclusive access if there is at least one thread waiting for it to
become available. This may significantly reduce overhead by reducing
the number of unnecessary wakeups issued whenever the framework becomes
idle.
Annotate that we still signal the CV more than necessary and should
fix this.
Obtained from: TrustedBSD Project
Reviewed by: csjp
Tested by: csjp
manipulation is visible to the subject process. Remove XXX comments
suggesting this.
Convert one XXX on a difference from Darwin into a note: it's not a
bug, it's a feature.
Obtained from: TrustedBSD Project
- Replace XXX with Note: in several cases where observations are made about
future functionality rather than problems or bugs.
- Remove an XXX comment about byte order and au_to_ip() -- IP headers must
be submitted in network byte order. Add a comment to this effect.
- Mention that we don't implement select/poll for /dev/audit.
Obtained from: TrustedBSD Project
kernel<->policy ABI version. Add a comment to the definition describing
it and listing known versions. Modify MAC_POLICY_SET() to reference the
current kernel version by name rather than by number.
Staticize mac_late, which is used only in mac_framework.c.
Obtained from: TrustedBSD Project
mac_framework.c Contains basic MAC Framework functions, policy
registration, sysinits, etc.
mac_syscalls.c Contains implementations of various MAC system calls,
including ENOSYS stubs when compiling without options
MAC.
Obtained from: TrustedBSD Project
consumes and implements, as well as the location of the framework and
policy modules.
Refactor MAC Framework versioning a bit so that the current ABI version can
be exported via a read-only sysctl.
Further update comments relating to locking/synchronization.
Update copyright to take into account these and other recent changes.
Obtained from: TrustedBSD Project
Framework and security modules, to src/sys/security/mac/mac_policy.h,
completing the removal of kernel-only MAC Framework include files from
src/sys/sys. Update the MAC Framework and MAC policy modules. Delete
the old mac_policy.h.
Third party policy modules will need similar updating.
Obtained from: TrustedBSD Project
subsystems will be a property of policy modules, which may require
access control check entry points to be invoked even when not actively
enforcing (i.e., to track information flow without providing
protection).
Obtained from: TrustedBSD Project
Suggested by: Christopher dot Vance at sparta dot com
than from the slab, but don't.
Document mac_mbuf_to_label(), mac_copy_mbuf_tag().
Clean up white space/wrapping for other comments.
Obtained from: TrustedBSD Project
Exapnd comments on System V IPC labeling methods, which could use improved
consistency with respect to other object types.
Obtained from: TrustedBSD Project
the ifnet itself. The stack copy has been made while holding the mutex
protecting ifnet labels, so copying from the ifnet copy could result in
an inconsistent version being copied out.
Reported by: Todd.Miller@sparta.com
Obtained from: TrustedBSD Project
MFC after: 3 weeks
kernel. This LOR snuck in with some of the recent syncache changes. To
fix this, the inpcb handling was changed:
- Hang a MAC label off the syncache object
- When the syncache entry is initially created, we pickup the PCB lock
is held because we extract information from it while initializing the
syncache entry. While we do this, copy the MAC label associated with
the PCB and use it for the syncache entry.
- When the packet is transmitted, copy the label from the syncache entry
to the mbuf so it can be processed by security policies which analyze
mbuf labels.
This change required that the MAC framework be extended to support the
label copy operations from the PCB to the syncache entry, and then from
the syncache entry to the mbuf.
These functions really should be referencing the syncache structure instead
of the label. However, due to some of the complexities associated with
exposing this syncache structure we operate directly on it's label pointer.
This should be OK since we aren't making any access control decisions within
this code directly, we are merely allocating and copying label storage so
we can properly initialize mbuf labels for any packets the syncache code
might create.
This also has a nice side effect of caching. Prior to this change, the
PCB would be looked up/locked for each packet transmitted. Now the label
is cached at the time the syncache entry is initialized.
Submitted by: andre [1]
Discussed with: rwatson
[1] andre submitted the tcp_syncache.c changes
specific privilege names to a broad range of privileges. These may
require some future tweaking.
Sponsored by: nCircle Network Security, Inc.
Obtained from: TrustedBSD Project
Discussed on: arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
Alex Lyashkov <umka at sevcity dot net>,
Skip Ford <skip dot ford at verizon dot net>,
Antoine Brodin <antoine dot brodin at laposte dot net>
privilege for threads and credentials. Unlike the existing suser(9)
interface, priv(9) exposes a named privilege identifier to the privilege
checking code, allowing more complex policies regarding the granting of
privilege to be expressed. Two interfaces are provided, replacing the
existing suser(9) interface:
suser(td) -> priv_check(td, priv)
suser_cred(cred, flags) -> priv_check_cred(cred, priv, flags)
A comprehensive list of currently available kernel privileges may be
found in priv.h. New privileges are easily added as required, but the
comments on adding privileges found in priv.h and priv(9) should be read
before doing so.
The new privilege interface exposed sufficient information to the
privilege checking routine that it will now be possible for jail to
determine whether a particular privilege is granted in the check routine,
rather than relying on hints from the calling context via the
SUSER_ALLOWJAIL flag. For now, the flag is maintained, but a new jail
check function, prison_priv_check(), is exposed from kern_jail.c and used
by the privilege check routine to determine if the privilege is permitted
in jail. As a result, a centralized list of privileges permitted in jail
is now present in kern_jail.c.
The MAC Framework is now also able to instrument privilege checks, both
to deny privileges otherwise granted (mac_priv_check()), and to grant
privileges otherwise denied (mac_priv_grant()), permitting MAC Policy
modules to implement privilege models, as well as control a much broader
range of system behavior in order to constrain processes running with
root privilege.
The suser() and suser_cred() functions remain implemented, now in terms
of priv_check() and the PRIV_ROOT privilege, for use during the transition
and possibly continuing use by third party kernel modules that have not
been updated. The PRIV_DRIVER privilege exists to allow device drivers to
check privilege without adopting a more specific privilege identifier.
This change does not modify the actual security policy, rather, it
modifies the interface for privilege checks so changes to the security
policy become more feasible.
Sponsored by: nCircle Network Security, Inc.
Obtained from: TrustedBSD Project
Discussed on: arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
Alex Lyashkov <umka at sevcity dot net>,
Skip Ford <skip dot ford at verizon dot net>,
Antoine Brodin <antoine dot brodin at laposte dot net>
sockaddr_storage. This structure is defined in RFC 2553 and is a more
semantically correct structure for holding IP and IP6 sockaddr information.
struct sockaddr is not big enough to hold all the required information for
IP6, resulting in truncated addresses et al when auditing IP6 sockaddr
information.
We also need to assume that the sa->sa_len has been validated before the call to
audit_arg_sockaddr() is made, otherwise it could result in a buffer overflow.
This is being done to accommodate auditing of network related arguments (like
connect, bind et al) that will be added soon.
Discussed with: rwatson
Obtained from: TrustedBSD Project
MFC after: 2 weeks
begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now
contains the userspace and user<->kernel API and definitions, with all
in-kernel interfaces moved to mac_framework.h, which is now included
across most of the kernel instead.
This change is the first step in a larger cleanup and sweep of MAC
Framework interfaces in the kernel, and will not be MFC'd.
Obtained from: TrustedBSD Project
Sponsored by: SPARTA
not trust jails enough to execute audit related system calls. An example of
this is with su(1), or login(1) within prisons. So, if the syscall request
comes from a jail return ENOSYS. This will cause these utilities to operate
as if audit is not present in the kernel.
Looking forward, this problem will be remedied by allowing non privileged
users to maintain and their own audit streams, but the details on exactly how
this will be implemented needs to be worked out.
This change should fix situations when options AUDIT has been compiled into
the kernel, and utilities like su(1), or login(1) fail due to audit system
call failures within jails.
This is a RELENG_6 candidate.
Reported by: Christian Brueffer
Discussed with: rwatson
MFC after: 3 days
written to the audit trail file:
- audit_record_write() now returns void, and all file system specific
error handling occurs inside this function. This pushes error handling
complexity out of the record demux routine that hands off to both the
trail and audit pipes, and makes trail behavior more consistent with
pipes as a record destination.
- Rate limit kernel printfs associated with running low on space. Rate
limit audit triggers for low space. Rate limit printfs for fail stop
events. Rate limit audit worker write error printfs.
- Document in detail the types of limits and space checks we perform, and
combine common cases.
This improves the audit subsystems tolerance to low space conditions by
avoiding toasting the console with printfs are waking up the audit daemon
continuously.
MFC after: 3 days
Obtained from: TrustedBSD Project
other problems while labels were first being added to various kernel
objects. They have outlived their usefulness.
MFC after: 1 month
Suggested by: Christopher dot Vance at SPARTA dot com
Obtained from: TrustedBSD Project
when allocating the record in the first place, allocate the final buffer
when closing the BSM record. At that point, more size information is
available, so a sufficiently large buffer can be allocated.
This allows the kernel to generate audit records in excess of
MAXAUDITDATA bytes, but is consistent with Solaris's behavior. This only
comes up when auditing command line arguments, in which case we presume
the administrator really does want the data as they have specified the
policy flag to gather them.
Obtained from: TrustedBSD Project
MFC after: 3 days
with other commonly used sysctl name spaces, rather than declaring them
all over the place.
MFC after: 1 month
Sponsored by: nCircle Network Security, Inc.
audit pipes. If the kernel record was not selected for the trail or the pipe,
any user supplied record attached to it would be tossed away, resulting in
otherwise selected events being lost.
- Introduce two new masks: AR_PRESELECT_USER_TRAIL AR_PRESELECT_USER_PIPE,
currently we have AR_PRESELECT_TRAIL and AR_PRESELECT_PIPE, which tells
the audit worker that we are interested in the kernel record, with
the additional masks we can determine if either the pipe or trail is
interested in seeing the kernel or user record.
- In audit(2), we unconditionally set the AR_PRESELECT_USER_TRAIL and
AR_PRESELECT_USER_PIPE masks under the assumption that userspace has
done the preselection [1].
Currently, there is work being done that allows the kernel to parse and
preselect user supplied records, so in the future preselection could occur
in either layer. But there is still a few details to work out here.
[1] At some point we need to teach au_preselect(3) about the interests of
all the individual audit pipes.
This is a RELENG_6 candidate.
Reviewed by: rwatson
Obtained from: TrustedBSD Project
MFC after: 1 week
exists to allow the mandatory access control policy to properly initialize
mbufs generated by the firewall. An example where this might happen is keep
alive packets, or ICMP error packets in response to other packets.
This takes care of kernel panics associated with un-initialize mbuf labels
when the firewall generates packets.
[1] I modified this patch from it's original version, the initial patch
introduced a number of entry points which were programmatically
equivalent. So I introduced only one. Instead, we should leverage
mac_create_mbuf_netlayer() which is used for similar situations,
an example being icmp_error()
This will minimize the impact associated with the MFC
Submitted by: mlaier [1]
MFC after: 1 week
This is a RELENG_6 candidate
Add the argument auditing functions for argv and env.
Add kernel-specific versions of the tokenizer functions for the
arg and env represented as a char array.
Implement the AUDIT_ARGV and AUDIT_ARGE audit policy commands to
enable/disable argv/env auditing.
Call the argument auditing from the exec system calls.
Obtained from: TrustedBSD Project
Approved by: rwatson (mentor)
is loaded. This problem stems from the fact that the policy is not properly
initializing the mac label associated with the NFS daemon.
Obtained from: TrustedBSD Project
Discussed with: rwatson
audit record size at run-time, which can be used by the user
process to size the user space buffer it reads into from the audit
pipe.
Perforce change: 105098
Obtained from: TrustedBSD Project
progress the kernel audit code in CVS is considered authoritative.
This will ease $P4$-related merging issues during the CVS loopback.
Obtained from: TrustedBSD Project