Commit Graph

864 Commits

Author SHA1 Message Date
Pawel Jakub Dawidek
7008be5bd7 Change the cap_rights_t type from uint64_t to a structure that we can extend
in the future in a backward compatible (API and ABI) way.

The cap_rights_t represents capability rights. We used to use one bit to
represent one right, but we are running out of spare bits. Currently the new
structure provides place for 114 rights (so 50 more than the previous
cap_rights_t), but it is possible to grow the structure to hold at least 285
rights, although we can make it even larger if 285 rights won't be enough.

The structure definition looks like this:

	struct cap_rights {
		uint64_t	cr_rights[CAP_RIGHTS_VERSION + 2];
	};

The initial CAP_RIGHTS_VERSION is 0.

The top two bits in the first element of the cr_rights[] array contain total
number of elements in the array - 2. This means if those two bits are equal to
0, we have 2 array elements.

The top two bits in all remaining array elements should be 0.
The next five bits in all array elements contain array index. Only one bit is
used and bit position in this five-bits range defines array index. This means
there can be at most five array elements in the future.

To define new right the CAPRIGHT() macro must be used. The macro takes two
arguments - an array index and a bit to set, eg.

	#define	CAP_PDKILL	CAPRIGHT(1, 0x0000000000000800ULL)

We still support aliases that combine few rights, but the rights have to belong
to the same array element, eg:

	#define	CAP_LOOKUP	CAPRIGHT(0, 0x0000000000000400ULL)
	#define	CAP_FCHMOD	CAPRIGHT(0, 0x0000000000002000ULL)

	#define	CAP_FCHMODAT	(CAP_FCHMOD | CAP_LOOKUP)

There is new API to manage the new cap_rights_t structure:

	cap_rights_t *cap_rights_init(cap_rights_t *rights, ...);
	void cap_rights_set(cap_rights_t *rights, ...);
	void cap_rights_clear(cap_rights_t *rights, ...);
	bool cap_rights_is_set(const cap_rights_t *rights, ...);

	bool cap_rights_is_valid(const cap_rights_t *rights);
	void cap_rights_merge(cap_rights_t *dst, const cap_rights_t *src);
	void cap_rights_remove(cap_rights_t *dst, const cap_rights_t *src);
	bool cap_rights_contains(const cap_rights_t *big, const cap_rights_t *little);

Capability rights to the cap_rights_init(), cap_rights_set(),
cap_rights_clear() and cap_rights_is_set() functions are provided by
separating them with commas, eg:

	cap_rights_t rights;

	cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT);

There is no need to terminate the list of rights, as those functions are
actually macros that take care of the termination, eg:

	#define	cap_rights_set(rights, ...)				\
		__cap_rights_set((rights), __VA_ARGS__, 0ULL)
	void __cap_rights_set(cap_rights_t *rights, ...);

Thanks to using one bit as an array index we can assert in those functions that
there are no two rights belonging to different array elements provided
together. For example this is illegal and will be detected, because CAP_LOOKUP
belongs to element 0 and CAP_PDKILL to element 1:

	cap_rights_init(&rights, CAP_LOOKUP | CAP_PDKILL);

Providing several rights that belongs to the same array's element this way is
correct, but is not advised. It should only be used for aliases definition.

This commit also breaks compatibility with some existing Capsicum system calls,
but I see no other way to do that. This should be fine as Capsicum is still
experimental and this change is not going to 9.x.

Sponsored by:	The FreeBSD Foundation
2013-09-05 00:09:56 +00:00
Konstantin Belousov
940cb0e2bb Implement read(2)/write(2) and neccessary lseek(2) for posix shmfd.
Add MAC framework entries for posix shm read and write.

Do not allow implicit extension of the underlying memory segment past
the limit set by ftruncate(2) by either of the syscalls.  Read and
write returns short i/o, lseek(2) fails with EINVAL when resulting
offset does not fit into the limit.

Discussed with:	alc
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2013-08-21 17:45:00 +00:00
Andriy Gapon
a01669ea96 audit_proc_coredump: check return value of audit_new
audit_new may return NULL if audit is disabled or suspended.

Sponsored by:	HybridCluster
MFC after:	7 days
2013-07-09 09:03:01 +00:00
Alan Cox
a42159f0ee Relax the vm object locking in mac_proc_vm_revoke_recurse(). A read lock
suffices in one place.

Sponsored by:	EMC / Isilon Storage Division
2013-06-04 17:23:09 +00:00
Attilio Rao
89f6b8632c Switch the vm_object mutex to be a rwlock. This will enable in the
future further optimizations where the vm_object lock will be held
in read mode most of the time the page cache resident pool of pages
are accessed for reading purposes.

The change is mostly mechanical but few notes are reported:
* The KPI changes as follow:
  - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK()
  - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK()
  - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK()
  - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED()
    (in order to avoid visibility of implementation details)
  - The read-mode operations are added:
    VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(),
    VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED()
* The vm/vm_pager.h namespace pollution avoidance (forcing requiring
  sys/mutex.h in consumers directly to cater its inlining functions
  using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h
  consumers now must include also sys/rwlock.h.
* zfs requires a quite convoluted fix to include FreeBSD rwlocks into
  the compat layer because the name clash between FreeBSD and solaris
  versions must be avoided.
  At this purpose zfs redefines the vm_object locking functions
  directly, isolating the FreeBSD components in specific compat stubs.

The KPI results heavilly broken by this commit.  Thirdy part ports must
be updated accordingly (I can think off-hand of VirtualBox, for example).

Sponsored by:	EMC / Isilon storage division
Reviewed by:	jeff
Reviewed by:	pjd (ZFS specific review)
Discussed with:	alc
Tested by:	pho
2013-03-09 02:32:23 +00:00
Pawel Jakub Dawidek
7493f24ee6 - Implement two new system calls:
int bindat(int fd, int s, const struct sockaddr *addr, socklen_t addrlen);
	int connectat(int fd, int s, const struct sockaddr *name, socklen_t namelen);

  which allow to bind and connect respectively to a UNIX domain socket with a
  path relative to the directory associated with the given file descriptor 'fd'.

- Add manual pages for the new syscalls.

- Make the new syscalls available for processes in capability mode sandbox.

- Add capability rights CAP_BINDAT and CAP_CONNECTAT that has to be present on
  the directory descriptor for the syscalls to work.

- Update audit(4) to support those two new syscalls and to handle path
  in sockaddr_un structure relative to the given directory descriptor.

- Update procstat(1) to recognize the new capability rights.

- Document the new capability rights in cap_rights_limit(2).

Sponsored by:	The FreeBSD Foundation
Discussed with:	rwatson, jilles, kib, des
2013-03-02 21:11:30 +00:00
Pawel Jakub Dawidek
2609222ab4 Merge Capsicum overhaul:
- Capability is no longer separate descriptor type. Now every descriptor
  has set of its own capability rights.

- The cap_new(2) system call is left, but it is no longer documented and
  should not be used in new code.

- The new syscall cap_rights_limit(2) should be used instead of
  cap_new(2), which limits capability rights of the given descriptor
  without creating a new one.

- The cap_getrights(2) syscall is renamed to cap_rights_get(2).

- If CAP_IOCTL capability right is present we can further reduce allowed
  ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed
  ioctls can be retrived with cap_ioctls_get(2) syscall.

- If CAP_FCNTL capability right is present we can further reduce fcntls
  that can be used with the new cap_fcntls_limit(2) syscall and retrive
  them with cap_fcntls_get(2).

- To support ioctl and fcntl white-listing the filedesc structure was
  heavly modified.

- The audit subsystem, kdump and procstat tools were updated to
  recognize new syscalls.

- Capability rights were revised and eventhough I tried hard to provide
  backward API and ABI compatibility there are some incompatible changes
  that are described in detail below:

	CAP_CREATE old behaviour:
	- Allow for openat(2)+O_CREAT.
	- Allow for linkat(2).
	- Allow for symlinkat(2).
	CAP_CREATE new behaviour:
	- Allow for openat(2)+O_CREAT.

	Added CAP_LINKAT:
	- Allow for linkat(2). ABI: Reuses CAP_RMDIR bit.
	- Allow to be target for renameat(2).

	Added CAP_SYMLINKAT:
	- Allow for symlinkat(2).

	Removed CAP_DELETE. Old behaviour:
	- Allow for unlinkat(2) when removing non-directory object.
	- Allow to be source for renameat(2).

	Removed CAP_RMDIR. Old behaviour:
	- Allow for unlinkat(2) when removing directory.

	Added CAP_RENAMEAT:
	- Required for source directory for the renameat(2) syscall.

	Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR):
	- Allow for unlinkat(2) on any object.
	- Required if target of renameat(2) exists and will be removed by this
	  call.

	Removed CAP_MAPEXEC.

	CAP_MMAP old behaviour:
	- Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and
	  PROT_WRITE.
	CAP_MMAP new behaviour:
	- Allow for mmap(2)+PROT_NONE.

	Added CAP_MMAP_R:
	- Allow for mmap(PROT_READ).
	Added CAP_MMAP_W:
	- Allow for mmap(PROT_WRITE).
	Added CAP_MMAP_X:
	- Allow for mmap(PROT_EXEC).
	Added CAP_MMAP_RW:
	- Allow for mmap(PROT_READ | PROT_WRITE).
	Added CAP_MMAP_RX:
	- Allow for mmap(PROT_READ | PROT_EXEC).
	Added CAP_MMAP_WX:
	- Allow for mmap(PROT_WRITE | PROT_EXEC).
	Added CAP_MMAP_RWX:
	- Allow for mmap(PROT_READ | PROT_WRITE | PROT_EXEC).

	Renamed CAP_MKDIR to CAP_MKDIRAT.
	Renamed CAP_MKFIFO to CAP_MKFIFOAT.
	Renamed CAP_MKNODE to CAP_MKNODEAT.

	CAP_READ old behaviour:
	- Allow pread(2).
	- Disallow read(2), readv(2) (if there is no CAP_SEEK).
	CAP_READ new behaviour:
	- Allow read(2), readv(2).
	- Disallow pread(2) (CAP_SEEK was also required).

	CAP_WRITE old behaviour:
	- Allow pwrite(2).
	- Disallow write(2), writev(2) (if there is no CAP_SEEK).
	CAP_WRITE new behaviour:
	- Allow write(2), writev(2).
	- Disallow pwrite(2) (CAP_SEEK was also required).

	Added convinient defines:

	#define	CAP_PREAD		(CAP_SEEK | CAP_READ)
	#define	CAP_PWRITE		(CAP_SEEK | CAP_WRITE)
	#define	CAP_MMAP_R		(CAP_MMAP | CAP_SEEK | CAP_READ)
	#define	CAP_MMAP_W		(CAP_MMAP | CAP_SEEK | CAP_WRITE)
	#define	CAP_MMAP_X		(CAP_MMAP | CAP_SEEK | 0x0000000000000008ULL)
	#define	CAP_MMAP_RW		(CAP_MMAP_R | CAP_MMAP_W)
	#define	CAP_MMAP_RX		(CAP_MMAP_R | CAP_MMAP_X)
	#define	CAP_MMAP_WX		(CAP_MMAP_W | CAP_MMAP_X)
	#define	CAP_MMAP_RWX		(CAP_MMAP_R | CAP_MMAP_W | CAP_MMAP_X)
	#define	CAP_RECV		CAP_READ
	#define	CAP_SEND		CAP_WRITE

	#define	CAP_SOCK_CLIENT \
		(CAP_CONNECT | CAP_GETPEERNAME | CAP_GETSOCKNAME | CAP_GETSOCKOPT | \
		 CAP_PEELOFF | CAP_RECV | CAP_SEND | CAP_SETSOCKOPT | CAP_SHUTDOWN)
	#define	CAP_SOCK_SERVER \
		(CAP_ACCEPT | CAP_BIND | CAP_GETPEERNAME | CAP_GETSOCKNAME | \
		 CAP_GETSOCKOPT | CAP_LISTEN | CAP_PEELOFF | CAP_RECV | CAP_SEND | \
		 CAP_SETSOCKOPT | CAP_SHUTDOWN)

	Added defines for backward API compatibility:

	#define	CAP_MAPEXEC		CAP_MMAP_X
	#define	CAP_DELETE		CAP_UNLINKAT
	#define	CAP_MKDIR		CAP_MKDIRAT
	#define	CAP_RMDIR		CAP_UNLINKAT
	#define	CAP_MKFIFO		CAP_MKFIFOAT
	#define	CAP_MKNOD		CAP_MKNODAT
	#define	CAP_SOCK_ALL		(CAP_SOCK_CLIENT | CAP_SOCK_SERVER)

Sponsored by:	The FreeBSD Foundation
Reviewed by:	Christoph Mallon <christoph.mallon@gmx.de>
Many aspects discussed with:	rwatson, benl, jonathan
ABI compatibility discussed with:	kib
2013-03-02 00:53:12 +00:00
Pawel Jakub Dawidek
89adaea91f Remove redundant check. 2013-02-17 11:57:47 +00:00
Pawel Jakub Dawidek
9270ed9d38 Style. 2013-02-11 22:54:23 +00:00
Pawel Jakub Dawidek
222069f454 Add AUDIT_ARG_SOCKADDR() macro so we can start using the audit_arg_sockaddr()
function, which is currently unused.

Sponsored by:	The FreeBSD Foundation
2013-02-07 00:24:23 +00:00
Christian S.J. Peron
14bc51359c Implement the zonename token for jailed processes. If
a process has an auditid/preselection masks specified, and
is jailed, include the zonename (jailname) token as a
part of the audit record.

Reviewed by:	pjd
MFC after:	2 weeks
2013-01-17 21:02:53 +00:00
Robert Watson
6f1cbda73d Four .c files from OpenBSM are used, in modified form, by the kernel to
implement the BSM audit trail format.  Rename the kernel versions of the
files to match the userspace filenames so that it's easier to work out
what they correspond to, and therefore ensure they are kept in-sync.

Obtained from:	TrustedBSD Project
2012-12-15 15:21:09 +00:00
Robert Watson
d0c2e5bd23 Merge OpenBSM 1.2-alpha2 changes from contrib/openbsm to
src/sys/{bsm,security/audit}.  There are a few tweaks to help with the
FreeBSD build environment that will be merged back to OpenBSM.  No
significant functional changes appear on the kernel side.

Obtained from:	TrustedBSD Project
Sponsored by:	The FreeBSD Foundation (auditdistd)
2012-12-01 13:46:37 +00:00
Pawel Jakub Dawidek
ceaea52f0c IFp4 @219811:
VFS is now fully MPSAFE, fix compilation.
2012-12-01 08:51:40 +00:00
Pawel Jakub Dawidek
80a044ea46 IFp4 @208452:
Audit handling for missing events:
- AUE_READLINKAT
- AUE_FACCESSAT
- AUE_MKDIRAT
- AUE_MKFIFOAT
- AUE_MKNODAT
- AUE_SYMLINKAT

Sponsored by:	FreeBSD Foundation (auditdistd)
MFC after:	2 weeks
2012-11-30 23:21:55 +00:00
Pawel Jakub Dawidek
499f0f4d55 IFp4 @208451:
Fix path handling for *at() syscalls.

Before the change directory descriptor was totally ignored,
so the relative path argument was appended to current working
directory path and not to the path provided by descriptor, thus
wrong paths were stored in audit logs.

Now that we use directory descriptor in vfs_lookup, move
AUDIT_ARG_UPATH1() and AUDIT_ARG_UPATH2() calls to the place where
we hold file descriptors table lock, so we are sure paths will
be resolved according to the same directory in audit record and
in actual operation.

Sponsored by:	FreeBSD Foundation (auditdistd)
Reviewed by:	rwatson
MFC after:	2 weeks
2012-11-30 23:18:49 +00:00
Pawel Jakub Dawidek
1d8cd15cf8 IFp4 @208383:
Currently when we discover that trail file is greater than configured
limit we send AUDIT_TRIGGER_ROTATE_KERNEL trigger to the auditd daemon
once. If for some reason auditd didn't rotate trail file it will never
be rotated.

Change it by sending the trigger when trail file size grows by the
configured limit. For example if the limit is 1MB, we will send trigger
on 1MB, 2MB, 3MB, etc.

This is also needed for the auditd change that will be committed soon
where auditd may ignore the trigger - it might be ignored if kernel
requests the trail file to be rotated too quickly (often than once a second)
which would result in overwriting previous trail file.

Sponsored by:	FreeBSD Foundation (auditdistd)
MFC after:	2 weeks
2012-11-30 23:03:51 +00:00
Pawel Jakub Dawidek
6293140411 IFp4 @208382:
Currently on each record write we call VFS_STATFS() to get available space
on the file system as well as VOP_GETATTR() to get trail file size.

We can assume that trail file is only updated by the audit worker, so instead
of asking for file size on every write, get file size on trail switch only
(it should be zero, but it's not expensive) and use global variable audit_size
protected by the audit worker lock to keep track of trail file's size.

This eliminates VOP_GETATTR() call for every write. VFS_STATFS() is satisfied
from in-memory data (mount->mnt_stat), so shouldn't be expensive.

Sponsored by:	FreeBSD Foundation (auditdistd)
MFC after:	2 weeks
2012-11-30 22:59:20 +00:00
Pawel Jakub Dawidek
9658c0582e IFp4 @208381:
For VOP_GETATTR() we just need vnode to be shared-locked.

Sponsored by:	FreeBSD Foundation (auditdistd)
MFC after:	2 weeks
2012-11-30 22:52:35 +00:00
Konstantin Belousov
5050aa86cf Remove the support for using non-mpsafe filesystem modules.
In particular, do not lock Giant conditionally when calling into the
filesystem module, remove the VFS_LOCK_GIANT() and related
macros. Stop handling buffers belonging to non-mpsafe filesystems.

The VFS_VERSION is bumped to indicate the interface change which does
not result in the interface signatures changes.

Conducted and reviewed by:	attilio
Tested by:	pho
2012-10-22 17:50:54 +00:00
Christian Brueffer
e623c22083 Check vplabel for NULL before dereferencing it. Fixes a panic
when running atop with MAC_MLS enabled.

Submitted by:	Richard Kojedzinszky <krichy@tvnetwork.hu>
Reviewed by:	rwatson
MFC after:	1 week
2012-05-03 15:51:34 +00:00
Robert Watson
b4ef8be228 When allocation of labels on files is implicitly disabled due to MAC
policy configuration, avoid leaking resources following failed calls
to get and set MAC labels by file descriptor.

Reported by:	Mateusz Guzik <mjguzik at gmail.com> + clang scan-build
MFC after:	3 days
2012-04-08 11:01:49 +00:00
Alexander V. Chernikov
e4b3229aa5 - Improve BPF locking model.
Interface locks and descriptor locks are converted from mutex(9) to rwlock(9).
This greately improves performance: in most common case we need to acquire 1
reader lock instead of 2 mutexes.

- Remove filter(descriptor) (reader) lock in bpf_mtap[2]
This was suggested by glebius@. We protect filter by requesting interface
writer lock on filter change.

- Cover struct bpf_if under BPF_INTERNAL define. This permits including bpf.h
without including rwlock stuff. However, this is is temporary solution,
struct bpf_if should be made opaque for any external caller.

Found by:       Dmitrij Tejblum <tejblum@yandex-team.ru>
Sponsored by:   Yandex LLC

Reviewed by:    glebius (previous version)
Reviewed by:    silence on -net@
Approved by:    (mentor)

MFC after:      3 weeks
2012-04-06 06:53:58 +00:00
Ed Schouten
7870adb640 Remove direct access to si_name.
Code should just use the devtoname() function to obtain the name of a
character device. Also add const keywords to pieces of code that need it
to build properly.

MFC after:	2 weeks
2012-02-10 12:35:57 +00:00
Ed Schouten
dc15eac046 Use strchr() and strrchr().
It seems strchr() and strrchr() are used more often than index() and
rindex(). Therefore, simply migrate all kernel code to use it.

For the XFS code, remove an empty line to make the code identical to
the code in the Linux kernel.
2012-01-02 12:12:10 +00:00
Attilio Rao
77befd1d23 Revert the approach for skipping lockstat_probe_func call when doing
lock_success/lock_failure, introduced in r228424, by directly skipping
in dtrace_probe.

This mainly helps in avoiding namespace pollution and thus lockstat.h
dependency by systm.h.

As an added bonus, this also helps in MFC case.
Reviewed by:	avg
MFC after:	3 months (or never)
X-MFC:		r228424
2011-12-12 23:29:32 +00:00
Andriy Gapon
7a7ce668ef put sys/systm.h at its proper place or add it if missing
Reported by:	lstewart, tinderbox
Pointyhat to:	avg, attilio
MFC after:	1 week
MFC with:	r228430
2011-12-12 10:05:13 +00:00
Ed Schouten
6472ac3d8a Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.
The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.
2011-11-07 15:43:11 +00:00
Ed Schouten
d745c852be Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs.
This means that their use is restricted to a single C file.
2011-11-07 06:44:47 +00:00
Ed Schouten
a185bd12f3 Get rid of D_PSEUDO.
It seems the D_PSEUDO flag was meant to allow make_dev() to return NULL.
Nowadays we have a different interface for that; make_dev_p(). There's
no need to keep it there.

While there, remove an unneeded D_NEEDMINOR from the gpio driver.

Discussed with:	gonzo@ (gpio)
2011-10-18 08:09:44 +00:00
Christian Brueffer
709ef347b7 Remove two dublicated assignments.
CID:		9870
Found with:	Coverity Prevent(tm)
Confirmed by:	rwatson
MFC after:	1 week
2011-10-08 09:14:18 +00:00
Kip Macy
8451d0dd78 In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by:	rwatson
Approved by:	re (bz)
2011-09-16 13:58:51 +00:00
Robert Watson
9b6dd12e5d Correct several issues in the integration of POSIX shared memory objects
and the new setmode and setowner fileops in FreeBSD 9.0:

- Add new MAC Framework entry point mac_posixshm_check_create() to allow
  MAC policies to authorise shared memory use.  Provide a stub policy and
  test policy templates.

- Add missing Biba and MLS implementations of mac_posixshm_check_setmode()
  and mac_posixshm_check_setowner().

- Add 'accmode' argument to mac_posixshm_check_open() -- unlike the
  mac_posixsem_check_open() entry point it was modeled on, the access mode
  is required as shared memory access can be read-only as well as writable;
  this isn't true of POSIX semaphores.

- Implement full range of POSIX shared memory entry points for Biba and MLS.

Sponsored by:   Google Inc.
Obtained from:	TrustedBSD Project
Approved by:    re (kib)
2011-09-02 17:40:39 +00:00
Attilio Rao
6aba400a70 Fix a deficiency in the selinfo interface:
If a selinfo object is recorded (via selrecord()) and then it is
quickly destroyed, with the waiters missing the opportunity to awake,
at the next iteration they will find the selinfo object destroyed,
causing a PF#.

That happens because the selinfo interface has no way to drain the
waiters before to destroy the registered selinfo object. Also this
race is quite rare to get in practice, because it would require a
selrecord(), a poll request by another thread and a quick destruction
of the selrecord()'ed selinfo object.

Fix this by adding the seldrain() routine which should be called
before to destroy the selinfo objects (in order to avoid such case),
and fix the present cases where it might have already been called.
Sometimes, the context is safe enough to prevent this type of race,
like it happens in device drivers which installs selinfo objects on
poll callbacks. There, the destruction of the selinfo object happens
at driver detach time, when all the filedescriptors should be already
closed, thus there cannot be a race.
For this case, mfi(4) device driver can be set as an example, as it
implements a full correct logic for preventing this from happening.

Sponsored by:	Sandvine Incorporated
Reported by:	rstone
Tested by:	pluknet
Reviewed by:	jhb, kib
Approved by:	re (bz)
MFC after:	3 weeks
2011-08-25 15:51:54 +00:00
Konstantin Belousov
9c00bb9190 Add the fo_chown and fo_chmod methods to struct fileops and use them
to implement fchown(2) and fchmod(2) support for several file types
that previously lacked it. Add MAC entries for chown/chmod done on
posix shared memory and (old) in-kernel posix semaphores.

Based on the submission by:	glebius
Reviewed by:	rwatson
Approved by:	re (bz)
2011-08-16 20:07:47 +00:00
Robert Watson
a9d2f8d84f Second-to-last commit implementing Capsicum capabilities in the FreeBSD
kernel for FreeBSD 9.0:

Add a new capability mask argument to fget(9) and friends, allowing system
call code to declare what capabilities are required when an integer file
descriptor is converted into an in-kernel struct file *.  With options
CAPABILITIES compiled into the kernel, this enforces capability
protection; without, this change is effectively a no-op.

Some cases require special handling, such as mmap(2), which must preserve
information about the maximum rights at the time of mapping in the memory
map so that they can later be enforced in mprotect(2) -- this is done by
narrowing the rights in the existing max_protection field used for similar
purposes with file permissions.

In namei(9), we assert that the code is not reached from within capability
mode, as we're not yet ready to enforce namespace capabilities there.
This will follow in a later commit.

Update two capability names: CAP_EVENT and CAP_KEVENT become
CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they
represent.

Approved by:	re (bz)
Submitted by:	jonathan
Sponsored by:	Google Inc
2011-08-11 12:30:23 +00:00
Jonathan Anderson
778b0e42a8 Provide ability to audit cap_rights_t arguments.
We wish to be able to audit capability rights arguments; this code
provides the necessary infrastructure.

This commit does not, of itself, turn on such auditing for any
system call; that should follow shortly.

Approved by: mentor (rwatson), re (Capsicum blanket)
Sponsored by: Google Inc
2011-07-18 12:58:18 +00:00
Alexander Leidinger
d783bbd2d2 - Add a FEATURE for capsicum (security_capabilities).
- Rename mac FEATURE to security_mac.

Discussed with:	rwatson
2011-03-04 09:03:54 +00:00
Robert Watson
25122f5c5f Add ECAPMODE, "Not permitted in capability mode", a new kernel errno
constant to indicate that a system call (or perhaps an operation requested
via a system call) is not permitted for a capability mode process.

Submitted by:	anderson
Sponsored by:	Google, Inc.
Obtained from:	Capsicum Project
MFC after:	1 week
2011-03-01 13:14:28 +00:00
Alexander Leidinger
de5b19526b Add some FEATURE macros for various features (AUDIT/CAM/IPC/KTR/MAC/NFS/NTP/
PMC/SYSV/...).

No FreeBSD version bump, the userland application to query the features will
be committed last and can serve as an indication of the availablility if
needed.

Sponsored by:   Google Summer of Code 2010
Submitted by:   kibab
Reviewed by:    arch@ (parts by rwatson, trasz, jhb)
X-MFC after:    to be determined in last commit with code from this project
2011-02-25 10:11:01 +00:00
Alan Cox
17f3095d1a Unless "cnt" exceeds MAX_COMMIT_COUNT, nfsrv_commit() and nfsvno_fsync() are
incorrectly calling vm_object_page_clean().  They are passing the length of
the range rather than the ending offset of the range.

Perform the OFF_TO_IDX() conversion in vm_object_page_clean() rather than the
callers.

Reviewed by:	kib
MFC after:	3 weeks
2011-02-05 21:21:27 +00:00
Matthew D Fleming
123d2cb7e9 sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly.
Commit the security directory.
2011-01-12 19:54:14 +00:00
Rebecca Cran
b1ce21c6ef Fix typos.
PR:	bin/148894
Submitted by:	olgeni
2010-11-09 10:59:09 +00:00
Robert Watson
a959b1f02c Add missing DTrace probe invocation to mac_vnode_check_open; the probe
was declared, but never used.

MFC after:	3 days
Sponsored by:	Google, Inc.
2010-10-23 16:59:39 +00:00
Matthew D Fleming
4d369413e1 Replace sbuf_overflowed() with sbuf_error(), which returns any error
code associated with overflow or with the drain function.  While this
function is not expected to be used often, it produces more information
in the form of an errno that sbuf_overflowed() did.
2010-09-10 16:42:16 +00:00
Rui Paulo
79856499bd Add an extra comment to the SDT probes definition. This allows us to get
use '-' in probe names, matching the probe names in Solaris.[1]

Add userland SDT probes definitions to sys/sdt.h.

Sponsored by:	The FreeBSD Foundation
Discussed with:	rwaston [1]
2010-08-22 11:18:57 +00:00
Christian S.J. Peron
24ffe72448 Add a case to make sure that internal audit records get converted
to BSM format for lpathconf(2) events.

MFC after:	2 weeks
2010-05-04 15:29:07 +00:00
Robert Watson
9663e34384 Update device-labeling logic for Biba, LOMAC, and MLS to recognize new-style
pts devices when various policy ptys_equal flags are enabled.

Submitted by:	Estella Mystagic <estella at mystagic.com>
MFC after:	1 week
2010-03-02 15:05:48 +00:00
Christian S.J. Peron
583450efd7 Make sure we convert audit records that were produced as the result of the
closefrom(2) syscall.
2010-01-31 22:31:01 +00:00
Brooks Davis
412f9500e2 Replace the static NGROUPS=NGROUPS_MAX+1=1024 with a dynamic
kern.ngroups+1.  kern.ngroups can range from NGROUPS_MAX=1023 to
INT_MAX-1.  Given that the Windows group limit is 1024, this range
should be sufficient for most applications.

MFC after:	1 month
2010-01-12 07:49:34 +00:00
Edward Tomasz Napierala
3a597bfc7b Make mac_lomac(4) able to interpret NFSv4 access bits.
Reviewed by:	rwatson
2010-01-03 17:19:14 +00:00
Poul-Henning Kamp
0d07b627a3 Having thrown the cat out of the house, add a necessary include. 2009-09-08 13:24:36 +00:00
Poul-Henning Kamp
6778431478 Revert previous commit and add myself to the list of people who should
know better than to commit with a cat in the area.
2009-09-08 13:19:05 +00:00
Poul-Henning Kamp
b34421bf9c Add necessary include. 2009-09-08 13:16:55 +00:00
Robert Watson
9eb3e4639a Correctly audit real gids following changes to the audit record argument
interface.

Approved by:	re (kib)
2009-08-12 10:45:45 +00:00
Robert Watson
791b0ad2bf Eliminate ARG_UPATH[12] arguments to AUDIT_ARG_UPATH() and instead
provide specific macros, AUDIT_ARG_UPATH1() and AUDIT_ARG_UPATH2()
to capture path information for audit records.  This allows us to
move the definitions of ARG_* out of the public audit header file,
as they are an implementation detail of our current kernel-internal
audit record, which may change.

Approved by:	re (kensmith)
Obtained from:	TrustedBSD Project
MFC after:	1 month
2009-07-29 07:44:43 +00:00
Robert Watson
b146fc1bf0 Rework vnode argument auditing to follow the same structure, in order
to avoid exposing ARG_ macros/flag values outside of the audit code in
order to name which one of two possible vnodes will be audited for a
system call.

Approved by:	re (kib)
Obtained from:	TrustedBSD Project
MFC after:	1 month
2009-07-28 21:52:24 +00:00
Robert Watson
e4b4bbb665 Audit file descriptors passed to fooat(2) system calls, which are used
instead of the root/current working directory as the starting point for
lookups.  Up to two such descriptors can be audited.  Add audit record
BSM encoding for fooat(2).

Note: due to an error in the OpenBSM 1.1p1 configuration file, a
further change is required to that file in order to fix openat(2)
auditing.

Approved by:	re (kib)
Reviewed by:	rdivacky (fooat(2) portions)
Obtained from:	TrustedBSD Project
MFC after:	1 month
2009-07-28 21:39:58 +00:00
Robert Watson
597df30e62 Import OpenBSM 1.1p1 from vendor branch to 8-CURRENT, populating
contrib/openbsm and a subset also imported into sys/security/audit.
This patch release addresses several minor issues:

- Fixes to AUT_SOCKUNIX token parsing.
- IPv6 support for au_to_me(3).
- Improved robustness in the parsing of audit_control, especially long
  flags/naflags strings and whitespace in all fields.
- Add missing conversion of a number of FreeBSD/Mac OS X errnos to/from BSM
  error number space.

MFC after:	3 weeks
Obtained from:	TrustedBSD Project
Sponsored by:	Apple, Inc.
Approved by:	re (kib)
2009-07-17 14:02:20 +00:00
Robert Watson
6196f898bb Create audit records for AUE_POSIX_OPENPT, currently w/o arguments.
Approved by:	re (audit argument blanket)
2009-07-02 16:33:38 +00:00
Robert Watson
deedc899fd Fix comment misthink.
Submitted by:	b. f. <bf1783 at googlemail.com>
Approved by:	re (audit argument blanket)
MFC after:	1 week
2009-07-02 09:50:13 +00:00
Robert Watson
2a5658382a Clean up a number of aspects of token generation from audit arguments to
system calls:

- Centralize generation of argument tokens for VM addresses in a macro,
  ADDR_TOKEN(), and properly encode 64-bit addresses in 64-bit arguments.
- Fix up argument numbers across a large number of syscalls so that they
  match the numeric argument into the system call.
- Don't audit the address argument to ioctl(2) or ptrace(2), but do keep
  generating tokens for mmap(2), minherit(2), since they relate to passing
  object access across execve(2).

Approved by:	re (audit argument blanket)
Obtained from:	TrustedBSD Project
MFC after:	1 week
2009-07-02 09:15:30 +00:00
Robert Watson
03f7b00438 For access(2) and eaccess(2), audit the requested access mode.
Approved by:	re (audit argument blanket)
MFC after:	3 days
2009-07-01 22:47:45 +00:00
Robert Watson
9e4c1521d5 Define missing audit argument macro AUDIT_ARG_SOCKET(), and
capture the domain, type, and protocol arguments to socket(2)
and socketpair(2).

Approved by:	re (audit argument blanket)
MFC after:	3 days
2009-07-01 18:54:49 +00:00
Robert Watson
6d5a61563a When auditing unmount(2), capture FSID arguments as regular text strings
rather than as paths, which would lead to them being treated as relative
pathnames and hence confusingly converted into absolute pathnames.

Capture flags to unmount(2) via an argument token.

Approved by:	re (audit argument blanket)
MFC after:	3 days
2009-07-01 16:56:56 +00:00
Robert Watson
422d786676 Audit the file descriptor number passed to lseek(2).
Approved by:	re (kib)
MFC after:	3 days
2009-07-01 15:37:23 +00:00
Robert Watson
2ef24dde7c udit the 'options' argument to wait4(2).
Approved by:	re (kib)
MFC after:	3 days
2009-07-01 12:36:10 +00:00
Stacey Son
86120afae4 Dynamically allocate the gidset field in audit record.
This fixes a problem created by the recent change that allows a large
number of groups per user.  The gidset field in struct kaudit_record
is now dynamically allocated to the size needed rather than statically
(using NGROUPS).

Approved by:	re@ (kensmith, rwatson), gnn (mentor)
2009-06-29 20:19:19 +00:00
Robert Watson
14961ba789 Replace AUDIT_ARG() with variable argument macros with a set more more
specific macros for each audit argument type.  This makes it easier to
follow call-graphs, especially for automated analysis tools (such as
fxr).

In MFC, we should leave the existing AUDIT_ARG() macros as they may be
used by third-party kernel modules.

Suggested by:	brooks
Approved by:	re (kib)
Obtained from:	TrustedBSD Project
MFC after:	1 week
2009-06-27 13:58:44 +00:00
Konstantin Belousov
3364c323e6 Implement global and per-uid accounting of the anonymous memory. Add
rlimit RLIMIT_SWAP that limits the amount of swap that may be reserved
for the uid.

The accounting information (charge) is associated with either map entry,
or vm object backing the entry, assuming the object is the first one
in the shadow chain and entry does not require COW. Charge is moved
from entry to object on allocation of the object, e.g. during the mmap,
assuming the object is allocated, or on the first page fault on the
entry. It moves back to the entry on forks due to COW setup.

The per-entry granularity of accounting makes the charge process fair
for processes that change uid during lifetime, and decrements charge
for proper uid when region is unmapped.

The interface of vm_pager_allocate(9) is extended by adding struct ucred *,
that is used to charge appropriate uid when allocation if performed by
kernel, e.g. md(4).

Several syscalls, among them is fork(2), may now return ENOMEM when
global or per-uid limits are enforced.

In collaboration with:	pho
Reviewed by:	alc
Approved by:	re (kensmith)
2009-06-23 20:45:22 +00:00
Ed Schouten
fbbbf5d135 Chase the removal of PRIV_TTY_PRISON in the mac(9) modules.
Reported by:	kib
Pointy hat to:	me
2009-06-20 15:54:35 +00:00
Konstantin Belousov
d8b0556c6d Adapt vfs kqfilter to the shared vnode lock used by zfs write vop. Use
vnode interlock to protect the knote fields [1]. The locking assumes
that shared vnode lock is held, thus we get exclusive access to knote
either by exclusive vnode lock protection, or by shared vnode lock +
vnode interlock.

Do not use kl_locked() method to assert either lock ownership or the
fact that curthread does not own the lock. For shared locks, ownership
is not recorded, e.g. VOP_ISLOCKED can return LK_SHARED for the shared
lock not owned by curthread, causing false positives in kqueue subsystem
assertions about knlist lock.

Remove kl_locked method from knlist lock vector, and add two separate
assertion methods kl_assert_locked and kl_assert_unlocked, that are
supposed to use proper asserts. Change knlist_init accordingly.

Add convenience function knlist_init_mtx to reduce number of arguments
for typical knlist initialization.

Submitted by:	jhb [1]
Noted by:	jhb [2]
Reviewed by:	jhb
Tested by:	rnoland
2009-06-10 20:59:32 +00:00
Robert Watson
bcf11e8d00 Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC
and used in a large number of files, but also because an increasing number
of incorrect uses of MAC calls were sneaking in due to copy-and-paste of
MAC-aware code without the associated opt_mac.h include.

Discussed with:	pjd
2009-06-05 14:55:22 +00:00
Robert Watson
3ad3d9c5ef Add one further check with mac_policy_count to an mbuf copying case
(limited to netatalk) to avoid MAC label lookup on both mbufs if no
policies are registered.

Obtained from:	TrustedBSD Project
2009-06-03 19:41:12 +00:00
Robert Watson
3de4046939 Continue work to optimize performance of "options MAC" when no MAC policy
modules are loaded by avoiding mbuf label lookups when policies aren't
loaded, pushing further socket locking into MAC policy modules, and
avoiding locking MAC ifnet locks when no policies are loaded:

- Check mac_policies_count before looking for mbuf MAC label m_tags in MAC
  Framework entry points.  We will still pay label lookup costs if MAC
  policies are present but don't require labels (typically a single mbuf
  header field read, but perhaps further indirection if IPSEC or other
  m_tag consumers are in use).

- Further push socket locking for socket-related access control checks and
  events into MAC policies from the MAC Framework, so that sockets are
  only locked if a policy specifically requires a lock to protect a label.
  This resolves lock order issues during sonewconn() and also in local
  domain socket cross-connect where multiple socket locks could not be
  held at once for the purposes of propagatig MAC labels across multiple
  sockets.  Eliminate mac_policy_count check in some entry points where it
  no longer avoids locking.

- Add mac_policy_count checking in some entry points relating to network
  interfaces that otherwise lock a global MAC ifnet lock used to protect
  ifnet labels.

Obtained from:	TrustedBSD Project
2009-06-03 18:46:28 +00:00
Robert Watson
15141acc67 By default, label all network interfaces as biba/equal on attach. This
makes it easier for first-time users to configure and work with biba as
remote acess is still allowed.  Effectively, this means that, by default,
only local security properties, not distributed ones, are enforced.

Obtained from:	TrustedBSD Project
2009-06-03 08:49:44 +00:00
Robert Watson
5f51fb4871 Mark MAC Framework sx and rm locks as NOWITNESS to suppress warnings that
might arise from WITNESS not understanding its locking protocol, which
should be deadlock-free.  Currently these warnings generally don't occur,
but as object locking is pushed into policies for some object types, they
would otherwise occur more often.

Obtained from:	TrustedBSD Project
2009-06-02 22:22:09 +00:00
Robert Watson
f93bfb23dc Add internal 'mac_policy_count' counter to the MAC Framework, which is a
count of the number of registered policies.

Rather than unconditionally locking sockets before passing them into MAC,
lock them in the MAC entry points only if mac_policy_count is non-zero.

This avoids locking overhead for a number of socket system calls when no
policies are registered, eliminating measurable overhead for the MAC
Framework for the socket subsystem when there are no active policies.

Possibly socket locks should be acquired by policies if they are required
for socket labels, which would further avoid locking overhead when there
are policies but they don't require labeling of sockets, or possibly
don't even implement socket controls.

Obtained from:	TrustedBSD Project
2009-06-02 18:26:17 +00:00
Robert Watson
1a109c1cb0 Make the rmlock(9) interface a bit more like the rwlock(9) interface:
- Add rm_init_flags() and accept extended options only for that variation.
- Add a flags space specifically for rm_init_flags(), rather than borrowing
  the lock_init() flag space.
- Define flag RM_RECURSE to use instead of LO_RECURSABLE.
- Define flag RM_NOWITNESS to allow an rmlock to be exempt from WITNESS
  checking; this wasn't possible previously as rm_init() always passed
  LO_WITNESS when initializing an rmlock's struct lock.
- Add RM_SYSINIT_FLAGS().
- Rename embedded mutex in rmlocks to make it more obvious what it is.
- Update consumers.
- Update man page.
2009-05-29 10:52:37 +00:00
Jamie Gritton
0304c73163 Add hierarchical jails. A jail may further virtualize its environment
by creating a child jail, which is visible to that jail and to any
parent jails.  Child jails may be restricted more than their parents,
but never less.  Jail names reflect this hierarchy, being MIB-style
dot-separated strings.

Every thread now points to a jail, the default being prison0, which
contains information about the physical system.  Prison0's root
directory is the same as rootvnode; its hostname is the same as the
global hostname, and its securelevel replaces the global securelevel.
Note that the variable "securelevel" has actually gone away, which
should not cause any problems for code that properly uses
securelevel_gt() and securelevel_ge().

Some jail-related permissions that were kept in global variables and
set via sysctls are now per-jail settings.  The sysctls still exist for
backward compatibility, used only by the now-deprecated jail(2) system
call.

Approved by:	bz (mentor)
2009-05-27 14:11:23 +00:00
Robert Watson
81fee06f9c Convert the MAC Framework from using rwlocks to rmlocks to stabilize
framework registration for non-sleepable entry points.

Obtained from:	TrustedBSD Project
2009-05-27 09:41:58 +00:00
Attilio Rao
dfd233edd5 Remove the thread argument from the FSD (File-System Dependent) parts of
the VFS.  Now all the VFS_* functions and relating parts don't want the
context as long as it always refers to curthread.

In some points, in particular when dealing with VOPs and functions living
in the same namespace (eg. vflush) which still need to be converted,
pass curthread explicitly in order to retain the old behaviour.
Such loose ends will be fixed ASAP.

While here fix a bug: now, UFS_EXTATTR can be compiled alone without the
UFS_EXTATTR_AUTOSTART option.

VFS KPI is heavilly changed by this commit so thirdy parts modules needs
to be recompiled.  Bump __FreeBSD_version in order to signal such
situation.
2009-05-11 15:33:26 +00:00
Robert Watson
fa76567150 Rename MAC Framework-internal macros used to invoke policy entry points:
MAC_BOOLEAN           -> MAC_POLICY_BOOLEAN
  MAC_BOOLEAN_NOSLEEP   -> MAC_POLICY_BOOLEANN_NOSLEEP
  MAC_CHECK             -> MAC_POLICY_CHECK
  MAC_CHECK_NOSLEEP     -> MAC_POLICY_CHECK_NOSLEEP
  MAC_EXTERNALIZE       -> MAC_POLICY_EXTERNALIZE
  MAC_GRANT             -> MAC_POLICY_GRANT
  MAC_GRANT_NOSLEEP     -> MAC_POLICY_GRANT_NOSLEEP
  MAC_INTERNALIZE       -> MAC_POLICY_INTERNALIZE
  MAC_PERFORM           -> MAC_POLICY_PERFORM_CHECK
  MAC_PERFORM_NOSLEEP   -> MAC_POLICY_PERFORM_NOSLEEP

This frees up those macro names for use in wrapping calls into the MAC
Framework from the remainder of the kernel.

Obtained from:	TrustedBSD Project
2009-05-01 21:05:40 +00:00
Robert Watson
2a5058a3ed Temporarily relax the constraints on argument size checking for A_GETCOND;
login(1) isn't quite ready for them yet on 64-bit systems as it continues
to use the conventions of the old version of the API.

Reported by:	stas, Jakub Lach <jakub_lach at mailplus.pl>
2009-04-19 23:28:08 +00:00
Robert Watson
4df4e33572 Merge OpenBSM 1.1 changes to the FreeBSD 8.x kernel:
- Add and use mapping of fcntl(2) commands to new BSM constant space.
- Adopt (int) rather than (long) arguments to a number of auditon(2)
  commands, as has happened in Solaris, and add compatibility code to
  handle the old comments.

Note that BSM_PF_IEEE80211 is partially but not fully removed, as the
userspace OpenBSM 1.1alpha5 code still depends on it.  Once userspace
is updated, I'll GCC the kernel constant.

MFC after:		2 weeks
Sponsored by:		Apple, Inc.
Obtained from:		TrustedBSD Project
Portions submitted by:	sson
2009-04-19 14:53:17 +00:00
Robert Watson
fe69399069 Merge new kernel files from OpenBSM 1.1: audit_fcntl.h and
audit_bsm_fcntl.c contain utility routines to map local fcntl
commands into BSM constants.  Adaptation to the FreeBSD kernel
environment will follow in a future commit.

Sponsored by:	Apple, Inc.
Obtained from:	TrustedBSD Project
MFC after:	2 weeks
2009-04-16 20:17:32 +00:00
Robert Watson
2f106d5e08 Remove D_NEEDGIANT from audit pipes. I'm actually not sure why this was
here, but isn't needed.

MFC after:	2 weeks
Sponsored by:	Apple, Inc.
2009-04-16 11:57:16 +00:00
Edward Tomasz Napierala
6180d3185d Get rid of VSTAT and replace it with VSTAT_PERMS, which is somewhat
better defined.

Approved by:	rwatson (mentor)
2009-03-29 17:45:48 +00:00
Pawel Jakub Dawidek
a3ce3b6d35 - Correct logic in if statement - we want to allocate temporary buffer
when someone is passing new rules, not when he only want to read them.
  Because of this bug, even if the given rules were incorrect, they
  ended up in rule_string.
- Add missing protection for rule_string when coping it.

Reviewed by:	rwatson
MFC after:	1 week
2009-03-14 20:40:06 +00:00
Robert Watson
4020272933 Rework MAC Framework synchronization in a number of ways in order to
improve performance:

- Eliminate custom reference count and condition variable to monitor
  threads entering the framework, as this had both significant overhead
  and behaved badly in the face of contention.

- Replace reference count with two locks: an rwlock and an sx lock,
  which will be read-acquired by threads entering the framework
  depending on whether a give policy entry point is permitted to sleep
  or not.

- Replace previous mutex locking of the reference count for exclusive
  access with write acquiring of both the policy list sx and rw locks,
  which occurs only when policies are attached or detached.

- Do a lockless read of the dynamic policy list head before acquiring
  any locks in order to reduce overhead when no dynamic policies are
  loaded; this a race we can afford to lose.

- For every policy entry point invocation, decide whether sleeping is
  permitted, and if not, use a _NOSLEEP() variant of the composition
  macros, which will use the rwlock instead of the sxlock.  In some
  cases, we decide which to use based on allocation flags passed to the
  MAC Framework entry point.

As with the move to rwlocks/rmlocks in pfil, this may trigger witness
warnings, but these should (generally) be false positives as all
acquisition of the locks is for read with two very narrow exceptions
for policy load/unload, and those code blocks should never acquire
other locks.

Sponsored by:	Google, Inc.
Obtained from:	TrustedBSD Project
Discussed with:	csjp (idea, not specific patch)
2009-03-14 16:06:06 +00:00
Christian S.J. Peron
095b4d2689 Mark the bsdextended rules sysctl as being mpsafe.
Discussed with:	rwatson
2009-03-09 17:42:18 +00:00
Robert Watson
b3f468e253 Add a new thread-private flag, TDP_AUDITREC, to indicate whether or
not there is an audit record hung off of td_ar on the current thread.
Test this flag instead of td_ar when auditing syscall arguments or
checking for an audit record to commit on syscall return.  Under
these circumstances, td_pflags is much more likely to be in the cache
(especially if there is no auditing of the current system call), so
this should help reduce cache misses in the system call return path.

MFC after:      1 week
Reported by:    kris
Obtained from:  TrustedBSD Project
2009-03-09 10:45:58 +00:00
Robert Watson
fefd0ac8a9 Remove 'uio' argument from MAC Framework and MAC policy entry points for
extended attribute get/set; in the case of get an uninitialized user
buffer was passed before the EA was retrieved, making it of relatively
little use; the latter was simply unused by any policies.

Obtained from:	TrustedBSD Project
Sponsored by:	Google, Inc.
2009-03-08 12:32:06 +00:00
Robert Watson
c14172e3ae Rename 'ucred' argument to mac_socket_check_bind() to 'cred' to match
other use of the same variable type.

Obtained from:	TrustedBSD Project
Sponsored by:	Google, Inc.
2009-03-08 12:22:00 +00:00
Robert Watson
6f6174a762 Improve the consistency of MAC Framework and MAC policy entry point
naming by renaming certain "proc" entry points to "cred" entry points,
reflecting their manipulation of credentials.  For some entry points,
the process was passed into the framework but not into policies; in
these cases, stop passing in the process since we don't need it.

  mac_proc_check_setaudit -> mac_cred_check_setaudit
  mac_proc_check_setaudit_addr -> mac_cred_check_setaudit_addr
  mac_proc_check_setauid -> mac_cred_check_setauid
  mac_proc_check_setegid -> mac_cred_check_setegid
  mac_proc_check_seteuid -> mac_cred_check_seteuid
  mac_proc_check_setgid -> mac_cred_check_setgid
  mac_proc_check_setgroups -> mac_cred_ceck_setgroups
  mac_proc_check_setregid -> mac_cred_check_setregid
  mac_proc_check_setresgid -> mac_cred_check_setresgid
  mac_proc_check_setresuid -> mac_cred_check_setresuid
  mac_proc_check_setreuid -> mac_cred_check_setreuid
  mac_proc_check_setuid -> mac_cred_check_setuid

Obtained from:	TrustedBSD Project
Sponsored by:	Google, Inc.
2009-03-08 10:58:37 +00:00
Robert Watson
2087a58ca2 Add static DTrace probes for MAC Framework access control checks and
privilege grants so that dtrace can be more easily used to monitor
the security decisions being generated by the MAC Framework following
policy invocation.

Successful access control checks will be reported by:

  mac_framework:kernel:<entrypoint>:mac_check_ok

Failed access control checks will be reported by:

  mac_framework:kernel:<entrypoint>:mac_check_err

Successful privilege grants will be reported by:

  mac_framework:kernel:priv_grant:mac_grant_ok

Failed privilege grants will be reported by:

  mac_framework:kernel:priv_grant:mac_grant_err

In all cases, the return value (always 0 for _ok, otherwise an errno
for _err) will be reported via arg0 on the probe, and subsequent
arguments will hold entrypoint-specific data, in a style similar to
privilege tracing.

Obtained from:	TrustedBSD Project
Sponsored by:	Google, Inc.
2009-03-08 00:50:37 +00:00
Robert Watson
73e416e35d Reduce the verbosity of SDT trace points for DTrace by defining several
wrapper macros that allow trace points and arguments to be declared
using a single macro rather than several.  This means a lot less
repetition and vertical space for each trace point.

Use these macros when defining privilege and MAC Framework trace points.

Reviewed by:	jb
MFC after:	1 week
2009-03-03 17:15:05 +00:00
Robert Watson
06edd2f1e8 Merge OpenBSM 1.1 beta 1 from OpenBSM vendor branch to head, both
contrib/openbsm (svn merge) and src/sys/{bsm,security/audit} (manual
merge).

OpenBSM history for imported revision below for reference.

MFC after:      1 month
Sponsored by:   Apple, Inc.
Obtained from:  TrustedBSD Project

OpenBSM 1.1 beta 1

- The filesz parameter in audit_control(5) now accepts suffixes: 'B' for
  Bytes, 'K' for Kilobytes, 'M' for Megabytes, and 'G' for Gigabytes.
  For legacy support no suffix defaults to bytes.
- Audit trail log expiration support added.  It is configured in
  audit_control(5) with the expire-after parameter.  If there is no
  expire-after parameter in audit_control(5), the default, then the audit
  trail files are not expired and removed.  See audit_control(5) for
  more information.
- Change defaults in audit_control: warn at 5% rather than 20% free for audit
  partitions, rotate automatically at 2mb, and set the default policy to
  cnt,argv rather than cnt so that execve(2) arguments are captured if
  AUE_EXECVE events are audited.  These may provide more usable defaults for
  many users.
- Use au_domain_to_bsm(3) and au_socket_type_to_bsm(3) to convert
  au_to_socket_ex(3) arguments to BSM format.
- Fix error encoding AUT_IPC_PERM tokens.
2009-03-02 13:29:18 +00:00
Konstantin Belousov
ad062f5bb8 Use vm_map_entry_t instead of explicit struct vm_map_entry *.
Reviewed by:	alc
2009-02-24 20:27:48 +00:00
Robert Watson
8f8dedbed8 Set the lower bound on queue size for an audit pipe to 1 instead of 0,
as an audit pipe with a queue length of 0 is less useful.

Obtained from:	TrustedBSD Project
Sponsored by:	Apple, Inc.
MFC after:	1 week
2009-02-08 15:38:31 +00:00
Robert Watson
f4f93a63f5 Change various routines that are responsible for transforming audit
event IDs based on arguments to return au_event_t rather than int.

Obtained from:	TrustedBSD Project
Sponsored by:	Apple, Inc.
MFC after:	1 week
2009-02-08 14:39:35 +00:00
Robert Watson
8b14aeee4b Audit AUE_MAC_EXECVE; currently just the standard AUE_EXECVE arguments
and not the label.

Obtained from:	TrustedBSD Project
Sponsored by:	Apple, Inc.
MFC after:	1 week
2009-02-08 14:24:35 +00:00
Robert Watson
4ba1f444c5 Audit the flag argument to the nfssvc(2) system call.
Obtained from:	TrustedBSD Project
Sponsored by:	Apple, Inc.
2009-02-08 14:04:08 +00:00
Robert Watson
91e832e0c3 Eliminate the local variable 'ape' in audit_pipe_kqread(), as it's only
used for an assertion that we don't really need anymore.

MFC after:	1 week
Reported by:	Christoph Mallon <christoph dot mallon at gmx dot de>
2009-02-04 19:56:37 +00:00
Robert Watson
c7ed8c0a85 Use __FBSDID() for $FreeBSD$ version strings in .c files.
Obtained from:	TrustedBSD Project
MFC after:	3 days
2009-01-24 13:15:45 +00:00
Robert Watson
91ec000612 Begin to add SDT tracing of the MAC Framework: add policy modevent,
register, and unregister hooks that give access to the mac_policy_conf
for the policy.

Obtained from:	TrustedBSD Project
MFC after:	3 days
2009-01-24 10:57:32 +00:00
Robert Watson
07cd9ab013 Update copyright, P4 version number as audit_bsm_token.c reflects changes
in bsm_token.c through #86 from OpenBSM.

MFC after:	1 month
Sponsored by:	Apple, Inc.
Obtained from:	TrustedBSD Project
2009-01-14 12:16:14 +00:00
Robert Watson
c74c7b73a0 Merge OpenBSM alpha 5 from OpenBSM vendor branch to head, both
contrib/openbsm (svn merge) and src/sys/{bsm,security/audit} (manual
merge).  Hook up bsm_domain.c and bsm_socket_type.c to the libbsm
build along with man pages, add audit_bsm_domain.c and
audit_bsm_socket_type.c to the kernel environment.

OpenBSM history for imported revisions below for reference.

MFC after:      1 month
Sponsored by:   Apple Inc.
Obtained from:  TrustedBSD Project

OpenBSM 1.1 alpha 5

- Stub libauditd(3) man page added.
- All BSM error number constants with BSM_ERRNO_.
- Interfaces to convert between local and BSM socket types and protocol
  families have been added: au_bsm_to_domain(3), au_bsm_to_socket_type(3),
  au_domain_to_bsm(3), and au_socket_type_to_bsm(3), along with definitions
  of constants in audit_domain.h and audit_socket_type.h.  This improves
  interoperability by converting local constant spaces, which vary by OS, to
  and from Solaris constants (where available) or OpenBSM constants for
  protocol domains not present in Solaris (a fair number).  These routines
  should be used when generating and interpreting extended socket tokens.
- Fix build warnings with full gcc warnings enabled on most supported
  platforms.
- Don't compile error strings into bsm_errno.c when building it in the kernel
  environment.
- When started by launchd, use the label com.apple.auditd rather than
  org.trustedbsd.auditd.
2009-01-14 10:44:16 +00:00
Robert Watson
9162f64b58 Rather than having MAC policies explicitly declare what object types
they label, derive that information implicitly from the set of label
initializers in their policy operations set.  This avoids a possible
class of programmer errors, while retaining the structure that
allows us to avoid allocating labels for objects that don't need
them.  As before, we regenerate a global mask of labeled objects
each time a policy is loaded or unloaded, stored in mac_labeled.

Discussed with:   csjp
Suggested by:     Jacques Vidrine <nectar at apple.com>
Obtained from:    TrustedBSD Project
Sponsored by:     Apple, Inc.
2009-01-10 10:58:41 +00:00
Robert Watson
dbdcb99498 Use MPC_OBJECT_IP6Q to indicate labeling of struct ip6q rather than
MPC_OBJECT_IPQ; it was already defined, just not used.

Obtained from:	TrustedBSD Project
Sponsored by:	Apple, Inc.
2009-01-10 09:17:16 +00:00
Robert Watson
d423f266c4 Do a lockless read of the audit pipe list before grabbing the audit pipe
lock in order to avoid the lock acquire hit if the pipe list is very
likely empty.

Obtained from:	TrustedBSD Project
MFC after:	3 weeks
Sponsored by:	Apple, Inc.
2009-01-06 14:15:38 +00:00
Robert Watson
cd6bbe656e In AUDIT_SYSCALL_EXIT(), invoke audit_syscall_exit() only if an audit
record is active on the current thread--historically we may always
have wanted to enter the audit code if auditing was enabled, but now
we just commit the audit record so don't need to enter if there isn't
one.

Obtained from:	TrustedBSD Project
Sponsored by:	Apple, Inc.
2009-01-06 13:59:59 +00:00
Robert Watson
efcde1e8c7 Fix white space botch: use carriage returns rather than tabs. 2008-12-31 23:22:45 +00:00
Robert Watson
3c36dc8159 Commit two files missed in previous commit: hook up audit_bsm_errno.c
and adapt for kernel build environment.

Obtained from:	TrustedBSD Project
Sponsored by:	Apple, Inc.
2008-12-31 13:56:31 +00:00
Robert Watson
fcdb2e9607 Call au_errno_to_bsm() on the errno value passed into au_to_return32()
to convert local FreeBSD error numbers into BSM error numbers.

Obtained from:	TrustedBSD Project
2008-12-31 11:56:35 +00:00
Robert Watson
7a0a89d2cb Merge OpenBSM alpha 4 from OpenBSM vendor branch to head, both
contrib/openbsm (svn merge) and src/sys/{bsm,security/audit} (manual
merge).  Add libauditd build parts and add to auditd's linkage;
force libbsm to build before libauditd.

OpenBSM history for imported revisions below for reference.

MFC after:      1 month
Sponsored by:   Apple Inc.
Obtained from:  TrustedBSD Project

OpenBSM 1.1 alpha 4

- With the addition of BSM error number mapping, we also need to map the
  local error number passed to audit_submit(3) to a BSM error number,
  rather than have the caller perform that conversion.
- Reallocate user audit events to avoid collisions with Solaris; adopt a
  more formal allocation scheme, and add some events allocated in Solaris
  that will be of immediate use on other platforms.
- Add an event for Calife.
- Add au_strerror(3), which allows generating strings for BSM errors
  directly, rather than requiring applications to map to the local error
  space, which might not be able to entirely represent the BSM error
  number space.
- Major auditd rewrite for launchd(8) support.  Add libauditd library
  that is shared between launchd and auditd.
- Add AUDIT_TRIGGER_INITIALIZE trigger (sent via 'audit -i') for
  (re)starting auditing under launchd(8) on Mac OS X.
- Add 'current' symlink to active audit trail.
- Add crash recovery of previous audit trail file when detected on audit
  startup that it has not been properly terminated.
- Add the event AUE_audit_recovery to indicated when an audit trail file
  has been recovered from not being properly terminated.  This event is
  stored in the new audit trail file and includes the path of recovered
  audit trail file.
- Mac OS X and FreeBSD dependent code in auditd.c is separated into
  auditd_darwin.c and auditd_fbsd.c files.
- Add an event for the posix_spawn(2) and fsgetpath(2) Mac OS X system
  calls.
- For Mac OS X, we use ASL(3) instead of syslog(3) for logging.
- Add support for NOTICE level logging.

OpenBSM 1.1 alpha 3

- Add two new functions, au_bsm_to_errno() and au_errno_to_bsm(), to map
  between BSM error numbers (largely the Solaris definitions) and local
  errno(2) values for 32-bit and 64-bit return tokens.  This is required
  as operating systems don't agree on some of the values of more recent
  error numbers.
- Fix a bug how au_to_exec_args(3) and au_to_exec_env(3) calculates the
  total size for the token.  This buge.
- Deprecated Darwin constants, such as TRAILER_PAD_MAGIC, removed.
2008-12-31 11:12:24 +00:00
Alan Cox
1361cdc644 Make preparations for resurrecting shared/read locks on vm maps:
mac_proc_vm_revoke_recurse() requests a read lock on the vm map at the start
but does not handle failure by vm_map_lock_upgrade() when it seeks to modify
the vm map.  At present, this works because all lock request on a vm map are
implemented as exclusive locks.  Thus, vm_map_lock_upgrade() is a no-op that
always reports success.  However, that is about to change, and
proc_vm_revoke_recurse() will require substantial modifications to handle
vm_map_lock_upgrade() failures.  For the time being, I am changing
mac_proc_vm_revoke_recurse() to request a write lock on the vm map at the
start.

Approved by:	rwatson
MFC after:	3 months
2008-12-22 17:32:52 +00:00
Robert Watson
52267f7411 Merge OpenBSM 1.1 alpha 2 from the OpenBSM vendor branch to head, both
contrib/openbsm (svn merge) and sys/{bsm,security/audit} (manual merge).

- Add OpenBSM contrib tree to include paths for audit(8) and auditd(8).
- Merge support for new tokens, fixes to existing token generation to
  audit_bsm_token.c.
- Synchronize bsm includes and definitions.

OpenBSM history for imported revisions below for reference.

MFC after:      1 month
Sponsored by:   Apple Inc.
Obtained from:  TrustedBSD Project

--

OpenBSM 1.1 alpha 2

- Include files in OpenBSM are now broken out into two parts: library builds
  required solely for user space, and system includes, which may also be
  required for use in the kernels of systems integrating OpenBSM.  Submitted
  by Stacey Son.
- Configure option --with-native-includes allows forcing the use of native
  include for system includes, rather than the versions bundled with OpenBSM.
  This is intended specifically for platforms that ship OpenBSM, have adapted
  versions of the system includes in a kernel source tree, and will use the
  OpenBSM build infrastructure with an unmodified OpenBSM distribution,
  allowing the customized system includes to be used with the OpenBSM build.
  Submitted by Stacey Son.
- Various strcpy()'s/strcat()'s have been changed to strlcpy()'s/strlcat()'s
  or asprintf().  Added compat/strlcpy.h for Linux.
- Remove compatibility defines for old Darwin token constant names; now only
  BSM token names are provided and used.
- Add support for extended header tokens, which contain space for information
  on the host generating the record.
- Add support for setting extended host information in the kernel, which is
  used for setting host information in extended header tokens.  The
  audit_control file now supports a "host" parameter which can be used by
  auditd to set the information; if not present, the kernel parameters won't
  be set and auditd uses unextended headers for records that it generates.

OpenBSM 1.1 alpha 1

- Add option to auditreduce(1) which allows users to invert sense of
  matching, such that BSM records that do not match, are selected.
- Fix bug in audit_write() where we commit an incomplete record in the
  event there is an error writing the subject token.  This was submitted
  by Diego Giagio.
- Build support for Mac OS X 10.5.1 submitted by Eric Hall.
- Fix a bug which resulted in host XML attributes not being arguments so
  that const strings can be passed as arguments to tokens.  This patch was
  submitted by Xin LI.
- Modify the -m option so users can select more then one audit event.
- For Mac OS X, added Mach IPC support for audit trigger messages.
- Fixed a bug in getacna() which resulted in a locking problem on Mac OS X.
- Added LOG_PERROR flag to openlog when -d option is used with auditd.
- AUE events added for Mac OS X Leopard system calls.
2008-12-02 23:26:43 +00:00
Christian S.J. Peron
5ac14ef177 Partially roll back a revision which changed the error code being returned
by getaudit(2).  Some applications such has su, id will interpret E2BIG as
requiring the use of getaudit_addr(2) to pull extended audit state (ip6)
from the kernel.

This change un-breaks the ABI when auditing has been activated on a system
and the users are logged in via ip6.

This is a RELENG_7_1 candidate.

MFC after:	1 day
Discussed with:	rwatson
2008-11-30 19:58:03 +00:00
Bjoern A. Zeeb
413628a7e3 MFp4:
Bring in updated jail support from bz_jail branch.

This enhances the current jail implementation to permit multiple
addresses per jail. In addtion to IPv4, IPv6 is supported as well.
Due to updated checks it is even possible to have jails without
an IP address at all, which basically gives one a chroot with
restricted process view, no networking,..

SCTP support was updated and supports IPv6 in jails as well.

Cpuset support permits jails to be bound to specific processor
sets after creation.

Jails can have an unrestricted (no duplicate protection, etc.) name
in addition to the hostname. The jail name cannot be changed from
within a jail and is considered to be used for management purposes
or as audit-token in the future.

DDB 'show jails' command was added to aid debugging.

Proper compat support permits 32bit jail binaries to be used on 64bit
systems to manage jails. Also backward compatibility was preserved where
possible: for jail v1 syscalls, as well as with user space management
utilities.

Both jail as well as prison version were updated for the new features.
A gap was intentionally left as the intermediate versions had been
used by various patches floating around the last years.

Bump __FreeBSD_version for the afore mentioned and in kernel changes.

Special thanks to:
- Pawel Jakub Dawidek (pjd) for his multi-IPv4 patches
  and Olivier Houchard (cognet) for initial single-IPv6 patches.
- Jeff Roberson (jeff) and Randall Stewart (rrs) for their
  help, ideas and review on cpuset and SCTP support.
- Robert Watson (rwatson) for lots and lots of help, discussions,
  suggestions and review of most of the patch at various stages.
- John Baldwin (jhb) for his help.
- Simon L. Nielsen (simon) as early adopter testing changes
  on cluster machines as well as all the testers and people
  who provided feedback the last months on freebsd-jail and
  other channels.
- My employer, CK Software GmbH, for the support so I could work on this.

Reviewed by:	(see above)
MFC after:	3 months (this is just so that I get the mail)
X-MFC Before:   7.2-RELEASE if possible
2008-11-29 14:32:14 +00:00
Robert Watson
a760c0b245 Regularize /* FALLTHROUGH */ comments in the BSM event type switch, and
add one that was missing.

MFC after:	3 weeks
Coverity ID:	3960
2008-11-25 11:25:45 +00:00
Robert Watson
e6870c95e3 When repeatedly accessing a thread credential, cache the credential
pointer in a local thread.  While this is unlikely to significantly
improve performance given modern compiler behavior, it makes the code
more readable and reduces diffs to the Mac OS X version of the same
code (which stores things in creds in the same way, but where the
cred for a thread is reached quite differently).

Discussed with: sson
MFC after:      1 month
Sponsored by:   Apple Inc.
Obtained from:	TrustedBSD Project
2008-11-14 01:24:52 +00:00
Robert Watson
618521b1a2 The audit queue limit variables are size_t, so use size_t for the audit
queue length variables as well, avoiding storing the limit in a larger
type than the length.

Submitted by:	sson
Sponsored by:	Apple Inc.
MFC after:	1 week
2008-11-13 00:21:01 +00:00
Robert Watson
4ebff7e0ca Move audit-internal function definitions for getting and setting audit
kinfo state to audit_private.h.
2008-11-11 23:08:20 +00:00
Robert Watson
91721ee9ba Minor style tweaks and change lock name string to use _'s and not spaces
to improve parseability.
2008-11-11 22:59:40 +00:00
Christian S.J. Peron
ffbcef5a42 Add support for extended header BSM tokens. Currently we use the
regular header tokens.  The extended header tokens contain an IP
or IPv6 address which makes it possible to identify which host an
audit record came from when audit records are centralized.

If the host information has not been specified, the system will
default to the old style headers.  Otherwise, audit records that
are created as a result of system calls will contain host information.

This implemented has been designed to be consistent with the Solaris
implementation.  Host information is set/retrieved using the A_GETKAUDIT
and A_SETKAUDIT auditon(2) commands.  These commands require that a
pointer to a auditinfo_addr_t object is passed.  Currently only IP and
IPv6 address families are supported.

The users pace bits associated with this change will follow in an
openbsm import.

Reviewed by:	rwatson, (sson, wsalamon (older version))
MFC after:	1 month
2008-11-11 21:57:03 +00:00
Robert Watson
b713bf6e3a Wrap sx locking of the audit worker sleep lock in macros, update comments.
MFC after:	2 months
Sponsored by:	Apple, Inc.
2008-11-10 22:06:24 +00:00
John Baldwin
927edcc9ba Use shared vnode locks for auditing vnode arguments as auditing only
does a VOP_GETATTR() which does not require an exclusive lock.

Reviewed by:	csjp, rwatson
2008-11-04 22:31:04 +00:00
John Baldwin
16da60664d Don't lock the vnode around calls to vn_fullpath().
Reviewed by:	csjp, rwatson
2008-11-04 22:30:24 +00:00
Robert Watson
d2f6bb070f Update introductory comment for audit pipes.
MFC after:	2 months
Sponsored by:	Apple, Inc.
2008-11-02 00:25:48 +00:00
Robert Watson
6e1362b499 Remove stale comment about filtering in audit pipe ioctl routine: we do
support filtering now, although we may want to make it more interesting
in the future.

MFC after:	2 months
Sponsored by:	Apple, Inc.
2008-11-02 00:18:19 +00:00
Robert Watson
e4565e2028 Add comment for per-pipe stats.
MFC after:	2 months
Sponsored by:	Apple, Inc.
2008-11-01 23:05:49 +00:00
Robert Watson
cff9c52e23 We only allow a partial read of the first record in an audit pipe
record queue, so move the offset field from the per-record
audit_pipe_entry structure to the audit_pipe structure.

Now that we support reading more than one record at a time, add a
new summary field to audit_pipe, ap_qbyteslen, which tracks the
total number of bytes present in a pipe, and return that (minus
the current offset) via FIONREAD and kqueue's data variable for
the pending byte count rather than the number of bytes remaining
in only the first record.

Add a number of asserts to confirm that these counts and offsets
following the expected rules.

MFC after:	2 months
Sponsored by:	Apple, Inc.
2008-11-01 21:56:45 +00:00
Robert Watson
a9275e0bd5 Allow a single read(2) system call on an audit pipe to retrieve data from
more than one audit record at a time in order to improve efficiency.

MFC after:	2 months
Sponsored by:	Apple, Inc.
2008-11-01 21:16:09 +00:00
Robert Watson
1a0edb10ca Since there is no longer the opportunity for record truncation, just
return 0 if the truncation counter is queried on an audit pipe.

MFC after:	2 months
Sponsored by:	Apple, Inc.
2008-10-31 15:11:01 +00:00
Robert Watson
5a9d15cd4c Historically, /dev/auditpipe has allows only whole records to be read via
read(2), which meant that records longer than the buffer passed to read(2)
were dropped.  Instead take the approach of allowing partial reads to be
continued across multiple system calls more in the style of streaming
character device.

This means retaining a record on the per-pipe queue in a partially read
state, so maintain a current offset into the record.  Keep the record on
the queue during a read, so add a new lock, ap_sx, to serialize removal
of records from the queue by either read(2) or ioctl(2) requesting a pipe
flush.  Modify the kqueue handler to return bytes left in the current
record rather than simply the size of the current record.

It is now possible to use praudit, which used the standard FILE * buffer
sizes, to track much larger record sizes from /dev/auditpipe, such as
very long command lines to execve(2).

MFC after:	2 months
Sponsored by:	Apple, Inc.
2008-10-31 14:40:21 +00:00
Robert Watson
1daa6feb45 When we drop an audit record going to and audit pipe because the audit
pipe has overflowed, drop the newest, rather than oldest, record.  This
makes overflow drop behavior consistent with memory allocation failure
leading to drop, avoids touching the consumer end of the queue from a
producer, and lowers the CPU overhead of dropping a record by dropping
before memory allocation and copying.

Obtained from:	Apple, Inc.
MFC after:	2 months
2008-10-30 23:09:19 +00:00
Robert Watson
846f37f1e7 Break out single audit_pipe_mtx into two types of locks: a global rwlock
protecting the list of audit pipes, and a per-pipe mutex protecting the
queue.

Likewise, replace the single global condition variable used to signal
delivery of a record to one or more pipes, and add a per-pipe condition
variable to avoid spurious wakeups when event subscriptions differ
across multiple pipes.

This slightly increases the cost of delivering to audit pipes, but should
reduce lock contention in the presence of multiple readers as only the
per-pipe lock is required to read from a pipe, as well as avoid
overheading when different pipes are used in different ways.

MFC after:	2 months
Sponsored by:	Apple, Inc.
2008-10-30 21:58:39 +00:00
Robert Watson
c211285f25 Protect the event->class lookup database using an rwlock instead of a
mutex, as it's rarely changed but frequently accessed read-only from
multiple threads, so a potentially significant source of contention.

MFC after:	1 month
Sponsored by:	Apple, Inc.
2008-10-30 17:47:57 +00:00
Robert Watson
a1b9471a47 The V* flags passed using an accmode_t to the access() and open()
access control checks in mac_bsdextended are not in the same
namespace as the MBI_ flags used in ugidfw policies, so add an
explicit conversion routine to get from one to the other.

Obtained from:	TrustedBSD Project
2008-10-30 10:13:53 +00:00
Edward Tomasz Napierala
178da2a90e Commit part of accmode_t changes that I missed in previous commit.
Approved by:	rwatson (mentor)
2008-10-28 21:57:32 +00:00
Robert Watson
564f8f0fee Break out strictly credential-related portions of mac_process.c into a
new file, mac_cred.c.

Obtained from:	TrustedBSD Project
2008-10-28 21:53:10 +00:00
Edward Tomasz Napierala
15bc6b2bd8 Introduce accmode_t. This is required for NFSv4 ACLs - it will be neccessary
to add more V* constants, and the variables changed by this patch were often
being assigned to mode_t variables, which is 16 bit.

Approved by:	rwatson (mentor)
2008-10-28 13:44:11 +00:00
Robert Watson
9215889d21 Rename mac_cred_mmapped_drop_perms(), which revokes access to virtual
memory mappings when the MAC label on a process changes, to
mac_proc_vm_revoke(),

It now also acquires its own credential reference directly from the
affected process rather than accepting one passed by the the caller,
simplifying the API and consumer code.

Obtained from:	TrustedBSD Project
2008-10-28 12:49:07 +00:00
Robert Watson
212ab0cfb3 Rename three MAC entry points from _proc_ to _cred_ to reflect the fact
that they operate directly on credentials: mac_proc_create_swapper(),
mac_proc_create_init(), and mac_proc_associate_nfsd().  Update policies.

Obtained from:	TrustedBSD Project
2008-10-28 11:33:06 +00:00
Robert Watson
048e2d5899 Extended comment on why we consider a partition relabel request of "0" to
be a no-op request, and why this might have to change if we want to allow
leaving a partition someday.

Obtained from:	TrustedBSD Project
MFC after:	3 days
2008-10-28 09:16:34 +00:00
Robert Watson
6c6c03be2d Rename label_on_label() to partition_check(), which is far more
suggestive as to its actual function.

Obtained from:	TrustedBSD Project
MFC after:	3 days
2008-10-28 09:12:13 +00:00
Robert Watson
5077415a10 Improve alphabetical sort order of stub entry points. 2008-10-28 08:50:09 +00:00
Robert Watson
168a6ae7a7 When the mac_bsdextended policy is unloaded, free rule memory.
Obtained from:	TrustedBSD Project
MFC after:	3 days
2008-10-27 18:08:12 +00:00
Robert Watson
0ee8da47fb Add TrustedBSD credit to new ugidfw_internal.h file. 2008-10-27 12:12:23 +00:00