Commit Graph

283 Commits

Author SHA1 Message Date
Brooks Davis
12e69f96a2 Add const to input-only char * arguments.
These arguments are mostly paths handled by NAMEI*() macros which already
take const char * arguments.

This change improves the match between syscalls.master and the public
declerations of system calls.

Reviewed by:	kib (prior version)
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17812
2018-11-02 20:50:22 +00:00
Robert Watson
2ddefb6d5d Rework the logic around quick checks for auditing that take place at
system-call entry and whenever audit arguments or return values are
captured:

1. Expose a single global, audit_syscalls_enabled, which controls
   whether the audit framework is entered, rather than exposing
   components of the policy -- e.g., if the trail is enabled,
   suspended, etc.

2. Introduce a new function audit_syscalls_enabled_update(), which is
   called to update audit_syscalls_enabled whenever an aspect of the
   policy changes, so that the value can be updated.

3. Remove a check of trail enablement/suspension from audit_new() --
   at the point where this function has been entered, we believe that
   system-call auditing is already in force, or we wouldn't get here,
   so simply proceed to more expensive policy checks.

4. Use an audit-provided global, audit_dtrace_enabled, rather than a
   dtaudit-provided global, to provide policy indicating whether
   dtaudit would like system calls to be audited.

5. Do some minor cosmetic renaming to clarify what various variables
   are for.

These changes collectively arrange it so that traditional audit
(trail, pipes) or the DTrace audit provider can enable system-call
probes without the other configured.  Otherwise, dtaudit cannot
capture system-call data without auditd(8) started.

Reviewed by:		gnn
Sponsored by:		DARPA, AFRL
Approved by:		re (gjb)
Differential Revision:	https://reviews.freebsd.org/D17348
2018-10-02 15:58:17 +00:00
Robert Watson
deea362c80 The kernel DTrace audit provider (dtaudit) relies on auditd(8) to load
/etc/security/audit_event to provide a list of audit event-number <->
name mappings.  However, this occurs too late for anonymous tracing.
With this change, adding 'audit_event_load="YES"' to /boot/loader.conf
will cause the boot loader to preload the file, and then the kernel
audit code will parse it to register an initial set of audit event-number
<-> name mappings.  Those mappings can later be updated by auditd(8) if
the configuration file changes.

Reviewed by:	gnn, asomers, markj, allanjude
Discussed with:	jhb
Approved by:	re (kib)
MFC after:	1 week
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D16589
2018-09-03 14:26:43 +00:00
Andriy Gapon
dc8240f0da fix incorrect operator in the AUDITPIPE_SET_QLIMIT bounds check
PR:		229983
Submitted by:	Aniket Pandey <aniketp@iitk.ac.in>
Reported by:	Aniket Pandey <aniketp@iitk.ac.in>
MFC after:	1 week
2018-07-23 16:56:49 +00:00
Alan Somers
12395dc9f6 Fix audit of chflagsat, lgetfh, and setfib
These syscalls were always supposed to have been auditted, but due to
oversights never were.

PR:		228374
Reported by:	aniketp
Reviewed by:	aniketp
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D16388
2018-07-22 14:11:52 +00:00
Alan Somers
f7c188aeeb auditon(2): fix A_SETPOLICY with 64-bit values
A_SETPOLICY is supposed to work with either 64 or 32-bit values, but due to a
typo the 64-bit version has never worked correctly.

Submitted by:	aniketp
Reviewed by:	asomers, cem
MFC after:	2 weeks
Sponsored by:	Google, Inc. (GSoC 2018)
Differential Revision:	https://reviews.freebsd.org/D16222
2018-07-15 21:10:19 +00:00
Alan Somers
ebc0d5599d audit(4): fix the definition of ARG_TERMID_ADDR
Due to a copy/paste error in r168688, ARG_TERMID_ADDR has the same
definition as ARG_SADDRUNIX.  Fix it.

The header change, while publicly visible, is guarded by #ifdef KERNEL, and
I can't find any kmod ports that use it.  So I'm not bumping
__FreeBSD_version.

PR:		228820
Submitted by:	aniketp
Sponsored by:	Google, Inc. (GSoC 2018)
Differential Revision:	https://reviews.freebsd.org/D15702
2018-06-13 14:55:31 +00:00
Alan Somers
1c97d64332 #include <bsm/audit.h> in security/audit/audit_ioctl.h
security/audit/audit_ioctl.h uses a type from bsm/audit.h, so needs to
include it.  And it needs to know the type's size, so it can't just
forward-declare.

PR:		228470
Submitted by:	aniketp
MFC after:	2 weeks
Sponsored by:	Google, Inc. (GSoC 2018)
Differential Revision:	https://reviews.freebsd.org/D15561
2018-05-30 21:50:23 +00:00
Alan Somers
cb3d7cd839 Fix "Bad tailq" panic when auditing auditon(A_SETCLASS, ...)
Due to an oversight in r195280, auditon(A_SETCLASS, ...) would cause a tailq
element to get added to the tailq twice, resulting in a circular tailq. This
panics when INVARIANTS are on.

MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D15381
2018-05-28 20:47:39 +00:00
Alan Somers
b521cf275c audit(4): fix a typo in a comment
no functional change
2018-03-17 17:56:08 +00:00
Pedro F. Giffuni
51369649b0 sys: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:43:44 +00:00
Mateusz Guzik
574adb65c8 Sprinkle __read_frequently on few obvious places.
Note that some of annotated variables should probably change their types
to something smaller, preferably bit-sized.
2017-09-06 20:33:33 +00:00
Konstantin Belousov
6992112349 Commit the 64-bit inode project.
Extend the ino_t, dev_t, nlink_t types to 64-bit ints.  Modify
struct dirent layout to add d_off, increase the size of d_fileno
to 64-bits, increase the size of d_namlen to 16-bits, and change
the required alignment.  Increase struct statfs f_mntfromname[] and
f_mntonname[] array length MNAMELEN to 1024.

ABI breakage is mitigated by providing compatibility using versioned
symbols, ingenious use of the existing padding in structures, and
by employing other tricks.  Unfortunately, not everything can be
fixed, especially outside the base system.  For instance, third-party
APIs which pass struct stat around are broken in backward and
forward incompatible ways.

Kinfo sysctl MIBs ABI is changed in backward-compatible way, but
there is no general mechanism to handle other sysctl MIBS which
return structures where the layout has changed. It was considered
that the breakage is either in the management interfaces, where we
usually allow ABI slip, or is not important.

Struct xvnode changed layout, no compat shims are provided.

For struct xtty, dev_t tty device member was reduced to uint32_t.
It was decided that keeping ABI compat in this case is more useful
than reporting 64-bit dev_t, for the sake of pstat.

Update note: strictly follow the instructions in UPDATING.  Build
and install the new kernel with COMPAT_FREEBSD11 option enabled,
then reboot, and only then install new world.

Credits: The 64-bit inode project, also known as ino64, started life
many years ago as a project by Gleb Kurtsou (gleb).  Kirk McKusick
(mckusick) then picked up and updated the patch, and acted as a
flag-waver.  Feedback, suggestions, and discussions were carried
by Ed Maste (emaste), John Baldwin (jhb), Jilles Tjoelker (jilles),
and Rick Macklem (rmacklem).  Kris Moore (kris) performed an initial
ports investigation followed by an exp-run by Antoine Brodin (antoine).
Essential and all-embracing testing was done by Peter Holm (pho).
The heavy lifting of coordinating all these efforts and bringing the
project to completion were done by Konstantin Belousov (kib).

Sponsored by:	The FreeBSD Foundation (emaste, kib)
Differential revision:	https://reviews.freebsd.org/D10439
2017-05-23 09:29:05 +00:00
Robert Watson
709557d903 Break audit_bsm_klib.c into two files: one (audit_bsm_klib.c)
retaining various utility functions used during BSM generation,
and a second (audit_bsm_db.c) that contains the various in-kernel
databases supporting various audit activities (the class and
event-name tables).

(No functional change is intended.)

Obtained from:	TrustedBSD Project
MFC after:	3 weeks
Sponsored by:	DARPA, AFRL
2017-04-03 10:15:58 +00:00
Robert Watson
475e1fc01f Correct macro names and signatures for !AUDIT versions of canonical
path auditing.

Obtained from:	TrustedBSD Project
MFC after:	3 weeks
Sponsored by:	DARPA, AFRL
2017-03-31 14:13:13 +00:00
Robert Watson
15bcf785ba Audit arguments to POSIX message queues, semaphores, and shared memory.
This requires minor changes to the audit framework to allow capturing
paths that are not filesystem paths (i.e., will not be canonicalised
relative to the process current working directory and/or filesystem
root).

Obtained from:	TrustedBSD Project
MFC after:	3 weeks
Sponsored by:	DARPA, AFRL
2017-03-31 13:43:00 +00:00
Robert Watson
1c2da02938 Audit arguments to System V IPC system calls implementing sempahores,
message queues, and shared memory.

Obtained from:	TrustedBSD Project
MFC after:	3 weeks
Sponsored by:	DARPA, AFRL
2017-03-30 22:26:15 +00:00
Robert Watson
b65ec5e523 Various BSM generation improvements when auditing AUE_ACCEPT,
AUE_PROCCTL, AUE_SENDFILE, AUE_ACL_*, and AUE_POSIX_FALLOCATE.
Audit AUE_SHMUNLINK path in the path token rather than as a
text string, and AUE_SHMOPEN flags as an integer token rather
than a System V IPC address token.

Obtained from:	TrustedBSD Project
MFC after:	3 weeks
Sponsored by:	DARPA, AFRL
2017-03-30 21:39:03 +00:00
Robert Watson
7cad64f796 Don't ifdef KDTRACE_HOOKS struct, variable, and function prototype
definitions for the DTrace audit provider, so that the dtaudit module
can compile in the absence of kernel DTrace support.  This doesn't
really make run-time sense (since the binary dependencies for the
module won't be present), but it allows the dtaudit module to compile
successfully regardless of the kernel configuration.

MFC after:	3 weeks
Sponsored by:	DARPA, AFRL
Reported by:	kib
2017-03-30 12:35:56 +00:00
Robert Watson
b783025921 When handling msgsys(2), semsys(2), and shmsys(2) multiplex system calls,
map the 'which' argument into a suitable audit event identifier for the
specific operation requested.

Obtained from:	TrustedBSD Project
MFC after:	3 weeks
Sponsored by:	DARPA, AFRL
2017-03-29 23:31:35 +00:00
Robert Watson
1811d6bf7f Add an experimental DTrace audit provider, which allows users of DTrace to
instrument security event auditing rather than relying on conventional BSM
trail files or audit pipes:

- Add a set of per-event 'commit' probes, which provide access to
  particular auditable events at the time of commit in system-call return.
  These probes gain access to audit data via the in-kernel audit_record
  data structure, providing convenient access to system-call arguments and
  return values in a single probe.

- Add a set of per-event 'bsm' probes, which provide access to particular
  auditable events at the time of BSM record generation in the audit
  worker thread. These probes have access to the in-kernel audit_record
  data structure and BSM representation as would be written to a trail
  file or audit pipe -- i.e., asynchronously in the audit worker thread.

DTrace probe arguments consist of the name of the audit event (to support
future mechanisms of instrumenting multiple events via a single probe --
e.g., using classes), a pointer to the in-kernel audit record, and an
optional pointer to the BSM data and its length. For human convenience,
upper-case audit event names (AUE_...) are converted to lower case in
DTrace.

DTrace scripts can now cause additional audit-based data to be collected
on system calls, and inspect internal and BSM representations of the data.
They do not affect data captured in the audit trail or audit pipes
configured in the system. auditd(8) must be configured and running in
order to provide a database of event information, as well as other audit
configuration parameters (e.g., to capture command-line arguments or
environmental variables) for the provider to operate.

Reviewed by:	gnn, jonathan, markj
Sponsored by:	DARPA, AFRL
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D10149
2017-03-29 19:58:00 +00:00
Robert Watson
759c8caa5a Introduce an audit event identifier -> audit event name mapping
database in the kernel audit implementation, similar the exist
class mapping database.  This will be used by the DTrace audit
provider to map audit event identifiers originating in the
system-call table back into strings for the purposes of setting
probe names.  The database is initialised and maintained by
auditd(8), which reads values in from the audit_events
configuration file, and then manages them using the A_GETEVENT
and A_SETEVENT auditon(2) operations.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, AFRL
MFC after:	3 weeks
2017-03-27 10:38:53 +00:00
Robert Watson
d422682f67 Extend comment describing path canonicalisation in audit.
Sponsored by:	DARPA, AFRL
Obtained from:	TrustedBSD Project
MFC after:	3 days
2017-03-27 08:29:17 +00:00
Robert Watson
1279fdafce Audit 'fd' and 'cmd' arguments to fcntl(2), and when generating BSM,
always audit the file-descriptor number and vnode information for all
fnctl(2) commands, not just locking-related ones.  This was likely an
oversight in the original adaptation of this code from XNU.

MFC after:	3 days
Sponsored by:	DARPA, AFRL
2016-11-22 00:41:24 +00:00
John Baldwin
b3db2736b1 Don't check aq64_minfree which is unsigned for negative values.
This fixes a tautological comparison warning.

Reviewed by:	rwatson
Differential Revision:	https://reviews.freebsd.org/D7682
2016-09-08 19:47:57 +00:00
Robert Watson
70a98c110e Audit the accepted (or rejected) username argument to setlogin(2).
(NB: This was likely a mismerge from XNU in audit support, where the
text argument to setlogin(2) is captured -- but as a text token,
whereas this change uses the dedicated login-name field in struct
audit_record.)

MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
2016-08-20 20:28:08 +00:00
Robert Watson
98daa3e5db Add AUE_WAIT6 handling to the BSM conversion switch statement, reusing
the BSM encoding used for AUE_WAIT4.

MFC after:	3 days
Sponsored by:	DARPA, AFRL
2016-07-11 13:06:17 +00:00
Robert Watson
2aa8c03917 Implement AUE_PREAD and AUE_PWRITE BSM conversion support, eliminating
console warnings when pread(2) and pwrite(2) are used with full
system-call auditing enabled.  We audit the same file-descriptor data
for these calls as we do read(2) and write(2).

Approved by:	re (kib)
MFC after:	3 days
Sponsored by:	DARPA, AFRL
2016-06-13 09:22:20 +00:00
Pedro F. Giffuni
323b076e9c sys: use our nitems() macro when param.h is available.
This should cover all the remaining cases in the kernel.

Discussed in:	freebsd-current
2016-04-21 19:40:10 +00:00
Pedro F. Giffuni
8dfea46460 Remove slightly used const values that can be replaced with nitems().
Suggested by:	jhb
2016-04-21 15:38:28 +00:00
Pedro F. Giffuni
2e4eeb703a audit(8): leave unsigned comparison for last.
aq64_minfree is unsigned so comparing to find out if it is less
than zero is a nonsense. Move the comparison to the last position
as we don't want to spend time if any of the others triggers first.

hile it would be tempting to just remove it, it may be important to
keep  it for portability with platforms where may be signed(?) or
in case we may want to change it in the future.
2016-04-08 03:26:21 +00:00
Konstantin Belousov
27725229dc Busy the mount point which is the owner of the audit vnode, around
audit_record_write().  This is important so that VFS_STATFS() is not
done on the NULL or freed mp and the check for free space is
consistent with the vnode used for write.

Add vn_start_write() braces around VOP_FSYNC() calls on the audit vnode.

Move repeated code to fsync vnode and panic to the helper
audit_worker_sync_vp().

Reviewed by:	rwatson
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-01-16 10:06:33 +00:00
Konstantin Belousov
c77f6350ee Move the funsetown(9) call from audit_pipe_close() to cdevpriv
destructor.  As result, close method becomes trivial and removed.
Final cdevsw close method might be called without file
context (e.g. in vn_open_vnode() if the vnode is reclaimed meantime),
which leaves ap_sigio registered for notification, despite cdevpriv
destructor frees the memory later.

Call destructor instead of doing a cleanup inline, for
devfs_set_cdevpriv() failure in open.  This adds missed funsetown(9)
call and locks ap to satisfy audit_pipe_free() invariants.

Reported and tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-01-13 14:02:07 +00:00
Christian Brueffer
8a0f5c0b6c Merge from contrib/openbsm to bring the kernel audit bits up to date with OpenBSM 1.2 alpha 4:
- remove $P4$
- fix a comment
2015-12-20 23:22:04 +00:00
Mateusz Guzik
f131759f54 fd: make 'rights' a manadatory argument to fget* functions 2015-07-05 19:05:16 +00:00
Mateusz Guzik
4da8456f0a Replace struct filedesc argument in getvnode with struct thread
This is is a step towards removal of spurious arguments.
2015-06-16 13:09:18 +00:00
Mateusz Guzik
9ef8328d52 fd: make rights a mandatory argument to fget_unlocked 2015-06-16 09:52:36 +00:00
Mateusz Guzik
daf63fd2f9 cred: add proc_set_cred helper
The goal here is to provide one place altering process credentials.

This eases debugging and opens up posibilities to do additional work when such
an action is performed.
2015-03-16 00:10:03 +00:00
Mateusz Guzik
eb0b6ba016 audit: fix cred assignment when A_SETPMASK is used
The code used to modify curproc instead of the target process.

Discussed with: rwatson
MFC after:	3 days
2015-03-15 21:43:43 +00:00
Davide Italiano
b40ea294f9 Replace dev_clone with cdevpriv(9) KPI in audit_pipe code.
This is (yet another) step towards the removal of device
cloning from our kernel.

CR:	https://reviews.freebsd.org/D441
Reviewed by:	kib, rwatson
Tested by:	pho
2014-08-20 16:04:30 +00:00
Mateusz Guzik
00b85f6218 audit: plug FILEDESC_LOCK leak in audit_canon_path.
MFC after:	3 days
2014-03-21 01:30:33 +00:00
John Baldwin
44ddb776c6 There is no sysctl with the MIB { CTL_KERN, KERN_MAXID }.
MFC after:	2 weeks
2013-12-05 21:55:10 +00:00
Davide Italiano
d56b4cd4ac - Use make_dev_credf(MAKEDEV_REF) instead of the race-prone make_dev()+
dev_ref() in the clone handlers that still use it.
- Don't set SI_CHEAPCLONE flag, it's not used anywhere neither in devfs
(for anything real)

Reviewed by:	kib
2013-09-07 13:45:44 +00:00
Pawel Jakub Dawidek
ab568de789 Handle cases where capability rights are not provided.
Reported by:	kib
2013-09-05 11:58:12 +00:00
Pawel Jakub Dawidek
7008be5bd7 Change the cap_rights_t type from uint64_t to a structure that we can extend
in the future in a backward compatible (API and ABI) way.

The cap_rights_t represents capability rights. We used to use one bit to
represent one right, but we are running out of spare bits. Currently the new
structure provides place for 114 rights (so 50 more than the previous
cap_rights_t), but it is possible to grow the structure to hold at least 285
rights, although we can make it even larger if 285 rights won't be enough.

The structure definition looks like this:

	struct cap_rights {
		uint64_t	cr_rights[CAP_RIGHTS_VERSION + 2];
	};

The initial CAP_RIGHTS_VERSION is 0.

The top two bits in the first element of the cr_rights[] array contain total
number of elements in the array - 2. This means if those two bits are equal to
0, we have 2 array elements.

The top two bits in all remaining array elements should be 0.
The next five bits in all array elements contain array index. Only one bit is
used and bit position in this five-bits range defines array index. This means
there can be at most five array elements in the future.

To define new right the CAPRIGHT() macro must be used. The macro takes two
arguments - an array index and a bit to set, eg.

	#define	CAP_PDKILL	CAPRIGHT(1, 0x0000000000000800ULL)

We still support aliases that combine few rights, but the rights have to belong
to the same array element, eg:

	#define	CAP_LOOKUP	CAPRIGHT(0, 0x0000000000000400ULL)
	#define	CAP_FCHMOD	CAPRIGHT(0, 0x0000000000002000ULL)

	#define	CAP_FCHMODAT	(CAP_FCHMOD | CAP_LOOKUP)

There is new API to manage the new cap_rights_t structure:

	cap_rights_t *cap_rights_init(cap_rights_t *rights, ...);
	void cap_rights_set(cap_rights_t *rights, ...);
	void cap_rights_clear(cap_rights_t *rights, ...);
	bool cap_rights_is_set(const cap_rights_t *rights, ...);

	bool cap_rights_is_valid(const cap_rights_t *rights);
	void cap_rights_merge(cap_rights_t *dst, const cap_rights_t *src);
	void cap_rights_remove(cap_rights_t *dst, const cap_rights_t *src);
	bool cap_rights_contains(const cap_rights_t *big, const cap_rights_t *little);

Capability rights to the cap_rights_init(), cap_rights_set(),
cap_rights_clear() and cap_rights_is_set() functions are provided by
separating them with commas, eg:

	cap_rights_t rights;

	cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT);

There is no need to terminate the list of rights, as those functions are
actually macros that take care of the termination, eg:

	#define	cap_rights_set(rights, ...)				\
		__cap_rights_set((rights), __VA_ARGS__, 0ULL)
	void __cap_rights_set(cap_rights_t *rights, ...);

Thanks to using one bit as an array index we can assert in those functions that
there are no two rights belonging to different array elements provided
together. For example this is illegal and will be detected, because CAP_LOOKUP
belongs to element 0 and CAP_PDKILL to element 1:

	cap_rights_init(&rights, CAP_LOOKUP | CAP_PDKILL);

Providing several rights that belongs to the same array's element this way is
correct, but is not advised. It should only be used for aliases definition.

This commit also breaks compatibility with some existing Capsicum system calls,
but I see no other way to do that. This should be fine as Capsicum is still
experimental and this change is not going to 9.x.

Sponsored by:	The FreeBSD Foundation
2013-09-05 00:09:56 +00:00
Andriy Gapon
a01669ea96 audit_proc_coredump: check return value of audit_new
audit_new may return NULL if audit is disabled or suspended.

Sponsored by:	HybridCluster
MFC after:	7 days
2013-07-09 09:03:01 +00:00
Pawel Jakub Dawidek
7493f24ee6 - Implement two new system calls:
int bindat(int fd, int s, const struct sockaddr *addr, socklen_t addrlen);
	int connectat(int fd, int s, const struct sockaddr *name, socklen_t namelen);

  which allow to bind and connect respectively to a UNIX domain socket with a
  path relative to the directory associated with the given file descriptor 'fd'.

- Add manual pages for the new syscalls.

- Make the new syscalls available for processes in capability mode sandbox.

- Add capability rights CAP_BINDAT and CAP_CONNECTAT that has to be present on
  the directory descriptor for the syscalls to work.

- Update audit(4) to support those two new syscalls and to handle path
  in sockaddr_un structure relative to the given directory descriptor.

- Update procstat(1) to recognize the new capability rights.

- Document the new capability rights in cap_rights_limit(2).

Sponsored by:	The FreeBSD Foundation
Discussed with:	rwatson, jilles, kib, des
2013-03-02 21:11:30 +00:00
Pawel Jakub Dawidek
2609222ab4 Merge Capsicum overhaul:
- Capability is no longer separate descriptor type. Now every descriptor
  has set of its own capability rights.

- The cap_new(2) system call is left, but it is no longer documented and
  should not be used in new code.

- The new syscall cap_rights_limit(2) should be used instead of
  cap_new(2), which limits capability rights of the given descriptor
  without creating a new one.

- The cap_getrights(2) syscall is renamed to cap_rights_get(2).

- If CAP_IOCTL capability right is present we can further reduce allowed
  ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed
  ioctls can be retrived with cap_ioctls_get(2) syscall.

- If CAP_FCNTL capability right is present we can further reduce fcntls
  that can be used with the new cap_fcntls_limit(2) syscall and retrive
  them with cap_fcntls_get(2).

- To support ioctl and fcntl white-listing the filedesc structure was
  heavly modified.

- The audit subsystem, kdump and procstat tools were updated to
  recognize new syscalls.

- Capability rights were revised and eventhough I tried hard to provide
  backward API and ABI compatibility there are some incompatible changes
  that are described in detail below:

	CAP_CREATE old behaviour:
	- Allow for openat(2)+O_CREAT.
	- Allow for linkat(2).
	- Allow for symlinkat(2).
	CAP_CREATE new behaviour:
	- Allow for openat(2)+O_CREAT.

	Added CAP_LINKAT:
	- Allow for linkat(2). ABI: Reuses CAP_RMDIR bit.
	- Allow to be target for renameat(2).

	Added CAP_SYMLINKAT:
	- Allow for symlinkat(2).

	Removed CAP_DELETE. Old behaviour:
	- Allow for unlinkat(2) when removing non-directory object.
	- Allow to be source for renameat(2).

	Removed CAP_RMDIR. Old behaviour:
	- Allow for unlinkat(2) when removing directory.

	Added CAP_RENAMEAT:
	- Required for source directory for the renameat(2) syscall.

	Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR):
	- Allow for unlinkat(2) on any object.
	- Required if target of renameat(2) exists and will be removed by this
	  call.

	Removed CAP_MAPEXEC.

	CAP_MMAP old behaviour:
	- Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and
	  PROT_WRITE.
	CAP_MMAP new behaviour:
	- Allow for mmap(2)+PROT_NONE.

	Added CAP_MMAP_R:
	- Allow for mmap(PROT_READ).
	Added CAP_MMAP_W:
	- Allow for mmap(PROT_WRITE).
	Added CAP_MMAP_X:
	- Allow for mmap(PROT_EXEC).
	Added CAP_MMAP_RW:
	- Allow for mmap(PROT_READ | PROT_WRITE).
	Added CAP_MMAP_RX:
	- Allow for mmap(PROT_READ | PROT_EXEC).
	Added CAP_MMAP_WX:
	- Allow for mmap(PROT_WRITE | PROT_EXEC).
	Added CAP_MMAP_RWX:
	- Allow for mmap(PROT_READ | PROT_WRITE | PROT_EXEC).

	Renamed CAP_MKDIR to CAP_MKDIRAT.
	Renamed CAP_MKFIFO to CAP_MKFIFOAT.
	Renamed CAP_MKNODE to CAP_MKNODEAT.

	CAP_READ old behaviour:
	- Allow pread(2).
	- Disallow read(2), readv(2) (if there is no CAP_SEEK).
	CAP_READ new behaviour:
	- Allow read(2), readv(2).
	- Disallow pread(2) (CAP_SEEK was also required).

	CAP_WRITE old behaviour:
	- Allow pwrite(2).
	- Disallow write(2), writev(2) (if there is no CAP_SEEK).
	CAP_WRITE new behaviour:
	- Allow write(2), writev(2).
	- Disallow pwrite(2) (CAP_SEEK was also required).

	Added convinient defines:

	#define	CAP_PREAD		(CAP_SEEK | CAP_READ)
	#define	CAP_PWRITE		(CAP_SEEK | CAP_WRITE)
	#define	CAP_MMAP_R		(CAP_MMAP | CAP_SEEK | CAP_READ)
	#define	CAP_MMAP_W		(CAP_MMAP | CAP_SEEK | CAP_WRITE)
	#define	CAP_MMAP_X		(CAP_MMAP | CAP_SEEK | 0x0000000000000008ULL)
	#define	CAP_MMAP_RW		(CAP_MMAP_R | CAP_MMAP_W)
	#define	CAP_MMAP_RX		(CAP_MMAP_R | CAP_MMAP_X)
	#define	CAP_MMAP_WX		(CAP_MMAP_W | CAP_MMAP_X)
	#define	CAP_MMAP_RWX		(CAP_MMAP_R | CAP_MMAP_W | CAP_MMAP_X)
	#define	CAP_RECV		CAP_READ
	#define	CAP_SEND		CAP_WRITE

	#define	CAP_SOCK_CLIENT \
		(CAP_CONNECT | CAP_GETPEERNAME | CAP_GETSOCKNAME | CAP_GETSOCKOPT | \
		 CAP_PEELOFF | CAP_RECV | CAP_SEND | CAP_SETSOCKOPT | CAP_SHUTDOWN)
	#define	CAP_SOCK_SERVER \
		(CAP_ACCEPT | CAP_BIND | CAP_GETPEERNAME | CAP_GETSOCKNAME | \
		 CAP_GETSOCKOPT | CAP_LISTEN | CAP_PEELOFF | CAP_RECV | CAP_SEND | \
		 CAP_SETSOCKOPT | CAP_SHUTDOWN)

	Added defines for backward API compatibility:

	#define	CAP_MAPEXEC		CAP_MMAP_X
	#define	CAP_DELETE		CAP_UNLINKAT
	#define	CAP_MKDIR		CAP_MKDIRAT
	#define	CAP_RMDIR		CAP_UNLINKAT
	#define	CAP_MKFIFO		CAP_MKFIFOAT
	#define	CAP_MKNOD		CAP_MKNODAT
	#define	CAP_SOCK_ALL		(CAP_SOCK_CLIENT | CAP_SOCK_SERVER)

Sponsored by:	The FreeBSD Foundation
Reviewed by:	Christoph Mallon <christoph.mallon@gmx.de>
Many aspects discussed with:	rwatson, benl, jonathan
ABI compatibility discussed with:	kib
2013-03-02 00:53:12 +00:00
Pawel Jakub Dawidek
89adaea91f Remove redundant check. 2013-02-17 11:57:47 +00:00
Pawel Jakub Dawidek
9270ed9d38 Style. 2013-02-11 22:54:23 +00:00