Commit Graph

939 Commits

Author SHA1 Message Date
Mateusz Guzik
e8451da5e8 audi: replace open-coded TDP_AUDITREC checks with the macro
Sponsored by:	The FreeBSD Foundation
2018-12-11 17:14:12 +00:00
Mateusz Guzik
c0282e1e07 audit: predict AUDITING_TD as false
By default it is compiled in and disabled.

Sponsored by:	The FreeBSD Foundation
2018-11-29 09:19:48 +00:00
Mateusz Guzik
159dcc30a5 audit: change audit_syscalls_enabled type to bool
So that it fits better in __read_frequently.

Sponsored by:	The FreeBSD Foundation
2018-11-29 08:37:33 +00:00
Brooks Davis
12e69f96a2 Add const to input-only char * arguments.
These arguments are mostly paths handled by NAMEI*() macros which already
take const char * arguments.

This change improves the match between syscalls.master and the public
declerations of system calls.

Reviewed by:	kib (prior version)
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17812
2018-11-02 20:50:22 +00:00
Robert Watson
2ddefb6d5d Rework the logic around quick checks for auditing that take place at
system-call entry and whenever audit arguments or return values are
captured:

1. Expose a single global, audit_syscalls_enabled, which controls
   whether the audit framework is entered, rather than exposing
   components of the policy -- e.g., if the trail is enabled,
   suspended, etc.

2. Introduce a new function audit_syscalls_enabled_update(), which is
   called to update audit_syscalls_enabled whenever an aspect of the
   policy changes, so that the value can be updated.

3. Remove a check of trail enablement/suspension from audit_new() --
   at the point where this function has been entered, we believe that
   system-call auditing is already in force, or we wouldn't get here,
   so simply proceed to more expensive policy checks.

4. Use an audit-provided global, audit_dtrace_enabled, rather than a
   dtaudit-provided global, to provide policy indicating whether
   dtaudit would like system calls to be audited.

5. Do some minor cosmetic renaming to clarify what various variables
   are for.

These changes collectively arrange it so that traditional audit
(trail, pipes) or the DTrace audit provider can enable system-call
probes without the other configured.  Otherwise, dtaudit cannot
capture system-call data without auditd(8) started.

Reviewed by:		gnn
Sponsored by:		DARPA, AFRL
Approved by:		re (gjb)
Differential Revision:	https://reviews.freebsd.org/D17348
2018-10-02 15:58:17 +00:00
Robert Watson
deea362c80 The kernel DTrace audit provider (dtaudit) relies on auditd(8) to load
/etc/security/audit_event to provide a list of audit event-number <->
name mappings.  However, this occurs too late for anonymous tracing.
With this change, adding 'audit_event_load="YES"' to /boot/loader.conf
will cause the boot loader to preload the file, and then the kernel
audit code will parse it to register an initial set of audit event-number
<-> name mappings.  Those mappings can later be updated by auditd(8) if
the configuration file changes.

Reviewed by:	gnn, asomers, markj, allanjude
Discussed with:	jhb
Approved by:	re (kib)
MFC after:	1 week
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D16589
2018-09-03 14:26:43 +00:00
Mark Johnston
6324de037c Require that MAC label buffers be able to store a non-empty string.
The buffer size may be used to initialize an sbuf in
MAC_POLICY_EXTERNALIZE, and without this constraint it's possible to
trigger an assertion failure in the sbuf code.  With INVARIANTS
disabled, the first attempt to write to the sbuf will fail.

Reported by:	pho
Reviewed by:	delphij
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D16527
2018-08-01 03:46:07 +00:00
Andriy Gapon
dc8240f0da fix incorrect operator in the AUDITPIPE_SET_QLIMIT bounds check
PR:		229983
Submitted by:	Aniket Pandey <aniketp@iitk.ac.in>
Reported by:	Aniket Pandey <aniketp@iitk.ac.in>
MFC after:	1 week
2018-07-23 16:56:49 +00:00
Alan Somers
12395dc9f6 Fix audit of chflagsat, lgetfh, and setfib
These syscalls were always supposed to have been auditted, but due to
oversights never were.

PR:		228374
Reported by:	aniketp
Reviewed by:	aniketp
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D16388
2018-07-22 14:11:52 +00:00
Ian Lepore
3496c981ac Make it possible to run ntpd as a non-root user, add ntpd uid and gid.
Code analysis and runtime analysis using truss(8) indicate that the only
privileged operations performed by ntpd are adjusting system time, and
(re-)binding to privileged UDP port 123. These changes add a new mac(4)
policy module, mac_ntpd(4), which grants just those privileges to any
process running with uid 123.

This also adds a new user and group, ntpd:ntpd, (uid:gid 123:123), and makes
them the owner of the /var/db/ntp directory, so that it can be used as a
location where the non-privileged daemon can write files such as the
driftfile, and any optional logfile or stats files.

Because there are so many ways to configure ntpd, the question of how to
configure it to run without root privs can be a bit complex, so that will be
addressed in a separate commit. These changes are just what's required to
grant the limited subset of privs to ntpd, and the small change to ntpd to
prevent it from exiting with an error if running as non-root.

Differential Revision:	https://reviews.freebsd.org/D16281
2018-07-19 23:55:29 +00:00
Alan Somers
f7c188aeeb auditon(2): fix A_SETPOLICY with 64-bit values
A_SETPOLICY is supposed to work with either 64 or 32-bit values, but due to a
typo the 64-bit version has never worked correctly.

Submitted by:	aniketp
Reviewed by:	asomers, cem
MFC after:	2 weeks
Sponsored by:	Google, Inc. (GSoC 2018)
Differential Revision:	https://reviews.freebsd.org/D16222
2018-07-15 21:10:19 +00:00
Stephen J. Kiernan
ade9788665 Add mpo_vnode_check_setmode MAC method to MAC/veriexec.
In the method, disallow changing SUID/SGID on verified files.

Obtained from:	Juniper Networks, Inc.
2018-07-14 17:21:16 +00:00
Stephen J. Kiernan
1db017d006 Fix a typo which could cause a build breakage when building with MAC/veriexec
enabled in the kernel config.

Remove unused mac_veriexec_print_db prototype in internal header file.
2018-07-14 17:15:28 +00:00
Stephen J. Kiernan
38d5d2d53b Remove RIPEMD-160 fingerprint modules for veriexec, since it has very
little practical use and would not be recommended for anyone to use in
a production environment.

Reviewed by:	sjg
2018-07-14 16:59:17 +00:00
Stephen J. Kiernan
593d37bb8e Fix build breakage in veriexec for 32-bit architectures.
fsid_t and ino_t are 64-bit entities, use uintmax_t typecast to ensure we
can print it on 32-bit or 64-bit architectures by using the %ju format for
prints.

Obtained from:	Juniper Networks, Inc.
2018-06-20 06:54:38 +00:00
Stephen J. Kiernan
fb47a3769c MAC/veriexec implements a verified execution environment using the MAC
framework.

The code is organized into a few distinct pieces:

* The meta-data store (in veriexec_metadata.c) which maps a file system
  identifier, file identifier, and generation key tuple to veriexec
  meta-data record.

* Fingerprint management (in veriexec_fingerprint.c) which deals with
  calculating the cryptographic hash for a file and verifying it. It also
  manages the loadable fingerprint modules.

* MAC policy implementation (in mac_veriexec.c) which implements the
  following MAC methods:

mpo_init
  Initializes the veriexec state, meta-data store, fingerprint modules,
  and registers mount and unmount EVENTHANDLERs

mpo_syscall
  Implements the following per-policy system calls:
  MAC_VERIEXEC_CHECK_FD_SYSCALL
    Check a file descriptor to see if the referenced file has a valid
    fingerprint.
  MAC_VERIEXEC_CHECK_PATH_SYSCALL
    Check a path to see if the referenced file has a valid fingerprint.

mpo_kld_check_load
  Check if loading a kld is allowed. This checks if the referenced vnode
  has a valid fingerprint.

mpo_mount_destroy_label
  Clears the veriexec slot data in a mount point label.

mpo_mount_init_label
  Initializes the veriexec slot data in a mount point label.
  The file system identifier is saved in the veriexec slot data.

mpo_priv_check
  Check if a process is allowed to write to /dev/kmem and /dev/mem
  devices.
  If a process is flagged as trusted, it is allowed to write.

mpo_proc_check_debug
  Check if a process is allowed to be debugged. If a process is not
  flagged with VERIEXEC_NOTRACE, then debugging is allowed.

mpo_vnode_check_exec
  Check is an exectuable is allowed to run. If veriexec is not enforcing
  or the executable has a valid fingerprint, then it is allowed to run.
  NOTE: veriexec will complain about mismatched fingerprints if it is
  active, regardless of the state of the enforcement.

mpo_vnode_check_open
  Check is a file is allowed to be opened. If verification was not
  requested, veriexec is not enforcing, or the file has a valid
  fingerprint, then veriexec will allow the file to be opened.

mpo_vnode_copy_label
  Copies the veriexec slot data from one label to another.

mpo_vnode_destroy_label
  Clears the veriexec slot data in a vnode label.

mpo_vnode_init_label
  Initializes the veriexec slot data in a vnode label.
  The fingerprint status for the file is stored in the veriexec slot data.

* Some sysctls, under security.mac.veriexec, for setting debug level,
  fetching the current state in a human-readable form, and dumping the
  fingerprint database are implemented.

* The MAC policy implementation source file also contains some utility
  functions.

* A set of fingerprint modules for the following cryptographic hash
  algorithms:
  RIPEMD-160, SHA1, SHA2-256, SHA2-384, SHA2-512

* Loadable module builds for MAC/veriexec and fingerprint modules.

 WARNING: Using veriexec with NFS (or other network-based) file systems is
          not recommended as one cannot guarantee the integrity of the files
          served, nor the uniqueness of file system identifiers which are
          used as key in the meta-data store.

Reviewed by:	ian, jtl
Obtained from:	Juniper Networks, Inc.
Differential Revision:	https://reviews.freebsd.org/D8554
2018-06-20 00:41:30 +00:00
Alan Somers
ebc0d5599d audit(4): fix the definition of ARG_TERMID_ADDR
Due to a copy/paste error in r168688, ARG_TERMID_ADDR has the same
definition as ARG_SADDRUNIX.  Fix it.

The header change, while publicly visible, is guarded by #ifdef KERNEL, and
I can't find any kmod ports that use it.  So I'm not bumping
__FreeBSD_version.

PR:		228820
Submitted by:	aniketp
Sponsored by:	Google, Inc. (GSoC 2018)
Differential Revision:	https://reviews.freebsd.org/D15702
2018-06-13 14:55:31 +00:00
Alan Somers
1c97d64332 #include <bsm/audit.h> in security/audit/audit_ioctl.h
security/audit/audit_ioctl.h uses a type from bsm/audit.h, so needs to
include it.  And it needs to know the type's size, so it can't just
forward-declare.

PR:		228470
Submitted by:	aniketp
MFC after:	2 weeks
Sponsored by:	Google, Inc. (GSoC 2018)
Differential Revision:	https://reviews.freebsd.org/D15561
2018-05-30 21:50:23 +00:00
Alan Somers
cb3d7cd839 Fix "Bad tailq" panic when auditing auditon(A_SETCLASS, ...)
Due to an oversight in r195280, auditon(A_SETCLASS, ...) would cause a tailq
element to get added to the tailq twice, resulting in a circular tailq. This
panics when INVARIANTS are on.

MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D15381
2018-05-28 20:47:39 +00:00
Brooks Davis
541d96aaaf Use an accessor function to access ifr_data.
This fixes 32-bit compat (no ioctl command defintions are required
as struct ifreq is the same size).  This is believed to be sufficent to
fully support ifconfig on 32-bit systems.

Reviewed by:	kib
Obtained from:	CheriBSD
MFC after:	1 week
Relnotes:	yes
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14900
2018-03-30 18:50:13 +00:00
Alan Somers
b521cf275c audit(4): fix a typo in a comment
no functional change
2018-03-17 17:56:08 +00:00
Eugene Grosbein
df850530e3 mac_portacl(4): stop panicing INVARIANTS-enabled kernel by loading .ko
when kernel already has options MAC_PORTACL.

PR:		183817
Approved by:	avg (mentor)
MFC after:	1 week
2018-02-25 23:10:13 +00:00
Brooks Davis
d88fe103eb Reduce duplication in __mac_*_(file|link)(2) implementation.
Reviewed by:	rwatson
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14175
2018-02-15 18:57:22 +00:00
Alexander Kabaev
151ba7933a Do pass removing some write-only variables from the kernel.
This reduces noise when kernel is compiled by newer GCC versions,
such as one used by external toolchain ports.

Reviewed by: kib, andrew(sys/arm and sys/arm64), emaste(partial), erj(partial)
Reviewed by: jhb (sys/dev/pci/* sys/kern/vfs_aio.c and sys/kern/kern_synch.c)
Differential Revision: https://reviews.freebsd.org/D10385
2017-12-25 04:48:39 +00:00
Pedro F. Giffuni
51369649b0 sys: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:43:44 +00:00
Mateusz Guzik
574adb65c8 Sprinkle __read_frequently on few obvious places.
Note that some of annotated variables should probably change their types
to something smaller, preferably bit-sized.
2017-09-06 20:33:33 +00:00
Ed Maste
993114dd2a Correct bitwise test in mac_bsdextended ugidfw_rule_valid()
PR:		218039
CID:		1008934
Reported by:	Coverity, PVS-Studio
Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D10300
2017-06-13 01:17:58 +00:00
Konstantin Belousov
6992112349 Commit the 64-bit inode project.
Extend the ino_t, dev_t, nlink_t types to 64-bit ints.  Modify
struct dirent layout to add d_off, increase the size of d_fileno
to 64-bits, increase the size of d_namlen to 16-bits, and change
the required alignment.  Increase struct statfs f_mntfromname[] and
f_mntonname[] array length MNAMELEN to 1024.

ABI breakage is mitigated by providing compatibility using versioned
symbols, ingenious use of the existing padding in structures, and
by employing other tricks.  Unfortunately, not everything can be
fixed, especially outside the base system.  For instance, third-party
APIs which pass struct stat around are broken in backward and
forward incompatible ways.

Kinfo sysctl MIBs ABI is changed in backward-compatible way, but
there is no general mechanism to handle other sysctl MIBS which
return structures where the layout has changed. It was considered
that the breakage is either in the management interfaces, where we
usually allow ABI slip, or is not important.

Struct xvnode changed layout, no compat shims are provided.

For struct xtty, dev_t tty device member was reduced to uint32_t.
It was decided that keeping ABI compat in this case is more useful
than reporting 64-bit dev_t, for the sake of pstat.

Update note: strictly follow the instructions in UPDATING.  Build
and install the new kernel with COMPAT_FREEBSD11 option enabled,
then reboot, and only then install new world.

Credits: The 64-bit inode project, also known as ino64, started life
many years ago as a project by Gleb Kurtsou (gleb).  Kirk McKusick
(mckusick) then picked up and updated the patch, and acted as a
flag-waver.  Feedback, suggestions, and discussions were carried
by Ed Maste (emaste), John Baldwin (jhb), Jilles Tjoelker (jilles),
and Rick Macklem (rmacklem).  Kris Moore (kris) performed an initial
ports investigation followed by an exp-run by Antoine Brodin (antoine).
Essential and all-embracing testing was done by Peter Holm (pho).
The heavy lifting of coordinating all these efforts and bringing the
project to completion were done by Konstantin Belousov (kib).

Sponsored by:	The FreeBSD Foundation (emaste, kib)
Differential revision:	https://reviews.freebsd.org/D10439
2017-05-23 09:29:05 +00:00
Robert Watson
709557d903 Break audit_bsm_klib.c into two files: one (audit_bsm_klib.c)
retaining various utility functions used during BSM generation,
and a second (audit_bsm_db.c) that contains the various in-kernel
databases supporting various audit activities (the class and
event-name tables).

(No functional change is intended.)

Obtained from:	TrustedBSD Project
MFC after:	3 weeks
Sponsored by:	DARPA, AFRL
2017-04-03 10:15:58 +00:00
Robert Watson
475e1fc01f Correct macro names and signatures for !AUDIT versions of canonical
path auditing.

Obtained from:	TrustedBSD Project
MFC after:	3 weeks
Sponsored by:	DARPA, AFRL
2017-03-31 14:13:13 +00:00
Robert Watson
15bcf785ba Audit arguments to POSIX message queues, semaphores, and shared memory.
This requires minor changes to the audit framework to allow capturing
paths that are not filesystem paths (i.e., will not be canonicalised
relative to the process current working directory and/or filesystem
root).

Obtained from:	TrustedBSD Project
MFC after:	3 weeks
Sponsored by:	DARPA, AFRL
2017-03-31 13:43:00 +00:00
Robert Watson
1c2da02938 Audit arguments to System V IPC system calls implementing sempahores,
message queues, and shared memory.

Obtained from:	TrustedBSD Project
MFC after:	3 weeks
Sponsored by:	DARPA, AFRL
2017-03-30 22:26:15 +00:00
Robert Watson
b65ec5e523 Various BSM generation improvements when auditing AUE_ACCEPT,
AUE_PROCCTL, AUE_SENDFILE, AUE_ACL_*, and AUE_POSIX_FALLOCATE.
Audit AUE_SHMUNLINK path in the path token rather than as a
text string, and AUE_SHMOPEN flags as an integer token rather
than a System V IPC address token.

Obtained from:	TrustedBSD Project
MFC after:	3 weeks
Sponsored by:	DARPA, AFRL
2017-03-30 21:39:03 +00:00
Robert Watson
7cad64f796 Don't ifdef KDTRACE_HOOKS struct, variable, and function prototype
definitions for the DTrace audit provider, so that the dtaudit module
can compile in the absence of kernel DTrace support.  This doesn't
really make run-time sense (since the binary dependencies for the
module won't be present), but it allows the dtaudit module to compile
successfully regardless of the kernel configuration.

MFC after:	3 weeks
Sponsored by:	DARPA, AFRL
Reported by:	kib
2017-03-30 12:35:56 +00:00
Robert Watson
b783025921 When handling msgsys(2), semsys(2), and shmsys(2) multiplex system calls,
map the 'which' argument into a suitable audit event identifier for the
specific operation requested.

Obtained from:	TrustedBSD Project
MFC after:	3 weeks
Sponsored by:	DARPA, AFRL
2017-03-29 23:31:35 +00:00
Robert Watson
1811d6bf7f Add an experimental DTrace audit provider, which allows users of DTrace to
instrument security event auditing rather than relying on conventional BSM
trail files or audit pipes:

- Add a set of per-event 'commit' probes, which provide access to
  particular auditable events at the time of commit in system-call return.
  These probes gain access to audit data via the in-kernel audit_record
  data structure, providing convenient access to system-call arguments and
  return values in a single probe.

- Add a set of per-event 'bsm' probes, which provide access to particular
  auditable events at the time of BSM record generation in the audit
  worker thread. These probes have access to the in-kernel audit_record
  data structure and BSM representation as would be written to a trail
  file or audit pipe -- i.e., asynchronously in the audit worker thread.

DTrace probe arguments consist of the name of the audit event (to support
future mechanisms of instrumenting multiple events via a single probe --
e.g., using classes), a pointer to the in-kernel audit record, and an
optional pointer to the BSM data and its length. For human convenience,
upper-case audit event names (AUE_...) are converted to lower case in
DTrace.

DTrace scripts can now cause additional audit-based data to be collected
on system calls, and inspect internal and BSM representations of the data.
They do not affect data captured in the audit trail or audit pipes
configured in the system. auditd(8) must be configured and running in
order to provide a database of event information, as well as other audit
configuration parameters (e.g., to capture command-line arguments or
environmental variables) for the provider to operate.

Reviewed by:	gnn, jonathan, markj
Sponsored by:	DARPA, AFRL
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D10149
2017-03-29 19:58:00 +00:00
Robert Watson
759c8caa5a Introduce an audit event identifier -> audit event name mapping
database in the kernel audit implementation, similar the exist
class mapping database.  This will be used by the DTrace audit
provider to map audit event identifiers originating in the
system-call table back into strings for the purposes of setting
probe names.  The database is initialised and maintained by
auditd(8), which reads values in from the audit_events
configuration file, and then manages them using the A_GETEVENT
and A_SETEVENT auditon(2) operations.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, AFRL
MFC after:	3 weeks
2017-03-27 10:38:53 +00:00
Robert Watson
d422682f67 Extend comment describing path canonicalisation in audit.
Sponsored by:	DARPA, AFRL
Obtained from:	TrustedBSD Project
MFC after:	3 days
2017-03-27 08:29:17 +00:00
Robert Watson
1279fdafce Audit 'fd' and 'cmd' arguments to fcntl(2), and when generating BSM,
always audit the file-descriptor number and vnode information for all
fnctl(2) commands, not just locking-related ones.  This was likely an
oversight in the original adaptation of this code from XNU.

MFC after:	3 days
Sponsored by:	DARPA, AFRL
2016-11-22 00:41:24 +00:00
Bryan Drewery
28323add09 Fix improper use of "its".
Sponsored by:	Dell EMC Isilon
2016-11-08 23:59:41 +00:00
John Baldwin
b3db2736b1 Don't check aq64_minfree which is unsigned for negative values.
This fixes a tautological comparison warning.

Reviewed by:	rwatson
Differential Revision:	https://reviews.freebsd.org/D7682
2016-09-08 19:47:57 +00:00
Robert Watson
70a98c110e Audit the accepted (or rejected) username argument to setlogin(2).
(NB: This was likely a mismerge from XNU in audit support, where the
text argument to setlogin(2) is captured -- but as a text token,
whereas this change uses the dedicated login-name field in struct
audit_record.)

MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
2016-08-20 20:28:08 +00:00
Robert Watson
98daa3e5db Add AUE_WAIT6 handling to the BSM conversion switch statement, reusing
the BSM encoding used for AUE_WAIT4.

MFC after:	3 days
Sponsored by:	DARPA, AFRL
2016-07-11 13:06:17 +00:00
Robert Watson
2aa8c03917 Implement AUE_PREAD and AUE_PWRITE BSM conversion support, eliminating
console warnings when pread(2) and pwrite(2) are used with full
system-call auditing enabled.  We audit the same file-descriptor data
for these calls as we do read(2) and write(2).

Approved by:	re (kib)
MFC after:	3 days
Sponsored by:	DARPA, AFRL
2016-06-13 09:22:20 +00:00
Pedro F. Giffuni
bc5ade0d10 sys/security: minor spelling fixes.
No functional change.
2016-05-06 16:59:04 +00:00
Pedro F. Giffuni
323b076e9c sys: use our nitems() macro when param.h is available.
This should cover all the remaining cases in the kernel.

Discussed in:	freebsd-current
2016-04-21 19:40:10 +00:00
Pedro F. Giffuni
8dfea46460 Remove slightly used const values that can be replaced with nitems().
Suggested by:	jhb
2016-04-21 15:38:28 +00:00
Pedro F. Giffuni
2e4eeb703a audit(8): leave unsigned comparison for last.
aq64_minfree is unsigned so comparing to find out if it is less
than zero is a nonsense. Move the comparison to the last position
as we don't want to spend time if any of the others triggers first.

hile it would be tempting to just remove it, it may be important to
keep  it for portability with platforms where may be signed(?) or
in case we may want to change it in the future.
2016-04-08 03:26:21 +00:00
Konstantin Belousov
27725229dc Busy the mount point which is the owner of the audit vnode, around
audit_record_write().  This is important so that VFS_STATFS() is not
done on the NULL or freed mp and the check for free space is
consistent with the vnode used for write.

Add vn_start_write() braces around VOP_FSYNC() calls on the audit vnode.

Move repeated code to fsync vnode and panic to the helper
audit_worker_sync_vp().

Reviewed by:	rwatson
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-01-16 10:06:33 +00:00
Konstantin Belousov
c77f6350ee Move the funsetown(9) call from audit_pipe_close() to cdevpriv
destructor.  As result, close method becomes trivial and removed.
Final cdevsw close method might be called without file
context (e.g. in vn_open_vnode() if the vnode is reclaimed meantime),
which leaves ap_sigio registered for notification, despite cdevpriv
destructor frees the memory later.

Call destructor instead of doing a cleanup inline, for
devfs_set_cdevpriv() failure in open.  This adds missed funsetown(9)
call and locks ap to satisfy audit_pipe_free() invariants.

Reported and tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-01-13 14:02:07 +00:00
Christian Brueffer
8a0f5c0b6c Merge from contrib/openbsm to bring the kernel audit bits up to date with OpenBSM 1.2 alpha 4:
- remove $P4$
- fix a comment
2015-12-20 23:22:04 +00:00
Mark Johnston
3616095801 Fix style issues around existing SDT probes.
- Use SDT_PROBE<N>() instead of SDT_PROBE(). This has no functional effect
  at the moment, but will be needed for some future changes.
- Don't hardcode the module component of the probe identifier. This is
  set automatically by the SDT framework.

MFC after:	1 week
2015-12-16 23:39:27 +00:00
Mateusz Guzik
f131759f54 fd: make 'rights' a manadatory argument to fget* functions 2015-07-05 19:05:16 +00:00
Mateusz Guzik
4da8456f0a Replace struct filedesc argument in getvnode with struct thread
This is is a step towards removal of spurious arguments.
2015-06-16 13:09:18 +00:00
Mateusz Guzik
9ef8328d52 fd: make rights a mandatory argument to fget_unlocked 2015-06-16 09:52:36 +00:00
Mateusz Guzik
daf63fd2f9 cred: add proc_set_cred helper
The goal here is to provide one place altering process credentials.

This eases debugging and opens up posibilities to do additional work when such
an action is performed.
2015-03-16 00:10:03 +00:00
Mateusz Guzik
eb0b6ba016 audit: fix cred assignment when A_SETPMASK is used
The code used to modify curproc instead of the target process.

Discussed with: rwatson
MFC after:	3 days
2015-03-15 21:43:43 +00:00
Gleb Kurtsou
dde58752db Adjust printf format specifiers for dev_t and ino_t in kernel.
ino_t and dev_t are about to become uint64_t.

Reviewed by:	kib, mckusick
2014-12-17 07:27:19 +00:00
Davide Italiano
b40ea294f9 Replace dev_clone with cdevpriv(9) KPI in audit_pipe code.
This is (yet another) step towards the removal of device
cloning from our kernel.

CR:	https://reviews.freebsd.org/D441
Reviewed by:	kib, rwatson
Tested by:	pho
2014-08-20 16:04:30 +00:00
Hans Petter Selasky
af3b2549c4 Pull in r267961 and r267973 again. Fix for issues reported will follow. 2014-06-28 03:56:17 +00:00
Glen Barber
37a107a407 Revert r267961, r267973:
These changes prevent sysctl(8) from returning proper output,
such as:

 1) no output from sysctl(8)
 2) erroneously returning ENOMEM with tools like truss(1)
    or uname(1)
 truss: can not get etype: Cannot allocate memory
2014-06-27 22:05:21 +00:00
Hans Petter Selasky
3da1cf1e88 Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after:	2 weeks
Sponsored by:	Mellanox Technologies
2014-06-27 16:33:43 +00:00
Mateusz Guzik
00b85f6218 audit: plug FILEDESC_LOCK leak in audit_canon_path.
MFC after:	3 days
2014-03-21 01:30:33 +00:00
Robert Watson
4a14441044 Update kernel inclusions of capability.h to use capsicum.h instead; some
further refinement is required as some device drivers intended to be
portable over FreeBSD versions rely on __FreeBSD_version to decide whether
to include capability.h.

MFC after:	3 weeks
2014-03-16 10:55:57 +00:00
Gleb Smirnoff
45c203fce2 Remove AppleTalk support.
AppleTalk was a network transport protocol for Apple Macintosh devices
in 80s and then 90s. Starting with Mac OS X in 2000 the AppleTalk was
a legacy protocol and primary networking protocol is TCP/IP. The last
Mac OS X release to support AppleTalk happened in 2009. The same year
routing equipment vendors (namely Cisco) end their support.

Thus, AppleTalk won't be supported in FreeBSD 11.0-RELEASE.
2014-03-14 06:29:43 +00:00
Gleb Smirnoff
2c284d9395 Remove IPX support.
IPX was a network transport protocol in Novell's NetWare network operating
system from late 80s and then 90s. The NetWare itself switched to TCP/IP
as default transport in 1998. Later, in this century the Novell Open
Enterprise Server became successor of Novell NetWare. The last release
that claimed to still support IPX was OES 2 in 2007. Routing equipment
vendors (e.g. Cisco) discontinued support for IPX in 2011.

Thus, IPX won't be supported in FreeBSD 11.0-RELEASE.
2014-03-14 02:58:48 +00:00
Bjoern A. Zeeb
523b2279b8 As constantly reported during kernel compilation, m_buflen is unsigned so
can never be < 0.  Remove the expression, which can never be true.

MFC after:	1 week
2013-12-25 20:10:17 +00:00
John Baldwin
44ddb776c6 There is no sysctl with the MIB { CTL_KERN, KERN_MAXID }.
MFC after:	2 weeks
2013-12-05 21:55:10 +00:00
Andriy Gapon
d9fae5ab88 dtrace sdt: remove the ugly sname parameter of SDT_PROBE_DEFINE
In its stead use the Solaris / illumos approach of emulating '-' (dash)
in probe names with '__' (two consecutive underscores).

Reviewed by:	markj
MFC after:	3 weeks
2013-11-26 08:46:27 +00:00
Attilio Rao
54366c0bd7 - For kernel compiled only with KDTRACE_HOOKS and not any lock debugging
option, unbreak the lock tracing release semantic by embedding
  calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined
  version of the releasing functions for mutex, rwlock and sxlock.
  Failing to do so skips the lockstat_probe_func invokation for
  unlocking.
- As part of the LOCKSTAT support is inlined in mutex operation, for
  kernel compiled without lock debugging options, potentially every
  consumer must be compiled including opt_kdtrace.h.
  Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the
  dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES
  is linked there and it is only used as a compile-time stub [0].

[0] immediately shows some new bug as DTRACE-derived support for debug
in sfxge is broken and it was never really tested.  As it was not
including correctly opt_kdtrace.h before it was never enabled so it
was kept broken for a while.  Fix this by using a protection stub,
leaving sfxge driver authors the responsibility for fixing it
appropriately [1].

Sponsored by:	EMC / Isilon storage division
Discussed with:	rstone
[0] Reported by:	rstone
[1] Discussed with:	philip
2013-11-25 07:38:45 +00:00
Gleb Smirnoff
76039bc84f The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-26 17:58:36 +00:00
Mark Johnston
92c6196caa Fix some typos that were causing probe argument types to show up as unknown.
Reviewed by:	rwatson (mac provider)
Approved by:	re (glebius)
MFC after:	1 week
2013-10-01 15:40:27 +00:00
Konstantin Belousov
f7fadf1f22 Make the mac_policy_rm lock recursable, which allows reentrance into
the mac framework.  It is needed when priv_check_cred(9) is called from
the mac callback, e.g. in the mac_portacl(4).

Reported by:	az
Reviewed by:	rwatson
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Approved by:	re (gjb)
2013-09-29 20:21:34 +00:00
Davide Italiano
d56b4cd4ac - Use make_dev_credf(MAKEDEV_REF) instead of the race-prone make_dev()+
dev_ref() in the clone handlers that still use it.
- Don't set SI_CHEAPCLONE flag, it's not used anywhere neither in devfs
(for anything real)

Reviewed by:	kib
2013-09-07 13:45:44 +00:00
Pawel Jakub Dawidek
ab568de789 Handle cases where capability rights are not provided.
Reported by:	kib
2013-09-05 11:58:12 +00:00
Pawel Jakub Dawidek
7008be5bd7 Change the cap_rights_t type from uint64_t to a structure that we can extend
in the future in a backward compatible (API and ABI) way.

The cap_rights_t represents capability rights. We used to use one bit to
represent one right, but we are running out of spare bits. Currently the new
structure provides place for 114 rights (so 50 more than the previous
cap_rights_t), but it is possible to grow the structure to hold at least 285
rights, although we can make it even larger if 285 rights won't be enough.

The structure definition looks like this:

	struct cap_rights {
		uint64_t	cr_rights[CAP_RIGHTS_VERSION + 2];
	};

The initial CAP_RIGHTS_VERSION is 0.

The top two bits in the first element of the cr_rights[] array contain total
number of elements in the array - 2. This means if those two bits are equal to
0, we have 2 array elements.

The top two bits in all remaining array elements should be 0.
The next five bits in all array elements contain array index. Only one bit is
used and bit position in this five-bits range defines array index. This means
there can be at most five array elements in the future.

To define new right the CAPRIGHT() macro must be used. The macro takes two
arguments - an array index and a bit to set, eg.

	#define	CAP_PDKILL	CAPRIGHT(1, 0x0000000000000800ULL)

We still support aliases that combine few rights, but the rights have to belong
to the same array element, eg:

	#define	CAP_LOOKUP	CAPRIGHT(0, 0x0000000000000400ULL)
	#define	CAP_FCHMOD	CAPRIGHT(0, 0x0000000000002000ULL)

	#define	CAP_FCHMODAT	(CAP_FCHMOD | CAP_LOOKUP)

There is new API to manage the new cap_rights_t structure:

	cap_rights_t *cap_rights_init(cap_rights_t *rights, ...);
	void cap_rights_set(cap_rights_t *rights, ...);
	void cap_rights_clear(cap_rights_t *rights, ...);
	bool cap_rights_is_set(const cap_rights_t *rights, ...);

	bool cap_rights_is_valid(const cap_rights_t *rights);
	void cap_rights_merge(cap_rights_t *dst, const cap_rights_t *src);
	void cap_rights_remove(cap_rights_t *dst, const cap_rights_t *src);
	bool cap_rights_contains(const cap_rights_t *big, const cap_rights_t *little);

Capability rights to the cap_rights_init(), cap_rights_set(),
cap_rights_clear() and cap_rights_is_set() functions are provided by
separating them with commas, eg:

	cap_rights_t rights;

	cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT);

There is no need to terminate the list of rights, as those functions are
actually macros that take care of the termination, eg:

	#define	cap_rights_set(rights, ...)				\
		__cap_rights_set((rights), __VA_ARGS__, 0ULL)
	void __cap_rights_set(cap_rights_t *rights, ...);

Thanks to using one bit as an array index we can assert in those functions that
there are no two rights belonging to different array elements provided
together. For example this is illegal and will be detected, because CAP_LOOKUP
belongs to element 0 and CAP_PDKILL to element 1:

	cap_rights_init(&rights, CAP_LOOKUP | CAP_PDKILL);

Providing several rights that belongs to the same array's element this way is
correct, but is not advised. It should only be used for aliases definition.

This commit also breaks compatibility with some existing Capsicum system calls,
but I see no other way to do that. This should be fine as Capsicum is still
experimental and this change is not going to 9.x.

Sponsored by:	The FreeBSD Foundation
2013-09-05 00:09:56 +00:00
Konstantin Belousov
940cb0e2bb Implement read(2)/write(2) and neccessary lseek(2) for posix shmfd.
Add MAC framework entries for posix shm read and write.

Do not allow implicit extension of the underlying memory segment past
the limit set by ftruncate(2) by either of the syscalls.  Read and
write returns short i/o, lseek(2) fails with EINVAL when resulting
offset does not fit into the limit.

Discussed with:	alc
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2013-08-21 17:45:00 +00:00
Andriy Gapon
a01669ea96 audit_proc_coredump: check return value of audit_new
audit_new may return NULL if audit is disabled or suspended.

Sponsored by:	HybridCluster
MFC after:	7 days
2013-07-09 09:03:01 +00:00
Alan Cox
a42159f0ee Relax the vm object locking in mac_proc_vm_revoke_recurse(). A read lock
suffices in one place.

Sponsored by:	EMC / Isilon Storage Division
2013-06-04 17:23:09 +00:00
Attilio Rao
89f6b8632c Switch the vm_object mutex to be a rwlock. This will enable in the
future further optimizations where the vm_object lock will be held
in read mode most of the time the page cache resident pool of pages
are accessed for reading purposes.

The change is mostly mechanical but few notes are reported:
* The KPI changes as follow:
  - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK()
  - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK()
  - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK()
  - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED()
    (in order to avoid visibility of implementation details)
  - The read-mode operations are added:
    VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(),
    VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED()
* The vm/vm_pager.h namespace pollution avoidance (forcing requiring
  sys/mutex.h in consumers directly to cater its inlining functions
  using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h
  consumers now must include also sys/rwlock.h.
* zfs requires a quite convoluted fix to include FreeBSD rwlocks into
  the compat layer because the name clash between FreeBSD and solaris
  versions must be avoided.
  At this purpose zfs redefines the vm_object locking functions
  directly, isolating the FreeBSD components in specific compat stubs.

The KPI results heavilly broken by this commit.  Thirdy part ports must
be updated accordingly (I can think off-hand of VirtualBox, for example).

Sponsored by:	EMC / Isilon storage division
Reviewed by:	jeff
Reviewed by:	pjd (ZFS specific review)
Discussed with:	alc
Tested by:	pho
2013-03-09 02:32:23 +00:00
Pawel Jakub Dawidek
7493f24ee6 - Implement two new system calls:
int bindat(int fd, int s, const struct sockaddr *addr, socklen_t addrlen);
	int connectat(int fd, int s, const struct sockaddr *name, socklen_t namelen);

  which allow to bind and connect respectively to a UNIX domain socket with a
  path relative to the directory associated with the given file descriptor 'fd'.

- Add manual pages for the new syscalls.

- Make the new syscalls available for processes in capability mode sandbox.

- Add capability rights CAP_BINDAT and CAP_CONNECTAT that has to be present on
  the directory descriptor for the syscalls to work.

- Update audit(4) to support those two new syscalls and to handle path
  in sockaddr_un structure relative to the given directory descriptor.

- Update procstat(1) to recognize the new capability rights.

- Document the new capability rights in cap_rights_limit(2).

Sponsored by:	The FreeBSD Foundation
Discussed with:	rwatson, jilles, kib, des
2013-03-02 21:11:30 +00:00
Pawel Jakub Dawidek
2609222ab4 Merge Capsicum overhaul:
- Capability is no longer separate descriptor type. Now every descriptor
  has set of its own capability rights.

- The cap_new(2) system call is left, but it is no longer documented and
  should not be used in new code.

- The new syscall cap_rights_limit(2) should be used instead of
  cap_new(2), which limits capability rights of the given descriptor
  without creating a new one.

- The cap_getrights(2) syscall is renamed to cap_rights_get(2).

- If CAP_IOCTL capability right is present we can further reduce allowed
  ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed
  ioctls can be retrived with cap_ioctls_get(2) syscall.

- If CAP_FCNTL capability right is present we can further reduce fcntls
  that can be used with the new cap_fcntls_limit(2) syscall and retrive
  them with cap_fcntls_get(2).

- To support ioctl and fcntl white-listing the filedesc structure was
  heavly modified.

- The audit subsystem, kdump and procstat tools were updated to
  recognize new syscalls.

- Capability rights were revised and eventhough I tried hard to provide
  backward API and ABI compatibility there are some incompatible changes
  that are described in detail below:

	CAP_CREATE old behaviour:
	- Allow for openat(2)+O_CREAT.
	- Allow for linkat(2).
	- Allow for symlinkat(2).
	CAP_CREATE new behaviour:
	- Allow for openat(2)+O_CREAT.

	Added CAP_LINKAT:
	- Allow for linkat(2). ABI: Reuses CAP_RMDIR bit.
	- Allow to be target for renameat(2).

	Added CAP_SYMLINKAT:
	- Allow for symlinkat(2).

	Removed CAP_DELETE. Old behaviour:
	- Allow for unlinkat(2) when removing non-directory object.
	- Allow to be source for renameat(2).

	Removed CAP_RMDIR. Old behaviour:
	- Allow for unlinkat(2) when removing directory.

	Added CAP_RENAMEAT:
	- Required for source directory for the renameat(2) syscall.

	Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR):
	- Allow for unlinkat(2) on any object.
	- Required if target of renameat(2) exists and will be removed by this
	  call.

	Removed CAP_MAPEXEC.

	CAP_MMAP old behaviour:
	- Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and
	  PROT_WRITE.
	CAP_MMAP new behaviour:
	- Allow for mmap(2)+PROT_NONE.

	Added CAP_MMAP_R:
	- Allow for mmap(PROT_READ).
	Added CAP_MMAP_W:
	- Allow for mmap(PROT_WRITE).
	Added CAP_MMAP_X:
	- Allow for mmap(PROT_EXEC).
	Added CAP_MMAP_RW:
	- Allow for mmap(PROT_READ | PROT_WRITE).
	Added CAP_MMAP_RX:
	- Allow for mmap(PROT_READ | PROT_EXEC).
	Added CAP_MMAP_WX:
	- Allow for mmap(PROT_WRITE | PROT_EXEC).
	Added CAP_MMAP_RWX:
	- Allow for mmap(PROT_READ | PROT_WRITE | PROT_EXEC).

	Renamed CAP_MKDIR to CAP_MKDIRAT.
	Renamed CAP_MKFIFO to CAP_MKFIFOAT.
	Renamed CAP_MKNODE to CAP_MKNODEAT.

	CAP_READ old behaviour:
	- Allow pread(2).
	- Disallow read(2), readv(2) (if there is no CAP_SEEK).
	CAP_READ new behaviour:
	- Allow read(2), readv(2).
	- Disallow pread(2) (CAP_SEEK was also required).

	CAP_WRITE old behaviour:
	- Allow pwrite(2).
	- Disallow write(2), writev(2) (if there is no CAP_SEEK).
	CAP_WRITE new behaviour:
	- Allow write(2), writev(2).
	- Disallow pwrite(2) (CAP_SEEK was also required).

	Added convinient defines:

	#define	CAP_PREAD		(CAP_SEEK | CAP_READ)
	#define	CAP_PWRITE		(CAP_SEEK | CAP_WRITE)
	#define	CAP_MMAP_R		(CAP_MMAP | CAP_SEEK | CAP_READ)
	#define	CAP_MMAP_W		(CAP_MMAP | CAP_SEEK | CAP_WRITE)
	#define	CAP_MMAP_X		(CAP_MMAP | CAP_SEEK | 0x0000000000000008ULL)
	#define	CAP_MMAP_RW		(CAP_MMAP_R | CAP_MMAP_W)
	#define	CAP_MMAP_RX		(CAP_MMAP_R | CAP_MMAP_X)
	#define	CAP_MMAP_WX		(CAP_MMAP_W | CAP_MMAP_X)
	#define	CAP_MMAP_RWX		(CAP_MMAP_R | CAP_MMAP_W | CAP_MMAP_X)
	#define	CAP_RECV		CAP_READ
	#define	CAP_SEND		CAP_WRITE

	#define	CAP_SOCK_CLIENT \
		(CAP_CONNECT | CAP_GETPEERNAME | CAP_GETSOCKNAME | CAP_GETSOCKOPT | \
		 CAP_PEELOFF | CAP_RECV | CAP_SEND | CAP_SETSOCKOPT | CAP_SHUTDOWN)
	#define	CAP_SOCK_SERVER \
		(CAP_ACCEPT | CAP_BIND | CAP_GETPEERNAME | CAP_GETSOCKNAME | \
		 CAP_GETSOCKOPT | CAP_LISTEN | CAP_PEELOFF | CAP_RECV | CAP_SEND | \
		 CAP_SETSOCKOPT | CAP_SHUTDOWN)

	Added defines for backward API compatibility:

	#define	CAP_MAPEXEC		CAP_MMAP_X
	#define	CAP_DELETE		CAP_UNLINKAT
	#define	CAP_MKDIR		CAP_MKDIRAT
	#define	CAP_RMDIR		CAP_UNLINKAT
	#define	CAP_MKFIFO		CAP_MKFIFOAT
	#define	CAP_MKNOD		CAP_MKNODAT
	#define	CAP_SOCK_ALL		(CAP_SOCK_CLIENT | CAP_SOCK_SERVER)

Sponsored by:	The FreeBSD Foundation
Reviewed by:	Christoph Mallon <christoph.mallon@gmx.de>
Many aspects discussed with:	rwatson, benl, jonathan
ABI compatibility discussed with:	kib
2013-03-02 00:53:12 +00:00
Pawel Jakub Dawidek
89adaea91f Remove redundant check. 2013-02-17 11:57:47 +00:00
Pawel Jakub Dawidek
9270ed9d38 Style. 2013-02-11 22:54:23 +00:00
Pawel Jakub Dawidek
222069f454 Add AUDIT_ARG_SOCKADDR() macro so we can start using the audit_arg_sockaddr()
function, which is currently unused.

Sponsored by:	The FreeBSD Foundation
2013-02-07 00:24:23 +00:00
Christian S.J. Peron
14bc51359c Implement the zonename token for jailed processes. If
a process has an auditid/preselection masks specified, and
is jailed, include the zonename (jailname) token as a
part of the audit record.

Reviewed by:	pjd
MFC after:	2 weeks
2013-01-17 21:02:53 +00:00
Robert Watson
6f1cbda73d Four .c files from OpenBSM are used, in modified form, by the kernel to
implement the BSM audit trail format.  Rename the kernel versions of the
files to match the userspace filenames so that it's easier to work out
what they correspond to, and therefore ensure they are kept in-sync.

Obtained from:	TrustedBSD Project
2012-12-15 15:21:09 +00:00
Robert Watson
d0c2e5bd23 Merge OpenBSM 1.2-alpha2 changes from contrib/openbsm to
src/sys/{bsm,security/audit}.  There are a few tweaks to help with the
FreeBSD build environment that will be merged back to OpenBSM.  No
significant functional changes appear on the kernel side.

Obtained from:	TrustedBSD Project
Sponsored by:	The FreeBSD Foundation (auditdistd)
2012-12-01 13:46:37 +00:00
Pawel Jakub Dawidek
ceaea52f0c IFp4 @219811:
VFS is now fully MPSAFE, fix compilation.
2012-12-01 08:51:40 +00:00
Pawel Jakub Dawidek
80a044ea46 IFp4 @208452:
Audit handling for missing events:
- AUE_READLINKAT
- AUE_FACCESSAT
- AUE_MKDIRAT
- AUE_MKFIFOAT
- AUE_MKNODAT
- AUE_SYMLINKAT

Sponsored by:	FreeBSD Foundation (auditdistd)
MFC after:	2 weeks
2012-11-30 23:21:55 +00:00
Pawel Jakub Dawidek
499f0f4d55 IFp4 @208451:
Fix path handling for *at() syscalls.

Before the change directory descriptor was totally ignored,
so the relative path argument was appended to current working
directory path and not to the path provided by descriptor, thus
wrong paths were stored in audit logs.

Now that we use directory descriptor in vfs_lookup, move
AUDIT_ARG_UPATH1() and AUDIT_ARG_UPATH2() calls to the place where
we hold file descriptors table lock, so we are sure paths will
be resolved according to the same directory in audit record and
in actual operation.

Sponsored by:	FreeBSD Foundation (auditdistd)
Reviewed by:	rwatson
MFC after:	2 weeks
2012-11-30 23:18:49 +00:00
Pawel Jakub Dawidek
1d8cd15cf8 IFp4 @208383:
Currently when we discover that trail file is greater than configured
limit we send AUDIT_TRIGGER_ROTATE_KERNEL trigger to the auditd daemon
once. If for some reason auditd didn't rotate trail file it will never
be rotated.

Change it by sending the trigger when trail file size grows by the
configured limit. For example if the limit is 1MB, we will send trigger
on 1MB, 2MB, 3MB, etc.

This is also needed for the auditd change that will be committed soon
where auditd may ignore the trigger - it might be ignored if kernel
requests the trail file to be rotated too quickly (often than once a second)
which would result in overwriting previous trail file.

Sponsored by:	FreeBSD Foundation (auditdistd)
MFC after:	2 weeks
2012-11-30 23:03:51 +00:00
Pawel Jakub Dawidek
6293140411 IFp4 @208382:
Currently on each record write we call VFS_STATFS() to get available space
on the file system as well as VOP_GETATTR() to get trail file size.

We can assume that trail file is only updated by the audit worker, so instead
of asking for file size on every write, get file size on trail switch only
(it should be zero, but it's not expensive) and use global variable audit_size
protected by the audit worker lock to keep track of trail file's size.

This eliminates VOP_GETATTR() call for every write. VFS_STATFS() is satisfied
from in-memory data (mount->mnt_stat), so shouldn't be expensive.

Sponsored by:	FreeBSD Foundation (auditdistd)
MFC after:	2 weeks
2012-11-30 22:59:20 +00:00
Pawel Jakub Dawidek
9658c0582e IFp4 @208381:
For VOP_GETATTR() we just need vnode to be shared-locked.

Sponsored by:	FreeBSD Foundation (auditdistd)
MFC after:	2 weeks
2012-11-30 22:52:35 +00:00
Konstantin Belousov
5050aa86cf Remove the support for using non-mpsafe filesystem modules.
In particular, do not lock Giant conditionally when calling into the
filesystem module, remove the VFS_LOCK_GIANT() and related
macros. Stop handling buffers belonging to non-mpsafe filesystems.

The VFS_VERSION is bumped to indicate the interface change which does
not result in the interface signatures changes.

Conducted and reviewed by:	attilio
Tested by:	pho
2012-10-22 17:50:54 +00:00
Christian Brueffer
e623c22083 Check vplabel for NULL before dereferencing it. Fixes a panic
when running atop with MAC_MLS enabled.

Submitted by:	Richard Kojedzinszky <krichy@tvnetwork.hu>
Reviewed by:	rwatson
MFC after:	1 week
2012-05-03 15:51:34 +00:00
Robert Watson
b4ef8be228 When allocation of labels on files is implicitly disabled due to MAC
policy configuration, avoid leaking resources following failed calls
to get and set MAC labels by file descriptor.

Reported by:	Mateusz Guzik <mjguzik at gmail.com> + clang scan-build
MFC after:	3 days
2012-04-08 11:01:49 +00:00
Alexander V. Chernikov
e4b3229aa5 - Improve BPF locking model.
Interface locks and descriptor locks are converted from mutex(9) to rwlock(9).
This greately improves performance: in most common case we need to acquire 1
reader lock instead of 2 mutexes.

- Remove filter(descriptor) (reader) lock in bpf_mtap[2]
This was suggested by glebius@. We protect filter by requesting interface
writer lock on filter change.

- Cover struct bpf_if under BPF_INTERNAL define. This permits including bpf.h
without including rwlock stuff. However, this is is temporary solution,
struct bpf_if should be made opaque for any external caller.

Found by:       Dmitrij Tejblum <tejblum@yandex-team.ru>
Sponsored by:   Yandex LLC

Reviewed by:    glebius (previous version)
Reviewed by:    silence on -net@
Approved by:    (mentor)

MFC after:      3 weeks
2012-04-06 06:53:58 +00:00
Ed Schouten
7870adb640 Remove direct access to si_name.
Code should just use the devtoname() function to obtain the name of a
character device. Also add const keywords to pieces of code that need it
to build properly.

MFC after:	2 weeks
2012-02-10 12:35:57 +00:00
Ed Schouten
dc15eac046 Use strchr() and strrchr().
It seems strchr() and strrchr() are used more often than index() and
rindex(). Therefore, simply migrate all kernel code to use it.

For the XFS code, remove an empty line to make the code identical to
the code in the Linux kernel.
2012-01-02 12:12:10 +00:00
Attilio Rao
77befd1d23 Revert the approach for skipping lockstat_probe_func call when doing
lock_success/lock_failure, introduced in r228424, by directly skipping
in dtrace_probe.

This mainly helps in avoiding namespace pollution and thus lockstat.h
dependency by systm.h.

As an added bonus, this also helps in MFC case.
Reviewed by:	avg
MFC after:	3 months (or never)
X-MFC:		r228424
2011-12-12 23:29:32 +00:00
Andriy Gapon
7a7ce668ef put sys/systm.h at its proper place or add it if missing
Reported by:	lstewart, tinderbox
Pointyhat to:	avg, attilio
MFC after:	1 week
MFC with:	r228430
2011-12-12 10:05:13 +00:00
Ed Schouten
6472ac3d8a Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.
The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.
2011-11-07 15:43:11 +00:00
Ed Schouten
d745c852be Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs.
This means that their use is restricted to a single C file.
2011-11-07 06:44:47 +00:00
Ed Schouten
a185bd12f3 Get rid of D_PSEUDO.
It seems the D_PSEUDO flag was meant to allow make_dev() to return NULL.
Nowadays we have a different interface for that; make_dev_p(). There's
no need to keep it there.

While there, remove an unneeded D_NEEDMINOR from the gpio driver.

Discussed with:	gonzo@ (gpio)
2011-10-18 08:09:44 +00:00
Christian Brueffer
709ef347b7 Remove two dublicated assignments.
CID:		9870
Found with:	Coverity Prevent(tm)
Confirmed by:	rwatson
MFC after:	1 week
2011-10-08 09:14:18 +00:00
Kip Macy
8451d0dd78 In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by:	rwatson
Approved by:	re (bz)
2011-09-16 13:58:51 +00:00
Robert Watson
9b6dd12e5d Correct several issues in the integration of POSIX shared memory objects
and the new setmode and setowner fileops in FreeBSD 9.0:

- Add new MAC Framework entry point mac_posixshm_check_create() to allow
  MAC policies to authorise shared memory use.  Provide a stub policy and
  test policy templates.

- Add missing Biba and MLS implementations of mac_posixshm_check_setmode()
  and mac_posixshm_check_setowner().

- Add 'accmode' argument to mac_posixshm_check_open() -- unlike the
  mac_posixsem_check_open() entry point it was modeled on, the access mode
  is required as shared memory access can be read-only as well as writable;
  this isn't true of POSIX semaphores.

- Implement full range of POSIX shared memory entry points for Biba and MLS.

Sponsored by:   Google Inc.
Obtained from:	TrustedBSD Project
Approved by:    re (kib)
2011-09-02 17:40:39 +00:00
Attilio Rao
6aba400a70 Fix a deficiency in the selinfo interface:
If a selinfo object is recorded (via selrecord()) and then it is
quickly destroyed, with the waiters missing the opportunity to awake,
at the next iteration they will find the selinfo object destroyed,
causing a PF#.

That happens because the selinfo interface has no way to drain the
waiters before to destroy the registered selinfo object. Also this
race is quite rare to get in practice, because it would require a
selrecord(), a poll request by another thread and a quick destruction
of the selrecord()'ed selinfo object.

Fix this by adding the seldrain() routine which should be called
before to destroy the selinfo objects (in order to avoid such case),
and fix the present cases where it might have already been called.
Sometimes, the context is safe enough to prevent this type of race,
like it happens in device drivers which installs selinfo objects on
poll callbacks. There, the destruction of the selinfo object happens
at driver detach time, when all the filedescriptors should be already
closed, thus there cannot be a race.
For this case, mfi(4) device driver can be set as an example, as it
implements a full correct logic for preventing this from happening.

Sponsored by:	Sandvine Incorporated
Reported by:	rstone
Tested by:	pluknet
Reviewed by:	jhb, kib
Approved by:	re (bz)
MFC after:	3 weeks
2011-08-25 15:51:54 +00:00
Konstantin Belousov
9c00bb9190 Add the fo_chown and fo_chmod methods to struct fileops and use them
to implement fchown(2) and fchmod(2) support for several file types
that previously lacked it. Add MAC entries for chown/chmod done on
posix shared memory and (old) in-kernel posix semaphores.

Based on the submission by:	glebius
Reviewed by:	rwatson
Approved by:	re (bz)
2011-08-16 20:07:47 +00:00
Robert Watson
a9d2f8d84f Second-to-last commit implementing Capsicum capabilities in the FreeBSD
kernel for FreeBSD 9.0:

Add a new capability mask argument to fget(9) and friends, allowing system
call code to declare what capabilities are required when an integer file
descriptor is converted into an in-kernel struct file *.  With options
CAPABILITIES compiled into the kernel, this enforces capability
protection; without, this change is effectively a no-op.

Some cases require special handling, such as mmap(2), which must preserve
information about the maximum rights at the time of mapping in the memory
map so that they can later be enforced in mprotect(2) -- this is done by
narrowing the rights in the existing max_protection field used for similar
purposes with file permissions.

In namei(9), we assert that the code is not reached from within capability
mode, as we're not yet ready to enforce namespace capabilities there.
This will follow in a later commit.

Update two capability names: CAP_EVENT and CAP_KEVENT become
CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they
represent.

Approved by:	re (bz)
Submitted by:	jonathan
Sponsored by:	Google Inc
2011-08-11 12:30:23 +00:00
Jonathan Anderson
778b0e42a8 Provide ability to audit cap_rights_t arguments.
We wish to be able to audit capability rights arguments; this code
provides the necessary infrastructure.

This commit does not, of itself, turn on such auditing for any
system call; that should follow shortly.

Approved by: mentor (rwatson), re (Capsicum blanket)
Sponsored by: Google Inc
2011-07-18 12:58:18 +00:00
Alexander Leidinger
d783bbd2d2 - Add a FEATURE for capsicum (security_capabilities).
- Rename mac FEATURE to security_mac.

Discussed with:	rwatson
2011-03-04 09:03:54 +00:00
Robert Watson
25122f5c5f Add ECAPMODE, "Not permitted in capability mode", a new kernel errno
constant to indicate that a system call (or perhaps an operation requested
via a system call) is not permitted for a capability mode process.

Submitted by:	anderson
Sponsored by:	Google, Inc.
Obtained from:	Capsicum Project
MFC after:	1 week
2011-03-01 13:14:28 +00:00
Alexander Leidinger
de5b19526b Add some FEATURE macros for various features (AUDIT/CAM/IPC/KTR/MAC/NFS/NTP/
PMC/SYSV/...).

No FreeBSD version bump, the userland application to query the features will
be committed last and can serve as an indication of the availablility if
needed.

Sponsored by:   Google Summer of Code 2010
Submitted by:   kibab
Reviewed by:    arch@ (parts by rwatson, trasz, jhb)
X-MFC after:    to be determined in last commit with code from this project
2011-02-25 10:11:01 +00:00
Alan Cox
17f3095d1a Unless "cnt" exceeds MAX_COMMIT_COUNT, nfsrv_commit() and nfsvno_fsync() are
incorrectly calling vm_object_page_clean().  They are passing the length of
the range rather than the ending offset of the range.

Perform the OFF_TO_IDX() conversion in vm_object_page_clean() rather than the
callers.

Reviewed by:	kib
MFC after:	3 weeks
2011-02-05 21:21:27 +00:00
Matthew D Fleming
123d2cb7e9 sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly.
Commit the security directory.
2011-01-12 19:54:14 +00:00
Rebecca Cran
b1ce21c6ef Fix typos.
PR:	bin/148894
Submitted by:	olgeni
2010-11-09 10:59:09 +00:00
Robert Watson
a959b1f02c Add missing DTrace probe invocation to mac_vnode_check_open; the probe
was declared, but never used.

MFC after:	3 days
Sponsored by:	Google, Inc.
2010-10-23 16:59:39 +00:00
Matthew D Fleming
4d369413e1 Replace sbuf_overflowed() with sbuf_error(), which returns any error
code associated with overflow or with the drain function.  While this
function is not expected to be used often, it produces more information
in the form of an errno that sbuf_overflowed() did.
2010-09-10 16:42:16 +00:00
Rui Paulo
79856499bd Add an extra comment to the SDT probes definition. This allows us to get
use '-' in probe names, matching the probe names in Solaris.[1]

Add userland SDT probes definitions to sys/sdt.h.

Sponsored by:	The FreeBSD Foundation
Discussed with:	rwaston [1]
2010-08-22 11:18:57 +00:00
Christian S.J. Peron
24ffe72448 Add a case to make sure that internal audit records get converted
to BSM format for lpathconf(2) events.

MFC after:	2 weeks
2010-05-04 15:29:07 +00:00
Robert Watson
9663e34384 Update device-labeling logic for Biba, LOMAC, and MLS to recognize new-style
pts devices when various policy ptys_equal flags are enabled.

Submitted by:	Estella Mystagic <estella at mystagic.com>
MFC after:	1 week
2010-03-02 15:05:48 +00:00
Christian S.J. Peron
583450efd7 Make sure we convert audit records that were produced as the result of the
closefrom(2) syscall.
2010-01-31 22:31:01 +00:00
Brooks Davis
412f9500e2 Replace the static NGROUPS=NGROUPS_MAX+1=1024 with a dynamic
kern.ngroups+1.  kern.ngroups can range from NGROUPS_MAX=1023 to
INT_MAX-1.  Given that the Windows group limit is 1024, this range
should be sufficient for most applications.

MFC after:	1 month
2010-01-12 07:49:34 +00:00
Edward Tomasz Napierala
3a597bfc7b Make mac_lomac(4) able to interpret NFSv4 access bits.
Reviewed by:	rwatson
2010-01-03 17:19:14 +00:00
Poul-Henning Kamp
0d07b627a3 Having thrown the cat out of the house, add a necessary include. 2009-09-08 13:24:36 +00:00
Poul-Henning Kamp
6778431478 Revert previous commit and add myself to the list of people who should
know better than to commit with a cat in the area.
2009-09-08 13:19:05 +00:00
Poul-Henning Kamp
b34421bf9c Add necessary include. 2009-09-08 13:16:55 +00:00
Robert Watson
9eb3e4639a Correctly audit real gids following changes to the audit record argument
interface.

Approved by:	re (kib)
2009-08-12 10:45:45 +00:00
Robert Watson
791b0ad2bf Eliminate ARG_UPATH[12] arguments to AUDIT_ARG_UPATH() and instead
provide specific macros, AUDIT_ARG_UPATH1() and AUDIT_ARG_UPATH2()
to capture path information for audit records.  This allows us to
move the definitions of ARG_* out of the public audit header file,
as they are an implementation detail of our current kernel-internal
audit record, which may change.

Approved by:	re (kensmith)
Obtained from:	TrustedBSD Project
MFC after:	1 month
2009-07-29 07:44:43 +00:00
Robert Watson
b146fc1bf0 Rework vnode argument auditing to follow the same structure, in order
to avoid exposing ARG_ macros/flag values outside of the audit code in
order to name which one of two possible vnodes will be audited for a
system call.

Approved by:	re (kib)
Obtained from:	TrustedBSD Project
MFC after:	1 month
2009-07-28 21:52:24 +00:00
Robert Watson
e4b4bbb665 Audit file descriptors passed to fooat(2) system calls, which are used
instead of the root/current working directory as the starting point for
lookups.  Up to two such descriptors can be audited.  Add audit record
BSM encoding for fooat(2).

Note: due to an error in the OpenBSM 1.1p1 configuration file, a
further change is required to that file in order to fix openat(2)
auditing.

Approved by:	re (kib)
Reviewed by:	rdivacky (fooat(2) portions)
Obtained from:	TrustedBSD Project
MFC after:	1 month
2009-07-28 21:39:58 +00:00
Robert Watson
597df30e62 Import OpenBSM 1.1p1 from vendor branch to 8-CURRENT, populating
contrib/openbsm and a subset also imported into sys/security/audit.
This patch release addresses several minor issues:

- Fixes to AUT_SOCKUNIX token parsing.
- IPv6 support for au_to_me(3).
- Improved robustness in the parsing of audit_control, especially long
  flags/naflags strings and whitespace in all fields.
- Add missing conversion of a number of FreeBSD/Mac OS X errnos to/from BSM
  error number space.

MFC after:	3 weeks
Obtained from:	TrustedBSD Project
Sponsored by:	Apple, Inc.
Approved by:	re (kib)
2009-07-17 14:02:20 +00:00
Robert Watson
6196f898bb Create audit records for AUE_POSIX_OPENPT, currently w/o arguments.
Approved by:	re (audit argument blanket)
2009-07-02 16:33:38 +00:00
Robert Watson
deedc899fd Fix comment misthink.
Submitted by:	b. f. <bf1783 at googlemail.com>
Approved by:	re (audit argument blanket)
MFC after:	1 week
2009-07-02 09:50:13 +00:00
Robert Watson
2a5658382a Clean up a number of aspects of token generation from audit arguments to
system calls:

- Centralize generation of argument tokens for VM addresses in a macro,
  ADDR_TOKEN(), and properly encode 64-bit addresses in 64-bit arguments.
- Fix up argument numbers across a large number of syscalls so that they
  match the numeric argument into the system call.
- Don't audit the address argument to ioctl(2) or ptrace(2), but do keep
  generating tokens for mmap(2), minherit(2), since they relate to passing
  object access across execve(2).

Approved by:	re (audit argument blanket)
Obtained from:	TrustedBSD Project
MFC after:	1 week
2009-07-02 09:15:30 +00:00
Robert Watson
03f7b00438 For access(2) and eaccess(2), audit the requested access mode.
Approved by:	re (audit argument blanket)
MFC after:	3 days
2009-07-01 22:47:45 +00:00
Robert Watson
9e4c1521d5 Define missing audit argument macro AUDIT_ARG_SOCKET(), and
capture the domain, type, and protocol arguments to socket(2)
and socketpair(2).

Approved by:	re (audit argument blanket)
MFC after:	3 days
2009-07-01 18:54:49 +00:00
Robert Watson
6d5a61563a When auditing unmount(2), capture FSID arguments as regular text strings
rather than as paths, which would lead to them being treated as relative
pathnames and hence confusingly converted into absolute pathnames.

Capture flags to unmount(2) via an argument token.

Approved by:	re (audit argument blanket)
MFC after:	3 days
2009-07-01 16:56:56 +00:00
Robert Watson
422d786676 Audit the file descriptor number passed to lseek(2).
Approved by:	re (kib)
MFC after:	3 days
2009-07-01 15:37:23 +00:00
Robert Watson
2ef24dde7c udit the 'options' argument to wait4(2).
Approved by:	re (kib)
MFC after:	3 days
2009-07-01 12:36:10 +00:00
Stacey Son
86120afae4 Dynamically allocate the gidset field in audit record.
This fixes a problem created by the recent change that allows a large
number of groups per user.  The gidset field in struct kaudit_record
is now dynamically allocated to the size needed rather than statically
(using NGROUPS).

Approved by:	re@ (kensmith, rwatson), gnn (mentor)
2009-06-29 20:19:19 +00:00
Robert Watson
14961ba789 Replace AUDIT_ARG() with variable argument macros with a set more more
specific macros for each audit argument type.  This makes it easier to
follow call-graphs, especially for automated analysis tools (such as
fxr).

In MFC, we should leave the existing AUDIT_ARG() macros as they may be
used by third-party kernel modules.

Suggested by:	brooks
Approved by:	re (kib)
Obtained from:	TrustedBSD Project
MFC after:	1 week
2009-06-27 13:58:44 +00:00
Konstantin Belousov
3364c323e6 Implement global and per-uid accounting of the anonymous memory. Add
rlimit RLIMIT_SWAP that limits the amount of swap that may be reserved
for the uid.

The accounting information (charge) is associated with either map entry,
or vm object backing the entry, assuming the object is the first one
in the shadow chain and entry does not require COW. Charge is moved
from entry to object on allocation of the object, e.g. during the mmap,
assuming the object is allocated, or on the first page fault on the
entry. It moves back to the entry on forks due to COW setup.

The per-entry granularity of accounting makes the charge process fair
for processes that change uid during lifetime, and decrements charge
for proper uid when region is unmapped.

The interface of vm_pager_allocate(9) is extended by adding struct ucred *,
that is used to charge appropriate uid when allocation if performed by
kernel, e.g. md(4).

Several syscalls, among them is fork(2), may now return ENOMEM when
global or per-uid limits are enforced.

In collaboration with:	pho
Reviewed by:	alc
Approved by:	re (kensmith)
2009-06-23 20:45:22 +00:00
Ed Schouten
fbbbf5d135 Chase the removal of PRIV_TTY_PRISON in the mac(9) modules.
Reported by:	kib
Pointy hat to:	me
2009-06-20 15:54:35 +00:00
Konstantin Belousov
d8b0556c6d Adapt vfs kqfilter to the shared vnode lock used by zfs write vop. Use
vnode interlock to protect the knote fields [1]. The locking assumes
that shared vnode lock is held, thus we get exclusive access to knote
either by exclusive vnode lock protection, or by shared vnode lock +
vnode interlock.

Do not use kl_locked() method to assert either lock ownership or the
fact that curthread does not own the lock. For shared locks, ownership
is not recorded, e.g. VOP_ISLOCKED can return LK_SHARED for the shared
lock not owned by curthread, causing false positives in kqueue subsystem
assertions about knlist lock.

Remove kl_locked method from knlist lock vector, and add two separate
assertion methods kl_assert_locked and kl_assert_unlocked, that are
supposed to use proper asserts. Change knlist_init accordingly.

Add convenience function knlist_init_mtx to reduce number of arguments
for typical knlist initialization.

Submitted by:	jhb [1]
Noted by:	jhb [2]
Reviewed by:	jhb
Tested by:	rnoland
2009-06-10 20:59:32 +00:00
Robert Watson
bcf11e8d00 Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC
and used in a large number of files, but also because an increasing number
of incorrect uses of MAC calls were sneaking in due to copy-and-paste of
MAC-aware code without the associated opt_mac.h include.

Discussed with:	pjd
2009-06-05 14:55:22 +00:00
Robert Watson
3ad3d9c5ef Add one further check with mac_policy_count to an mbuf copying case
(limited to netatalk) to avoid MAC label lookup on both mbufs if no
policies are registered.

Obtained from:	TrustedBSD Project
2009-06-03 19:41:12 +00:00
Robert Watson
3de4046939 Continue work to optimize performance of "options MAC" when no MAC policy
modules are loaded by avoiding mbuf label lookups when policies aren't
loaded, pushing further socket locking into MAC policy modules, and
avoiding locking MAC ifnet locks when no policies are loaded:

- Check mac_policies_count before looking for mbuf MAC label m_tags in MAC
  Framework entry points.  We will still pay label lookup costs if MAC
  policies are present but don't require labels (typically a single mbuf
  header field read, but perhaps further indirection if IPSEC or other
  m_tag consumers are in use).

- Further push socket locking for socket-related access control checks and
  events into MAC policies from the MAC Framework, so that sockets are
  only locked if a policy specifically requires a lock to protect a label.
  This resolves lock order issues during sonewconn() and also in local
  domain socket cross-connect where multiple socket locks could not be
  held at once for the purposes of propagatig MAC labels across multiple
  sockets.  Eliminate mac_policy_count check in some entry points where it
  no longer avoids locking.

- Add mac_policy_count checking in some entry points relating to network
  interfaces that otherwise lock a global MAC ifnet lock used to protect
  ifnet labels.

Obtained from:	TrustedBSD Project
2009-06-03 18:46:28 +00:00