Fix path handling for *at() syscalls.
Before the change directory descriptor was totally ignored,
so the relative path argument was appended to current working
directory path and not to the path provided by descriptor, thus
wrong paths were stored in audit logs.
Now that we use directory descriptor in vfs_lookup, move
AUDIT_ARG_UPATH1() and AUDIT_ARG_UPATH2() calls to the place where
we hold file descriptors table lock, so we are sure paths will
be resolved according to the same directory in audit record and
in actual operation.
Sponsored by: FreeBSD Foundation (auditdistd)
Reviewed by: rwatson
MFC after: 2 weeks
code associated with overflow or with the drain function. While this
function is not expected to be used often, it produces more information
in the form of an errno that sbuf_overflowed() did.
instead of the root/current working directory as the starting point for
lookups. Up to two such descriptors can be audited. Add audit record
BSM encoding for fooat(2).
Note: due to an error in the OpenBSM 1.1p1 configuration file, a
further change is required to that file in order to fix openat(2)
auditing.
Approved by: re (kib)
Reviewed by: rdivacky (fooat(2) portions)
Obtained from: TrustedBSD Project
MFC after: 1 month
mutex, as it's rarely changed but frequently accessed read-only from
multiple threads, so a potentially significant source of contention.
MFC after: 1 month
Sponsored by: Apple, Inc.
processes are not producing absolute pathname tokens. It is required
that audited pathnames are generated relative to the global root mount
point. This modification changes our implementation of audit_canon_path(9)
and introduces a new function: vn_fullpath_global(9) which performs a
vnode -> pathname translation relative to the global mount point based
on the contents of the name cache. Much like vn_fullpath,
vn_fullpath_global is a wrapper function which called vn_fullpath1.
Further, the string parsing routines have been converted to use the
sbuf(9) framework. This change also removes the conditional acquisition
of Giant, since the vn_fullpath1 method will not dip into file system
dependent code.
The vnode locking was modified to use vhold()/vdrop() instead the vref()
and vrele(). This will modify the hold count instead of modifying the
user count. This makes more sense since it's the kernel that requires
the reference to the vnode. This also makes sure that the vnode does not
get recycled we hold the reference to it. [1]
Discussed with: rwatson
Reviewed by: kib [1]
MFC after: 2 weeks
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.
Manpage and FreeBSD_version will be updated through further commits.
As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.
Tested by: Diego Sardina <siarodx at gmail dot com>,
Andrea Di Pasquale <whyx dot it at gmail dot com>
supports the removal of hard-coded audit class constants in OpenBSM
1.0. All audit classes are now dynamically configured via the
audit_class database.
Obtained from: TrustedBSD Project
and flags with an sxlock. This leads to a significant and measurable
performance improvement as a result of access to shared locking for
frequent lookup operations, reduced general overhead, and reduced overhead
in the event of contention. All of these are imported for threaded
applications where simultaneous access to a shared file descriptor array
occurs frequently. Kris has reported 2x-4x transaction rate improvements
on 8-core MySQL benchmarks; smaller improvements can be expected for many
workloads as a result of reduced overhead.
- Generally eliminate the distinction between "fast" and regular
acquisisition of the filedesc lock; the plan is that they will now all
be fast. Change all locking instances to either shared or exclusive
locks.
- Correct a bug (pointed out by kib) in fdfree() where previously msleep()
was called without the mutex held; sx_sleep() is now always called with
the sxlock held exclusively.
- Universally hold the struct file lock over changes to struct file,
rather than the filedesc lock or no lock. Always update the f_ops
field last. A further memory barrier is required here in the future
(discussed with jhb).
- Improve locking and reference management in linux_at(), which fails to
properly acquire vnode references before using vnode pointers. Annotate
improper use of vn_fullpath(), which will be replaced at a future date.
In fcntl(), we conservatively acquire an exclusive lock, even though in
some cases a shared lock may be sufficient, which should be revisited.
The dropping of the filedesc lock in fdgrowtable() is no longer required
as the sxlock can be held over the sleep operation; we should consider
removing that (pointed out by attilio).
Tested by: kris
Discussed with: jhb, kris, attilio, jeff
global audit trail configuration. This allows applications consuming
audit trails to specify parameters for which audit records are of
interest, including selecting records not required by the global trail.
Allowing application interest specification without changing the global
configuration allows intrusion detection systems to run without
interfering with global auditing or each other (if multiple are
present). To implement this:
- Kernel audit records now carry a flag to indicate whether they have
been selected by the global trail or by the audit pipe subsystem,
set during record commit, so that this information is available
after BSM conversion when delivering the BSM to the trail and audit
pipes in the audit worker thread asynchronously. Preselection by
either record target will cause the record to be kept.
- Similar changes to preselection when the audit record is created
when the system call is entering: consult both the global trail and
pipes.
- au_preselect() now accepts the class in order to avoid repeatedly
looking up the mask for each preselection test.
- Define a series of ioctls that allow applications to specify whether
they want to track the global trail, or program their own
preselection parameters: they may specify their own flags and naflags
masks, similar to the global masks of the same name, as well as a set
of per-auid masks. They also set a per-pipe mode specifying whether
they track the global trail, or user their own -- the door is left
open for future additional modes. A new ioctl is defined to allow a
user process to flush the current audit pipe queue, which can be used
after reprogramming pre-selection to make sure that only records of
interest are received in future reads.
- Audit pipe data structures are extended to hold the additional fields
necessary to support preselection. By default, audit pipes track the
global trail, so "praudit /dev/auditpipe" will track the global audit
trail even though praudit doesn't program the audit pipe selection
model.
- Comment about the complexities of potentially adding partial read
support to audit pipes.
By using a set of ioctls, applications can select which records are of
interest, and toggle the preselection mode.
Obtained from: TrustedBSD Project
- Management of audit state on processes.
- Audit system calls to configure process and system audit state.
- Reliable audit record queue implementation, audit_worker kernel
thread to asynchronously store records on disk.
- Audit event argument.
- Internal audit data structure -> BSM audit trail conversion library.
- Audit event pre-selection.
- Audit pseudo-device permitting kernel->user upcalls to notify auditd
of kernel audit events.
Much work by: wsalamon
Obtained from: TrustedBSD Project, Apple Computer, Inc.