Commit Graph

38 Commits

Author SHA1 Message Date
Konstantin Belousov
c77f6350ee Move the funsetown(9) call from audit_pipe_close() to cdevpriv
destructor.  As result, close method becomes trivial and removed.
Final cdevsw close method might be called without file
context (e.g. in vn_open_vnode() if the vnode is reclaimed meantime),
which leaves ap_sigio registered for notification, despite cdevpriv
destructor frees the memory later.

Call destructor instead of doing a cleanup inline, for
devfs_set_cdevpriv() failure in open.  This adds missed funsetown(9)
call and locks ap to satisfy audit_pipe_free() invariants.

Reported and tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-01-13 14:02:07 +00:00
Davide Italiano
b40ea294f9 Replace dev_clone with cdevpriv(9) KPI in audit_pipe code.
This is (yet another) step towards the removal of device
cloning from our kernel.

CR:	https://reviews.freebsd.org/D441
Reviewed by:	kib, rwatson
Tested by:	pho
2014-08-20 16:04:30 +00:00
Davide Italiano
d56b4cd4ac - Use make_dev_credf(MAKEDEV_REF) instead of the race-prone make_dev()+
dev_ref() in the clone handlers that still use it.
- Don't set SI_CHEAPCLONE flag, it's not used anywhere neither in devfs
(for anything real)

Reviewed by:	kib
2013-09-07 13:45:44 +00:00
Ed Schouten
a185bd12f3 Get rid of D_PSEUDO.
It seems the D_PSEUDO flag was meant to allow make_dev() to return NULL.
Nowadays we have a different interface for that; make_dev_p(). There's
no need to keep it there.

While there, remove an unneeded D_NEEDMINOR from the gpio driver.

Discussed with:	gonzo@ (gpio)
2011-10-18 08:09:44 +00:00
Attilio Rao
6aba400a70 Fix a deficiency in the selinfo interface:
If a selinfo object is recorded (via selrecord()) and then it is
quickly destroyed, with the waiters missing the opportunity to awake,
at the next iteration they will find the selinfo object destroyed,
causing a PF#.

That happens because the selinfo interface has no way to drain the
waiters before to destroy the registered selinfo object. Also this
race is quite rare to get in practice, because it would require a
selrecord(), a poll request by another thread and a quick destruction
of the selrecord()'ed selinfo object.

Fix this by adding the seldrain() routine which should be called
before to destroy the selinfo objects (in order to avoid such case),
and fix the present cases where it might have already been called.
Sometimes, the context is safe enough to prevent this type of race,
like it happens in device drivers which installs selinfo objects on
poll callbacks. There, the destruction of the selinfo object happens
at driver detach time, when all the filedescriptors should be already
closed, thus there cannot be a race.
For this case, mfi(4) device driver can be set as an example, as it
implements a full correct logic for preventing this from happening.

Sponsored by:	Sandvine Incorporated
Reported by:	rstone
Tested by:	pluknet
Reviewed by:	jhb, kib
Approved by:	re (bz)
MFC after:	3 weeks
2011-08-25 15:51:54 +00:00
Konstantin Belousov
d8b0556c6d Adapt vfs kqfilter to the shared vnode lock used by zfs write vop. Use
vnode interlock to protect the knote fields [1]. The locking assumes
that shared vnode lock is held, thus we get exclusive access to knote
either by exclusive vnode lock protection, or by shared vnode lock +
vnode interlock.

Do not use kl_locked() method to assert either lock ownership or the
fact that curthread does not own the lock. For shared locks, ownership
is not recorded, e.g. VOP_ISLOCKED can return LK_SHARED for the shared
lock not owned by curthread, causing false positives in kqueue subsystem
assertions about knlist lock.

Remove kl_locked method from knlist lock vector, and add two separate
assertion methods kl_assert_locked and kl_assert_unlocked, that are
supposed to use proper asserts. Change knlist_init accordingly.

Add convenience function knlist_init_mtx to reduce number of arguments
for typical knlist initialization.

Submitted by:	jhb [1]
Noted by:	jhb [2]
Reviewed by:	jhb
Tested by:	rnoland
2009-06-10 20:59:32 +00:00
Robert Watson
2f106d5e08 Remove D_NEEDGIANT from audit pipes. I'm actually not sure why this was
here, but isn't needed.

MFC after:	2 weeks
Sponsored by:	Apple, Inc.
2009-04-16 11:57:16 +00:00
Robert Watson
8f8dedbed8 Set the lower bound on queue size for an audit pipe to 1 instead of 0,
as an audit pipe with a queue length of 0 is less useful.

Obtained from:	TrustedBSD Project
Sponsored by:	Apple, Inc.
MFC after:	1 week
2009-02-08 15:38:31 +00:00
Robert Watson
91e832e0c3 Eliminate the local variable 'ape' in audit_pipe_kqread(), as it's only
used for an assertion that we don't really need anymore.

MFC after:	1 week
Reported by:	Christoph Mallon <christoph dot mallon at gmx dot de>
2009-02-04 19:56:37 +00:00
Robert Watson
d423f266c4 Do a lockless read of the audit pipe list before grabbing the audit pipe
lock in order to avoid the lock acquire hit if the pipe list is very
likely empty.

Obtained from:	TrustedBSD Project
MFC after:	3 weeks
Sponsored by:	Apple, Inc.
2009-01-06 14:15:38 +00:00
Robert Watson
efcde1e8c7 Fix white space botch: use carriage returns rather than tabs. 2008-12-31 23:22:45 +00:00
Robert Watson
d2f6bb070f Update introductory comment for audit pipes.
MFC after:	2 months
Sponsored by:	Apple, Inc.
2008-11-02 00:25:48 +00:00
Robert Watson
6e1362b499 Remove stale comment about filtering in audit pipe ioctl routine: we do
support filtering now, although we may want to make it more interesting
in the future.

MFC after:	2 months
Sponsored by:	Apple, Inc.
2008-11-02 00:18:19 +00:00
Robert Watson
e4565e2028 Add comment for per-pipe stats.
MFC after:	2 months
Sponsored by:	Apple, Inc.
2008-11-01 23:05:49 +00:00
Robert Watson
cff9c52e23 We only allow a partial read of the first record in an audit pipe
record queue, so move the offset field from the per-record
audit_pipe_entry structure to the audit_pipe structure.

Now that we support reading more than one record at a time, add a
new summary field to audit_pipe, ap_qbyteslen, which tracks the
total number of bytes present in a pipe, and return that (minus
the current offset) via FIONREAD and kqueue's data variable for
the pending byte count rather than the number of bytes remaining
in only the first record.

Add a number of asserts to confirm that these counts and offsets
following the expected rules.

MFC after:	2 months
Sponsored by:	Apple, Inc.
2008-11-01 21:56:45 +00:00
Robert Watson
a9275e0bd5 Allow a single read(2) system call on an audit pipe to retrieve data from
more than one audit record at a time in order to improve efficiency.

MFC after:	2 months
Sponsored by:	Apple, Inc.
2008-11-01 21:16:09 +00:00
Robert Watson
1a0edb10ca Since there is no longer the opportunity for record truncation, just
return 0 if the truncation counter is queried on an audit pipe.

MFC after:	2 months
Sponsored by:	Apple, Inc.
2008-10-31 15:11:01 +00:00
Robert Watson
5a9d15cd4c Historically, /dev/auditpipe has allows only whole records to be read via
read(2), which meant that records longer than the buffer passed to read(2)
were dropped.  Instead take the approach of allowing partial reads to be
continued across multiple system calls more in the style of streaming
character device.

This means retaining a record on the per-pipe queue in a partially read
state, so maintain a current offset into the record.  Keep the record on
the queue during a read, so add a new lock, ap_sx, to serialize removal
of records from the queue by either read(2) or ioctl(2) requesting a pipe
flush.  Modify the kqueue handler to return bytes left in the current
record rather than simply the size of the current record.

It is now possible to use praudit, which used the standard FILE * buffer
sizes, to track much larger record sizes from /dev/auditpipe, such as
very long command lines to execve(2).

MFC after:	2 months
Sponsored by:	Apple, Inc.
2008-10-31 14:40:21 +00:00
Robert Watson
1daa6feb45 When we drop an audit record going to and audit pipe because the audit
pipe has overflowed, drop the newest, rather than oldest, record.  This
makes overflow drop behavior consistent with memory allocation failure
leading to drop, avoids touching the consumer end of the queue from a
producer, and lowers the CPU overhead of dropping a record by dropping
before memory allocation and copying.

Obtained from:	Apple, Inc.
MFC after:	2 months
2008-10-30 23:09:19 +00:00
Robert Watson
846f37f1e7 Break out single audit_pipe_mtx into two types of locks: a global rwlock
protecting the list of audit pipes, and a per-pipe mutex protecting the
queue.

Likewise, replace the single global condition variable used to signal
delivery of a record to one or more pipes, and add a per-pipe condition
variable to avoid spurious wakeups when event subscriptions differ
across multiple pipes.

This slightly increases the cost of delivering to audit pipes, but should
reduce lock contention in the presence of multiple readers as only the
per-pipe lock is required to read from a pipe, as well as avoid
overheading when different pipes are used in different ways.

MFC after:	2 months
Sponsored by:	Apple, Inc.
2008-10-30 21:58:39 +00:00
Ed Schouten
d3ce832719 Remove unit2minor() use from kernel code.
When I changed kern_conf.c three months ago I made device unit numbers
equal to (unneeded) device minor numbers. We used to require
bitshifting, because there were eight bits in the middle that were
reserved for a device major number. Not very long after I turned
dev2unit(), minor(), unit2minor() and minor2unit() into macro's.
The unit2minor() and minor2unit() macro's were no-ops.

We'd better not remove these four macro's from the kernel, because there
is a lot of (external) code that may still depend on them. For now it's
harmless to remove all invocations of unit2minor() and minor2unit().

Reviewed by:	kib
2008-09-26 14:19:52 +00:00
Robert Watson
f6d4a8a77b Further synchronization of copyrights, licenses, white space, etc from
Apple and from the OpenBSM vendor tree.

Obtained from:	Apple Inc., TrustedBSD Project
MFC after:	3 days
2008-07-31 09:54:35 +00:00
Ed Schouten
29d4cb241b Don't enforce unique device minor number policy anymore.
Except for the case where we use the cloner library (clone_create() and
friends), there is no reason to enforce a unique device minor number
policy. There are various drivers in the source tree that allocate unr
pools and such to provide minor numbers, without using them themselves.

Because we still need to support unique device minor numbers for the
cloner library, introduce a new flag called D_NEEDMINOR. All cdevsw's
that are used in combination with the cloner library should be marked
with this flag to make the cloning work.

This means drivers can now freely use si_drv0 to store their own flags
and state, making it effectively the same as si_drv1 and si_drv2. We
still keep the minor() and dev2unit() routines around to make drivers
happy.

The NTFS code also used the minor number in its hash table. We should
not do this anymore. If the si_drv0 field would be changed, it would no
longer end up in the same list.

Approved by:	philip (mentor)
2008-06-11 18:55:19 +00:00
Robert Watson
dda409d4ec Use __FBSDID() for $FreeBSD$ IDs in the audit code.
MFC after:	3 days
2008-04-13 22:06:56 +00:00
Wojciech A. Koszek
7a9d5a45e7 Change "audit_pipe_preselect" to "audit_pipe_presel" to make it print
with proper alignment in ddb(4) and vmstat(8).

Reviewed by:	rwatson@
2007-12-25 13:23:19 +00:00
Christian S.J. Peron
4777d3f98a Make sure we are incrementing the read count for each audit pipe read.
MFC after:	1 week
2007-10-27 22:28:01 +00:00
Christian S.J. Peron
24f4142c18 - Change the wakeup logic associated with having multiple sleepers
on multiple different audit pipes.  The old method used cv_signal()
  which would result in only one thread being woken up after we
  appended a record to it's queue.  This resulted in un-timely wake-ups
  when processing audit records real-time.

- Assign PSOCK priority to threads that have been sleeping on a read(2).
  This is the same priority threads are woken up with when they select(2)
  or poll(2).  This yields fairness between various forms of sleep on
  the audit pipes.

Obtained from:	TrustedBSD Project
Discussed with:	rwatson
MFC after:	1 week
2007-10-12 15:09:02 +00:00
Robert Watson
d8c0f4dc21 Clean up audit comments--formatting, spelling, etc. 2007-06-01 21:58:59 +00:00
Robert Watson
acd3428b7d Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges.  These may
require some future tweaking.

Sponsored by:           nCircle Network Security, Inc.
Obtained from:          TrustedBSD Project
Discussed on:           arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
                        Alex Lyashkov <umka at sevcity dot net>,
                        Skip Ford <skip dot ford at verizon dot net>,
                        Antoine Brodin <antoine dot brodin at laposte dot net>
2006-11-06 13:42:10 +00:00
Robert Watson
9fe741b895 Allow the user process to query the kernel's notion of a maximum
audit record size at run-time, which can be used by the user
process to size the user space buffer it reads into from the audit
pipe.

Perforce change:	105098
Obtained from:		TrustedBSD Project
2006-08-26 17:59:31 +00:00
Robert Watson
0fff4cde9d Add kqueue support to audit pipe pseudo-devices.
Obtained from:	TrustedBSD Project
2006-08-24 17:42:38 +00:00
Robert Watson
e257c20ec1 Introduce support for per-audit pipe preselection independent from the
global audit trail configuration.  This allows applications consuming
audit trails to specify parameters for which audit records are of
interest, including selecting records not required by the global trail.
Allowing application interest specification without changing the global
configuration allows intrusion detection systems to run without
interfering with global auditing or each other (if multiple are
present).  To implement this:

- Kernel audit records now carry a flag to indicate whether they have
  been selected by the global trail or by the audit pipe subsystem,
  set during record commit, so that this information is available
  after BSM conversion when delivering the BSM to the trail and audit
  pipes in the audit worker thread asynchronously.  Preselection by
  either record target will cause the record to be kept.

- Similar changes to preselection when the audit record is created
  when the system call is entering: consult both the global trail and
  pipes.

- au_preselect() now accepts the class in order to avoid repeatedly
  looking up the mask for each preselection test.

- Define a series of ioctls that allow applications to specify whether
  they want to track the global trail, or program their own
  preselection parameters: they may specify their own flags and naflags
  masks, similar to the global masks of the same name, as well as a set
  of per-auid masks.  They also set a per-pipe mode specifying whether
  they track the global trail, or user their own -- the door is left
  open for future additional modes.  A new ioctl is defined to allow a
  user process to flush the current audit pipe queue, which can be used
  after reprogramming pre-selection to make sure that only records of
  interest are received in future reads.

- Audit pipe data structures are extended to hold the additional fields
  necessary to support preselection.  By default, audit pipes track the
  global trail, so "praudit /dev/auditpipe" will track the global audit
  trail even though praudit doesn't program the audit pipe selection
  model.

- Comment about the complexities of potentially adding partial read
  support to audit pipes.

By using a set of ioctls, applications can select which records are of
interest, and toggle the preselection mode.

Obtained from:	TrustedBSD Project
2006-06-05 14:48:17 +00:00
Robert Watson
059c649508 Merge Perforce change 93570 from TrustedBSD audit3 branch:
Add audit pipe ioctls to query minimum and maximum audit queue
  lengths.

Obtained from:	TrustedBSD Project
2006-03-19 15:39:03 +00:00
Robert Watson
6a4bde1b76 Merge Perforce change 93567 from TrustedBSD audit3 branch:
Bump default queue limit for audit pipes from 32 to 128, since 32 is
  pretty small.

Obtained from:	TrustedBSD Project
2006-03-19 15:38:03 +00:00
Robert Watson
ed708e1f7f Merge Perforce change 93506 from TrustedBSD audit3 branch:
Add ioctls to audit pipes in order to allow querying of the current
  record queue state, setting of the queue limit, and querying of pipe
  statistics.

Obtained from:	TrustedBSD Project
2006-03-19 15:36:10 +00:00
Robert Watson
69c89e437b Count drops when the first of two pipe mallocs fails.
Obtained from:	TrustedBSD Project
2006-03-04 17:09:17 +00:00
Robert Watson
860ae58e3f Fix queue drop logic when the queue overflows: decrement queue length.
Obtained from:	TrustedBSD Project
2006-02-07 14:46:26 +00:00
Robert Watson
09daf1c828 Add support for audit pipe special devices, which allow user space
applications to insert a "tee" in the live audit event stream.  Records
are inserted into a per-clone queue so that user processes can pull
discreet records out of the queue.  Unlike delivery to disk, audit pipes
are "lossy", dropping records in low memory conditions or when the
process falls behind real-time events.  This mechanism is appropriate
for use by live monitoring systems, host-based intrusion detection, etc,
and avoids applications having to dig through active on-disk trails that
are owned by the audit daemon.

Obtained from:	TrustedBSD Project
2006-02-06 22:50:39 +00:00