38 Commits

Author SHA1 Message Date
kib
e884cfc968 Move the funsetown(9) call from audit_pipe_close() to cdevpriv
destructor.  As result, close method becomes trivial and removed.
Final cdevsw close method might be called without file
context (e.g. in vn_open_vnode() if the vnode is reclaimed meantime),
which leaves ap_sigio registered for notification, despite cdevpriv
destructor frees the memory later.

Call destructor instead of doing a cleanup inline, for
devfs_set_cdevpriv() failure in open.  This adds missed funsetown(9)
call and locks ap to satisfy audit_pipe_free() invariants.

Reported and tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-01-13 14:02:07 +00:00
davide
c71af9be63 Replace dev_clone with cdevpriv(9) KPI in audit_pipe code.
This is (yet another) step towards the removal of device
cloning from our kernel.

CR:	https://reviews.freebsd.org/D441
Reviewed by:	kib, rwatson
Tested by:	pho
2014-08-20 16:04:30 +00:00
davide
ec6382d0c2 - Use make_dev_credf(MAKEDEV_REF) instead of the race-prone make_dev()+
dev_ref() in the clone handlers that still use it.
- Don't set SI_CHEAPCLONE flag, it's not used anywhere neither in devfs
(for anything real)

Reviewed by:	kib
2013-09-07 13:45:44 +00:00
ed
832b15d289 Get rid of D_PSEUDO.
It seems the D_PSEUDO flag was meant to allow make_dev() to return NULL.
Nowadays we have a different interface for that; make_dev_p(). There's
no need to keep it there.

While there, remove an unneeded D_NEEDMINOR from the gpio driver.

Discussed with:	gonzo@ (gpio)
2011-10-18 08:09:44 +00:00
attilio
683d7a54ce Fix a deficiency in the selinfo interface:
If a selinfo object is recorded (via selrecord()) and then it is
quickly destroyed, with the waiters missing the opportunity to awake,
at the next iteration they will find the selinfo object destroyed,
causing a PF#.

That happens because the selinfo interface has no way to drain the
waiters before to destroy the registered selinfo object. Also this
race is quite rare to get in practice, because it would require a
selrecord(), a poll request by another thread and a quick destruction
of the selrecord()'ed selinfo object.

Fix this by adding the seldrain() routine which should be called
before to destroy the selinfo objects (in order to avoid such case),
and fix the present cases where it might have already been called.
Sometimes, the context is safe enough to prevent this type of race,
like it happens in device drivers which installs selinfo objects on
poll callbacks. There, the destruction of the selinfo object happens
at driver detach time, when all the filedescriptors should be already
closed, thus there cannot be a race.
For this case, mfi(4) device driver can be set as an example, as it
implements a full correct logic for preventing this from happening.

Sponsored by:	Sandvine Incorporated
Reported by:	rstone
Tested by:	pluknet
Reviewed by:	jhb, kib
Approved by:	re (bz)
MFC after:	3 weeks
2011-08-25 15:51:54 +00:00
kib
e1cb2941d4 Adapt vfs kqfilter to the shared vnode lock used by zfs write vop. Use
vnode interlock to protect the knote fields [1]. The locking assumes
that shared vnode lock is held, thus we get exclusive access to knote
either by exclusive vnode lock protection, or by shared vnode lock +
vnode interlock.

Do not use kl_locked() method to assert either lock ownership or the
fact that curthread does not own the lock. For shared locks, ownership
is not recorded, e.g. VOP_ISLOCKED can return LK_SHARED for the shared
lock not owned by curthread, causing false positives in kqueue subsystem
assertions about knlist lock.

Remove kl_locked method from knlist lock vector, and add two separate
assertion methods kl_assert_locked and kl_assert_unlocked, that are
supposed to use proper asserts. Change knlist_init accordingly.

Add convenience function knlist_init_mtx to reduce number of arguments
for typical knlist initialization.

Submitted by:	jhb [1]
Noted by:	jhb [2]
Reviewed by:	jhb
Tested by:	rnoland
2009-06-10 20:59:32 +00:00
rwatson
8dbf62efb2 Remove D_NEEDGIANT from audit pipes. I'm actually not sure why this was
here, but isn't needed.

MFC after:	2 weeks
Sponsored by:	Apple, Inc.
2009-04-16 11:57:16 +00:00
rwatson
1d82f9d188 Set the lower bound on queue size for an audit pipe to 1 instead of 0,
as an audit pipe with a queue length of 0 is less useful.

Obtained from:	TrustedBSD Project
Sponsored by:	Apple, Inc.
MFC after:	1 week
2009-02-08 15:38:31 +00:00
rwatson
8ee4f3581d Eliminate the local variable 'ape' in audit_pipe_kqread(), as it's only
used for an assertion that we don't really need anymore.

MFC after:	1 week
Reported by:	Christoph Mallon <christoph dot mallon at gmx dot de>
2009-02-04 19:56:37 +00:00
rwatson
5d645da259 Do a lockless read of the audit pipe list before grabbing the audit pipe
lock in order to avoid the lock acquire hit if the pipe list is very
likely empty.

Obtained from:	TrustedBSD Project
MFC after:	3 weeks
Sponsored by:	Apple, Inc.
2009-01-06 14:15:38 +00:00
rwatson
a5f7e7ad63 Fix white space botch: use carriage returns rather than tabs. 2008-12-31 23:22:45 +00:00
rwatson
20831b1f86 Update introductory comment for audit pipes.
MFC after:	2 months
Sponsored by:	Apple, Inc.
2008-11-02 00:25:48 +00:00
rwatson
368cc5044a Remove stale comment about filtering in audit pipe ioctl routine: we do
support filtering now, although we may want to make it more interesting
in the future.

MFC after:	2 months
Sponsored by:	Apple, Inc.
2008-11-02 00:18:19 +00:00
rwatson
3f0f3e5028 Add comment for per-pipe stats.
MFC after:	2 months
Sponsored by:	Apple, Inc.
2008-11-01 23:05:49 +00:00
rwatson
64f6525f93 We only allow a partial read of the first record in an audit pipe
record queue, so move the offset field from the per-record
audit_pipe_entry structure to the audit_pipe structure.

Now that we support reading more than one record at a time, add a
new summary field to audit_pipe, ap_qbyteslen, which tracks the
total number of bytes present in a pipe, and return that (minus
the current offset) via FIONREAD and kqueue's data variable for
the pending byte count rather than the number of bytes remaining
in only the first record.

Add a number of asserts to confirm that these counts and offsets
following the expected rules.

MFC after:	2 months
Sponsored by:	Apple, Inc.
2008-11-01 21:56:45 +00:00
rwatson
f8873b326d Allow a single read(2) system call on an audit pipe to retrieve data from
more than one audit record at a time in order to improve efficiency.

MFC after:	2 months
Sponsored by:	Apple, Inc.
2008-11-01 21:16:09 +00:00
rwatson
efc5b661a1 Since there is no longer the opportunity for record truncation, just
return 0 if the truncation counter is queried on an audit pipe.

MFC after:	2 months
Sponsored by:	Apple, Inc.
2008-10-31 15:11:01 +00:00
rwatson
6f79887fc5 Historically, /dev/auditpipe has allows only whole records to be read via
read(2), which meant that records longer than the buffer passed to read(2)
were dropped.  Instead take the approach of allowing partial reads to be
continued across multiple system calls more in the style of streaming
character device.

This means retaining a record on the per-pipe queue in a partially read
state, so maintain a current offset into the record.  Keep the record on
the queue during a read, so add a new lock, ap_sx, to serialize removal
of records from the queue by either read(2) or ioctl(2) requesting a pipe
flush.  Modify the kqueue handler to return bytes left in the current
record rather than simply the size of the current record.

It is now possible to use praudit, which used the standard FILE * buffer
sizes, to track much larger record sizes from /dev/auditpipe, such as
very long command lines to execve(2).

MFC after:	2 months
Sponsored by:	Apple, Inc.
2008-10-31 14:40:21 +00:00
rwatson
81bbfda754 When we drop an audit record going to and audit pipe because the audit
pipe has overflowed, drop the newest, rather than oldest, record.  This
makes overflow drop behavior consistent with memory allocation failure
leading to drop, avoids touching the consumer end of the queue from a
producer, and lowers the CPU overhead of dropping a record by dropping
before memory allocation and copying.

Obtained from:	Apple, Inc.
MFC after:	2 months
2008-10-30 23:09:19 +00:00
rwatson
7e2b08356c Break out single audit_pipe_mtx into two types of locks: a global rwlock
protecting the list of audit pipes, and a per-pipe mutex protecting the
queue.

Likewise, replace the single global condition variable used to signal
delivery of a record to one or more pipes, and add a per-pipe condition
variable to avoid spurious wakeups when event subscriptions differ
across multiple pipes.

This slightly increases the cost of delivering to audit pipes, but should
reduce lock contention in the presence of multiple readers as only the
per-pipe lock is required to read from a pipe, as well as avoid
overheading when different pipes are used in different ways.

MFC after:	2 months
Sponsored by:	Apple, Inc.
2008-10-30 21:58:39 +00:00
ed
4212d51a7d Remove unit2minor() use from kernel code.
When I changed kern_conf.c three months ago I made device unit numbers
equal to (unneeded) device minor numbers. We used to require
bitshifting, because there were eight bits in the middle that were
reserved for a device major number. Not very long after I turned
dev2unit(), minor(), unit2minor() and minor2unit() into macro's.
The unit2minor() and minor2unit() macro's were no-ops.

We'd better not remove these four macro's from the kernel, because there
is a lot of (external) code that may still depend on them. For now it's
harmless to remove all invocations of unit2minor() and minor2unit().

Reviewed by:	kib
2008-09-26 14:19:52 +00:00
rwatson
b8596e4794 Further synchronization of copyrights, licenses, white space, etc from
Apple and from the OpenBSM vendor tree.

Obtained from:	Apple Inc., TrustedBSD Project
MFC after:	3 days
2008-07-31 09:54:35 +00:00
ed
1bfc292986 Don't enforce unique device minor number policy anymore.
Except for the case where we use the cloner library (clone_create() and
friends), there is no reason to enforce a unique device minor number
policy. There are various drivers in the source tree that allocate unr
pools and such to provide minor numbers, without using them themselves.

Because we still need to support unique device minor numbers for the
cloner library, introduce a new flag called D_NEEDMINOR. All cdevsw's
that are used in combination with the cloner library should be marked
with this flag to make the cloning work.

This means drivers can now freely use si_drv0 to store their own flags
and state, making it effectively the same as si_drv1 and si_drv2. We
still keep the minor() and dev2unit() routines around to make drivers
happy.

The NTFS code also used the minor number in its hash table. We should
not do this anymore. If the si_drv0 field would be changed, it would no
longer end up in the same list.

Approved by:	philip (mentor)
2008-06-11 18:55:19 +00:00
rwatson
780b65a710 Use __FBSDID() for $FreeBSD$ IDs in the audit code.
MFC after:	3 days
2008-04-13 22:06:56 +00:00
wkoszek
a8e6c33502 Change "audit_pipe_preselect" to "audit_pipe_presel" to make it print
with proper alignment in ddb(4) and vmstat(8).

Reviewed by:	rwatson@
2007-12-25 13:23:19 +00:00
csjp
eaecf9354f Make sure we are incrementing the read count for each audit pipe read.
MFC after:	1 week
2007-10-27 22:28:01 +00:00
csjp
d250020a68 - Change the wakeup logic associated with having multiple sleepers
on multiple different audit pipes.  The old method used cv_signal()
  which would result in only one thread being woken up after we
  appended a record to it's queue.  This resulted in un-timely wake-ups
  when processing audit records real-time.

- Assign PSOCK priority to threads that have been sleeping on a read(2).
  This is the same priority threads are woken up with when they select(2)
  or poll(2).  This yields fairness between various forms of sleep on
  the audit pipes.

Obtained from:	TrustedBSD Project
Discussed with:	rwatson
MFC after:	1 week
2007-10-12 15:09:02 +00:00
rwatson
0d42b093e7 Clean up audit comments--formatting, spelling, etc. 2007-06-01 21:58:59 +00:00
rwatson
10d0d9cf47 Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges.  These may
require some future tweaking.

Sponsored by:           nCircle Network Security, Inc.
Obtained from:          TrustedBSD Project
Discussed on:           arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
                        Alex Lyashkov <umka at sevcity dot net>,
                        Skip Ford <skip dot ford at verizon dot net>,
                        Antoine Brodin <antoine dot brodin at laposte dot net>
2006-11-06 13:42:10 +00:00
rwatson
c9215ad31e Allow the user process to query the kernel's notion of a maximum
audit record size at run-time, which can be used by the user
process to size the user space buffer it reads into from the audit
pipe.

Perforce change:	105098
Obtained from:		TrustedBSD Project
2006-08-26 17:59:31 +00:00
rwatson
1e4f4abfce Add kqueue support to audit pipe pseudo-devices.
Obtained from:	TrustedBSD Project
2006-08-24 17:42:38 +00:00
rwatson
4f317e1576 Introduce support for per-audit pipe preselection independent from the
global audit trail configuration.  This allows applications consuming
audit trails to specify parameters for which audit records are of
interest, including selecting records not required by the global trail.
Allowing application interest specification without changing the global
configuration allows intrusion detection systems to run without
interfering with global auditing or each other (if multiple are
present).  To implement this:

- Kernel audit records now carry a flag to indicate whether they have
  been selected by the global trail or by the audit pipe subsystem,
  set during record commit, so that this information is available
  after BSM conversion when delivering the BSM to the trail and audit
  pipes in the audit worker thread asynchronously.  Preselection by
  either record target will cause the record to be kept.

- Similar changes to preselection when the audit record is created
  when the system call is entering: consult both the global trail and
  pipes.

- au_preselect() now accepts the class in order to avoid repeatedly
  looking up the mask for each preselection test.

- Define a series of ioctls that allow applications to specify whether
  they want to track the global trail, or program their own
  preselection parameters: they may specify their own flags and naflags
  masks, similar to the global masks of the same name, as well as a set
  of per-auid masks.  They also set a per-pipe mode specifying whether
  they track the global trail, or user their own -- the door is left
  open for future additional modes.  A new ioctl is defined to allow a
  user process to flush the current audit pipe queue, which can be used
  after reprogramming pre-selection to make sure that only records of
  interest are received in future reads.

- Audit pipe data structures are extended to hold the additional fields
  necessary to support preselection.  By default, audit pipes track the
  global trail, so "praudit /dev/auditpipe" will track the global audit
  trail even though praudit doesn't program the audit pipe selection
  model.

- Comment about the complexities of potentially adding partial read
  support to audit pipes.

By using a set of ioctls, applications can select which records are of
interest, and toggle the preselection mode.

Obtained from:	TrustedBSD Project
2006-06-05 14:48:17 +00:00
rwatson
bae874c2cb Merge Perforce change 93570 from TrustedBSD audit3 branch:
Add audit pipe ioctls to query minimum and maximum audit queue
  lengths.

Obtained from:	TrustedBSD Project
2006-03-19 15:39:03 +00:00
rwatson
2b1a7974d7 Merge Perforce change 93567 from TrustedBSD audit3 branch:
Bump default queue limit for audit pipes from 32 to 128, since 32 is
  pretty small.

Obtained from:	TrustedBSD Project
2006-03-19 15:38:03 +00:00
rwatson
a74ff4762f Merge Perforce change 93506 from TrustedBSD audit3 branch:
Add ioctls to audit pipes in order to allow querying of the current
  record queue state, setting of the queue limit, and querying of pipe
  statistics.

Obtained from:	TrustedBSD Project
2006-03-19 15:36:10 +00:00
rwatson
fb6445828e Count drops when the first of two pipe mallocs fails.
Obtained from:	TrustedBSD Project
2006-03-04 17:09:17 +00:00
rwatson
bc3d3926ef Fix queue drop logic when the queue overflows: decrement queue length.
Obtained from:	TrustedBSD Project
2006-02-07 14:46:26 +00:00
rwatson
a1af4bcfbd Add support for audit pipe special devices, which allow user space
applications to insert a "tee" in the live audit event stream.  Records
are inserted into a per-clone queue so that user processes can pull
discreet records out of the queue.  Unlike delivery to disk, audit pipes
are "lossy", dropping records in low memory conditions or when the
process falls behind real-time events.  This mechanism is appropriate
for use by live monitoring systems, host-based intrusion detection, etc,
and avoids applications having to dig through active on-disk trails that
are owned by the audit daemon.

Obtained from:	TrustedBSD Project
2006-02-06 22:50:39 +00:00