Commit Graph

575 Commits

Author SHA1 Message Date
John Baldwin
6bc1e9cd84 Rework the lifetime management of the kernel implementation of POSIX
semaphores.  Specifically, semaphores are now represented as new file
descriptor type that is set to close on exec.  This removes the need for
all of the manual process reference counting (and fork, exec, and exit
event handlers) as the normal file descriptor operations handle all of
that for us nicely.  It is also suggested as one possible implementation
in the spec and at least one other OS (OS X) uses this approach.

Some bugs that were fixed as a result include:
- References to a named semaphore whose name is removed still work after
  the sem_unlink() operation.  Prior to this patch, if a semaphore's name
  was removed, valid handles from sem_open() would get EINVAL errors from
  sem_getvalue(), sem_post(), etc.  This fixes that.
- Unnamed semaphores created with sem_init() were not cleaned up when a
  process exited or exec'd.  They were only cleaned up if the process
  did an explicit sem_destroy().  This could result in a leak of semaphore
  objects that could never be cleaned up.
- On the other hand, if another process guessed the id (kernel pointer to
  'struct ksem' of an unnamed semaphore (created via sem_init)) and had
  write access to the semaphore based on UID/GID checks, then that other
  process could manipulate the semaphore via sem_destroy(), sem_post(),
  sem_wait(), etc.
- As part of the permission check (UID/GID), the umask of the proces
  creating the semaphore was not honored.  Thus if your umask denied group
  read/write access but the explicit mode in the sem_init() call allowed
  it, the semaphore would be readable/writable by other users in the
  same group, for example.  This includes access via the previous bug.
- If the module refused to unload because there were active semaphores,
  then it might have deregistered one or more of the semaphore system
  calls before it noticed that there was a problem.  I'm not sure if
  this actually happened as the order that modules are discovered by the
  kernel linker depends on how the actual .ko file is linked.  One can
  make the order deterministic by using a single module with a mod_event
  handler that explicitly registers syscalls (and deregisters during
  unload after any checks).  This also fixes a race where even if the
  sem_module unloaded first it would have destroyed locks that the
  syscalls might be trying to access if they are still executing when
  they are unloaded.

  XXX: By the way, deregistering system calls doesn't do any blocking
  to drain any threads from the calls.
- Some minor fixes to errno values on error.  For example, sem_init()
  isn't documented to return ENFILE or EMFILE if we run out of semaphores
  the way that sem_open() can.  Instead, it should return ENOSPC in that
  case.

Other changes:
- Kernel semaphores now use a hash table to manage the namespace of
  named semaphores nearly in a similar fashion to the POSIX shared memory
  object file descriptors.  Kernel semaphores can now also have names
  longer than 14 chars (up to MAXPATHLEN) and can include subdirectories
  in their pathname.
- The UID/GID permission checks for access to a named semaphore are now
  done via vaccess() rather than a home-rolled set of checks.
- Now that kernel semaphores have an associated file object, the various
  MAC checks for POSIX semaphores accept both a file credential and an
  active credential.  There is also a new posixsem_check_stat() since it
  is possible to fstat() a semaphore file descriptor.
- A small set of regression tests (using the ksem API directly) is present
  in src/tools/regression/posixsem.

Reported by:	kris (1)
Tested by:	kris
Reviewed by:	rwatson (lightly)
MFC after:	1 month
2008-06-27 05:39:04 +00:00
John Baldwin
127cc7673d Add missing counter increments for posix shm checks. 2008-06-26 13:49:32 +00:00
John Baldwin
c4f3a35a54 Remove the posixsem_check_destroy() MAC check. It is semantically identical
to doing a MAC check for close(), but no other types of close() (including
close(2) and ksem_close(2)) have MAC checks.

Discussed with:	rwatson
2008-06-23 21:37:53 +00:00
Robert Watson
37f44cb428 The TrustedBSD MAC Framework named struct ipq instances 'ipq', which is the
same as the global variable defined in ip_input.c.  Instead, adopt the name
'q' as found in about 1/2 of uses in ip_input.c, preventing a collision on
the name.  This is non-harmful, but means that search and replace on the
global works less well (as in the virtualization work), as well as indexing
tools.

MFC after:	1 week
Reported by:	julian
2008-06-13 22:14:15 +00:00
Ed Schouten
29d4cb241b Don't enforce unique device minor number policy anymore.
Except for the case where we use the cloner library (clone_create() and
friends), there is no reason to enforce a unique device minor number
policy. There are various drivers in the source tree that allocate unr
pools and such to provide minor numbers, without using them themselves.

Because we still need to support unique device minor numbers for the
cloner library, introduce a new flag called D_NEEDMINOR. All cdevsw's
that are used in combination with the cloner library should be marked
with this flag to make the cloning work.

This means drivers can now freely use si_drv0 to store their own flags
and state, making it effectively the same as si_drv1 and si_drv2. We
still keep the minor() and dev2unit() routines around to make drivers
happy.

The NTFS code also used the minor number in its hash table. We should
not do this anymore. If the si_drv0 field would be changed, it would no
longer end up in the same list.

Approved by:	philip (mentor)
2008-06-11 18:55:19 +00:00
Simon L. B. Nielsen
3bff0167b9 When the file-system containing the audit log file is running low on
disk space a warning is printed.  Make this warning a bit more
informative.

Approved by:	rwatson
2008-06-10 20:05:32 +00:00
Robert Watson
4e95375678 Add an XXX comment regarding a bug I introduced when modifying the behavior
of audit log vnode rotation: on shutdown, we may not properly drain all
pending records, which could lead to lost records during system shutdown.
2008-06-03 11:06:34 +00:00
Christian S.J. Peron
1f84ab0f2a Plug a memory leak which can occur when multiple MAC policies are loaded
which label mbufs.  This leak can occur if one policy successfully allocates
label storage and subsequent allocations from other policies fail.

Spotted by:	rwatson
MFC after:	1 week
2008-05-27 14:18:02 +00:00
Robert Watson
bcbd871a3f Don't use LK_DRAIN before calling VOP_FSYNC() in the two further
panic cases for audit trail failure -- this doesn't contribute
anything, and might arguably be wrong.

MFC after:	1 week
Requested by:	attilio
2008-05-21 13:59:05 +00:00
Robert Watson
bf7baa9eca Don't use LK_DRAIN before calling VOP_FSYNC() in the panic case for
audit trail failure -- this doesn't contribute anything, and might
arguably be wrong.

MFC after:	1 week
Requested by:	attilio
2008-05-21 13:05:06 +00:00
Robert Watson
7d8ab8bafb When testing whether to enter the audit argument gathering code, rather
than checking whether audit is enabled globally, instead check whether
the current thread has an audit record.  This avoids entering the audit
code to collect argument data if auditing is enabled but the current
system call is not of interest to audit.

MFC after:	1 week
Sponsored by:	Apple, Inc.
2008-05-06 00:32:23 +00:00
Robert Watson
fa9e0a18af Fix include guard spelling.
MFC after:	3 days
Submitted by:	diego
2008-04-27 15:51:49 +00:00
Robert Watson
81efe39deb Use logic or, not binary or, when deciding whether or not a system call
exit requires entering the audit code.  The result is much the same,
but they mean different things.

MFC afer:	3 days
Submitted by:	Diego Giagio <dgiagio at gmail dot com>
2008-04-24 12:23:31 +00:00
Robert Watson
1a46aa801e When auditing state from an IPv4 or IPv6 socket, use read locks on the
inpcb rather than write locks.

MFC after:	3 months
2008-04-19 18:37:08 +00:00
Robert Watson
211b72ad2f When propagating a MAC label from an inpcb to an mbuf, allow read and
write locks on the inpcb, not just write locks.

MFC after:	3 months
2008-04-19 18:35:27 +00:00
Robert Watson
8501a69cc9 Convert pcbinfo and inpcb mutexes to rwlocks, and modify macros to
explicitly select write locking for all use of the inpcb mutex.
Update some pcbinfo lock assertions to assert locked rather than
write-locked, although in practice almost all uses of the pcbinfo
rwlock main exclusive, and all instances of inpcb lock acquisition
are exclusive.

This change should introduce (ideally) little functional change.
However, it lays the groundwork for significantly increased
parallelism in the TCP/IP code.

MFC after:	3 months
Tested by:	kris (superset of committered patch)
2008-04-17 21:38:18 +00:00
Robert Watson
dda409d4ec Use __FBSDID() for $FreeBSD$ IDs in the audit code.
MFC after:	3 days
2008-04-13 22:06:56 +00:00
Robert Watson
646a9f8029 Make naming of include guards for MAC Framework include files more
consistent with other kernel include guards (don't start with _SYS).

MFC after:	3 days
2008-04-13 21:45:52 +00:00
Konstantin Belousov
57b4252e45 Add the support for the AT_FDCWD and fd-relative name lookups to the
namei(9).

Based on the submission by rdivacky,
	sponsored by Google Summer of Code 2007
Reviewed by:	rwatson, rdivacky
Tested by:	pho
2008-03-31 12:01:21 +00:00
Robert Watson
237fdd787b In keeping with style(9)'s recommendations on macros, use a ';'
after each SYSINIT() macro invocation.  This makes a number of
lightweight C parsers much happier with the FreeBSD kernel
source, including cflow's prcc and lxr.

MFC after:	1 month
Discussed with:	imp, rink
2008-03-16 10:58:09 +00:00
Robert Watson
d4cafc74ae Remove XXX to remind me to check the free space calculation, which to my
eyes appears right following a check.

MFC after:	3 days
2008-03-10 18:15:02 +00:00
Christian S.J. Peron
e5ad5f4d70 Change auditon(2) so that if somebody supplies an invalid command, it
returns EINVAL. Right now we return 0 or success for invalid commands,
which could be quite problematic in certain conditions.

MFC after:	1 week
Discussed with:	rwatson
2008-03-06 22:57:03 +00:00
Robert Watson
8805ca53e7 Rather than copying out the full audit trigger record, which includes
a queue entry field, just copy out the unsigned int that is the trigger
message.  In practice, auditd always requested sizeof(unsigned int), so
the extra bytes were ignored, but copying them out was not the intent.

MFC after:	1 month
2008-03-02 21:34:17 +00:00
Robert Watson
6cc189913c Add audit_prefixes to two more globally visible functions in the Audit
implementation.

MFC after:	1 month
2008-03-01 11:40:49 +00:00
Robert Watson
fb4ed8c9bf Rename globally exposed symbol send_trigger() to audit_send_trigger().
MFC after:	1 month
2008-03-01 11:04:04 +00:00
Robert Watson
ae87be447c Replace somewhat awkward audit trail rotation scheme, which involved the
global audit mutex and condition variables, with an sx lock which protects
the trail vnode and credential while in use, and is acquired by the system
call code when rotating the trail.  Previously, a "message" would be sent
to the kernel audit worker, which did the rotation, but the new code is
simpler and (hopefully) less error-prone.

Obtained from:	TrustedBSD Project
MFC after:	1 month
2008-02-27 17:12:22 +00:00
Robert Watson
303d3f35fb Rename several audit functions in the global kernel symbol namespace to
have audit_ on the front:

- canon_path -> audit_canon_path
- msgctl_to_event -> audit_msgctl_to_event
- semctl_to_event -> audit_semctl_to_event

MFC after:	1 month
2008-02-25 20:28:00 +00:00
Christian S.J. Peron
c52a508838 Make sure that the termid type is initialized to AU_IPv4 by default.
This makes sure that process tokens credentials with un-initialized
audit contexts are handled correctly.  Currently, when invariants are
enabled, this change fixes a panic by ensuring that we have a valid
termid family.  Also, this fixes token generation for process tokens
making sure that userspace is always getting a valid token.

This is consistent with what Solaris does when an audit context is
un-initialized.

Obtained from:	TrustedBSD Project
MFC after:	1 week
2008-01-28 17:33:46 +00:00
Robert Watson
5ac3b03500 Properly return the error from mls_subject_privileged() in the ifnet
relabel check for MLS rather than returning 0 directly.

This problem didn't result in a vulnerability currently as the central
implementation of ifnet relabeling also checks for UNIX privilege, and
we currently don't guarantee containment for the root user in mac_mls,
but we should be using the MLS definition of privilege as well as the
UNIX definition in anticipation of supporting root containment at some
point.

MFC after:	3 days
Submitted by:	Zhouyi Zhou <zhouzhouyi at gmail dot com>
Sponsored by:	Google SoC 2007
2008-01-28 10:20:18 +00:00
Christian S.J. Peron
0f7e334a95 Fix gratuitous whitespace bug
MFC after:	1 week
Obtained from:	TrustedBSD Project
2008-01-18 19:57:21 +00:00
Christian S.J. Peron
cd109a68ae Add a case for AUE_LISTEN. This removes the following console error message:
"BSM conversion requested for unknown event 43140"

It should be noted that we need to audit the fd argument for this system
call.

Obtained from:	TrustedBSD Project
MFC after:	1 week
2008-01-18 19:50:34 +00:00
Attilio Rao
22db15c06f VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.

KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.

Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
2008-01-13 14:44:15 +00:00
Attilio Rao
cb05b60a89 vn_lock() is currently only used with the 'curthread' passed as argument.
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.

Manpage and FreeBSD_version will be updated through further commits.

As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.

Tested by:	Diego Sardina <siarodx at gmail dot com>,
		Andrea Di Pasquale <whyx dot it at gmail dot com>
2008-01-10 01:10:58 +00:00
John Baldwin
8e38aeff17 Add a new file descriptor type for IPC shared memory objects and use it to
implement shm_open(2) and shm_unlink(2) in the kernel:
- Each shared memory file descriptor is associated with a swap-backed vm
  object which provides the backing store.  Each descriptor starts off with
  a size of zero, but the size can be altered via ftruncate(2).  The shared
  memory file descriptors also support fstat(2).  read(2), write(2),
  ioctl(2), select(2), poll(2), and kevent(2) are not supported on shared
  memory file descriptors.
- shm_open(2) and shm_unlink(2) are now implemented as system calls that
  manage shared memory file descriptors.  The virtual namespace that maps
  pathnames to shared memory file descriptors is implemented as a hash
  table where the hash key is generated via the 32-bit Fowler/Noll/Vo hash
  of the pathname.
- As an extension, the constant 'SHM_ANON' may be specified in place of the
  path argument to shm_open(2).  In this case, an unnamed shared memory
  file descriptor will be created similar to the IPC_PRIVATE key for
  shmget(2).  Note that the shared memory object can still be shared among
  processes by sharing the file descriptor via fork(2) or sendmsg(2), but
  it is unnamed.  This effectively serves to implement the getmemfd() idea
  bandied about the lists several times over the years.
- The backing store for shared memory file descriptors are garbage
  collected when they are not referenced by any open file descriptors or
  the shm_open(2) virtual namespace.

Submitted by:	dillon, peter (previous versions)
Submitted by:	rwatson (I based this on his version)
Reviewed by:	alc (suggested converting getmemfd() to shm_open())
2008-01-08 21:58:16 +00:00
Robert Watson
3de213cc00 Add a new 'why' argument to kdb_enter(), and a set of constants to use
for that argument.  This will allow DDB to detect the broad category of
reason why the debugger has been entered, which it can use for the
purposes of deciding which DDB script to run.

Assign approximate why values to all current consumers of the
kdb_enter() interface.
2007-12-25 17:52:02 +00:00
Wojciech A. Koszek
7a9d5a45e7 Change "audit_pipe_preselect" to "audit_pipe_presel" to make it print
with proper alignment in ddb(4) and vmstat(8).

Reviewed by:	rwatson@
2007-12-25 13:23:19 +00:00
Robert Watson
b5f992b93d Fix a MAC label leak for POSIX semaphores, in which per-policy labels
would be properly disposed of, but the global label structure for the
semaphore wouldn't be freed.

MFC after:	3 days
Reported by:	tanyong <tanyong at ercist dot iscas dot ac dot cn>,
		zhouzhouyi
2007-12-17 17:26:32 +00:00
Wojciech A. Koszek
4ce05f7e44 Explicitly initialize 'ret' to 0'. It lets one to build tmpfs from the
latest source tree with older compiler--gcc3.

Approved by:	cognet (mentor)
2007-12-04 20:20:59 +00:00
Robert Watson
1876fb2118 Implement per-object type consistency checks for labels passed to
'internalize' operations rather than using a single common check.

Obtained from:	TrustedBSD Project
2007-10-30 00:01:28 +00:00
Robert Watson
323f4cc31d Replace use of AU_NULL with 0 when no audit classes are in use; this
supports the removal of hard-coded audit class constants in OpenBSM
1.0.  All audit classes are now dynamically configured via the
audit_class database.

Obtained from:	TrustedBSD Project
2007-10-29 18:07:48 +00:00
Robert Watson
f03368334e Canonicalize names of local variables.
Add some missing label checks in mac_test.

Obtained from:	TrustedBSD Project
2007-10-29 15:30:47 +00:00
Robert Watson
eb320b0ee7 Resort TrustedBSD MAC Framework policy entry point implementations and
declarations to match the object, operation sort order in the framework
itself.

Obtained from:	TrustedBSD Project
2007-10-29 13:33:06 +00:00
Robert Watson
f10b1ebc78 Add missing mac_test labeling and sleep checks for the syncache.
Discussed with:	csjp
Obtained from:	TrustedBSD Project
2007-10-28 18:33:31 +00:00
Robert Watson
2a9e17ce8e Garbage collect mac_mbuf_create_multicast_encap TrustedBSD MAC Framework
entry point, which is no longer required now that we don't support
old-style multicast tunnels.  This removes the last mbuf object class
entry point that isn't init/copy/destroy.

Obtained from:	TrustedBSD Project
2007-10-28 17:55:57 +00:00
Robert Watson
a13e21f7bc Continue to move from generic network entry points in the TrustedBSD MAC
Framework by moving from mac_mbuf_create_netlayer() to more specific
entry points for specific network services:

- mac_netinet_firewall_reply() to be used when replying to in-bound TCP
  segments in pf and ipfw (etc).

- Rename mac_netinet_icmp_reply() to mac_netinet_icmp_replyinplace() and
  add mac_netinet_icmp_reply(), reflecting that in some cases we overwrite
  a label in place, but in others we apply the label to a new mbuf.

Obtained from:	TrustedBSD Project
2007-10-28 17:12:48 +00:00
Robert Watson
b9b0dac33b Move towards more explicit support for various network protocol stacks
in the TrustedBSD MAC Framework:

- Add mac_atalk.c and add explicit entry point mac_netatalk_aarp_send()
  for AARP packet labeling, rather than using a generic link layer
  entry point.

- Add mac_inet6.c and add explicit entry point mac_netinet6_nd6_send()
  for ND6 packet labeling, rather than using a generic link layer entry
  point.

- Add expliict entry point mac_netinet_arp_send() for ARP packet
  labeling, and mac_netinet_igmp_send() for IGMP packet labeling,
  rather than using a generic link layer entry point.

- Remove previous genering link layer entry point,
  mac_mbuf_create_linklayer() as it is no longer used.

- Add implementations of new entry points to various policies, largely
  by replicating the existing link layer entry point for them; remove
  old link layer entry point implementation.

- Make MAC_IFNET_LOCK(), MAC_IFNET_UNLOCK(), and mac_ifnet_mtx global
  to the MAC Framework rather than static to mac_net.c as it is now
  needed outside of mac_net.c.

Obtained from:	TrustedBSD Project
2007-10-28 15:55:23 +00:00
Robert Watson
b0f4c777e4 Perform explicit label type checks for externalize entry points, rather than
a generic initialized test.

Obtained from:	TrustedBSD Project
2007-10-28 14:28:33 +00:00
Christian S.J. Peron
4777d3f98a Make sure we are incrementing the read count for each audit pipe read.
MFC after:	1 week
2007-10-27 22:28:01 +00:00
Robert Watson
438aeadf27 Give each posixsem MAC Framework entry point its own counter and test case
in the mac_test policy, rather than sharing a single function for all of
the access control checks.

Obtained from:	TrustedBSD Project
2007-10-27 10:38:57 +00:00
Robert Watson
6683b28d78 Update comment following MAC Framework entry point renaming and
reorganization.

Obtained from:	TrustedBSD Project
2007-10-26 21:16:34 +00:00