Commit Graph

98 Commits

Author SHA1 Message Date
Jamie Gritton
b38ff370e4 Introduce the extensible jail framework, using the same "name=value"
interface as nmount(2).  Three new system calls are added:
* jail_set, to create jails and change the parameters of existing jails.
  This replaces jail(2).
* jail_get, to read the parameters of existing jails.  This replaces the
  security.jail.list sysctl.
* jail_remove to kill off a jail's processes and remove the jail.
Most jail parameters may now be changed after creation, and jails may be
set to exist without any attached processes.  The current jail(2) system
call still exists, though it is now a stub to jail_set(2).

Approved by:	bz (mentor)
2009-04-29 21:14:15 +00:00
Jamie Gritton
af7bd9a4f4 Some non-functional changes: whitespace, KASSERT strings, declaration order.
Approved by:	bz (mentor)
2009-04-29 18:41:08 +00:00
Jamie Gritton
8571af59e5 Whitespace/spelling fixes in advance of upcoming functional changes.
Approved by:	bz (mentor)
2009-03-27 13:13:59 +00:00
Jamie Gritton
ca04ba6430 Don't allow creating a socket with a protocol family that the current
jail doesn't support.  This involves a new function prison_check_af,
like prison_check_ip[46] but that checks only the family.

With this change, most of the errors generated by jailed sockets
shouldn't ever occur, at least until jails are changeable.

Approved by:	bz (mentor)
2009-02-05 14:15:18 +00:00
Jamie Gritton
b89e82dd87 Standardize the various prison_foo_ip[46] functions and prison_if to
return zero on success and an error code otherwise.  The possible errors
are EADDRNOTAVAIL if an address being checked for doesn't match the
prison, and EAFNOSUPPORT if the prison doesn't have any addresses in
that address family.  For most callers of these functions, use the
returned error code instead of e.g. a hard-coded EADDRNOTAVAIL or
EINVAL.

Always include a jailed() check in these functions, where a non-jailed
cred always returns success (and makes no changes).  Remove the explicit
jailed() checks that preceded many of the function calls.

Approved by:	bz (mentor)
2009-02-05 14:06:09 +00:00
Ed Schouten
f3b86a5fd7 Mark most often used sysctl's as MPSAFE.
After running a `make buildkernel', I noticed most of the Giant locks in
sysctl are only caused by a very small amount of sysctl's:

- sysctl.name2oid. This one is locked by SYSCTL_LOCK, just like
  sysctl.oidfmt.

- kern.ident, kern.osrelease, kern.version, etc. These are just constant
  strings.

- kern.arandom, used by the stack protector. It is already protected by
  arc4_mtx.

I also saw the following sysctl's show up. Not as often as the ones
above, but still quite often:

- security.jail.jailed. Also mark security.jail.list as MPSAFE. They
  don't need locking or already use allprison_lock.

- kern.devname, used by devname(3), ttyname(3), etc.

This seems to reduce Giant locking inside sysctl by ~75% in my primitive
test setup.
2009-01-28 19:58:05 +00:00
Bjoern A. Zeeb
1cecba0fcd For consistency with prison_{local,remote,check}_ipN rename
prison_getipN to prison_get_ipN.

Submitted by:	jamie (as part of a larger patch)
MFC after:	1 week
2009-01-25 10:11:58 +00:00
Bjoern A. Zeeb
eb76156548 Back out r186615; the sanitizing of the pointers in the error case
is not needed and seems that it will not be needed either.

Pointy hat:	mine, mine, mine and not pho's
2009-01-04 12:18:18 +00:00
Peter Holm
1d347f061f Added missing second part of cleaning j->ip[46] as requested by bz
Approved by:	kib (mentor)
Pointy hat:	pho
2008-12-30 20:39:47 +00:00
Peter Holm
bc971b2c82 Make sure that unused j->ip[46] are cleared
Reviewed by:	bz
Approved by:	kib (mentor)
2008-12-30 17:54:25 +00:00
Bjoern A. Zeeb
0f1fe22db5 Correctly check the number of prison states to not access anything
outside the prison_states array.
When checking if there is a name configured for the prison, check the
first character to not be '\0' instead of checking if the char array
is present, which it always is. Note, that this is different for the
*jailname in the syscall.

Found with:	Coverity Prevent(tm)
CID:		4156, 4155
MFC after:	4 weeks (just that I get the mail)
2008-12-11 01:04:25 +00:00
Bjoern A. Zeeb
d465a41d3b Unbreak the no-networks (no INET/6) build that I broke with
the commit in r185435.

Pointyhat:	no, but I could need a ski cap for the winter
2008-11-29 16:17:39 +00:00
Bjoern A. Zeeb
413628a7e3 MFp4:
Bring in updated jail support from bz_jail branch.

This enhances the current jail implementation to permit multiple
addresses per jail. In addtion to IPv4, IPv6 is supported as well.
Due to updated checks it is even possible to have jails without
an IP address at all, which basically gives one a chroot with
restricted process view, no networking,..

SCTP support was updated and supports IPv6 in jails as well.

Cpuset support permits jails to be bound to specific processor
sets after creation.

Jails can have an unrestricted (no duplicate protection, etc.) name
in addition to the hostname. The jail name cannot be changed from
within a jail and is considered to be used for management purposes
or as audit-token in the future.

DDB 'show jails' command was added to aid debugging.

Proper compat support permits 32bit jail binaries to be used on 64bit
systems to manage jails. Also backward compatibility was preserved where
possible: for jail v1 syscalls, as well as with user space management
utilities.

Both jail as well as prison version were updated for the new features.
A gap was intentionally left as the intermediate versions had been
used by various patches floating around the last years.

Bump __FreeBSD_version for the afore mentioned and in kernel changes.

Special thanks to:
- Pawel Jakub Dawidek (pjd) for his multi-IPv4 patches
  and Olivier Houchard (cognet) for initial single-IPv6 patches.
- Jeff Roberson (jeff) and Randall Stewart (rrs) for their
  help, ideas and review on cpuset and SCTP support.
- Robert Watson (rwatson) for lots and lots of help, discussions,
  suggestions and review of most of the patch at various stages.
- John Baldwin (jhb) for his help.
- Simon L. Nielsen (simon) as early adopter testing changes
  on cluster machines as well as all the testers and people
  who provided feedback the last months on freebsd-jail and
  other channels.
- My employer, CK Software GmbH, for the support so I could work on this.

Reviewed by:	(see above)
MFC after:	3 months (this is just so that I get the mail)
X-MFC Before:   7.2-RELEASE if possible
2008-11-29 14:32:14 +00:00
Bjoern A. Zeeb
b9f0b66c75 With the permissions of phk@ change the license on kern_jail.c
to a 2 clause BSD license.
2008-11-28 19:23:46 +00:00
Pawel Jakub Dawidek
1ba4a712dd Update ZFS from version 6 to 13 and bring some FreeBSD-specific changes.
This bring huge amount of changes, I'll enumerate only user-visible changes:

- Delegated Administration

	Allows regular users to perform ZFS operations, like file system
	creation, snapshot creation, etc.

- L2ARC

	Level 2 cache for ZFS - allows to use additional disks for cache.
	Huge performance improvements mostly for random read of mostly
	static content.

- slog

	Allow to use additional disks for ZFS Intent Log to speed up
	operations like fsync(2).

- vfs.zfs.super_owner

	Allows regular users to perform privileged operations on files stored
	on ZFS file systems owned by him. Very careful with this one.

- chflags(2)

	Not all the flags are supported. This still needs work.

- ZFSBoot

	Support to boot off of ZFS pool. Not finished, AFAIK.

	Submitted by:	dfr

- Snapshot properties

- New failure modes

	Before if write requested failed, system paniced. Now one
	can select from one of three failure modes:
	- panic - panic on write error
	- wait - wait for disk to reappear
	- continue - serve read requests if possible, block write requests

- Refquota, refreservation properties

	Just quota and reservation properties, but don't count space consumed
	by children file systems, clones and snapshots.

- Sparse volumes

	ZVOLs that don't reserve space in the pool.

- External attributes

	Compatible with extattr(2).

- NFSv4-ACLs

	Not sure about the status, might not be complete yet.

	Submitted by:	trasz

- Creation-time properties

- Regression tests for zpool(8) command.

Obtained from:	OpenSolaris
2008-11-17 20:49:29 +00:00
Dag-Erling Smørgrav
1ede983cc9 Retire the MALLOC and FREE macros. They are an abomination unto style(9).
MFC after:	3 months
2008-10-23 15:53:51 +00:00
Marko Zec
8b615593fc Step 1.5 of importing the network stack virtualization infrastructure
from the vimage project, as per plan established at devsummit 08/08:
http://wiki.freebsd.org/Image/Notes200808DevSummit

Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator
macros, and CURVNET_SET() context setting macros, all currently
resolving to NOPs.

Prepare for virtualization of selected SYSCTL objects by introducing a
family of SYSCTL_V_*() macros, currently resolving to their global
counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT().

Move selected #defines from sys/sys/vimage.h to newly introduced header
files specific to virtualized subsystems (sys/net/vnet.h,
sys/netinet/vinet.h etc.).

All the changes are verified to have zero functional impact at this
point in time by doing MD5 comparision between pre- and post-change
object files(*).

(*) netipsec/keysock.c did not validate depending on compile time options.

Implemented by:	julian, bz, brooks, zec
Reviewed by:	julian, bz, brooks, kris, rwatson, ...
Approved by:	julian (mentor)
Obtained from:	//depot/projects/vimage-commit2/...
X-MFC after:	never
Sponsored by:	NLnet Foundation, The FreeBSD Foundation
2008-10-02 15:37:58 +00:00
Bjoern A. Zeeb
603724d3ab Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).

This is the first in a series of commits over the course
of the next few weeks.

Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.

We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.

Obtained from:	//depot/projects/vimage-commit2/...
Reviewed by:	brooks, des, ed, mav, julian,
		jamie, kris, rwatson, zec, ...
		(various people I forgot, different versions)
		md5 (with a bit of help)
Sponsored by:	NLnet Foundation, The FreeBSD Foundation
X-MFC after:	never
V_Commit_Message_Reviewed_By:	more people than the patch
2008-08-17 23:27:27 +00:00
Bjoern A. Zeeb
45e48455cd MFp4 144659:
Plug a memory leak with jail services.

PR:		125257
Submitted by:	Mateusz Guzik <mjguzik gmail.com>
MFC after:	6 days
2008-07-07 20:53:49 +00:00
Robert Watson
4f7d1876d5 Introduce a new lock, hostname_mtx, and use it to synchronize access
to global hostname and domainname variables.  Where necessary, copy
to or from a stack-local buffer before performing copyin() or
copyout().  A few uses, such as in cd9660 and daemon_saver, remain
under-synchronized and will require further updates.

Correct a bug in which a failed copyin() of domainname would leave
domainname potentially corrupted.

MFC after:	3 weeks
2008-07-05 13:10:10 +00:00
Xin LI
2110d913c0 Revert rev. 178124 as requested by kris@. Having jail id not being
reused too frequently is useful for script controlled environment.
2008-06-19 21:41:57 +00:00
Xin LI
31c50f53da Instead of rolling our own jail number allocation procedure, use
alloc_unr() to do it.

Submitted by:	Ed Schouten <ed 80386 nl>
PR:		kern/122270
MFC after:	1 month
2008-04-11 21:31:15 +00:00
Konstantin Belousov
57b4252e45 Add the support for the AT_FDCWD and fd-relative name lookups to the
namei(9).

Based on the submission by rdivacky,
	sponsored by Google Summer of Code 2007
Reviewed by:	rwatson, rdivacky
Tested by:	pho
2008-03-31 12:01:21 +00:00
Bjoern A. Zeeb
79ba395267 Replace the last susers calls in netinet6/ with privilege checks.
Introduce a new privilege allowing to set certain IP header options
(hop-by-hop, routing headers).

Leave a few comments to be addressed later.

Reviewed by:	rwatson (older version, before addressing his comments)
2008-01-24 08:25:59 +00:00
Attilio Rao
22db15c06f VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.

KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.

Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
2008-01-13 14:44:15 +00:00
Attilio Rao
cb05b60a89 vn_lock() is currently only used with the 'curthread' passed as argument.
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.

Manpage and FreeBSD_version will be updated through further commits.

As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.

Tested by:	Diego Sardina <siarodx at gmail dot com>,
		Andrea Di Pasquale <whyx dot it at gmail dot com>
2008-01-10 01:10:58 +00:00
Robert Watson
30d239bc4c Merge first in a series of TrustedBSD MAC Framework KPI changes
from Mac OS X Leopard--rationalize naming for entry points to
the following general forms:

  mac_<object>_<method/action>
  mac_<object>_check_<method/action>

The previous naming scheme was inconsistent and mostly
reversed from the new scheme.  Also, make object types more
consistent and remove spaces from object types that contain
multiple parts ("posix_sem" -> "posixsem") to make mechanical
parsing easier.  Introduce a new "netinet" object type for
certain IPv4/IPv6-related methods.  Also simplify, slightly,
some entry point names.

All MAC policy modules will need to be recompiled, and modules
not updates as part of this commit will need to be modified to
conform to the new KPI.

Sponsored by:	SPARTA (original patches against Mac OS X)
Obtained from:	TrustedBSD Project, Apple Computer
2007-10-24 19:04:04 +00:00
Robert Watson
e41966dc35 Add PRIV_VFS_STAT privilege, which will allow overriding policy limits on
the right to stat() a file, such as in mac_bsdextended.

Obtained from:	TrustedBSD Project
MFC after:	3 months
2007-10-21 22:50:11 +00:00
Pawel Jakub Dawidek
24b0502ee0 Fix jails and jail-friendly file systems handling:
- We need to allow for PRIV_VFS_MOUNT_OWNER inside a jail.
- Move security checks to vfs_suser() and deny unmounting and updating
  for jailed root from different jails, etc.

OK'ed by:	rwatson
2007-04-13 23:54:22 +00:00
Robert Watson
4b08405682 Allow PRIV_NETINET_REUSEPORT in jail. 2007-04-10 15:59:49 +00:00
Pawel Jakub Dawidek
c2cda60911 prison_free() can be called with a mutex held. This wasn't a problem until
I converted allprison_mtx mutex to allprison_lock sx lock. To fix this LOR,
move prison removal to prison_complete() entirely. To ensure that noone
will reference this prison before it's beeing removed from the list skip
prisons with 'pr_ref == 0' in prison_find() and assert that pr_ref has to
greater than 0 in prison_hold().

Reported by:	kris
OK'ed by:	rwatson
2007-04-08 10:46:23 +00:00
Pawel Jakub Dawidek
b63b0c6529 Only use prison mutex to protect the fields that need to be protected by it. 2007-04-08 10:21:38 +00:00
Pawel Jakub Dawidek
264de85e73 pr_list is protected by the allprison_lock. 2007-04-08 02:13:32 +00:00
Pawel Jakub Dawidek
dc68a63332 Implement functionality I called 'jail services'.
It may be used for external modules to attach some data to jail's in-kernel
structure.

- Change allprison_mtx mutex to allprison_sx sx(9) lock.
  We will need to call external functions while holding this lock, which may
  want to allocate memory.
  Make use of the fact that this is shared-exclusive lock and use shared
  version when possible.
- Implement the following functions:
  prison_service_register() - registers a service that wants to be noticed
	when a jail is created and destroyed
  prison_service_deregister() - deregisters service
  prison_service_data_add() - adds service-specific data to the jail structure
  prison_service_data_get() - takes service-specific data from the jail
	structure
  prison_service_data_del() - removes service-specific data from the jail
	structure

Reviewed by:	rwatson
2007-04-05 23:19:13 +00:00
Pawel Jakub Dawidek
54b369c1ae Make prison_find() globally accessible. 2007-04-05 21:34:54 +00:00
Pawel Jakub Dawidek
f3a8d2f93c Add security.jail.mount_allowed sysctl, which allows to mount and
unmount jail-friendly file systems from within a jail.
Precisely it grants PRIV_VFS_MOUNT, PRIV_VFS_UNMOUNT and
PRIV_VFS_MOUNT_NONUSER privileges for a jailed super-user.
It is turned off by default.

A jail-friendly file system is a file system which driver registers
itself with VFCF_JAIL flag via VFS_SET(9) API.
The lsvfs(1) command can be used to see which file systems are
jail-friendly ones.

There currently no jail-friendly file systems, ZFS will be the first one.
In the future we may consider marking file systems like nullfs as
jail-friendly.

Reviewed by:	rwatson
2007-04-05 21:03:05 +00:00
Pawel Jakub Dawidek
2709e8904f Minor simplification. 2007-03-09 05:22:10 +00:00
Pawel Jakub Dawidek
9e5dcf7b21 White space nits. 2007-03-07 21:24:51 +00:00
Robert Watson
0c14ff0eb5 Remove 'MPSAFE' annotations from the comments above most system calls: all
system calls now enter without Giant held, and then in some cases, acquire
Giant explicitly.

Remove a number of other MPSAFE annotations in the credential code and
tweak one or two other adjacent comments.
2007-03-04 22:36:48 +00:00
Pawel Jakub Dawidek
bb531912ff Rename PRIV_VFS_CLEARSUGID to PRIV_VFS_RETAINSUGID, which seems to better
describe the privilege.

OK'ed by:	rwatson
2007-03-01 20:47:42 +00:00
Robert Watson
95420afea4 Remove unused PRIV_IPC_EXEC. Renumbers System V IPC privilege. 2007-02-20 00:12:52 +00:00
Robert Watson
95b091d2f2 Rename three quota privileges from the UFS privilege namespace to the
VFS privilege namespace: exceedquota, getquota, and setquota.  Leave
UFS-specific quota configuration privileges in the UFS name space.

This renumbers VFS and UFS privileges, so requires rebuilding modules
if you are using security policies aware of privilege identifiers.
This is likely no one at this point since none of the committed MAC
policies use the privilege checks.
2007-02-19 13:33:10 +00:00
Robert Watson
e82d0201bd Limit quota privileges in jail to PRIV_UFS_GETQUOTA and
PRIV_UFS_SETQUOTA.
2007-02-19 13:26:39 +00:00
Robert Watson
c3c1b5e62a For now, reflect practical reality that Audit system calls aren't
allowed in Jail: return a privilege error.
2007-02-19 13:10:29 +00:00
Robert Watson
800c940832 Add a new priv(9) kernel interface for checking the availability of
privilege for threads and credentials.  Unlike the existing suser(9)
interface, priv(9) exposes a named privilege identifier to the privilege
checking code, allowing more complex policies regarding the granting of
privilege to be expressed.  Two interfaces are provided, replacing the
existing suser(9) interface:

suser(td)                 ->   priv_check(td, priv)
suser_cred(cred, flags)   ->   priv_check_cred(cred, priv, flags)

A comprehensive list of currently available kernel privileges may be
found in priv.h.  New privileges are easily added as required, but the
comments on adding privileges found in priv.h and priv(9) should be read
before doing so.

The new privilege interface exposed sufficient information to the
privilege checking routine that it will now be possible for jail to
determine whether a particular privilege is granted in the check routine,
rather than relying on hints from the calling context via the
SUSER_ALLOWJAIL flag.  For now, the flag is maintained, but a new jail
check function, prison_priv_check(), is exposed from kern_jail.c and used
by the privilege check routine to determine if the privilege is permitted
in jail.  As a result, a centralized list of privileges permitted in jail
is now present in kern_jail.c.

The MAC Framework is now also able to instrument privilege checks, both
to deny privileges otherwise granted (mac_priv_check()), and to grant
privileges otherwise denied (mac_priv_grant()), permitting MAC Policy
modules to implement privilege models, as well as control a much broader
range of system behavior in order to constrain processes running with
root privilege.

The suser() and suser_cred() functions remain implemented, now in terms
of priv_check() and the PRIV_ROOT privilege, for use during the transition
and possibly continuing use by third party kernel modules that have not
been updated.  The PRIV_DRIVER privilege exists to allow device drivers to
check privilege without adopting a more specific privilege identifier.

This change does not modify the actual security policy, rather, it
modifies the interface for privilege checks so changes to the security
policy become more feasible.

Sponsored by:		nCircle Network Security, Inc.
Obtained from:		TrustedBSD Project
Discussed on:		arch@
Reviewed (at least in part) by:	mlaier, jmg, pjd, bde, ceri,
			Alex Lyashkov <umka at sevcity dot net>,
			Skip Ford <skip dot ford at verizon dot net>,
			Antoine Brodin <antoine dot brodin at laposte dot net>
2006-11-06 13:37:19 +00:00
Robert Watson
aed5570872 Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h
begun with a repo-copy of mac.h to mac_framework.h.  sys/mac.h now
contains the userspace and user<->kernel API and definitions, with all
in-kernel interfaces moved to mac_framework.h, which is now included
across most of the kernel instead.

This change is the first step in a larger cleanup and sweep of MAC
Framework interfaces in the kernel, and will not be MFC'd.

Obtained from:	TrustedBSD Project
Sponsored by:	SPARTA
2006-10-22 11:52:19 +00:00
Robert Watson
5702e0965e Declare security and security.bsd sysctl hierarchies in sysctl.h along
with other commonly used sysctl name spaces, rather than declaring them
all over the place.

MFC after:	1 month
Sponsored by:	nCircle Network Security, Inc.
2006-09-17 20:00:36 +00:00
Christian S.J. Peron
453f7d5369 Push Giant down in jails. Pass the MPSAFE flag to NDINIT, and keep track
of whether or not Giant was picked up by the filesystem. Add VFS_LOCK_GIANT
macros around vrele as it's possible that this can call in the VOP_INACTIVE
filesystem specific code. Also while we are here, remove the Giant assertion.
from the sysctl handler,  we do not actually require Giant here so we
shouldn't assert it. Doing so will just complicate things when Giant is removed
from the sysctl framework.
2005-09-28 00:30:56 +00:00
Pawel Jakub Dawidek
06a137780b Actually only protect mount-point if security.jail.enforce_statfs is set to 2.
If we don't return statistics about requested file systems, system tools
may not work correctly or at all.

Approved by:	re (scottl)
2005-06-23 22:13:29 +00:00
Pawel Jakub Dawidek
820a0de9a9 Rename sysctl security.jail.getfsstatroot_only to security.jail.enforce_statfs
and extend its functionality:

value	policy
0	show all mount-points without any restrictions
1	show only mount-points below jail's chroot and show only part of the
	mount-point's path (if jail's chroot directory is /jails/foo and
	mount-point is /jails/foo/usr/home only /usr/home will be shown)
2	show only mount-point where jail's chroot directory is placed.

Default value is 2.

Discussed with:	rwatson
2005-06-09 18:49:19 +00:00