Commit Graph

142 Commits

Author SHA1 Message Date
jeff
2d7d8c05e7 - Merge changes to the base system to support OFED. These include
a wider arg2 for sysctl, updates to vlan code, IFT_INFINIBAND,
   and other miscellaneous small features.
2011-03-21 09:40:01 +00:00
trasz
62f6a13e39 Add two new system calls, setloginclass(2) and getloginclass(2). This makes
it possible for the kernel to track login class the process is assigned to,
which is required for RCTL.  This change also make setusercontext(3) call
setloginclass(2) and makes it possible to retrieve current login class using
id(1).

Reviewed by:	kib (as part of a larger patch)
2011-03-05 12:40:35 +00:00
dchagin
1e124ec538 Add macro to test the sv_flags of any process. Change some places to test
the flags instead of explicit comparing with address of known sysentvec
structures.

MFC after:	1 month
2011-01-26 20:03:58 +00:00
bz
246fcf4d6f Mfp4 CH177924:
Add and export constants of array sizes of jail parameters as compiled into
the kernel.
This is the least intrusive way to allow kvm to read the (sparse) arrays
independent of the options the kernel was compiled with.

Reviewed by:	jhb (originally)
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Sponsored by:	CK Software GmbH
2010-12-31 22:49:13 +00:00
jamie
5a233127aa Don't exit kern_jail_set without freeing options when enforce_statfs
has an illegal value.

MFC after:	3 days
2010-09-10 21:45:42 +00:00
jamie
4e0690ba81 Back out r210974. Any convenience of not typing "persist" is outweighed
by the possibility of unintended partially-formed jails.
2010-08-08 23:22:55 +00:00
jamie
37e8c8fb79 Implicitly make a new jail persistent if it's set not to attach.
MFC after:	3 days
2010-08-06 22:04:18 +00:00
cperciva
4adc6d09d8 Declare ip6 as (struct in6_addr *) instead of (struct in_addr *). This is
a harmless bug since we never actually use ip6 as anything other than an
opaque pointer.

Found with:	Coverty Prevent(tm)
CID:		4319
MFC after:	1 month
2010-06-04 14:38:24 +00:00
nwhitehorn
142a4d2993 Provide groundwork for 32-bit binary compatibility on non-x86 platforms,
for upcoming 64-bit PowerPC and MIPS support. This renames the COMPAT_IA32
option to COMPAT_FREEBSD32, removes some IA32-specific code from MI parts
of the kernel and enhances the freebsd32 compatibility code to support
big-endian platforms.

Reviewed by:	kib, jhb
2010-03-11 14:49:06 +00:00
delphij
d9a0cd0982 Revised revision 199201 (add interface description capability as inspired
by OpenBSD), based on comments from many, including rwatson, jhb, brooks
and others.

Sponsored by:	iXsystems, Inc.
MFC after:	1 month
2010-01-27 00:30:07 +00:00
bz
d80ba03e3c Add ip4.saddrsel/ip4.nosaddrsel (and equivalent for ip6) to control
whether to use source address selection (default) or the primary
jail address for unbound outgoing connections.

This is intended to be used by people upgrading from single-IP
jails to multi-IP jails but not having to change firewall rules,
application ACLs, ... but to force their connections (unless
otherwise changed) to the primry jail IP they had been used for
years, as well as for people prefering to implement similar policies.

Note that for IPv6, if configured incorrectly, this might lead to
scope violations, which single-IPv6 jails could as well, as by the
design of jails. [1]

Reviewed by:	jamie, hrs (ipv6 part)
Pointed out by:	hrs [1]
MFC After:	2 weeks
Asked for by:	Jase Thew (bazerka beardz.net)
2010-01-17 12:57:11 +00:00
bz
bdc1b5009d Change DDB show prison:
- name some columns more closely to the user space variables,
  as we do for host.* or allow.* (in the listing) already.
- print pr_childmax (children.max).
- prefix hex values with 0x.

MFC after:	3 weeks
2010-01-11 22:34:25 +00:00
bz
4ba08a5642 Adjust a comment to reflect reality, as we have proper source
address selection, even for IPv4, since r183571.

Pointed out by:	Jase Thew (bazerka beardz.net)
MFC after:	3 days
2010-01-11 21:21:30 +00:00
antoine
bfd388c026 (S)LIST_HEAD_INITIALIZER takes a (S)LIST_HEAD as an argument.
Fix some wrong usages.
Note: this does not affect generated binaries as this argument is not used.

PR:		137213
Submitted by:	Eygene Ryabinkin (initial version)
MFC after:	1 month
2009-12-28 22:56:30 +00:00
bz
932cbdbe4d Throughout the network stack we have a few places of
if (jailed(cred))
left.  If you are running with a vnet (virtual network stack) those will
return true and defer you to classic IP-jails handling and thus things
will be "denied" or returned with an error.

Work around this problem by introducing another "jailed()" function,
jailed_without_vnet(), that also takes vnets into account, and permits
the calls, should the jail from the given cred have its own virtual
network stack.

We cannot change the classic jailed() call to do that,  as it is used
outside the network stack as well.

Discussed with:	julian, zec, jamie, rwatson (back in Sept)
MFC after:	5 days
2009-12-13 13:57:32 +00:00
delphij
8fed657163 Revert revision 199201 for now as it has introduced a kernel vulnerability
and requires more polishing.
2009-11-12 19:02:10 +00:00
delphij
13a19ef806 Add interface description capability as inspired by OpenBSD.
MFC after:	3 months
2009-11-11 21:30:58 +00:00
phk
5297261651 Revert previous commit and add myself to the list of people who should
know better than to commit with a cat in the area.
2009-09-08 13:19:05 +00:00
phk
2314521104 Add necessary include. 2009-09-08 13:16:55 +00:00
jamie
319f7fbbc6 Allow a jail's name to be the same as its jid (which is the default if no
name is specified), but still disallow other numeric names.

Reviewed by:	zec
Approved by:	bz (mentor)
MFC after:	3 days
2009-09-04 19:00:48 +00:00
jamie
c78efa488e Fix a LOR between allprison_lock and vnode locks by releasing
allprison_lock before releasing a prison's root vnode.

PR:		kern/138004
Reviewed by:	kib
Approved by:	bz (mentor)
MFC after:	3 days
2009-08-27 16:15:51 +00:00
zec
927d43d574 When "jail -c vnet" request fails, the current code actually creates and
leaves behind an orphaned vnet.  This change ensures that such vnets get
released.

This change affects only options VIMAGE builds.

Submitted by:	jamie
Discussed with:	bz
Approved by:	re (rwatson), julian (mentor)
MFC after:	3 days
2009-08-24 10:16:19 +00:00
bz
5307a46b8b Make it possible to change the vnet sysctl variables on jails
with their own virtual network stack. Jails only inheriting a
network stack cannot change anything that cannot be changed from
within a prison.

Reviewed by:	rwatson, zec
Approved by:	re (kib)
2009-08-13 10:26:34 +00:00
bz
b45bae484a Make the kernel compile without IP networking by moving
a variable under a proper #ifdef.

Approved by:	re (rwatson)
2009-08-12 12:12:23 +00:00
rwatson
fb9ffed650 Merge the remainder of kern_vimage.c and vimage.h into vnet.c and
vnet.h, we now use jails (rather than vimages) as the abstraction
for virtualization management, and what remained was specific to
virtual network stacks.  Minor cleanups are done in the process,
and comments updated to reflect these changes.

Reviewed by:	bz
Approved by:	re (vimage blanket)
2009-08-01 19:26:27 +00:00
jamie
e87d51a605 Make the "enforce_statfs" default 2 (most restrictive) in jail_set(2),
instead of whatever the parent/system has (which is generally 0).  This
mirrors the old-style default used for jail(2) in conjunction with the
security.jail.enforce_statfs sysctl.

Approved by:	re (kib), bz (mentor)
2009-07-31 16:00:41 +00:00
jamie
0a7374675b Remove a LOR, where the the sleepable allprison_lock was being obtained
in prison_equal_ip4/6 while an inp mutex was held.  Locking allprison_lock
can be avoided by making a restriction on the IP addresses associated with
jails:

Don't allow the "ip4" and "ip6" parameters to be changed after a jail is
created.  Setting the "ip4.addr" and "ip6.addr" parameters is allowed,
but only if the jail was already created with either ip4/6=new or
ip4/6=disable.  With this restriction, the prison flags in question
(PR_IP4_USER and PR_IP6_USER) become read-only and can be checked
without locking.

This also allows the simplification of a messy code path that was needed
to handle an existing prison gaining an IP address list.

PR:		kern/136899
Reported by:	Dirk Meyer
Approved by:	re (kib), bz (mentor)
2009-07-30 14:28:56 +00:00
jamie
2fc68fe1d7 Don't allow mixing the "vnet" and "ip4/6" jail parameters, since vnet
jails have their own IP stack and don't have access to the parent IP
addresses anyway.  Note that a virtual network stack forms a break
between prisons with regard to the list of allowed IP addresses.

Approved by:	re (kib), bz (mentor)
2009-07-29 16:46:59 +00:00
jamie
4bceb596d2 Change the default value of the "ip4" and "ip6" jail parameters to
"disable", which only allows access to the parent/physical system's
IP addresses when specifically directed.  Change the default value of
"host" to "new", and don't copy the parent host values, to insulate
jails from the parent hostname et al.

Approved by:	re (kib), bz (mentor)
2009-07-29 16:41:02 +00:00
jamie
274ea197bb Some jail parameters (in particular, "ip4" and "ip6" for IP address
restrictions) were found to be inadequately described by a boolean.
Define a new parameter type with three values (disable, new, inherit)
to handle these and future cases.

Approved by:	re (kib), bz (mentor)
Discussed with:	rwatson
2009-07-25 14:48:57 +00:00
jamie
9f81cbd9ec Remove the interim vimage containers, struct vimage and struct procg,
and the ioctl-based interface that supported them.

Approved by:	re (kib), bz (mentor)
2009-07-17 14:48:21 +00:00
jamie
3a4b2dc4d4 Wrap a PR_VNET inside "#ifdef VIMAGE" since that the only place it applies.
bz wants the blame for this.

Noticed by:	rwatson
Approved by:	bz (mentor)
2009-06-24 22:06:56 +00:00
jamie
e53e57277b In case of prisons with their own network stack, permit
additional privileges as well as not restricting the type of
sockets a user can open.

Note: the VIMAGE/vnet fetaure of of jails is still considered
      experimental and cannot guarantee that privileged users
      can be kept imprisoned if enabled.

Reviewed by:	rwatson
Approved by:	bz (mentor)
2009-06-24 21:39:50 +00:00
jamie
eeafb36508 Add a limit for child jails via the "children.cur" and "children.max"
parameters.  This replaces the simple "allow.jails" permission.

Approved by:	bz (mentor)
2009-06-23 20:35:51 +00:00
jamie
5675a54fb1 Manage vnets via the jail system. If a jail is given the boolean
parameter "vnet" when it is created, a new vnet instance will be created
along with the jail.  Networks interfaces can be moved between prisons
with an ioctl similar to the one that moves them between vimages.
For now vnets will co-exist under both jails and vimages, but soon
struct vimage will be going away.

Reviewed by:	zec, julian
Approved by:	bz (mentor)
2009-06-15 18:59:29 +00:00
jamie
f419891544 Rename the host-related prison fields to be the same as the host.*
parameters they represent, and the variables they replaced, instead of
abbreviated versions of them.

Approved by:	bz (mentor)
2009-06-13 15:39:12 +00:00
jamie
e9da16507b Add counterparts to getcredhostname:
getcreddomainname, getcredhostuuid, getcredhostid

Suggested by:	rmacklem
Approved by:	bz
2009-06-13 00:12:02 +00:00
jamie
a353b40c7e Fix some overflow errors: a signed allocation and an insufficiant
array size.

Reported by:	pho
Tested by:	pho
Approved by:	bz (mentor)
2009-06-09 22:09:29 +00:00
rwatson
f4934662e5 Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC
and used in a large number of files, but also because an increasing number
of incorrect uses of MAC calls were sneaking in due to copy-and-paste of
MAC-aware code without the associated opt_mac.h include.

Discussed with:	pjd
2009-06-05 14:55:22 +00:00
jamie
572db1408a Place hostnames and similar information fully under the prison system.
The system hostname is now stored in prison0, and the global variable
"hostname" has been removed, as has the hostname_mtx mutex.  Jails may
have their own host information, or they may inherit it from the
parent/system.  The proper way to read the hostname is via
getcredhostname(), which will copy either the hostname associated with
the passed cred, or the system hostname if you pass NULL.  The system
hostname can still be accessed directly (and without locking) at
prison0.pr_host, but that should be avoided where possible.

The "similar information" referred to is domainname, hostid, and
hostuuid, which have also become prison parameters and had their
associated global variables removed.

Approved by:	bz (mentor)
2009-05-29 21:27:12 +00:00
jamie
a013e0afcb Add hierarchical jails. A jail may further virtualize its environment
by creating a child jail, which is visible to that jail and to any
parent jails.  Child jails may be restricted more than their parents,
but never less.  Jail names reflect this hierarchy, being MIB-style
dot-separated strings.

Every thread now points to a jail, the default being prison0, which
contains information about the physical system.  Prison0's root
directory is the same as rootvnode; its hostname is the same as the
global hostname, and its securelevel replaces the global securelevel.
Note that the variable "securelevel" has actually gone away, which
should not cause any problems for code that properly uses
securelevel_gt() and securelevel_ge().

Some jail-related permissions that were kept in global variables and
set via sysctls are now per-jail settings.  The sysctls still exist for
backward compatibility, used only by the now-deprecated jail(2) system
call.

Approved by:	bz (mentor)
2009-05-27 14:11:23 +00:00
jamie
e446b3e48f Delay an error message until the variable it uses gets initialized.
Found with:	Coverity Prevent(tm)
CID:		4316
Reported by:	trasz
Approved by:	bz (mentor)
2009-05-23 16:13:26 +00:00
zec
639797b2e6 Introduce a new virtualization container, provisionally named vprocg, to hold
virtualized instances of hostname and domainname, as well as a new top-level
virtualization struct vimage, which holds pointers to struct vnet and struct
vprocg.  Struct vprocg is likely to become replaced in the near future with
a new jail management API import.

As a consequence of this change, change struct ucred to point to a struct
vimage, instead of directly pointing to a vnet.

Merge vnet / vimage / ucred refcounting infrastructure from p4 / vimage
branch.

Permit kldload / kldunload operations to be executed only from the default
vimage context.

This change should have no functional impact on nooptions VIMAGE kernel
builds.

Reviewed by:	bz
Approved by:	julian (mentor)
2009-05-08 14:11:06 +00:00
jamie
267ea54b44 Move the per-prison Linux MIB from a private one-off pointer to the new
OSD-based jail extensions.  This allows the Linux MIB to accessed via
jail_set and jail_get, and serves as a demonstration of adding jail support
to a module.

Reviewed by:	dchagin, kib
Approved by:	bz (mentor)
2009-05-07 18:36:47 +00:00
jamie
453b86f943 Introduce the extensible jail framework, using the same "name=value"
interface as nmount(2).  Three new system calls are added:
* jail_set, to create jails and change the parameters of existing jails.
  This replaces jail(2).
* jail_get, to read the parameters of existing jails.  This replaces the
  security.jail.list sysctl.
* jail_remove to kill off a jail's processes and remove the jail.
Most jail parameters may now be changed after creation, and jails may be
set to exist without any attached processes.  The current jail(2) system
call still exists, though it is now a stub to jail_set(2).

Approved by:	bz (mentor)
2009-04-29 21:14:15 +00:00
jamie
51a4d1c4a3 Some non-functional changes: whitespace, KASSERT strings, declaration order.
Approved by:	bz (mentor)
2009-04-29 18:41:08 +00:00
jamie
5a5a677581 Whitespace/spelling fixes in advance of upcoming functional changes.
Approved by:	bz (mentor)
2009-03-27 13:13:59 +00:00
jamie
8f639d4b9a Don't allow creating a socket with a protocol family that the current
jail doesn't support.  This involves a new function prison_check_af,
like prison_check_ip[46] but that checks only the family.

With this change, most of the errors generated by jailed sockets
shouldn't ever occur, at least until jails are changeable.

Approved by:	bz (mentor)
2009-02-05 14:15:18 +00:00
jamie
12bbe1869f Standardize the various prison_foo_ip[46] functions and prison_if to
return zero on success and an error code otherwise.  The possible errors
are EADDRNOTAVAIL if an address being checked for doesn't match the
prison, and EAFNOSUPPORT if the prison doesn't have any addresses in
that address family.  For most callers of these functions, use the
returned error code instead of e.g. a hard-coded EADDRNOTAVAIL or
EINVAL.

Always include a jailed() check in these functions, where a non-jailed
cred always returns success (and makes no changes).  Remove the explicit
jailed() checks that preceded many of the function calls.

Approved by:	bz (mentor)
2009-02-05 14:06:09 +00:00
ed
50efccc9f0 Mark most often used sysctl's as MPSAFE.
After running a `make buildkernel', I noticed most of the Giant locks in
sysctl are only caused by a very small amount of sysctl's:

- sysctl.name2oid. This one is locked by SYSCTL_LOCK, just like
  sysctl.oidfmt.

- kern.ident, kern.osrelease, kern.version, etc. These are just constant
  strings.

- kern.arandom, used by the stack protector. It is already protected by
  arc4_mtx.

I also saw the following sysctl's show up. Not as often as the ones
above, but still quite often:

- security.jail.jailed. Also mark security.jail.list as MPSAFE. They
  don't need locking or already use allprison_lock.

- kern.devname, used by devname(3), ttyname(3), etc.

This seems to reduce Giant locking inside sysctl by ~75% in my primitive
test setup.
2009-01-28 19:58:05 +00:00