The Jail paper, written jointly by rwatson & me.
This commit is contained in:
parent
60fc5205f3
commit
d15d3eac7b
10
share/doc/papers/jail/Makefile
Normal file
10
share/doc/papers/jail/Makefile
Normal file
@ -0,0 +1,10 @@
|
||||
# $FreeBSD$
|
||||
PRINTERDEVICE=ps
|
||||
NODOCCOMPRESS=1
|
||||
VOLUME= papers
|
||||
DOC= jail
|
||||
SRCS= paper.ms
|
||||
MACROS= -ms -U
|
||||
OBJS= implementation.ms mgt.ms future.ms
|
||||
|
||||
.include <bsd.doc.mk>
|
104
share/doc/papers/jail/future.ms
Normal file
104
share/doc/papers/jail/future.ms
Normal file
@ -0,0 +1,104 @@
|
||||
.\"
|
||||
.\" $FreeBSD$
|
||||
.\"
|
||||
.NH
|
||||
Future Directions
|
||||
.PP
|
||||
The jail facility has already been deployed in numerous capacities and
|
||||
a few opportunities for improvement have manifested themselves.
|
||||
.NH 2
|
||||
Improved Virtualisation
|
||||
.PP
|
||||
As it stands, the jail code provides a strict subset of system resources
|
||||
to the jail environment, based on access to processes, files, network
|
||||
resources, and privileged services.
|
||||
Virtualisation, or making the jail environments appear to be fully
|
||||
functional FreeBSD systems, allows maximum application support and the
|
||||
ability to offer a wide range of services within a jail environment.
|
||||
However, there are a number of limitations on the degree of virtualisation
|
||||
in the current code, and removing these limitations will enhance the
|
||||
ability to offer services in a jail environment.
|
||||
Two areas that deserve greater attention are the virtualisation of
|
||||
network resources, and management of scheduling resources.
|
||||
.PP
|
||||
Currently, a single IP address may be allocated to each jail, and all
|
||||
communication from the jail is limited to that IP address.
|
||||
In particular, these addresses are IPv4 addresses.
|
||||
There has been substantial interest in improving interface virtualisation,
|
||||
allowing one or more addresses to be assigned to an interface, and
|
||||
removing the requirement that the address be an IPv4 address, allowing
|
||||
the use of IPv6.
|
||||
Also, access to raw sockets is currently prohibited, as the current
|
||||
implementation of raw sockets allows access to raw IP packets associated
|
||||
with all interfaces.
|
||||
Limiting the scope of the raw socket would allow its safe use within
|
||||
a jail, re-enabling support for ping, and other network debugging and
|
||||
evaluation tools.
|
||||
.PP
|
||||
Another area of great interest to the current consumers of the jail code
|
||||
is the ability to limit the impact of one jail on the CPU resources
|
||||
available for other jails.
|
||||
Specifically, this would require that the jail of a process play a rule in
|
||||
its scheduling parameters.
|
||||
Prior work in the area of lottery scheduling, currently available as
|
||||
patches on FreeBSD 2.2.x, might be leveraged to allow some degree of
|
||||
partitioning between jail environments \s-2[LOTTERY1] [LOTTERY2]\s+2.
|
||||
However, as the current scheduling mechanism is targeted at time
|
||||
sharing, and FreeBSD does not currently support real time preemption
|
||||
of processes in kernel, complete partitioning is not possible within the
|
||||
current framework.
|
||||
.NH 2
|
||||
Improved Management
|
||||
.PP
|
||||
Management of jail environments is currently somewhat ad hoc--creating
|
||||
and starting jails is a well-documented procedure, but day-to-day
|
||||
management of jails, as well as special case procedures such as shutdown,
|
||||
are not well analysed and documented.
|
||||
The current kernel process management infrastructure does not have the
|
||||
ability to manage pools of processes in a jail-centric way.
|
||||
For example, it is possible to, within a jail, deliver a signal to all
|
||||
processes in a jail, but it is not possibly to atomically target all
|
||||
processes within a jail from outside of the jail.
|
||||
If the jail code is to effectively limit the behaviour of a jail, the
|
||||
ability to shut it down cleanly is paramount.
|
||||
Similarly, shutting down a jail cleanly from within is also not well
|
||||
defined, the traditional shutdown utilities having been written with
|
||||
a host environment in mind.
|
||||
This suggests a number of improvements, both in the kernel and in the
|
||||
user-land utility set.
|
||||
.PP
|
||||
First, the ability to address kernel-centric management mechanisms at
|
||||
jails is important.
|
||||
One way in which this might be done is to assign a unique jail id, not
|
||||
unlike a process id or process group id, at jail creation time.
|
||||
A new jailkill() syscall would permit the direction of signals to
|
||||
specific jailids, allowing for the effective termination of all processes
|
||||
in the jail.
|
||||
A unique jailid could also supplant the hostname as the unique
|
||||
identifier for a jail, allowing the hostname to be changed by the
|
||||
processes in the jail without interfering with jail management.
|
||||
.PP
|
||||
More carefully defining the user-land semantics of a jail during startup
|
||||
and shutdown is also important.
|
||||
The traditional FreeBSD environment makes use of an init process to
|
||||
bring the system up during the boot process, and to assist in shutdown.
|
||||
A similar technique might be used for jail, in effect a jailinit,
|
||||
formulated to handle the clean startup and shutdown, including calling
|
||||
out to jail-local /etc/rc.shutdown, and other useful shutdown functions.
|
||||
A jailinit would also present a central location for delivering
|
||||
management requests to within a jail from the host environment, allowing
|
||||
the host environment to request the shutdown of the jail cleanly, before
|
||||
resorting to terminating processes, in the same style as the host
|
||||
environment shutting down before killing all processes and halting the
|
||||
kernel.
|
||||
.PP
|
||||
Improvements in the host environment would also assist in improving
|
||||
jail management, possibly including automated runtime jail management tools,
|
||||
tools to more easily construct the per-jail file system area, and
|
||||
include jail shutdown as part of normal system shutdown.
|
||||
.PP
|
||||
These improvements in the jail framework would improve both raw
|
||||
functionality and usability from a management perspective.
|
||||
The jail code has raised significant interest in the FreeBSD community,
|
||||
and it is hoped that this type of improved functionality will be
|
||||
available in upcoming releases of FreeBSD.
|
126
share/doc/papers/jail/implementation.ms
Normal file
126
share/doc/papers/jail/implementation.ms
Normal file
@ -0,0 +1,126 @@
|
||||
.\"
|
||||
.\" $FreeBSD$
|
||||
.\"
|
||||
.NH
|
||||
Implementation jail in the FreeBSD kernel.
|
||||
.NH 2
|
||||
The jail(2) system call, allocation, refcounting and deallocation of
|
||||
\fCstruct prison\fP.
|
||||
.PP
|
||||
The jail(2) system call is implemented as a non-optional system call
|
||||
in FreeBSD. Other system calls are controlled by compile time options
|
||||
in the kernel configuration file, but due to the minute footprint of
|
||||
the jail implementation, it was decided to make it a standard
|
||||
facility in FreeBSD.
|
||||
.PP
|
||||
The implementation of the system call is straightforward: a data structure
|
||||
is allocated and populated with the arguments provided. The data structure
|
||||
is attached to the current process' \fCstruct proc\fP, its reference count
|
||||
set to one and a call to the
|
||||
chroot(2) syscall implementation completes the task.
|
||||
.PP
|
||||
Hooks in the code implementing process creation and destruction maintains
|
||||
the reference count on the data structure and free it when the last reference
|
||||
is lost.
|
||||
Any new process created by a process in a jail will inherit a reference
|
||||
to the jail, which effectively puts the new process in the same jail.
|
||||
.PP
|
||||
There is no way to modify the contents of the data structure describing
|
||||
the jail after its creation, and no way to attach a process to an existing
|
||||
jail if it was not created from the inside that jail.
|
||||
.NH 2
|
||||
Fortification of the chroot(2) facility for filesystem name scoping.
|
||||
.PP
|
||||
A number of ways to escape the confines of a chroot(2)-created subscope
|
||||
of the filesystem view have been identified over the years.
|
||||
chroot(2) was never intended to be security mechanism as such, but even
|
||||
then the ftp daemon largely depended on the security provided by
|
||||
chroot(2) to provide the ``anonymous ftp'' access method.
|
||||
.PP
|
||||
Three classes of escape routes existed: recursive chroot(2) escapes,
|
||||
``..'' based escapes and fchdir(2) based escapes.
|
||||
All of these exploited the fact that chroot(2) didn't try sufficiently
|
||||
hard to enforce the new root directory.
|
||||
.PP
|
||||
New code were added to detect and thwart these escapes, amongst
|
||||
other things by tracking the directory of the first level of chroot(2)
|
||||
experienced by a process and refusing backwards traversal across
|
||||
this directory, as well as additional code to refuse chroot(2) if
|
||||
file-descriptors were open referencing directories.
|
||||
.NH 2
|
||||
Restriction of process visibility and interaction.
|
||||
.PP
|
||||
A macro was already in available in the kernel to determine if one process
|
||||
could affect another process. This macro did the rather complex checking
|
||||
of uid and gid values. It was felt that the complexity of the macro were
|
||||
approaching the lower edge of IOCCC entrance criteria, and it was therefore
|
||||
converted to a proper function named \fCp_trespass(p1, p2)\fP which does
|
||||
all the previous checks and additionally checks the jail aspect of the access.
|
||||
The check is implemented such that access fails if the origin process is jailed
|
||||
but the target process is not in the same jail.
|
||||
.PP
|
||||
Process visibility is provided through two mechanisms in FreeBSD,
|
||||
the \fCprocfs\fP file system and a sub-tree of the \fCsysctl\fP tree.
|
||||
Both of these were modified to report only the processes in the same
|
||||
jail to a jailed process.
|
||||
.NH 2
|
||||
Restriction to one IP number.
|
||||
.PP
|
||||
Restricting TCP and UDP access to just one IP number was done almost
|
||||
entirely in the code which manages ``protocol control blocks''.
|
||||
When a jailed process binds to a socket, the IP number provided by
|
||||
the process will not be used, instead the pre-configured IP number of
|
||||
the jail is used.
|
||||
.PP
|
||||
BSD based TCP/IP network stacks sport a special interface, the loop-back
|
||||
interface, which has the ``magic'' IP number 127.0.0.1.
|
||||
This is often used by processes to contact servers on the local machine,
|
||||
and consequently special handling for jails were needed.
|
||||
To handle this case it was necessary to also intercept and modify the
|
||||
behaviour of connection establishment, and when the 127.0.0.1 address
|
||||
were seen from a jailed process, substitute the jails configured IP number.
|
||||
.PP
|
||||
Finally the APIs through which the network configuration and connection
|
||||
state may be queried were modified to report only information relevant
|
||||
to the configured IP number of a jailed process.
|
||||
.NH 2
|
||||
Adding jail awareness to selected device drivers.
|
||||
.PP
|
||||
A couple of device drivers needed to be taught about jails, the ``pty''
|
||||
driver is one of them. The pty driver provides ``virtual terminals'' to
|
||||
services like telnet, ssh, rlogin and X11 terminal window programs.
|
||||
Therefore jails need access to the pty driver, and code had to be added
|
||||
to enforce that a particular virtual terminal were not accessed from more
|
||||
than one jail at the same time.
|
||||
.NH 2
|
||||
General restriction of super-users powers for jailed super-users.
|
||||
.PP
|
||||
This item proved to be the simplest but most tedious to implement.
|
||||
Tedious because a manual review of all places where the kernel allowed
|
||||
the super user special powers were called for,
|
||||
simple because very few places were required to let a jailed root through.
|
||||
Of the approximately 260 checks in the FreeBSD 4.0 kernel, only
|
||||
about 35 will let a jailed root through.
|
||||
.PP
|
||||
Since the default is for jailed roots to not receive privilege, new
|
||||
code or drivers in the FreeBSD kernel are automatically jail-aware: they
|
||||
will refuse jailed roots privilege.
|
||||
The other part of this protection comes from the fact that a jailed
|
||||
root cannot create new device nodes with the mknod(2) systemcall, so
|
||||
unless the machine administrator creates device nodes for a particular
|
||||
device inside the jails filesystem tree, the driver in effect does
|
||||
not exist in the jail.
|
||||
.PP
|
||||
As a side-effect of this work the suser(9) API were cleaned up and
|
||||
extended to cater for not only the jail facility, but also to make room
|
||||
for future partitioning facilities.
|
||||
.NH 2
|
||||
Implementation statistics
|
||||
.PP
|
||||
The change of the suser(9) API modified approx 350 source lines
|
||||
distributed over approx. 100 source files. The vast majority of
|
||||
these changes were generated automatically with a script.
|
||||
.PP
|
||||
The implementation of the jail facility added approx 200 lines of
|
||||
code in total, distributed over approx. 50 files. and about 200 lines
|
||||
in two new kernel files.
|
234
share/doc/papers/jail/jail01.eps
Normal file
234
share/doc/papers/jail/jail01.eps
Normal file
@ -0,0 +1,234 @@
|
||||
%!PS-Adobe-2.0 EPSF-2.0
|
||||
%%Title: jail01.eps
|
||||
%%Creator: fig2dev Version 3.2 Patchlevel 1
|
||||
%%CreationDate: Fri Mar 24 20:37:59 2000
|
||||
%%For: $FreeBSD$
|
||||
%%Orientation: Portrait
|
||||
%%BoundingBox: 0 0 425 250
|
||||
%%Pages: 0
|
||||
%%BeginSetup
|
||||
%%EndSetup
|
||||
%%Magnification: 1.0000
|
||||
%%EndComments
|
||||
/$F2psDict 200 dict def
|
||||
$F2psDict begin
|
||||
$F2psDict /mtrx matrix put
|
||||
/col-1 {0 setgray} bind def
|
||||
/col0 {0.000 0.000 0.000 srgb} bind def
|
||||
/col1 {0.000 0.000 1.000 srgb} bind def
|
||||
/col2 {0.000 1.000 0.000 srgb} bind def
|
||||
/col3 {0.000 1.000 1.000 srgb} bind def
|
||||
/col4 {1.000 0.000 0.000 srgb} bind def
|
||||
/col5 {1.000 0.000 1.000 srgb} bind def
|
||||
/col6 {1.000 1.000 0.000 srgb} bind def
|
||||
/col7 {1.000 1.000 1.000 srgb} bind def
|
||||
/col8 {0.000 0.000 0.560 srgb} bind def
|
||||
/col9 {0.000 0.000 0.690 srgb} bind def
|
||||
/col10 {0.000 0.000 0.820 srgb} bind def
|
||||
/col11 {0.530 0.810 1.000 srgb} bind def
|
||||
/col12 {0.000 0.560 0.000 srgb} bind def
|
||||
/col13 {0.000 0.690 0.000 srgb} bind def
|
||||
/col14 {0.000 0.820 0.000 srgb} bind def
|
||||
/col15 {0.000 0.560 0.560 srgb} bind def
|
||||
/col16 {0.000 0.690 0.690 srgb} bind def
|
||||
/col17 {0.000 0.820 0.820 srgb} bind def
|
||||
/col18 {0.560 0.000 0.000 srgb} bind def
|
||||
/col19 {0.690 0.000 0.000 srgb} bind def
|
||||
/col20 {0.820 0.000 0.000 srgb} bind def
|
||||
/col21 {0.560 0.000 0.560 srgb} bind def
|
||||
/col22 {0.690 0.000 0.690 srgb} bind def
|
||||
/col23 {0.820 0.000 0.820 srgb} bind def
|
||||
/col24 {0.500 0.190 0.000 srgb} bind def
|
||||
/col25 {0.630 0.250 0.000 srgb} bind def
|
||||
/col26 {0.750 0.380 0.000 srgb} bind def
|
||||
/col27 {1.000 0.500 0.500 srgb} bind def
|
||||
/col28 {1.000 0.630 0.630 srgb} bind def
|
||||
/col29 {1.000 0.750 0.750 srgb} bind def
|
||||
/col30 {1.000 0.880 0.880 srgb} bind def
|
||||
/col31 {1.000 0.840 0.000 srgb} bind def
|
||||
|
||||
end
|
||||
save
|
||||
-117.0 298.0 translate
|
||||
1 -1 scale
|
||||
|
||||
/cp {closepath} bind def
|
||||
/ef {eofill} bind def
|
||||
/gr {grestore} bind def
|
||||
/gs {gsave} bind def
|
||||
/sa {save} bind def
|
||||
/rs {restore} bind def
|
||||
/l {lineto} bind def
|
||||
/m {moveto} bind def
|
||||
/rm {rmoveto} bind def
|
||||
/n {newpath} bind def
|
||||
/s {stroke} bind def
|
||||
/sh {show} bind def
|
||||
/slc {setlinecap} bind def
|
||||
/slj {setlinejoin} bind def
|
||||
/slw {setlinewidth} bind def
|
||||
/srgb {setrgbcolor} bind def
|
||||
/rot {rotate} bind def
|
||||
/sc {scale} bind def
|
||||
/sd {setdash} bind def
|
||||
/ff {findfont} bind def
|
||||
/sf {setfont} bind def
|
||||
/scf {scalefont} bind def
|
||||
/sw {stringwidth} bind def
|
||||
/tr {translate} bind def
|
||||
/tnt {dup dup currentrgbcolor
|
||||
4 -2 roll dup 1 exch sub 3 -1 roll mul add
|
||||
4 -2 roll dup 1 exch sub 3 -1 roll mul add
|
||||
4 -2 roll dup 1 exch sub 3 -1 roll mul add srgb}
|
||||
bind def
|
||||
/shd {dup dup currentrgbcolor 4 -2 roll mul 4 -2 roll mul
|
||||
4 -2 roll mul srgb} bind def
|
||||
/$F2psBegin {$F2psDict begin /$F2psEnteredState save def} def
|
||||
/$F2psEnd {$F2psEnteredState restore end} def
|
||||
%%EndProlog
|
||||
|
||||
$F2psBegin
|
||||
10 setmiterlimit
|
||||
n -1000 5962 m -1000 -1000 l 10022 -1000 l 10022 5962 l cp clip
|
||||
0.06000 0.06000 sc
|
||||
/Courier-BoldOblique ff 180.00 scf sf
|
||||
7725 3600 m
|
||||
gs 1 -1 sc (10.0.0.2) dup sw pop neg 0 rm col0 sh gr
|
||||
% Polyline
|
||||
15.000 slw
|
||||
n 9000 3300 m 9000 4275 l gs col0 s gr
|
||||
% Polyline
|
||||
2 slc
|
||||
n 7875 3225 m 7800 3225 l gs col0 s gr
|
||||
% Polyline
|
||||
0 slc
|
||||
n 7875 4125 m 7800 4125 l gs col0 s gr
|
||||
% Polyline
|
||||
n 7875 3225 m 7875 4425 l gs col0 s gr
|
||||
% Polyline
|
||||
n 7875 3825 m 7800 3825 l gs col0 s gr
|
||||
% Polyline
|
||||
n 7875 3525 m 7800 3525 l gs col0 s gr
|
||||
% Polyline
|
||||
n 8175 3825 m 7875 3825 l gs col0 s gr
|
||||
% Polyline
|
||||
2 slc
|
||||
n 7875 4425 m 7800 4425 l gs col0 s gr
|
||||
/Courier-Bold ff 180.00 scf sf
|
||||
8700 3900 m
|
||||
gs 1 -1 sc (fxp0) dup sw pop neg 0 rm col0 sh gr
|
||||
% Polyline
|
||||
0 slc
|
||||
7.500 slw
|
||||
n 2925 1425 m 3075 1425 l gs col0 s gr
|
||||
% Polyline
|
||||
15.000 slw
|
||||
n 2475 1350 m 2472 1347 l 2465 1342 l 2453 1334 l 2438 1323 l 2420 1311 l
|
||||
2401 1299 l 2383 1289 l 2366 1281 l 2351 1275 l 2338 1274 l
|
||||
2325 1275 l 2314 1279 l 2303 1285 l 2291 1293 l 2278 1303 l
|
||||
2264 1314 l 2250 1326 l 2236 1339 l 2222 1353 l 2209 1366 l
|
||||
2198 1379 l 2188 1391 l 2181 1403 l 2177 1414 l 2175 1425 l
|
||||
2177 1436 l 2181 1447 l 2188 1459 l 2198 1471 l 2209 1484 l
|
||||
2222 1497 l 2236 1511 l 2250 1524 l 2264 1536 l 2278 1547 l
|
||||
2291 1557 l 2303 1565 l 2314 1571 l 2325 1575 l 2338 1576 l
|
||||
2351 1575 l 2366 1569 l 2383 1561 l 2401 1551 l 2420 1539 l
|
||||
2438 1527 l 2453 1516 l 2465 1508 l 2472 1503 l 2475 1500 l gs col0 s gr
|
||||
/Courier-Bold ff 180.00 scf sf
|
||||
2550 1500 m
|
||||
gs 1 -1 sc (lo0) col0 sh gr
|
||||
/Courier-BoldOblique ff 180.00 scf sf
|
||||
3075 1500 m
|
||||
gs 1 -1 sc (127.0.0.1) col0 sh gr
|
||||
% Polyline
|
||||
7.500 slw
|
||||
n 2100 3525 m 2250 3525 l gs col0 s gr
|
||||
% Polyline
|
||||
n 2550 2100 m 2250 2400 l 2250 4500 l 2550 4800 l gs col0 s gr
|
||||
/Courier-Bold ff 180.00 scf sf
|
||||
1950 3600 m
|
||||
gs 1 -1 sc (/) col0 sh gr
|
||||
/Courier-Bold ff 180.00 scf sf
|
||||
2550 3900 m
|
||||
gs 1 -1 sc (jail_1/) col0 sh gr
|
||||
/Courier-Bold ff 180.00 scf sf
|
||||
2550 4200 m
|
||||
gs 1 -1 sc (jail_2/) col0 sh gr
|
||||
/Courier-Bold ff 180.00 scf sf
|
||||
2550 4500 m
|
||||
gs 1 -1 sc (jail_3/) col0 sh gr
|
||||
/Courier-Bold ff 180.00 scf sf
|
||||
2550 2400 m
|
||||
gs 1 -1 sc (dev/) col0 sh gr
|
||||
/Courier-Bold ff 180.00 scf sf
|
||||
2550 2700 m
|
||||
gs 1 -1 sc (etc/) col0 sh gr
|
||||
/Courier-Bold ff 180.00 scf sf
|
||||
2550 3000 m
|
||||
gs 1 -1 sc (usr/) col0 sh gr
|
||||
/Courier-Bold ff 180.00 scf sf
|
||||
2550 3300 m
|
||||
gs 1 -1 sc (var/) col0 sh gr
|
||||
/Courier-Bold ff 180.00 scf sf
|
||||
2550 3600 m
|
||||
gs 1 -1 sc (home/) col0 sh gr
|
||||
% Polyline
|
||||
n 3375 3825 m 3900 3825 l 4950 1800 l 5100 1800 l gs col0 s gr
|
||||
% Polyline
|
||||
n 3375 4125 m 3900 4125 l 4950 3900 l 5100 3900 l gs col0 s gr
|
||||
% Polyline
|
||||
n 5400 900 m 5100 1200 l 5100 2400 l 5400 2700 l gs col0 s gr
|
||||
% Polyline
|
||||
n 5400 3000 m 5100 3300 l 5100 4500 l 5400 4800 l gs col0 s gr
|
||||
% Polyline
|
||||
n 4650 825 m 4650 2775 l 6675 2775 l 6675 3375 l 7950 3375 l 7950 825 l
|
||||
cp gs col0 s gr
|
||||
% Polyline
|
||||
n 4650 2775 m 4650 4950 l 6300 4950 l 6300 3675 l 7950 3675 l 7950 3375 l
|
||||
6675 3375 l 6675 2775 l cp gs col0 s gr
|
||||
/Courier-Bold ff 180.00 scf sf
|
||||
5400 1200 m
|
||||
gs 1 -1 sc (dev/) col0 sh gr
|
||||
/Courier-Bold ff 180.00 scf sf
|
||||
5400 1500 m
|
||||
gs 1 -1 sc (etc/) col0 sh gr
|
||||
/Courier-Bold ff 180.00 scf sf
|
||||
5400 1800 m
|
||||
gs 1 -1 sc (usr/) col0 sh gr
|
||||
/Courier-Bold ff 180.00 scf sf
|
||||
5400 2100 m
|
||||
gs 1 -1 sc (var/) col0 sh gr
|
||||
/Courier-Bold ff 180.00 scf sf
|
||||
5400 2400 m
|
||||
gs 1 -1 sc (home/) col0 sh gr
|
||||
/Courier-Bold ff 180.00 scf sf
|
||||
5400 3300 m
|
||||
gs 1 -1 sc (dev/) col0 sh gr
|
||||
/Courier-Bold ff 180.00 scf sf
|
||||
5400 3600 m
|
||||
gs 1 -1 sc (etc/) col0 sh gr
|
||||
/Courier-Bold ff 180.00 scf sf
|
||||
5400 3900 m
|
||||
gs 1 -1 sc (usr/) col0 sh gr
|
||||
/Courier-Bold ff 180.00 scf sf
|
||||
5400 4200 m
|
||||
gs 1 -1 sc (var/) col0 sh gr
|
||||
/Courier-Bold ff 180.00 scf sf
|
||||
5400 4500 m
|
||||
gs 1 -1 sc (home/) col0 sh gr
|
||||
/Courier-BoldOblique ff 180.00 scf sf
|
||||
7725 3300 m
|
||||
gs 1 -1 sc (10.0.0.1) dup sw pop neg 0 rm col0 sh gr
|
||||
/Courier-BoldOblique ff 180.00 scf sf
|
||||
7725 4500 m
|
||||
gs 1 -1 sc (10.0.0.5) dup sw pop neg 0 rm col0 sh gr
|
||||
/Courier-BoldOblique ff 180.00 scf sf
|
||||
7725 4200 m
|
||||
gs 1 -1 sc (10.0.0.4) dup sw pop neg 0 rm col0 sh gr
|
||||
/Courier-BoldOblique ff 180.00 scf sf
|
||||
7725 3900 m
|
||||
gs 1 -1 sc (10.0.0.3) dup sw pop neg 0 rm col0 sh gr
|
||||
% Polyline
|
||||
15.000 slw
|
||||
n 9000 3825 m 8775 3825 l gs col0 s gr
|
||||
$F2psEnd
|
||||
rs
|
86
share/doc/papers/jail/jail01.fig
Normal file
86
share/doc/papers/jail/jail01.fig
Normal file
@ -0,0 +1,86 @@
|
||||
#FIG 3.2
|
||||
# $FreeBSD$
|
||||
Landscape
|
||||
Center
|
||||
Inches
|
||||
A4
|
||||
100.00
|
||||
Single
|
||||
-2
|
||||
1200 2
|
||||
6 7725 3150 9075 4500
|
||||
6 8700 3225 9075 4350
|
||||
2 1 0 2 0 7 100 0 -1 0.000 0 0 -1 0 0 2
|
||||
9000 3825 8775 3825
|
||||
2 1 0 2 0 7 100 0 -1 0.000 0 0 -1 0 0 2
|
||||
9000 3300 9000 4275
|
||||
-6
|
||||
2 1 0 2 0 7 100 0 -1 0.000 0 2 -1 0 0 2
|
||||
7875 3225 7800 3225
|
||||
2 1 0 2 0 7 100 0 -1 0.000 0 0 -1 0 0 2
|
||||
7875 4125 7800 4125
|
||||
2 1 0 2 0 7 100 0 -1 0.000 0 0 -1 0 0 2
|
||||
7875 3225 7875 4425
|
||||
2 1 0 2 0 7 100 0 -1 0.000 0 0 -1 0 0 2
|
||||
7875 3825 7800 3825
|
||||
2 1 0 2 0 7 100 0 -1 0.000 0 0 -1 0 0 2
|
||||
7875 3525 7800 3525
|
||||
2 1 0 2 0 7 100 0 -1 0.000 0 0 -1 0 0 2
|
||||
8175 3825 7875 3825
|
||||
2 1 0 2 0 7 100 0 -1 0.000 0 2 -1 0 0 2
|
||||
7875 4425 7800 4425
|
||||
4 2 0 100 0 14 12 0.0000 4 180 420 8700 3900 fxp0\001
|
||||
-6
|
||||
6 2100 1200 4050 1650
|
||||
2 1 0 1 0 7 100 0 -1 0.000 0 0 -1 0 0 2
|
||||
2925 1425 3075 1425
|
||||
3 2 0 2 0 7 100 0 -1 0.000 0 0 0 5
|
||||
2475 1350 2325 1275 2175 1425 2325 1575 2475 1500
|
||||
0.000 -1.000 -1.000 -1.000 0.000
|
||||
4 0 0 100 0 14 12 0.0000 4 135 315 2550 1500 lo0\001
|
||||
4 0 0 100 0 15 12 0.0000 4 135 945 3075 1500 127.0.0.1\001
|
||||
-6
|
||||
6 1950 2100 3300 4800
|
||||
2 1 0 1 0 7 100 0 -1 0.000 0 0 -1 0 0 2
|
||||
2100 3525 2250 3525
|
||||
2 1 0 1 0 7 100 0 -1 0.000 0 0 -1 0 0 4
|
||||
2550 2100 2250 2400 2250 4500 2550 4800
|
||||
4 0 0 100 0 14 12 0.0000 4 150 105 1950 3600 /\001
|
||||
4 0 0 100 0 14 12 0.0000 4 180 735 2550 3900 jail_1/\001
|
||||
4 0 0 100 0 14 12 0.0000 4 180 735 2550 4200 jail_2/\001
|
||||
4 0 0 100 0 14 12 0.0000 4 180 735 2550 4500 jail_3/\001
|
||||
4 0 0 100 0 14 12 0.0000 4 165 420 2550 2400 dev/\001
|
||||
4 0 0 100 0 14 12 0.0000 4 150 420 2550 2700 etc/\001
|
||||
4 0 0 100 0 14 12 0.0000 4 150 420 2550 3000 usr/\001
|
||||
4 0 0 100 0 14 12 0.0000 4 150 420 2550 3300 var/\001
|
||||
4 0 0 100 0 14 12 0.0000 4 165 525 2550 3600 home/\001
|
||||
-6
|
||||
2 1 0 1 0 7 100 0 -1 0.000 0 0 -1 0 0 4
|
||||
3375 3825 3900 3825 4950 1800 5100 1800
|
||||
2 1 0 1 0 7 100 0 -1 0.000 0 0 -1 0 0 4
|
||||
3375 4125 3900 4125 4950 3900 5100 3900
|
||||
2 1 0 1 0 7 100 0 -1 0.000 0 0 -1 0 0 4
|
||||
5400 900 5100 1200 5100 2400 5400 2700
|
||||
2 1 0 1 0 7 100 0 -1 0.000 0 0 -1 0 0 4
|
||||
5400 3000 5100 3300 5100 4500 5400 4800
|
||||
2 3 0 1 0 7 100 0 -1 0.000 0 0 -1 0 0 7
|
||||
4650 825 4650 2775 6675 2775 6675 3375 7950 3375 7950 825
|
||||
4650 825
|
||||
2 3 0 1 0 7 100 0 -1 0.000 0 0 -1 0 0 9
|
||||
4650 2775 4650 4950 6300 4950 6300 3675 7950 3675 7950 3375
|
||||
6675 3375 6675 2775 4650 2775
|
||||
4 0 0 100 0 14 12 0.0000 4 165 420 5400 1200 dev/\001
|
||||
4 0 0 100 0 14 12 0.0000 4 150 420 5400 1500 etc/\001
|
||||
4 0 0 100 0 14 12 0.0000 4 150 420 5400 1800 usr/\001
|
||||
4 0 0 100 0 14 12 0.0000 4 150 420 5400 2100 var/\001
|
||||
4 0 0 100 0 14 12 0.0000 4 165 525 5400 2400 home/\001
|
||||
4 0 0 100 0 14 12 0.0000 4 165 420 5400 3300 dev/\001
|
||||
4 0 0 100 0 14 12 0.0000 4 150 420 5400 3600 etc/\001
|
||||
4 0 0 100 0 14 12 0.0000 4 150 420 5400 3900 usr/\001
|
||||
4 0 0 100 0 14 12 0.0000 4 150 420 5400 4200 var/\001
|
||||
4 0 0 100 0 14 12 0.0000 4 165 525 5400 4500 home/\001
|
||||
4 2 0 100 0 15 12 0.0000 4 135 840 7725 3300 10.0.0.1\001
|
||||
4 2 0 100 0 15 12 0.0000 4 135 840 7725 4500 10.0.0.5\001
|
||||
4 2 0 100 0 15 12 0.0000 4 135 840 7725 4200 10.0.0.4\001
|
||||
4 2 0 100 0 15 12 0.0000 4 135 840 7725 3900 10.0.0.3\001
|
||||
4 2 0 100 0 15 12 0.0000 4 135 840 7725 3600 10.0.0.2\001
|
218
share/doc/papers/jail/mgt.ms
Normal file
218
share/doc/papers/jail/mgt.ms
Normal file
@ -0,0 +1,218 @@
|
||||
.\"
|
||||
.\" $FreeBSD$
|
||||
.\"
|
||||
.NH
|
||||
Managing Jails and the Jail File System Environment
|
||||
.NH 2
|
||||
Creating a Jail Environment
|
||||
.PP
|
||||
While the jail(2) call could be used in a number of ways, the expected
|
||||
configuration creates a complete FreeBSD installation for each jail.
|
||||
This includes copies of all relevant system binaries, data files, and its
|
||||
own \fC/etc\fP directory.
|
||||
Such a configuration maximises the independence of various jails,
|
||||
and reduces the chances of interference between jails being possible,
|
||||
especially when it is desirable to provide root access within a jail to
|
||||
a less trusted user.
|
||||
.PP
|
||||
On a box making use of the jail facility, we refer to two types of
|
||||
environment: the host environment, and the jail environment.
|
||||
The host environment is the real operating system environment, which is
|
||||
used to configure interfaces, and start up the jails.
|
||||
There are then one or more jail environments, effectively virtual
|
||||
FreeBSD machines.
|
||||
When configuring Jail for use, it is necessary to configure both the
|
||||
host and jail environments to prevent overlap.
|
||||
.PP
|
||||
As jailed virtual machines are generally bound to an IP address configured
|
||||
using the normal IP alias mechanism, those jail IP addresses are also
|
||||
accessible to host environment applications to use.
|
||||
If the accessibility of some host applications in the jail environment is
|
||||
not desirable, it is necessary to configure those applications to only
|
||||
listen on appropriate addresses.
|
||||
.PP
|
||||
In most of the production environments where jail is currently in use,
|
||||
one IP address is allocated to the host environment, and then a number
|
||||
are allocated to jail boxes, with each jail box receiving a unique IP.
|
||||
In this situation, it is sufficient to configure the networking applications
|
||||
on the host to listen only on the host IP.
|
||||
Generally, this consists of specifying the appropriate IP address to be
|
||||
used by inetd and SSH, and disabling applications that are not capable
|
||||
of limiting their address scope, such as sendmail, the port mapper, and
|
||||
syslogd.
|
||||
Other third party applications that have been installed on the host must also be
|
||||
configured in this manner, or users connecting to the jailbox will
|
||||
discover the host environment service, unless the jailbox has
|
||||
specifically bound a service to that port.
|
||||
In some situations, this can actually be the desirable behaviour.
|
||||
.PP
|
||||
The jail environments must also be custom-configured.
|
||||
This consists of building and installing a miniature version of the
|
||||
FreeBSD file system tree off of a subdirectory in the host environment,
|
||||
usually \fC/usr/jail\fP, or \fC/data/jail\fP, with a subdirectory per jail.
|
||||
Appropriate instructions for generating this tree are included in the
|
||||
jail(8) man page, but generally this process may be automated using the
|
||||
FreeBSD build environment.
|
||||
.PP
|
||||
One notable difference from the default FreeBSD install is that only
|
||||
a limited set of device nodes should be created.
|
||||
MAKEDEV(8) has been modified to accept a ``jail'' argument that creates
|
||||
the correct set of nodes.
|
||||
.PP
|
||||
To improve storage efficiency, a fair number of the binaries in the system tree
|
||||
may be deleted, as they are not relevant in a jail environment.
|
||||
This includes the kernel, boot loader, and related files, as well as
|
||||
hardware and network configuration tools.
|
||||
.PP
|
||||
After the creation of the jail tree, the easiest way to configure it is
|
||||
to start up the jail in single-user mode.
|
||||
The sysinstall admin tool may be used to help with the task, although
|
||||
it is not installed by default as part of the system tree.
|
||||
These tools should be run in the jail environment, or they will affect
|
||||
the host environment's configuration.
|
||||
.DS
|
||||
.ft C
|
||||
.ps -2
|
||||
# mkdir /data/jail/192.168.11.100/stand
|
||||
# cp /stand/sysinstall /data/jail/192.168.11.100/stand
|
||||
# jail /data/jail/192.168.11.100 testhostname 192.168.11.100 \e
|
||||
/bin/sh
|
||||
.ps +2
|
||||
.R
|
||||
.DE
|
||||
.PP
|
||||
After running the jail command, the shell is now within the jail environment,
|
||||
and all further commands
|
||||
will be limited to the scope of the jail until the shell exits.
|
||||
If the network alias has not yet been configured, then the jail will be
|
||||
unable to access the network.
|
||||
.PP
|
||||
The startup configuration of the jail environment may be configured so
|
||||
as to quell warnings from services that cannot run in the jail.
|
||||
Also, any per-system configuration required for a normal FreeBSD system
|
||||
is also required for each jailbox.
|
||||
Typically, this includes:
|
||||
.IP "" 5n
|
||||
\(bu Create empty /etc/fstab
|
||||
.IP
|
||||
\(bu Disable portmapper
|
||||
.IP
|
||||
\(bu Run newaliases
|
||||
.IP
|
||||
\(bu Disabling interface configuration
|
||||
.IP
|
||||
\(bu Configure the resolver
|
||||
.IP
|
||||
\(bu Set root password
|
||||
.IP
|
||||
\(bu Set timezone
|
||||
.IP
|
||||
\(bu Add any local accounts
|
||||
.IP
|
||||
\(bu Install any packets
|
||||
.NH 2
|
||||
Starting Jails
|
||||
.PP
|
||||
Jails are typically started by executing their /etc/rc script in much
|
||||
the same manner a shell was started in the previous section.
|
||||
Before starting the jail, any relevant networking configuration
|
||||
should also be performed.
|
||||
Typically, this involves adding an additional IP address to the
|
||||
appropriate network interface, setting network properties for the
|
||||
IP address using IP filtering, forwarding, and bandwidth shaping,
|
||||
and mounting a process file system for the jail, if the ability to
|
||||
debug processes from within the jail is desired.
|
||||
.DS
|
||||
.ft C
|
||||
.ps -2
|
||||
# ifconfig ed0 inet add 192.168.11.100 netmask 255.255.255.255
|
||||
# mount -t procfs proc /data/jail/192.168.11.100/proc
|
||||
# jail /data/jail/192.168.11.100 testhostname 192.168.11.100 \e
|
||||
/bin/sh /etc/rc
|
||||
.ps +2
|
||||
.ft P
|
||||
.DE
|
||||
.PP
|
||||
A few warnings are generated for sysctl's that are not permitted
|
||||
to be set within the jail, but the end result is a set of processes
|
||||
in an isolated process environment, bound to a single IP address.
|
||||
Normal procedures for accessing a FreeBSD machine apply: telneting in
|
||||
through the network reveals a telnet prompt, login, and shell.
|
||||
.DS
|
||||
.ft C
|
||||
.ps -2
|
||||
% ps ax
|
||||
PID TT STAT TIME COMMAND
|
||||
228 ?? SsJ 0:18.73 syslogd
|
||||
247 ?? IsJ 0:00.05 inetd -wW
|
||||
249 ?? IsJ 0:28.43 cron
|
||||
252 ?? SsJ 0:30.46 sendmail: accepting connections on port 25
|
||||
291 ?? IsJ 0:38.53 /usr/local/sbin/sshd
|
||||
93694 ?? SJ 0:01.01 sshd: rwatson@ttyp0 (sshd)
|
||||
93695 p0 SsJ 0:00.06 -csh (csh)
|
||||
93700 p0 R+J 0:00.00 ps ax
|
||||
.ps +2
|
||||
.ft P
|
||||
.DE
|
||||
.PP
|
||||
It is immediately obvious that the environment is within a jailbox: there
|
||||
is no init process, no kernel daemons, and a J flag is present beside all
|
||||
processes indicating the presence of a jail.
|
||||
.PP
|
||||
As with any FreeBSD system, accounts may be created and deleted,
|
||||
mail is delivered, logs are generated, packages may be added, and the
|
||||
system may be hacked into if configured incorrectly, or running a buggy
|
||||
version of a piece of software.
|
||||
However, all of this happens strictly within the scope of the jail.
|
||||
.NH 2
|
||||
Jail Management
|
||||
.PP
|
||||
Jail management is an interesting prospect, as there are two perspectives
|
||||
from which a jail environment may be administered: from within the jail,
|
||||
and from the host environment.
|
||||
From within the jail, as described above, the process is remarkably similar
|
||||
to any regular FreeBSD install, although certain actions are prohibited,
|
||||
such as mounting file systems, modifying system kernel properties, etc.
|
||||
The only area that really differs are that of shutting
|
||||
the system down: the processes within the jail may deliver signals
|
||||
between them, allowing all processes to be killed, but bringing the
|
||||
system back up requires intervention from outside of the jailbox.
|
||||
.PP
|
||||
From outside of the jail, there are a range of capabilities, as well
|
||||
as limitations.
|
||||
The jail environment is, in effect, a subset of the host environment:
|
||||
the jail file system appears as part of the host file system, and may
|
||||
be directly modified by processes in the host environment.
|
||||
Processes within the jail appear in the process listing of the host,
|
||||
and may likewise be signalled or debugged.
|
||||
The host process file system makes the hostname of the jail environment
|
||||
accessible in /proc/procnum/status, allowing utilities in the host
|
||||
environment to manage processes based on jailname.
|
||||
However, the default configuration allows privileged processes within
|
||||
jails to set the hostname of the jail, which makes the status file less
|
||||
useful from a management perspective if the contents of the jail are
|
||||
malicious.
|
||||
To prevent a jail from changing its hostname, the
|
||||
"jail.set_hostname_allowed" sysctl may be set to 0 prior to starting
|
||||
any jails.
|
||||
.PP
|
||||
One aspect immediately observable in an environment with multiple jails
|
||||
is that uids and gids are local to each jail environment: the uid associated
|
||||
with a process in one jail may be for a different user than in another
|
||||
jail.
|
||||
This collision of identifiers is only visible in the host environment,
|
||||
as normally processes from one jail are never visible in an environment
|
||||
with another scope for user/uid and group/gid mapping.
|
||||
Managers in the host environment should understand these scoping issues,
|
||||
or confusion and unintended consequences may result.
|
||||
.PP
|
||||
Jailed processes are subject to the normal restrictions present for
|
||||
any processes, including resource limits, and limits placed by the network
|
||||
code, including firewall rules.
|
||||
By specifying firewall rules for the IP address bound to a jail, it is
|
||||
possible to place connectivity and bandwidth limitations on individual
|
||||
jails, restricting services that may be consumed or offered.
|
||||
.PP
|
||||
Management of jails is an area that will see further improvement in
|
||||
future versions of FreeBSD. Some of these potential improvements are
|
||||
discussed later in this paper.
|
437
share/doc/papers/jail/paper.ms
Normal file
437
share/doc/papers/jail/paper.ms
Normal file
@ -0,0 +1,437 @@
|
||||
.\"
|
||||
.\" $FreeBSD$
|
||||
.\"
|
||||
.ds CH "
|
||||
.nr PI 2n
|
||||
.nr PS 12
|
||||
.nr LL 15c
|
||||
.nr PO 3c
|
||||
.nr FM 3.5c
|
||||
.po 3c
|
||||
.TL
|
||||
Jails: Confining the omnipotent root.
|
||||
.FS
|
||||
This paper was presented at the 2nd International System Administration and netoworking Conference "SANE 2000" May 22-25, 2000 in Maastricht, The Netherlands and are published in the in the proceedings.
|
||||
.FE
|
||||
.AU
|
||||
Poul-Henning Kamp <phk@FreeBSD.org>
|
||||
.AU
|
||||
Robert N. M. Watson <rwatson@FreeBSD.org>
|
||||
.AI
|
||||
The FreeBSD Project
|
||||
.FS
|
||||
This work was sponsored by \fChttp://www.servetheweb.com/\fP and
|
||||
donated to the FreeBSD Project for inclusion in the FreeBSD
|
||||
OS. FreeBSD 4.0-RELEASE was the first release including this
|
||||
code.
|
||||
Follow-on work was sponsored by Safeport Network Services,
|
||||
\fChttp://www.safeport.com/\fP
|
||||
.FE
|
||||
.AB
|
||||
The traditional UNIX security model is simple but inexpressive.
|
||||
Adding fine-grained access control improves the expressiveness,
|
||||
but often dramatically increases both the cost of system management
|
||||
and implementation complexity.
|
||||
In environments with a more complex management model, with delegation
|
||||
of some management functions to parties under varying degrees of trust,
|
||||
the base UNIX model and most natural
|
||||
extensions are inappropriate at best.
|
||||
Where multiple mutually un-trusting parties are introduced,
|
||||
``inappropriate'' rapidly transitions to ``nightmarish'', especially
|
||||
with regards to data integrity and privacy protection.
|
||||
.PP
|
||||
The FreeBSD ``Jail'' facility provides the ability to partition
|
||||
the operating system environment, while maintaining the simplicity
|
||||
of the UNIX ``root'' model.
|
||||
In Jail, users with privilege find that the scope of their requests
|
||||
is limited to the jail, allowing system administrators to delegate
|
||||
management capabilities for each virtual machine
|
||||
environment.
|
||||
Creating virtual machines in this manner has many potential uses; the
|
||||
most popular thus far has been for providing virtual machine services
|
||||
in Internet Service Provider environments.
|
||||
.AE
|
||||
.NH
|
||||
Introduction
|
||||
.PP
|
||||
The UNIX access control mechanism is designed for an environment with two
|
||||
types of users: those with, and without administrative privilege.
|
||||
Within this framework, every attempt is made to provide an open
|
||||
system, allowing easy sharing of files and inter-process communication.
|
||||
As a member of the UNIX family, FreeBSD inherits these
|
||||
security properties.
|
||||
Users of FreeBSD in non-traditional UNIX environments must balance
|
||||
their need for strong application support, high network performance
|
||||
and functionality, and low total cost of ownership with the need
|
||||
for alternative security models that are difficult or impossible to
|
||||
implement with the UNIX security mechanisms.
|
||||
.PP
|
||||
One such consideration is the desire to delegate some (but not all)
|
||||
administrative functions to untrusted or less trusted parties, and
|
||||
simultaneously impose system-wide mandatory policies on process
|
||||
interaction and sharing.
|
||||
Attempting to create such an environment in the current-day FreeBSD
|
||||
security environment is both difficult and costly: in many cases,
|
||||
the burden of implementing these policies falls on user
|
||||
applications, which means an increase in the size and complexity
|
||||
of the code base, in turn translating to higher development
|
||||
and maintaennce cost, as well as less overall flexibility.
|
||||
.PP
|
||||
This abstract risk becomes more clear when applied to a practical,
|
||||
real-world example:
|
||||
many web service providers turn to the FreeBSD
|
||||
operating system to host customer web sites, as it provides a
|
||||
high-performance, network-centric server environment.
|
||||
However, these providers have a number of concerns on their plate, both in
|
||||
terms of protecting the integrity and confidentiality of their own
|
||||
files and services from their customers, as well as protecting the files
|
||||
and services of one customer from (accidental or
|
||||
intentional) access by any other customer.
|
||||
At the same time, a provider would like to provide
|
||||
substantial autonomy to customers, allowing them to install and
|
||||
maintain their own software, and to manage their own services,
|
||||
such as web servers and other content-related daemon programs.
|
||||
.PP
|
||||
This problem space points strongly in the direction of a partitioning
|
||||
solution, in which customer processes and storage are isolated from those of
|
||||
other customers, both in terms of accidental disclosure of data or process
|
||||
information, but also in terms of the ability to modify files or processes
|
||||
outside of a compartment.
|
||||
Delegation of management functions within the system must
|
||||
be possible, but not at the cost of system-wide requirements, including
|
||||
integrity and privacy protection between partitions.
|
||||
.PP
|
||||
However, UNIX-style access control makes it notoriously difficult to
|
||||
compartmentalise functionality.
|
||||
While mechanisms such as chroot(2) provide a modest
|
||||
level compartmentalisation, it is well known
|
||||
that these mechanisms have serious shortcomings, both in terms of the
|
||||
scope of their functionality, and effectiveness at what they provide \s-2[CHROOT]\s+2.
|
||||
.PP
|
||||
In the case of the chroot(2) call, a process's visibility of
|
||||
the file system name-space is limited to a single subtree.
|
||||
However, the compartmentalisation does not extend to the process
|
||||
or networking spaces and therefore both observation of and interference
|
||||
with processes outside their compartment is possible.
|
||||
.PP
|
||||
To this end, we describe the new FreeBSD ``Jail'' facility, which
|
||||
provides a strong partitioning solution, leveraging existing
|
||||
mechanisms, such as chroot(2), to what effectively amounts to a
|
||||
virtual machine environment. Processes in a jail are provided
|
||||
full access to the files that they may manipulate, processes they
|
||||
may influence, and network services they can make use of, and neither
|
||||
access nor visibility of files, processes or network services outside
|
||||
their partition.
|
||||
.PP
|
||||
Unlike other fine-grained security solutions, Jail does not
|
||||
substantially increase the policy management requirements for the
|
||||
system administrator, as each Jail is a virtual FreeBSD environment
|
||||
permitting local policy to be independently managed, with much the
|
||||
same properties as the main system itself, making Jail easy to use
|
||||
for the administrator, and far more compatible with applications.
|
||||
.NH
|
||||
Traditional UNIX Security, or, ``God, root, what difference?" \s-2[UF]\s+2.
|
||||
.PP
|
||||
The traditional UNIX access model assigns numeric uids to each user of the
|
||||
system. In turn, each process ``owned'' by a user will be tagged with that
|
||||
user's uid in an unforgeable manner. The uids serve two purposes: first,
|
||||
they determine how discretionary access control mechanisms will be applied, and
|
||||
second, they are used to determine whether special privileges are accorded.
|
||||
.PP
|
||||
In the case of discretionary access controls, the primary object protected is
|
||||
a file. The uid (and related gids indicating group membership) are mapped to
|
||||
a set of rights for each object, courtesy the UNIX file mode, in effect acting
|
||||
as a limited form of access control list. Jail is, in general, not concerned
|
||||
with modifying the semantics of discretionary access control mechanisms,
|
||||
although there are important implications from a management perspective.
|
||||
.PP
|
||||
For the purposes of determining whether special privileges are accorded to a
|
||||
process, the check is simple: ``is the numeric uid equal to 0 ?''.
|
||||
If so, the
|
||||
process is acting with ``super-user privileges'', and all access checks are
|
||||
granted, in effect allowing the process the ability to do whatever it wants
|
||||
to \**.
|
||||
.FS
|
||||
\&... no matter how patently stupid it may be.
|
||||
.FE
|
||||
.PP
|
||||
For the purposes of human convenience, uid 0 is canonically allocated
|
||||
to the ``root'' user \s-2[ROOT]\s+2.
|
||||
For the purposes of jail, this behaviour is extremely relevant: many of
|
||||
these privileged operations can be used to manage system hardware and
|
||||
configuration, file system name-space, and special network operations.
|
||||
.PP
|
||||
Many limitations to this model are immediately clear: the root user is a
|
||||
single, concentrated source of privilege that is exposed to many pieces of
|
||||
software, and as such an immediate target for attacks. In the event of a
|
||||
compromise of the root capability set, the attacker has complete control over
|
||||
the system. Even without an attacker, the risks of a single administrative
|
||||
account are serious: delegating a narrow scope of capability to an
|
||||
inexperienced administrator is difficult, as the granularity of delegation is
|
||||
that of all system management abilities. These features make the omnipotent
|
||||
root account a sharp, efficient and extremely dangerous tool.
|
||||
.PP
|
||||
The BSD family of operating systems have implemented the ``securelevel''
|
||||
mechanism which allows the administrator to block certain configuration
|
||||
and management functions from being performed by root,
|
||||
until the system is restarted and brought up into single-user mode.
|
||||
While this does provide some amount of protection in the case of a root
|
||||
compromise of the machine, it does nothing to address the need for
|
||||
delegation of certain root abilities.
|
||||
.NH
|
||||
Other Solutions to the Root Problem
|
||||
.PP
|
||||
Many operating systems attempt to address these limitations by providing
|
||||
fine-grained access controls for system resources \s-2[BIBA]\s+2.
|
||||
These efforts vary in
|
||||
degrees of success, but almost all suffer from at least three serious
|
||||
limitations:
|
||||
.PP
|
||||
First, increasing the granularity of security controls increases the
|
||||
complexity of the administration process, in turn increasing both the
|
||||
opportunity for incorrect configuration, as well as the demand on
|
||||
administrator time and resources. In many cases, the increased complexity
|
||||
results in significant frustration for the administrator, which may result
|
||||
in two
|
||||
disastrous types of policy: ``all doors open as it's too much trouble'', and
|
||||
``trust that the system is secure, when in fact it isn't''.
|
||||
.PP
|
||||
The extent of the trouble is best illustrated by the fact that an entire
|
||||
niche industry has emerged providing tools to manage fine grained security
|
||||
controls \s-2[UAS]\s+2.
|
||||
.PP
|
||||
Second, usefully segregating capabilities and assigning them to running code
|
||||
and users is very difficult. Many privileged operations in UNIX seem
|
||||
independent, but are in fact closely related, and the handing out of one
|
||||
privilege may, in effect, be transitive to the many others. For example, in
|
||||
some trusted operating systems, a system capability may be assigned to a
|
||||
running process to allow it to read any file, for the purposes of backup.
|
||||
However, this capability is, in effect, equivalent to the ability to switch to
|
||||
any other account, as the ability to access any file provides access to system
|
||||
keying material, which in turn provides the ability to authenticate as any
|
||||
user. Similarly, many operating systems attempt to segregate management
|
||||
capabilities from auditing capabilities. In a number of these operating
|
||||
systems, however, ``management capabilities'' permit the administrator to
|
||||
assign ``auditing capabilities'' to itself, or another account, circumventing
|
||||
the segregation of capability.
|
||||
.PP
|
||||
Finally, introducing new security features often involves introducing new
|
||||
security management APIs. When fine-grained capabilities are introduced to
|
||||
replace the setuid mechanism in UNIX-like operating systems, applications that
|
||||
previously did an ``appropriateness check'' to see if they were running as
|
||||
root before executing must now be changed to know that they need not run as
|
||||
root. In the case of applications running with privilege and executing other
|
||||
programs, there is now a new set of privileges that must be voluntarily given
|
||||
up before executing another program. These change can introduce significant
|
||||
incompatibility for existing applications, and make life more difficult for
|
||||
application developers who may not be aware of differing security semantics on
|
||||
different systems \s-2[POSIX1e]\s+2.
|
||||
.NH
|
||||
The Jail Partitioning Solution
|
||||
.PP
|
||||
Jail neatly side-steps the majority of these problems through partitioning.
|
||||
Rather
|
||||
than introduce additional fine-grained access control mechanism, we partition
|
||||
a FreeBSD environment (processes, file system, network resources) into a
|
||||
management environment, and optionally subset Jail environments. In doing so,
|
||||
we simultaneously maintain the existing UNIX security model, allowing
|
||||
multiple users and a privileged root user in each jail, while
|
||||
limiting the scope of root's activities to his jail.
|
||||
Consequently the administrator of a
|
||||
FreeBSD machine can partition the machine into separate jails, and provide
|
||||
access to the super-user account in each of these without losing control of
|
||||
the over-all environment.
|
||||
.PP
|
||||
A process in a partition is referred to as ``in jail''. When a FreeBSD
|
||||
system is booted up after a fresh install, no processes will be in jail.
|
||||
When
|
||||
a process is placed in a jail, it, and any descendents of the process created
|
||||
after the jail creation, will be in that jail. A process may be in only one
|
||||
jail, and after creation, it can not leave the jail.
|
||||
Jails are created when a
|
||||
privileged process calls the jail(2) syscall, with a description of the jail as an
|
||||
argument to the call. Each call to jail(2) creates a new jail; the only way
|
||||
for a new process to enter the jail is by inheriting access to the jail from
|
||||
another process already in that jail.
|
||||
Processes may never
|
||||
leave the jail they created, or were created in.
|
||||
.KF
|
||||
.PSPIC jail01.eps 4i
|
||||
.ce 1
|
||||
Fig. 1 \(em Schematic diagram of machine with two configured jails
|
||||
.sp
|
||||
.KE
|
||||
.PP
|
||||
Membership in a jail involves a number of restrictions: access to the file
|
||||
name-space is restricted in the style of chroot(2), the ability to bind network
|
||||
resources is limited to a specific IP address, the ability to manipulate
|
||||
system resources and perform privileged operations is sharply curtailed, and
|
||||
the ability to interact with other processes is limited to only processes
|
||||
inside the same jail.
|
||||
.PP
|
||||
Jail takes advantage of the existing chroot(2) behaviour to limit access to the
|
||||
file system name-space for jailed processes. When a jail is created, it is
|
||||
bound to a particular file system root.
|
||||
Processes are unable to manipulate files that they cannot address,
|
||||
and as such the integrity and confidentiality of files outside of the jail
|
||||
file system root are protected. Traditional mechanisms for breaking out of
|
||||
chroot(2) have been blocked.
|
||||
In the expected and documented configuration, each jail is provided
|
||||
with its exclusive file system root, and standard FreeBSD directory layout,
|
||||
but this is not mandated by the implementation.
|
||||
.PP
|
||||
Each jail is bound to a single IP address: processes within the jail may not
|
||||
make use of any other IP address for outgoing or incoming connections; this
|
||||
includes the ability to restrict what network services a particular jail may
|
||||
offer. As FreeBSD distinguishes attempts to bind all IP addresses from
|
||||
attempts to bind a particular address, bind requests for all IP addresses are
|
||||
redirected to the individual Jail address. Some network functionality
|
||||
associated with privileged calls are wholesale disabled due to the nature of the
|
||||
functionality offered, in particular facilities which would allow ``spoofing''
|
||||
of IP numbers or disruptive traffic to be generated have been disabled.
|
||||
.PP
|
||||
Processes running without root privileges will notice few, if any differences
|
||||
between a jailed environment or un-jailed environment. Processes running with
|
||||
root privileges will find that many restrictions apply to the privileged calls
|
||||
they may make. Some calls will now return an access error \(em for example, an
|
||||
attempt to create a device node will now fail. Others will have a more
|
||||
limited scope than normal \(em attempts to bind a reserved port number on all
|
||||
available addresses will result in binding only the address associated with
|
||||
the jail. Other calls will succeed as normal: root may read a file owned by
|
||||
any uid, as long as it is accessible through the jail file system name-space.
|
||||
.PP
|
||||
Processes within the jail will find that they are unable to interact or
|
||||
even verify the existence of
|
||||
processes outside the jail \(em processes within the jail are
|
||||
prevented from delivering signals to processes outside the jail, as well as
|
||||
connecting to those processes with debuggers, or even see them in the
|
||||
sysctl or process file system monitoring mechanisms. Jail does not prevent,
|
||||
nor is it intended to prevent, the use of covert channels or communications
|
||||
mechanisms via accepted interfaces \(em for example, two processes may communicate
|
||||
via sockets over the IP network interface. Nor does it attempt to provide
|
||||
scheduling services based on the partition; however, it does prevent calls
|
||||
that interfere with normal process operation.
|
||||
.PP
|
||||
As a result of these attempts to retain the standard FreeBSD API and
|
||||
framework, almost all applications will run unaffected. Standard system
|
||||
services such as Telnet, FTP, and SSH all behave normally, as do most third
|
||||
party applications, including the popular Apache web server.
|
||||
.NH
|
||||
Jail Implementation
|
||||
.PP
|
||||
Processes running with root privileges in the jail find that there are serious
|
||||
restrictions on what it is capable of doing \(em in particular, activities that
|
||||
would extend outside of the jail:
|
||||
.IP "" 5n
|
||||
\(bu Modifying the running kernel by direct access and loading kernel
|
||||
modules is prohibited.
|
||||
.IP
|
||||
\(bu Modifying any of the network configuration, interfaces, addresses, and
|
||||
routing table is prohibited.
|
||||
.IP
|
||||
\(bu Mounting and unmounting file systems is prohibited.
|
||||
.IP
|
||||
\(bu Creating device nodes is prohibited.
|
||||
.IP
|
||||
\(bu Accessing raw, divert, or routing sockets is prohibited.
|
||||
.IP
|
||||
\(bu Modifying kernel runtime parameters, such as most sysctl settings, is
|
||||
prohibited.
|
||||
.IP
|
||||
\(bu Changing securelevel-related file flags is prohibited.
|
||||
.IP
|
||||
\(bu Accessing network resources not associated with the jail is prohibited.
|
||||
.bp
|
||||
.PP
|
||||
Other privileged activities are permitted as long as they are limited to the
|
||||
scope of the jail:
|
||||
.IP "" 5n
|
||||
\(bu Signalling any process within the jail is permitted.
|
||||
.IP
|
||||
\(bu Changing the ownership and mode of any file within the jail is permitted, as
|
||||
long as the file flags permit this.
|
||||
.IP
|
||||
\(bu Deleting any file within the jail is permitted, as long as the file flags
|
||||
permit this.
|
||||
.IP
|
||||
\(bu Binding reserved TCP and UDP port numbers on the jails IP address is
|
||||
permitted. (Attempts to bind TCP and UDP ports using IN_ADDRANY will be
|
||||
redirected to the jails IP address.)
|
||||
.IP
|
||||
\(bu Functions which operate on the uid/gid space are all permitted since they
|
||||
act as labels for filesystem objects of proceses
|
||||
which are partitioned off by other mechanisms.
|
||||
.PP
|
||||
These restrictions on root access limit the scope of root processes, enabling
|
||||
most applications to run un-hindered, but preventing calls that might allow an
|
||||
application to reach beyond the jail and influence other processes or
|
||||
system-wide configuration.
|
||||
.PP
|
||||
.so implementation.ms
|
||||
.so mgt.ms
|
||||
.so future.ms
|
||||
.NH
|
||||
Conclusion
|
||||
.PP
|
||||
The jail facility provides FreeBSD with a conceptually simple security
|
||||
partitioning mechanism, allowing the delegation of administrative rights
|
||||
within virtual machine partitions.
|
||||
.PP
|
||||
The implementation relies on
|
||||
restricting access within the jail environment to a well-defined subset
|
||||
of the overall host environment. This includes limiting interaction
|
||||
between processes, and to files, network resources, and privileged
|
||||
operations. Administrative overhead is reduced through avoiding
|
||||
fine-grained access control mechanisms, and maintaining a consistent
|
||||
administrative interface across partitions and the host environment.
|
||||
.PP
|
||||
The jail facility has already seen widespread deployment in particular as
|
||||
a vehicle for delivering "virtual private server" services.
|
||||
.PP
|
||||
The jail code is included in the base system as part of FreeBSD 4.0-RELEASE,
|
||||
and fully documented in the jail(2) and jail(8) man-pages.
|
||||
.bp
|
||||
.SH
|
||||
Notes & References
|
||||
.IP \s-2[BIBA]\s+2 .5i
|
||||
K. J. Biba, Integrity Considerations for Secure
|
||||
Computer Systems, USAF Electronic Systems Division, 1977
|
||||
.IP \s-2[CHROOT]\s+2 .5i
|
||||
Dr. Marshall Kirk Mckusick, private communication:
|
||||
``According to the SCCS logs, the chroot call was added by Bill Joy
|
||||
on March 18, 1982 approximately 1.5 years before 4.2BSD was released.
|
||||
That was well before we had ftp servers of any sort (ftp did not
|
||||
show up in the source tree until January 1983). My best guess as
|
||||
to its purpose was to allow Bill to chroot into the /4.2BSD build
|
||||
directory and build a system using only the files, include files,
|
||||
etc contained in that tree. That was the only use of chroot that
|
||||
I remember from the early days.''
|
||||
.IP \s-2[LOTTERY1]\s+2 .5i
|
||||
David Petrou and John Milford. Proportional-Share Scheduling:
|
||||
Implementation and Evaluation in a Widely-Deployed Operating System,
|
||||
December 1997.
|
||||
.nf
|
||||
\s-2\fChttp://www.cs.cmu.edu/~dpetrou/papers/freebsd_lottery_writeup98.ps\fP\s+2
|
||||
\s-2\fChttp://www.cs.cmu.edu/~dpetrou/code/freebsd_lottery_code.tar.gz\fP\s+2
|
||||
.IP \s-2[LOTTERY2]\s+2 .5i
|
||||
Carl A. Waldspurger and William E. Weihl. Lottery Scheduling: Flexible Proportional-Share Resource Management, Proceedings of the First Symposium on Operating Systems Design and Implementation (OSDI '94), pages 1-11, Monterey, California, November 1994.
|
||||
.nf
|
||||
\s-2\fChttp://www.research.digital.com/SRC/personal/caw/papers.html\fP\s+2
|
||||
.IP \s-2[POSIX1e]\s+2 .5i
|
||||
Draft Standard for Information Technology \(em
|
||||
Portable Operating System Interface (POSIX) \(em
|
||||
Part 1: System Application Program Interface (API) \(em Amendment:
|
||||
Protection, Audit and Control Interfaces [C Language]
|
||||
IEEE Std 1003.1e Draft 17 Editor Casey Schaufler
|
||||
.IP \s-2[ROOT]\s+2 .5i
|
||||
Historically other names have been used at times, Zilog for instance
|
||||
called the super-user account ``zeus''.
|
||||
.IP \s-2[UAS]\s+2 .5i
|
||||
One such niche product is the ``UAS'' system to maintain and audit
|
||||
RACF configurations on MVS systems.
|
||||
.nf
|
||||
\s-2\fChttp://www.entactinfo.com/products/uas/\fP\s+2
|
||||
.IP \s-2[UF]\s+2 .5i
|
||||
Quote from the User-Friendly cartoon by Illiad.
|
||||
.nf
|
||||
\s-2\fChttp://www.userfriendly.org/cartoons/archives/98nov/19981111.html\fP\s+2
|
Loading…
Reference in New Issue
Block a user