Document the potential for jail escape.
From r224615:
Always disable mount and unmount for jails with enforce_statfs==2.
From r231267:
A new jail(8) option "devfs_ruleset" defines the ruleset enforcement for
mounting devfs inside jails. A value of -1 disables mounting devfs in
jails, a value of zero means no restrictions. Nested jails can only
have mounting devfs disabled or inherit parent's enforcement as jails are
not allowed to view or manipulate devfs(8) rules.
From r232059:
To improve control over the use of mount(8) inside a jail(8), introduce
a new jail parameter node with the following parameters:
allow.mount.devfs:
allow mounting the devfs filesystem inside a jail
allow.mount.nullfs:
allow mounting the nullfs filesystem inside a jail
From r232186:
allow.mount.zfs:
allow mounting the zfs filesystem inside a jail
allow.mount.zfs:
allow mounting the zfs filesystem inside a jail
This way the permssions for mounting all current VFCF_JAIL filesystems
inside a jail are controlled wia allow.mount.* jail parameters.
Update sysctl descriptions.
Update jail(8) and zfs(8) manpages.
TODO: document the connection of allow.mount.* and VFCF_JAIL for kernel
developers
MFC after: 10 days
a new jail parameter node with the following parameters:
allow.mount.devfs:
allow mounting the devfs filesystem inside a jail
allow.mount.nullfs:
allow mounting the nullfs filesystem inside a jail
Both parameters are disabled by default (equals the behavior before
devfs and nullfs in jails). Administrators have to explicitly allow
mounting devfs and nullfs for each jail. The value "-1" of the
devfs_ruleset parameter is removed in favor of the new allow setting.
Reviewed by: jamie
Suggested by: pjd
MFC after: 2 weeks
A new jail(8) option "devfs_ruleset" defines the ruleset enforcement for
mounting devfs inside jails. A value of -1 disables mounting devfs in
jails, a value of zero means no restrictions. Nested jails can only
have mounting devfs disabled or inherit parent's enforcement as jails are
not allowed to view or manipulate devfs(8) rules.
Utilizes new functions introduced in r231265.
Reviewed by: jamie
MFC after: 1 month
Some errors printed the jail name for unnamed (command line) jails.
Attempting to create an already-existing jail from the command line
returned with no error (even for non-root) due to bad logic in
start_state.
Ignore kvm_proc errors, which are typically caused by permission
problems. Instead, stop ignoring permission errors when removing
a jail (but continue to silently ignore other errors, i.e. the
jail no longer existing). This makes non-root attempts at removing
a jail give a clearer error message.
jail(8) does a chdir(2) to the given path argument. Kernel evaluates the
jail path from the new cwd and not from the original cwd, which leads to
undesired behavior if given a relative path.
Reviewed by: jamie
MFC after: 2 weeks
recommended to allow root users in the jail to access the host system.
PR: docs/156853
Submitted by: crees
Patch by: crees
Approved by: re (kib) for BETA1
as part of jail removal (IP_STOP_TIMEOUT).
Note a jail as "removed" even if it wasn't jail_remove() that did
the deed, e.g. if it already went away because all its processes
were killed.
finish_command can be processed properly.
Call failed() once in next_command() instead of multiple times in
run_command().
Continue processing commands when a no-wait operation (IP__OP or background
command) succeeds.
Check for IPv4 or IPv6 to be available by the kernel to not
provoke errors trying to query options not available.
Make it possible to compile out INET or INET6 only parts.
a single command string to run, and an inner function (run_command) that
runs that single string.
Move the list of start/stop commands to run from a switch statement into
an array, with a new placeholder parameter IP__OP for actually creating
or removing the jail.
When jail creation fails, revert all non-exec commands in reverse order.
provoke errors trying to query options not available.
Make it possible to compile out INET or INET6 only parts.
Reviewed by: jamie
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
MFC after: 10 days
an attacker with root access to the jail can create a setuid binary for
their own use in the host environment (if they also have this access),
thus breaking root in the host.
This exploit is impossible if the jail's files are not world-readable.
Add instructions to the man page on how to create a jail with the
correct permissions set.
PR: docs/156853
Submitted by: Chris Rees (utisoft at gmail dot com)
Reviewed by: cperciva (security parts)
MFC after: 9 days
Make the parallelism limit a global instead of always passing it
to run_command and finish_command.
In the case of an empty command string, try to run any other strings
the command may have.
Replace JF_BACKGROUND with its sort-of opposite JF_SLEEPQ.
Change j->comstring earlier to render JF_RUNQ unncessary.
Change the if-else series to a more readable switch statement.
Treat IP_STOP_TIMEOUT like a command, calling run_command which then
calls term_procs.
When the IP_STOP_TIMEOUT "command" finishes, it shouldn't mess with
the parallelism limit.
Make sufficient checks in finish_command and run_command so that
the nonintuitive j->comstring null check isn't necessary to run them.
Rename the "waiting" queue to "depend", because the "sleeping" and
"runnable" queues are also used to wait for something.
path must be absolute.
mount paths must exist and have no symlinks beyond the jail's path itself.
consolelog must exist (apart from the final component) and have no
symlinks beyond the jail's path itself.
the jail(8) command. [10:04]
Fix a one-NUL-byte buffer overflow in libopie. [10:05]
Correctly sanity-check a buffer length in nfs mount. [10:06]
Approved by: so (cperciva)
Approved by: re (kensmith)
Security: FreeBSD-SA-10:04.jail
Security: FreeBSD-SA-10:05.opie
Security: FreeBSD-SA-10:06.nfsclient
whether to use source address selection (default) or the primary
jail address for unbound outgoing connections.
This is intended to be used by people upgrading from single-IP
jails to multi-IP jails but not having to change firewall rules,
application ACLs, ... but to force their connections (unless
otherwise changed) to the primry jail IP they had been used for
years, as well as for people prefering to implement similar policies.
Note that for IPv6, if configured incorrectly, this might lead to
scope violations, which single-IPv6 jails could as well, as by the
design of jails. [1]
Reviewed by: jamie, hrs (ipv6 part)
Pointed out by: hrs [1]
MFC After: 2 weeks
Asked for by: Jase Thew (bazerka beardz.net)
parameter unless a (numeric) IPv6 address is given. Even the default
binaries built with -DINET6 will work with IPv6-less kernels. With an
eye to the future, similarly handle the possibility of an IPv4-less kernel.
Approved by: re (kib), bz (mentor)
restrictions) were found to be inadequately described by a boolean.
Define a new parameter type with three values (disable, new, inherit)
to handle these and future cases.
Approved by: re (kib), bz (mentor)
Discussed with: rwatson
system callers of getgroups(), getgrouplist(), and setgroups() to
allocate buffers dynamically. Specifically, allocate a buffer of size
sysconf(_SC_NGROUPS_MAX)+1 (+2 in a few cases to allow for overflow).
This (or similar gymnastics) is required for the code to actually follow
the POSIX.1-2008 specification where {NGROUPS_MAX} may differ at runtime
and where getgroups may return {NGROUPS_MAX}+1 results on systems like
FreeBSD which include the primary group.
In id(1), don't pointlessly add the primary group to the list of all
groups, it is always the first result from getgroups(). In principle
the old code was more portable, but this was only done in one of the two
places where getgroups() was called to the overall effect was pointless.
Document the actual POSIX requirements in the getgroups(2) and
setgroups(2) manpages. We do not yet support a dynamic NGROUPS, but we
may in the future.
MFC after: 2 weeks
The system hostname is now stored in prison0, and the global variable
"hostname" has been removed, as has the hostname_mtx mutex. Jails may
have their own host information, or they may inherit it from the
parent/system. The proper way to read the hostname is via
getcredhostname(), which will copy either the hostname associated with
the passed cred, or the system hostname if you pass NULL. The system
hostname can still be accessed directly (and without locking) at
prison0.pr_host, but that should be avoided where possible.
The "similar information" referred to is domainname, hostid, and
hostuuid, which have also become prison parameters and had their
associated global variables removed.
Approved by: bz (mentor)
and jail_get(2). Jail(8) can now create jails using a "name=value"
format instead of just specifying a limited set of fixed parameters; it
can also modify parameters of existing jails. Jls(8) can display all
parameters of jails, or a specified set of parameters. The available
parameters are gathered from the kernel, and not hard-coded into these
programs.
Small patches on killall(1) and jexec(8) to support jail names with
jail_get(2).
Approved by: bz (mentor)
Bring in updated jail support from bz_jail branch.
This enhances the current jail implementation to permit multiple
addresses per jail. In addtion to IPv4, IPv6 is supported as well.
Due to updated checks it is even possible to have jails without
an IP address at all, which basically gives one a chroot with
restricted process view, no networking,..
SCTP support was updated and supports IPv6 in jails as well.
Cpuset support permits jails to be bound to specific processor
sets after creation.
Jails can have an unrestricted (no duplicate protection, etc.) name
in addition to the hostname. The jail name cannot be changed from
within a jail and is considered to be used for management purposes
or as audit-token in the future.
DDB 'show jails' command was added to aid debugging.
Proper compat support permits 32bit jail binaries to be used on 64bit
systems to manage jails. Also backward compatibility was preserved where
possible: for jail v1 syscalls, as well as with user space management
utilities.
Both jail as well as prison version were updated for the new features.
A gap was intentionally left as the intermediate versions had been
used by various patches floating around the last years.
Bump __FreeBSD_version for the afore mentioned and in kernel changes.
Special thanks to:
- Pawel Jakub Dawidek (pjd) for his multi-IPv4 patches
and Olivier Houchard (cognet) for initial single-IPv6 patches.
- Jeff Roberson (jeff) and Randall Stewart (rrs) for their
help, ideas and review on cpuset and SCTP support.
- Robert Watson (rwatson) for lots and lots of help, discussions,
suggestions and review of most of the patch at various stages.
- John Baldwin (jhb) for his help.
- Simon L. Nielsen (simon) as early adopter testing changes
on cluster machines as well as all the testers and people
who provided feedback the last months on freebsd-jail and
other channels.
- My employer, CK Software GmbH, for the support so I could work on this.
Reviewed by: (see above)
MFC after: 3 months (this is just so that I get the mail)
X-MFC Before: 7.2-RELEASE if possible
unmount jail-friendly file systems from within a jail.
Precisely it grants PRIV_VFS_MOUNT, PRIV_VFS_UNMOUNT and
PRIV_VFS_MOUNT_NONUSER privileges for a jailed super-user.
It is turned off by default.
A jail-friendly file system is a file system which driver registers
itself with VFCF_JAIL flag via VFS_SET(9) API.
The lsvfs(1) command can be used to see which file systems are
jail-friendly ones.
There currently no jail-friendly file systems, ZFS will be the first one.
In the future we may consider marking file systems like nullfs as
jail-friendly.
Reviewed by: rwatson
containing the jailid, path, hostname, ip and the command used to start
the jail.
PR: misc/89883
Submitted by: L. Jason Godsey <lannygodsey -at- yahoo.com>
Reviewed by: phk
MFC after: 1 week
behaviour of chflags within a jail. If set to 0 (the default), then a
jailed root user is treated as an unprivileged user; if set to 1, then
a jailed root user is treated the same as an unjailed root user.
This is necessary to allow "make installworld" to work inside a jail,
since it attempts to manipulate the system immutable flag on certain
files.
Discussed with: csjp, rwatson
MFC after: 2 weeks
hence bump it to 6.
Note that the last commit message was not quite accurate. While the
assumption exists in the code, it's not possible to have an
uninitialized p there because if lflag is set when username is NULL
then execution would be terminated earlier.
seeing status of mounted file system for jailed processes.
Pass full path of jail's root directory to the kernel. mount(8) utility is
doing the same thing already.
about the risks of enabling raw sockets in prisons.
Because raw sockets can be used to configure and interact
with various network subsystems, extra caution should be
used where privileged access to jails is given out to
untrusted parties. As such, by default this option is disabled.
A few others and I are currently auditing the kernel
source code to ensure that the use of raw sockets by
privledged prison users is safe.
Approved by: bmilekic (mentor)
o getpwnam(3) returns NULL and does not set errno when the user does
not exist. Bail out with "no such user" instead of "Unknown error: 0".
PR: bin/67262
Submitted by: demon (-U flag)
MFC after: 3 weeks
(1) Document the notion of using jail(8) to run "virtual servers" or
just to constrain specific applications. If only running specific
applications, some configuration steps are unnecessary (such as
editing rc.conf).
(2) Add some more subsection headers to break up the bigger chunks of
text.
(3) Clarify the problems associated with applications binding all IP
addresses in the host, and attempt to be more specific about
potential application problems. Document how to force sshd to
bind the the right socket.
(4) Suggest that in a jailed application scenario, you might want to
have the host syslogd listen on the socket in the jail, rather
than running syslogd in the jail.
(5) Catch another reference to /stand/sysinstall.
Approved by: re (bmah implicitly)
tell them that they also need to use devfs rules to prevent
inappropriate devices from appearing in the jail; add an Xref. In
earlier versions of this man page, the user was instructed to use
sh MAKEDEV jail, which only created a minimal set of device nodes.
o Add jexec(8) to execute a command in an existing jail.
o Add -j option for killall(1) to kill all processes in a specified
jail.
o Add -i option to jail(8) to output jail ID of newly created jail.