224 Commits

Author SHA1 Message Date
mjg
10f6c42acc Replace hand-rolled unrefs if > 1 with refcount_release_if_not_last
Sponsored by:	The FreeBSD Foundation
2018-12-07 16:11:45 +00:00
jamie
33597fd138 In hardened systems, where the security.bsd.unprivileged_proc_debug sysctl
node is set, allow setting security.bsd.unprivileged_proc_debug per-jail.
In part, this is needed to create jails in which the Address Sanitizer
(ASAN) fully works as ASAN utilizes libkvm to inspect the virtual address
space. Instead of having to allow unprivileged process debugging for the
entire system, allow setting it on a per-jail basis.

The sysctl node is still security.bsd.unprivileged_proc_debug and the
jail(8) param is allow.unprivileged_proc_debug. The sysctl code is now a
sysctl proc rather than a sysctl int. This allows us to determine setting
the flag for the corresponding jail (or prison0).

As part of the change, the dynamic allow.* API needed to be modified to
take into account pr_allow flags which may now be disabled in prison0.
This prevents conflicts with new pr_allow flags (like that of vmm(4)) that
are added (and removed) dynamically.

Also teach the jail creation KPI to allow differences for certain pr_allow
flags between the parent and child jail. This can happen when unprivileged
process debugging is disabled in the parent prison, but enabled in the
child.

Submitted by:	Shawn Webb <lattera at gmail.com>
Obtained from:	HardenedBSD (45b3625edba0f73b3e3890b1ec3d0d1e95fd47e1, deba0b5078cef0faae43cbdafed3035b16587afc, ab21eeb3b4c72f2500987c96ff603ccf3b6e7de8)
Relnotes:	yes
Sponsored by:	HardenedBSD and G2, Inc
Differential Revision:	https://reviews.freebsd.org/D18319
2018-11-27 17:51:50 +00:00
kib
a79b3edff0 Allow set ether/vlan PCP operation from the VNET jails.
The vlan interfaces can be created from vnet jails, it seems, so it
sounds logical to allow pcp configuration as well.

Reviewed by:	bz, hselasky (previous version)
Sponsored by:	Mellanox Technologies
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D17777
2018-11-12 15:59:32 +00:00
jamie
47dc3d2edc Fix typos from r339409.
Reported by:	maxim
Approved by:	re (gjb)
2018-10-18 15:02:57 +00:00
jamie
ae3e1ed6d1 Add a new jail permission, allow.read_msgbuf. When true, jailed processes
can see the dmesg buffer (this is the current behavior).  When false (the
new default), dmesg will be unavailable to jailed users, whether root or
not.

The security.bsd.unprivileged_read_msgbuf sysctl still works as before,
controlling system-wide whether non-root users can see the buffer.

PR:		211580
Submitted by:	bz
Approved by:	re@ (kib@)
MFC after:	3 days
2018-10-17 16:11:43 +00:00
jamie
3755eafea0 Fix the test prohibiting jails from sharing IP addresses.
It's not supposed to be legal for two jails to contain the same IP address,
unless both jails contain only that one address.  This is the behavior
documented in jail(8), and is there to prevent confusion when multiple
jails are listening on IADDR_ANY.

VIMAGE jails (now the default for GENERIC kernels) test this correctly,
but non-VIMAGE jails have been performing an incomplete test when nested
jails are used.

Approved by:	re@ (kib@)
MFC after:	5 days
2018-10-06 02:10:32 +00:00
jamie
6b9aac38ce Revert r337922, except for some documention-only bits. This needs to wait
until user is changed to stop using jail(2).

Differential Revision:	D14791
2018-08-16 19:09:43 +00:00
jamie
94a36bb7c1 Put jail(2) under COMPAT_FREEBSD11. It has been the "old" way of creating
jails since FreeBSD 7.

Along with the system call, put the various security.jail.allow_foo and
security.jail.foo_allowed sysctls partly under COMPAT_FREEBSD11 (or
BURN_BRIDGES).  These sysctls had two disparate uses: on the system side,
they were global permissions for jails created via jail(2) which lacked
fine-grained permission controls; inside a jail, they're read-only
descriptions of what the current jail is allowed to do.  The first use
is obsolete along with jail(2), but keep them for the second-read-only use.

Differential Revision:	D14791
2018-08-16 18:40:16 +00:00
antoine
8930512f7c Add allow.mlock to jail parameters
It allows locking or unlocking physical pages in memory within a jail

This allows running elasticsearch with "bootstrap.memory_lock" inside a jail

Reviewed by:	jamie@
Differential Revision:	https://reviews.freebsd.org/D16342
2018-07-29 12:41:56 +00:00
jamie
bc86284c7c Change prison_add_vfs() to the more generic prison_add_allow(), which
can add any dynamic allow.* or allow.*.* parameter.  Also keep
prison_add_vfs() as a wrapper.

Differential Revision:	D16146
2018-07-06 18:50:22 +00:00
kib
bf21f0b257 Silence warnings about unused variables when RACCT is defined but RCTL
is not.

Reported by:	Dries Michiels <driesm.michiels@gmail.com>
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2018-07-05 13:37:31 +00:00
bz
299899c6fd Instead of using hand-rolled loops where not needed switch them
to FOREACH_PROC_IN_SYSTEM() to have a single pattern to look for.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D15916
2018-06-20 11:42:06 +00:00
bz
d5cee881d6 Try to be consistent and spell "vnet" lower case like all the
other options (and as we do on command line).

Sponsored by:	iXsystems, Inc.
2018-05-24 15:31:05 +00:00
bz
89d7a27c82 Improve the KASSERT to also have the prison pointer.
Helpful when debugging from ddb.

Sponsored by:		iXsystems, Inc.
2018-05-24 15:28:21 +00:00
jamie
1c11f552d6 Make it easier for filesystems to count themselves as jail-enabled,
by doing most of the work in a new function prison_add_vfs in kern_jail.c
Now a jail-enabled filesystem need only mark itself with VFCF_JAIL, and
the rest is taken care of.  This includes adding a jail parameter like
allow.mount.foofs, and a sysctl like security.jail.mount_foofs_allowed.
Both of these used to be a static list of known filesystems, with
predefined permission bits.

Reviewed by:	kib
Differential Revision:	D14681
2018-05-04 20:54:27 +00:00
avg
3cd5284f93 call racct_proc_ucred_changed() under the proc lock
The lock is required to ensure that the switch to the new credentials
and the transfer of the process's accounting data from the old
credentials to the new ones is done atomically.  Otherwise, some updates
may be applied to the new credentials and then additionally transferred
from the old credentials if the updates happen after proc_set_cred() and
before racct_proc_ucred_changed().

The problem is especially pronounced for RACCT_RSS because
- there is a strict accounting for this resource (it's reclaimable)
- it's updated asynchronously by the vm daemon
- it's updated by setting an absolute value instead of applying a delta

I had to remove a call to rctl_proc_ucred_changed() from
racct_proc_ucred_changed() and make all callers of latter call the
former as well.  The reason is that rctl_proc_ucred_changed, as it is
implemented now, cannot be called while holding the proc lock, so the
lock is dropped after calling racct_proc_ucred_changed.  Additionally,
I've added calls to crhold / crfree around the rctl call, because
without the proc lock there is no gurantee that the new credentials,
owned by the process, will stay stable.  That does not eliminate a
possibility that the credentials passed to the rctl will get stale.
Ideally, rctl_proc_ucred_changed should be able to work under the proc
lock.

Many thanks to kib for pointing out the above problems.

PR:		222027
Discussed with:	kib
No comment:	trasz
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D15048
2018-04-20 13:08:04 +00:00
brooks
9d79658aab Move most of the contents of opt_compat.h to opt_global.h.
opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.

Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c.  A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.

Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.

Reviewed by:	kib, cem, jhb, jtl
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14941
2018-04-06 17:35:35 +00:00
jamie
783e904fc9 Represent boolean jail options as an array of structures containing the
flag and both the regular and "no" names, instead of two different string
arrays whose indices need to match the flag's bit position.  This makes
them similar to the say "jailsys" options are represented.

Loop through either kind of option array with a structure pointer rather
then an integer index.
2018-03-20 23:08:42 +00:00
pfg
cc22a86800 sys/kern: adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 15:20:12 +00:00
allanjude
86ca7d2af5 Jails: Optionally prevent jailed root from binding to privileged ports
You may now optionally specify allow.noreserved_ports to prevent root
inside a jail from using privileged ports (less than 1024)

PR:		217728
Submitted by:	Matt Miller <mattm916@pulsar.neomailbox.ch>
Reviewed by:	jamie, cem, smh
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D10202
2017-06-06 02:15:00 +00:00
vangyzen
c7c348accc Use inet_ntoa_r() instead of inet_ntoa() throughout the kernel
inet_ntoa() cannot be used safely in a multithreaded environment
because it uses a static local buffer. Instead, use inet_ntoa_r()
with a buffer on the caller's stack.

Suggested by:	glebius, emaste
Reviewed by:	gnn
MFC after:	2 weeks
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D9625
2017-02-16 20:47:41 +00:00
stevek
12aa5d6087 Move IPv4-specific jail functions to new file netinet/in_jail.c
_prison_check_ip4 renamed to prison_check_ip4_locked

Move IPv6-specific jail functions to new file netinet6/in6_jail.c
_prison_check_ip6 renamed to prison_check_ip6_locked

Add appropriate prototypes to sys/sys/jail.h

Adjust kern_jail.c to call prison_check_ip4_locked and
prison_check_ip6_locked accordingly.

Add netinet/in_jail.c and netinet6/in6_jail.c to the list of files that
need to be built when INET and INET6, respectively, are configured in the
kernel configuration file.

Reviewed by:	jtl
Approved by:	sjg (mentor)
Sponsored by:	Juniper Networks, Inc.
Differential Revision:	https://reviews.freebsd.org/D6799
2016-08-09 02:16:21 +00:00
jamie
aad828cb49 Fix a vnode leak when giving a child jail a too-long path when
debug.disablefullpath=1.
2016-06-09 21:59:11 +00:00
jamie
786e18e298 Re-order some jail parameter reading to prevent a vnode leak. 2016-06-09 20:43:14 +00:00
jamie
91dc125e2d Clean up some logic in jail error messages, replacing a missing test and
a redundant test with a single correct test.
2016-06-09 20:39:57 +00:00
jamie
debe558799 Make sure the OSD methods for jail set and remove can't run concurrently,
by holding allprison_lock exclusively (even if only for a moment before
downgrading) on all paths that call PR_METHOD_REMOVE.  Since they may run
on a downgraded lock, it's still possible for them to run concurrently
with PR_METHOD_GET, which will need to use the prison lock.
2016-06-09 16:41:41 +00:00
jamie
f9e8138137 Mark jail(2), and the sysctls that it (and only it) uses as deprecated.
jail(8) has long used jail_set(2), and those sysctl only cause confusion.
2016-05-30 05:21:24 +00:00
pfg
28823d0656 sys/kern: spelling fixes in comments.
No functional change.
2016-04-29 22:15:33 +00:00
jamie
c39e007779 Delay revmoing the last jail reference in prison_proc_free, and instead
put it off into the pr_task.  This is similar to prison_free, and in fact
uses the same task even though they do something slightly different.

This resolves a LOR between the process lock and allprison_lock, which
came about in r298565.

PR:		48471
2016-04-27 02:25:21 +00:00
jamie
6043e5f56b Use crcopysafe in jail_attach. 2016-04-26 21:19:12 +00:00
jamie
f29978c971 Pass the current/new jail to PR_METHOD_CHECK, which pushes the call
until after the jail is found or created.  This requires unlocking the
jail for the call and re-locking it afterward, but that works because
nothing in the jail has been changed yet, and other processes won't
change the important fields as long as allprison_lock remains held.

Keep better track of name vs namelc in kern_jail_set.  Name should
always be the hierarchical name (relative to the caller), and namelc
the last component.

PR:		48471
MFC after:	5 days
2016-04-25 04:27:58 +00:00
jamie
2003e1acf2 Add a new jail OSD method, PR_METHOD_REMOVE. It's called when a jail is
removed from the user perspective, i.e. when the last pr_uref goes away,
even though the jail mail still exist in the dying state.  It will also
be called if either PR_METHOD_CREATE or PR_METHOD_SET fail.

PR:		48471
MFC after:	 5 days
2016-04-25 04:24:00 +00:00
jamie
b67e183366 Remove the PR_REMOVE flag, which was meant as a temporary marker for
a jail that might be seen mid-removal.  It hasn't been doing the right
thing since at least the ability to resurrect dying jails, and such
resurrection also makes it unnecessary.
2016-04-25 03:58:08 +00:00
pfg
a7d40a88c9 kernel: use our nitems() macro when it is available through param.h.
No functional change, only trivial cases are done in this sweep,

Discussed in:	freebsd-current
2016-04-19 23:48:27 +00:00
jamie
b78d6a91e2 Fix jail name checking that disallowed anything that starts with '0'.
The intention was to just limit leading zeroes on numeric names.  That
check is now improved to also catch the leading spaces and '+' that
strtoul can pass through.

PR:		204897
MFC after:	3 days
2015-12-15 17:25:00 +00:00
trasz
ae1d62cec8 Speed up rctl operation with large rulesets, by holding the lock
during iteration instead of relocking it for each traversed rule.

Reviewed by:	mjg@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D4110
2015-11-15 12:10:51 +00:00
araujo
61ced0e48d Add support to the jail framework to be able to mount linsysfs(5) and
linprocfs(5).

Differential Revision:	D2846
Submitted by:		Nikolai Lifanov <lifanov@mail.lifanov.com>
Reviewed by:		jamie
2015-07-19 08:52:35 +00:00
mjg
c71e9ab863 Move chdir/chroot-related fdp manipulation to kern_descrip.c
Prefix exported functions with pwd_.

Deduplicate some code by adding a helper for setting fd_cdir.

Reviewed by:	kib
2015-07-11 16:19:11 +00:00
bz
ab1516b90e Initialise pr_enforce_statfs from the "default" sysctl value and
not from the compile time constant.  The sysctl value is seeded
from the compile time constant.

MFC after:	2 weeks
2015-06-17 13:15:54 +00:00
trasz
802017a04b Add kern.racct.enable tunable and RACCT_DISABLED config option.
The point of this is to be able to add RACCT (with RACCT_DISABLED)
to GENERIC, to avoid having to rebuild the kernel to use rctl(8).

Differential Revision:	https://reviews.freebsd.org/D2369
Reviewed by:	kib@
MFC after:	1 month
Relnotes:	yes
Sponsored by:	The FreeBSD Foundation
2015-04-29 10:23:02 +00:00
glebius
baf4ea8ca8 Do not include if_var.h and in6_var.h into kern_jail.c. It is now possible
after r280444.

Sponsored by:	Nginx, Inc.
2015-03-24 16:46:40 +00:00
mjg
054f9cab59 cred: add proc_set_cred helper
The goal here is to provide one place altering process credentials.

This eases debugging and opens up posibilities to do additional work when such
an action is performed.
2015-03-16 00:10:03 +00:00
ian
0b0c9cce2f Format the line properly (wrap before column 80). 2015-02-28 17:44:31 +00:00
ian
57c4ae8cb3 Export the new osreldate and osrelease jail parms in jail_get(2). 2015-02-28 17:32:31 +00:00
ian
1df855e5be Allow the kern.osrelease and kern.osreldate sysctl values to be set in a
jail's creation parameters.  This allows the kernel version to be reliably
spoofed within the jail whether examined directly with sysctl or
indirectly with the uname -r and -K options.

The values can only be set at jail creation time, to eliminate the need
for any locking when accessing the values via sysctl.

The overridden values are inherited by nested jails (unless the config for
the nested jails also overrides the values).

There is no sanity or range checking, other than disallowing an empty
release string or a zero release date, by design.  The system
administrator is trusted to set sane values.  Setting values that are
newer than the actual running kernel will likely cause compatibility
problems.

Differential Revision:	https://reviews.freebsd.org/D1948
Relnotes:	yes
2015-02-27 16:28:55 +00:00
jamie
c7d0935d11 Add allow.mount.fdescfs jail flag.
PR:		192951
Submitted by:	ruben@verweg.com
MFC after:	3 days
2015-01-28 21:08:09 +00:00
jamie
15f3ae0c52 Remove the prison flags PR_IP4_DISABLE and PR_IP6_DISABLE, which have been
write-only for as long as they've existed.
2015-01-14 04:50:28 +00:00
jamie
20db074137 Don't set prison's pr_ip4s or pr_ip6s to -1.
PR:		196474
MFC after:	3 days
2015-01-14 03:52:41 +00:00
trasz
76b4ee1f43 Avoid unlocking unlocked mutex in RCTL jail code. Specific test case
is attached to PR.

PR:		193457
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2014-09-09 16:05:33 +00:00
glebius
80e85e32a5 Remove AppleTalk support.
AppleTalk was a network transport protocol for Apple Macintosh devices
in 80s and then 90s. Starting with Mac OS X in 2000 the AppleTalk was
a legacy protocol and primary networking protocol is TCP/IP. The last
Mac OS X release to support AppleTalk happened in 2009. The same year
routing equipment vendors (namely Cisco) end their support.

Thus, AppleTalk won't be supported in FreeBSD 11.0-RELEASE.
2014-03-14 06:29:43 +00:00