Commit Graph

240 Commits

Author SHA1 Message Date
Mariusz Zaborski
05e1e482c7 jail: introduce per jail suser_enabled setting
The suser_enable sysctl allows to remove a privileged rights from uid 0.
This change introduce per jail setting which allow to make root a
normal user.

Reviewed by:	jamie
Previous version reviewed by:	kevans, emaste, markj, me_igalic.co
Discussed with:	pjd
Differential Revision:	https://reviews.freebsd.org/D27128
2020-11-18 21:07:08 +00:00
Mariusz Zaborski
21fe9441e1 Fix style nits. 2020-11-18 20:59:58 +00:00
Mateusz Guzik
6fed89b179 kern: clean up empty lines in .c and .h files 2020-09-01 22:12:32 +00:00
Mateusz Guzik
a459a6cfe7 vfs: respect PRIV_VFS_LOOKUP in vaccess_smr
Reported by:	novel
2020-08-25 14:18:50 +00:00
Adrian Chadd
f7d38a13a8 [net80211] Add new privileges; restrict what can be done in a jail.
Split the MANAGE privilege into MANAGE, SETMAC and CREATE_VAP.

+ VAP_MANAGE is everything but setting the MAC and creating a VAP.
+ VAP_SETMAC is setting the MAC address of the VAP.
  Typically you wouldn't want the jail to be able to modify this.
+ CREATE_VAP is to create a new VAP. Again, you don't want to be doing
  this in a jail, but this DOES stop being able to run some corner
  cases like Dynamic WDS (DWDS) AP in a jail/vnet. We can figure this
  bit out later.

This allows me to run wpa_supplicant in a jail after transferring
a STA VAP into it. I unfortunately can't currently set the wlan
debugging inside the jail; that would be super useful!

Reviewed by:	bz
Differential Revision:	https://reviews.freebsd.org/D25630
2020-07-19 15:16:27 +00:00
Kyle Evans
63619b6dba vfs: add restrictions to read(2) of a directory [2/2]
This commit adds the priv(9) that waters down the sysctl to make it only
allow read(2) of a dirfd by the system root. Jailed root is not allowed, but
jail policy and superuser policy will abstain from allowing/denying it so
that a MAC module can fully control the policy.

Such a MAC module has been written, and can be found at:
https://people.freebsd.org/~kevans/mac_read_dir-0.1.0.tar.gz

It is expected that the MAC module won't be needed by many, as most only
need to do such diagnostics that require this behavior as system root
anyways. Interested parties are welcome to grab the MAC module above and
create a port or locally integrate it, and with enough support it could see
introduction to base. As noted in mac_read_dir.c, it is released under the
BSD 2 clause license and allows the restrictions to be lifted for only
jailed root or for all unprivileged users.

PR:		246412
Reviewed by:	mckusick, kib, emaste, jilles, cy, phk, imp (all previous)
Reviewed by:	rgrimes (latest version)
Differential Revision:	https://reviews.freebsd.org/D24596
2020-06-04 18:17:25 +00:00
Kristof Provost
3f8bc99c4c ethersubr: Make the mac address generation more robust
If we create two (vnet) jails and create a bridge interface in each we end up
with the same mac address on both bridge interfaces.
These very often conflicts, resulting in same mac address in both jails.

Mitigate this problem by including the jail name in the mac address.

Reviewed by:	kevans, melifaro
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D24383
2020-04-18 07:50:30 +00:00
Kyle Evans
c318828929 Preload hostuuid for early-boot use
prison0's hostuuid will get set by the hostid rc script, either after
generating it and saving it to /etc/hostid or by simply reading /etc/hostid.

Some things (e.g. arbitrary MAC address generation) may use the hostuuid as
a factor in early boot, so providing a way to read /etc/hostid (if it's
available) and using it before userland starts up is desirable. The code is
written such that the preload doesn't *have* to be /etc/hostid, thus not
assuming that there will be newline at the end of the buffer or even the
exact shape of the newline. White trailing whitespace/non-printables
trimmed, the result will be validated as a valid uuid before it's used for
early boot purposes.

The preload can be turned off with hostuuid_load="NO" in /boot/loader.conf,
just as other preloads; it's worth noting that this is a 37-byte file, the
overhead is believed to be generally minimal.

It doesn't seem necessary at this time to be concerned with kern.hostid.

One does wonder if we should consider validating hostuuids coming in
via jail_set(2); some bits seem to care about uuid form and we bother
validating format of smbios-provided uuid and in-fact whatever uuid comes
from /etc/hostid.

Reviewed by:	karels, delphij, jamie
MFC after:	1 week (don't preload by default, probably)
Differential Revision:	https://reviews.freebsd.org/D24288
2020-04-16 00:54:06 +00:00
Bjoern A. Zeeb
1b786d0191 kern_jail: missing \0 termination check on osrelease parameter
If a user spplies a non-\0 terminated osrelease parameter reading it back
may disclose kernel memory.
This is a problem in case of nested jails (children.max > 0, which is not
the default).  Otherwise root outside the jail has access to kernel memory
by other means and root inside a jail cannot create a child jail.

Add the proper \0 check at the end of a supplied osrelease parameter and
make sure any copies of the field will be \0-terminated.

Submitted by:	Hans Christian Woithe (chwoithe yahoo.com)
MFC after:	3 days
2020-03-14 14:04:55 +00:00
Pawel Biernacki
7029da5c36 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
Mateusz Guzik
7b2ff0dcb2 Partially decompose priv_check by adding priv_check_cred_vfs_generation
During buildkernel there are very frequent calls to priv_check and they
all are for PRIV_VFS_GENERATION (coming from stat/fstat).

This results in branching on several potential privileges checking if
perhaps that's the one which has to be evaluated.

Instead of the kitchen-sink approach provide a way to have commonly used
privs directly evaluated.
2020-02-13 22:22:15 +00:00
Mateusz Guzik
e6081fe899 Inline jailed().
It is constantly called from priv_check.
2020-02-13 22:16:30 +00:00
Mateusz Guzik
3eb6b656c2 vfs: remove now useless ENODEV handling from vn_fullpath consumers
Noted by:	ngie
2020-02-08 15:51:08 +00:00
Mateusz Guzik
3ff65f71cb Remove duplicated empty lines from kern/*.c
No functional changes.
2020-01-30 20:05:05 +00:00
Mateusz Guzik
b249ce48ea vfs: drop the mostly unused flags argument from VOP_UNLOCK
Filesystems which want to use it in limited capacity can employ the
VOP_UNLOCK_FLAGS macro.

Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D21427
2020-01-03 22:29:58 +00:00
Alexander V. Chernikov
c83dda362e Split gigantic rtsock route_output() into smaller functions.
Amount of changes to the original code has been intentionally minimised
to ease diffing.
The changes are mostly mechanical, with the following exceptions:

* lltable handler is now called directly based of RTF_LLINFO flag presense.
* "report" logic for updating rtm in RTM_GET/RTM_DELETE has been simplified,
  fixing several potential use-after-free cases in rt_addrinfo.
* llable asserts has been replaced with error-returning, preventing kernel
  crashes when lltable gw af family is invalid (root required).

MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D22864
2019-12-31 17:26:53 +00:00
Mateusz Guzik
6ff4688b09 Replace hand-rolled unrefs if > 1 with refcount_release_if_not_last
Sponsored by:	The FreeBSD Foundation
2018-12-07 16:11:45 +00:00
Jamie Gritton
b307954481 In hardened systems, where the security.bsd.unprivileged_proc_debug sysctl
node is set, allow setting security.bsd.unprivileged_proc_debug per-jail.
In part, this is needed to create jails in which the Address Sanitizer
(ASAN) fully works as ASAN utilizes libkvm to inspect the virtual address
space. Instead of having to allow unprivileged process debugging for the
entire system, allow setting it on a per-jail basis.

The sysctl node is still security.bsd.unprivileged_proc_debug and the
jail(8) param is allow.unprivileged_proc_debug. The sysctl code is now a
sysctl proc rather than a sysctl int. This allows us to determine setting
the flag for the corresponding jail (or prison0).

As part of the change, the dynamic allow.* API needed to be modified to
take into account pr_allow flags which may now be disabled in prison0.
This prevents conflicts with new pr_allow flags (like that of vmm(4)) that
are added (and removed) dynamically.

Also teach the jail creation KPI to allow differences for certain pr_allow
flags between the parent and child jail. This can happen when unprivileged
process debugging is disabled in the parent prison, but enabled in the
child.

Submitted by:	Shawn Webb <lattera at gmail.com>
Obtained from:	HardenedBSD (45b3625edba0f73b3e3890b1ec3d0d1e95fd47e1, deba0b5078cef0faae43cbdafed3035b16587afc, ab21eeb3b4c72f2500987c96ff603ccf3b6e7de8)
Relnotes:	yes
Sponsored by:	HardenedBSD and G2, Inc
Differential Revision:	https://reviews.freebsd.org/D18319
2018-11-27 17:51:50 +00:00
Konstantin Belousov
389474c122 Allow set ether/vlan PCP operation from the VNET jails.
The vlan interfaces can be created from vnet jails, it seems, so it
sounds logical to allow pcp configuration as well.

Reviewed by:	bz, hselasky (previous version)
Sponsored by:	Mellanox Technologies
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D17777
2018-11-12 15:59:32 +00:00
Jamie Gritton
4520f617c9 Fix typos from r339409.
Reported by:	maxim
Approved by:	re (gjb)
2018-10-18 15:02:57 +00:00
Jamie Gritton
b19d66fd5a Add a new jail permission, allow.read_msgbuf. When true, jailed processes
can see the dmesg buffer (this is the current behavior).  When false (the
new default), dmesg will be unavailable to jailed users, whether root or
not.

The security.bsd.unprivileged_read_msgbuf sysctl still works as before,
controlling system-wide whether non-root users can see the buffer.

PR:		211580
Submitted by:	bz
Approved by:	re@ (kib@)
MFC after:	3 days
2018-10-17 16:11:43 +00:00
Jamie Gritton
08b4333399 Fix the test prohibiting jails from sharing IP addresses.
It's not supposed to be legal for two jails to contain the same IP address,
unless both jails contain only that one address.  This is the behavior
documented in jail(8), and is there to prevent confusion when multiple
jails are listening on IADDR_ANY.

VIMAGE jails (now the default for GENERIC kernels) test this correctly,
but non-VIMAGE jails have been performing an incomplete test when nested
jails are used.

Approved by:	re@ (kib@)
MFC after:	5 days
2018-10-06 02:10:32 +00:00
Jamie Gritton
c542c43ef1 Revert r337922, except for some documention-only bits. This needs to wait
until user is changed to stop using jail(2).

Differential Revision:	D14791
2018-08-16 19:09:43 +00:00
Jamie Gritton
284001a222 Put jail(2) under COMPAT_FREEBSD11. It has been the "old" way of creating
jails since FreeBSD 7.

Along with the system call, put the various security.jail.allow_foo and
security.jail.foo_allowed sysctls partly under COMPAT_FREEBSD11 (or
BURN_BRIDGES).  These sysctls had two disparate uses: on the system side,
they were global permissions for jails created via jail(2) which lacked
fine-grained permission controls; inside a jail, they're read-only
descriptions of what the current jail is allowed to do.  The first use
is obsolete along with jail(2), but keep them for the second-read-only use.

Differential Revision:	D14791
2018-08-16 18:40:16 +00:00
Antoine Brodin
ccd6ac9f6e Add allow.mlock to jail parameters
It allows locking or unlocking physical pages in memory within a jail

This allows running elasticsearch with "bootstrap.memory_lock" inside a jail

Reviewed by:	jamie@
Differential Revision:	https://reviews.freebsd.org/D16342
2018-07-29 12:41:56 +00:00
Jamie Gritton
0a1724045e Change prison_add_vfs() to the more generic prison_add_allow(), which
can add any dynamic allow.* or allow.*.* parameter.  Also keep
prison_add_vfs() as a wrapper.

Differential Revision:	D16146
2018-07-06 18:50:22 +00:00
Konstantin Belousov
dbadb01591 Silence warnings about unused variables when RACCT is defined but RCTL
is not.

Reported by:	Dries Michiels <driesm.michiels@gmail.com>
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2018-07-05 13:37:31 +00:00
Bjoern A. Zeeb
7938a4425a Instead of using hand-rolled loops where not needed switch them
to FOREACH_PROC_IN_SYSTEM() to have a single pattern to look for.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D15916
2018-06-20 11:42:06 +00:00
Bjoern A. Zeeb
bb8f162363 Try to be consistent and spell "vnet" lower case like all the
other options (and as we do on command line).

Sponsored by:	iXsystems, Inc.
2018-05-24 15:31:05 +00:00
Bjoern A. Zeeb
36b41cc336 Improve the KASSERT to also have the prison pointer.
Helpful when debugging from ddb.

Sponsored by:		iXsystems, Inc.
2018-05-24 15:28:21 +00:00
Jamie Gritton
0e5c6bd436 Make it easier for filesystems to count themselves as jail-enabled,
by doing most of the work in a new function prison_add_vfs in kern_jail.c
Now a jail-enabled filesystem need only mark itself with VFCF_JAIL, and
the rest is taken care of.  This includes adding a jail parameter like
allow.mount.foofs, and a sysctl like security.jail.mount_foofs_allowed.
Both of these used to be a static list of known filesystems, with
predefined permission bits.

Reviewed by:	kib
Differential Revision:	D14681
2018-05-04 20:54:27 +00:00
Andriy Gapon
f87beb93e8 call racct_proc_ucred_changed() under the proc lock
The lock is required to ensure that the switch to the new credentials
and the transfer of the process's accounting data from the old
credentials to the new ones is done atomically.  Otherwise, some updates
may be applied to the new credentials and then additionally transferred
from the old credentials if the updates happen after proc_set_cred() and
before racct_proc_ucred_changed().

The problem is especially pronounced for RACCT_RSS because
- there is a strict accounting for this resource (it's reclaimable)
- it's updated asynchronously by the vm daemon
- it's updated by setting an absolute value instead of applying a delta

I had to remove a call to rctl_proc_ucred_changed() from
racct_proc_ucred_changed() and make all callers of latter call the
former as well.  The reason is that rctl_proc_ucred_changed, as it is
implemented now, cannot be called while holding the proc lock, so the
lock is dropped after calling racct_proc_ucred_changed.  Additionally,
I've added calls to crhold / crfree around the rctl call, because
without the proc lock there is no gurantee that the new credentials,
owned by the process, will stay stable.  That does not eliminate a
possibility that the credentials passed to the rctl will get stale.
Ideally, rctl_proc_ucred_changed should be able to work under the proc
lock.

Many thanks to kib for pointing out the above problems.

PR:		222027
Discussed with:	kib
No comment:	trasz
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D15048
2018-04-20 13:08:04 +00:00
Brooks Davis
6469bdcdb6 Move most of the contents of opt_compat.h to opt_global.h.
opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.

Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c.  A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.

Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.

Reviewed by:	kib, cem, jhb, jtl
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14941
2018-04-06 17:35:35 +00:00
Jamie Gritton
672756aa9f Represent boolean jail options as an array of structures containing the
flag and both the regular and "no" names, instead of two different string
arrays whose indices need to match the flag's bit position.  This makes
them similar to the say "jailsys" options are represented.

Loop through either kind of option array with a structure pointer rather
then an integer index.
2018-03-20 23:08:42 +00:00
Pedro F. Giffuni
8a36da99de sys/kern: adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 15:20:12 +00:00
Allan Jude
e28f9b7d03 Jails: Optionally prevent jailed root from binding to privileged ports
You may now optionally specify allow.noreserved_ports to prevent root
inside a jail from using privileged ports (less than 1024)

PR:		217728
Submitted by:	Matt Miller <mattm916@pulsar.neomailbox.ch>
Reviewed by:	jamie, cem, smh
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D10202
2017-06-06 02:15:00 +00:00
Eric van Gyzen
8144690af4 Use inet_ntoa_r() instead of inet_ntoa() throughout the kernel
inet_ntoa() cannot be used safely in a multithreaded environment
because it uses a static local buffer. Instead, use inet_ntoa_r()
with a buffer on the caller's stack.

Suggested by:	glebius, emaste
Reviewed by:	gnn
MFC after:	2 weeks
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D9625
2017-02-16 20:47:41 +00:00
Stephen J. Kiernan
0ce1624d0e Move IPv4-specific jail functions to new file netinet/in_jail.c
_prison_check_ip4 renamed to prison_check_ip4_locked

Move IPv6-specific jail functions to new file netinet6/in6_jail.c
_prison_check_ip6 renamed to prison_check_ip6_locked

Add appropriate prototypes to sys/sys/jail.h

Adjust kern_jail.c to call prison_check_ip4_locked and
prison_check_ip6_locked accordingly.

Add netinet/in_jail.c and netinet6/in6_jail.c to the list of files that
need to be built when INET and INET6, respectively, are configured in the
kernel configuration file.

Reviewed by:	jtl
Approved by:	sjg (mentor)
Sponsored by:	Juniper Networks, Inc.
Differential Revision:	https://reviews.freebsd.org/D6799
2016-08-09 02:16:21 +00:00
Jamie Gritton
932a6e432d Fix a vnode leak when giving a child jail a too-long path when
debug.disablefullpath=1.
2016-06-09 21:59:11 +00:00
Jamie Gritton
cf0313c679 Re-order some jail parameter reading to prevent a vnode leak. 2016-06-09 20:43:14 +00:00
Jamie Gritton
176ff3a066 Clean up some logic in jail error messages, replacing a missing test and
a redundant test with a single correct test.
2016-06-09 20:39:57 +00:00
Jamie Gritton
ef0ddea316 Make sure the OSD methods for jail set and remove can't run concurrently,
by holding allprison_lock exclusively (even if only for a moment before
downgrading) on all paths that call PR_METHOD_REMOVE.  Since they may run
on a downgraded lock, it's still possible for them to run concurrently
with PR_METHOD_GET, which will need to use the prison lock.
2016-06-09 16:41:41 +00:00
Jamie Gritton
ee8d6bd352 Mark jail(2), and the sysctls that it (and only it) uses as deprecated.
jail(8) has long used jail_set(2), and those sysctl only cause confusion.
2016-05-30 05:21:24 +00:00
Pedro F. Giffuni
e3043798aa sys/kern: spelling fixes in comments.
No functional change.
2016-04-29 22:15:33 +00:00
Jamie Gritton
73d9e52d2f Delay revmoing the last jail reference in prison_proc_free, and instead
put it off into the pr_task.  This is similar to prison_free, and in fact
uses the same task even though they do something slightly different.

This resolves a LOR between the process lock and allprison_lock, which
came about in r298565.

PR:		48471
2016-04-27 02:25:21 +00:00
Jamie Gritton
1fb6767d27 Use crcopysafe in jail_attach. 2016-04-26 21:19:12 +00:00
Jamie Gritton
b6f47c231f Pass the current/new jail to PR_METHOD_CHECK, which pushes the call
until after the jail is found or created.  This requires unlocking the
jail for the call and re-locking it afterward, but that works because
nothing in the jail has been changed yet, and other processes won't
change the important fields as long as allprison_lock remains held.

Keep better track of name vs namelc in kern_jail_set.  Name should
always be the hierarchical name (relative to the caller), and namelc
the last component.

PR:		48471
MFC after:	5 days
2016-04-25 04:27:58 +00:00
Jamie Gritton
cc5fd8c748 Add a new jail OSD method, PR_METHOD_REMOVE. It's called when a jail is
removed from the user perspective, i.e. when the last pr_uref goes away,
even though the jail mail still exist in the dying state.  It will also
be called if either PR_METHOD_CREATE or PR_METHOD_SET fail.

PR:		48471
MFC after:	 5 days
2016-04-25 04:24:00 +00:00
Jamie Gritton
2a54950713 Remove the PR_REMOVE flag, which was meant as a temporary marker for
a jail that might be seen mid-removal.  It hasn't been doing the right
thing since at least the ability to resurrect dying jails, and such
resurrection also makes it unnecessary.
2016-04-25 03:58:08 +00:00
Pedro F. Giffuni
02abd40029 kernel: use our nitems() macro when it is available through param.h.
No functional change, only trivial cases are done in this sweep,

Discussed in:	freebsd-current
2016-04-19 23:48:27 +00:00