Commit Graph

922 Commits

Author SHA1 Message Date
Gleb Smirnoff
e7d02be19d protosw: refactor protosw and domain static declaration and load
o Assert that every protosw has pr_attach.  Now this structure is
  only for socket protocols declarations and nothing else.
o Merge struct pr_usrreqs into struct protosw.  This was suggested
  in 1996 by wollman@ (see 7b187005d1), and later reiterated
  in 2006 by rwatson@ (see 6fbb9cf860).
o Make struct domain hold a variable sized array of protosw pointers.
  For most protocols these pointers are initialized statically.
  Those domains that may have loadable protocols have spacers. IPv4
  and IPv6 have 8 spacers each (andre@ dff3237ee5).
o For inetsw and inet6sw leave a comment noting that many protosw
  entries very likely are dead code.
o Refactor pf_proto_[un]register() into protosw_[un]register().
o Isolate pr_*_notsupp() methods into uipc_domain.c

Reviewed by:		melifaro
Differential revision:	https://reviews.freebsd.org/D36232
2022-08-17 11:50:32 -07:00
Mateusz Guzik
60dae3b83b mac: cheaper check for mac_pipe_check_read
Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D36082
2022-08-17 14:21:25 +00:00
John Baldwin
3a3af6b2a1 mac_ddb: Fix the show rman validator.
The validator always returned true due to an incorrect check.

Reviewed by:	mhorne, imp
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D36125
2022-08-12 10:20:05 -07:00
Mateusz Guzik
92b5b97cb0 mac: s/0/false/ in macros denoting probe enablement
No functional changes.
2022-08-11 22:11:24 +00:00
Konstantin Belousov
c6d31b8306 AST: rework
Make most AST handlers dynamically registered.  This allows to have
subsystem-specific handler source located in the subsystem files,
instead of making subr_trap.c aware of it.  For instance, signal
delivery code on return to userspace is now moved to kern_sig.c.

Also, it allows to have some handlers designated as the cleanup (kclear)
type, which are called both at AST and on thread/process exit.  For
instance, ast(), exit1(), and NFS server no longer need to be aware
about UFS softdep processing.

The dynamic registration also allows third-party modules to register AST
handlers if needed.  There is one caveat with loadable modules: the
code does not make any effort to ensure that the module is not unloaded
before all threads processed through AST handler in it.  In fact, this
is already present behavior for hwpmc.ko and ufs.ko.  I do not think it
is worth the efforts and the runtime overhead to try to fix it.

Reviewed by:	markj
Tested by:	emaste (arm64), pho
Discussed with:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D35888
2022-08-02 21:11:09 +03:00
Allan Jude
e89841f893 Revert "mac_ddb: Make db_show_vnet_valid() handle !VIMAGE"
jhb@ already fixed this in a different way

Reported by: andrew

This reverts commit 3810b37903.
2022-07-21 14:26:54 +00:00
Allan Jude
3810b37903 mac_ddb: Make db_show_vnet_valid() handle !VIMAGE
Reported by:	kib
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
2022-07-21 14:01:01 +00:00
John Baldwin
b43b8f8157 mac_ddb: Only include the vnet validator in VIMAGE kernels.
This fixes the build of at least i386 MINIMAL which was failing with
the error:

sys/security/mac_ddb/mac_ddb.c:200:15: error: use of undeclared identifier 'vnet'; did you mean 'int'?
                if ((void *)vnet == (void *)addr)
                            ^~~~
                            int

Sponsored by:	DARPA
2022-07-20 16:02:56 -07:00
Mitchell Horne
4e2121c10a mac_ddb: add some validation functions
These global objects are easy to validate, so provide the helper
functions to do so and include these commands in the allow lists.

Reviewed by:	markj
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D35372
2022-07-18 22:06:22 +00:00
Mitchell Horne
287d467c5d mac: add new mac_ddb(4) policy
Generally, access to the kernel debugger is considered to be unsafe from
a security perspective since it presents an unrestricted interface to
inspect or modify the system state, including sensitive data such as
signing keys.

However, having some access to debugger functionality on production
systems may be useful in determining the cause of a panic or hang.
Therefore, it is desirable to have an optional policy which allows
limited use of ddb(4) while disabling the functionality which could
reveal system secrets.

This loadable MAC module allows for the use of some ddb(4) commands
while preventing the execution of others. The commands have been broadly
grouped into three categories:
 - Those which are 'safe' and will not emit sensitive data (e.g. trace).
   Generally, these commands are deterministic and don't accept
   arguments.
 - Those which are definitively unsafe (e.g. examine <addr>, search
   <addr> <value>)
 - Commands which may be safe to execute depending on the arguments
   provided (e.g. show thread <addr>).

Safe commands have been flagged as such with the DB_CMD_MEMSAFE flag.

Commands requiring extra validation can provide a function to do so.
For example, 'show thread <addr>' can be used as long as addr can be
checked against the system's list of process structures.

The policy also prevents debugger backends other than ddb(4) from
executing, for example gdb(4).

Reviewed by:	markj, pauamma_gundo.com (manpages)
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D35371
2022-07-18 22:06:15 +00:00
Mitchell Horne
2449b9e5fe mac: kdb/ddb framework hooks
Add three simple hooks to the debugger allowing for a loaded MAC policy
to intervene if desired:
 1. Before invoking the kdb backend
 2. Before ddb command registration
 3. Before ddb command execution

We extend struct db_command with a private pointer and two flag bits
reserved for policy use.

Reviewed by:	markj
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D35370
2022-07-18 22:06:13 +00:00
Wojciech Macek
15c362aeb7 mac_veriexec: Authorize reads of secured sysctls
Writes to sysctls flagged with CTLFLAG_SECURE are blocked if the appropriate secure level is set. mac_veriexec does not behave this way, it blocks such sysctls in read-only mode as well.

This change aims to make mac_veriexec behave like secure levels, as it was meant by the original commit ed377cf41.

Reviewed by:		sjg
Differential revision:	https://reviews.freebsd.org/D34327
Obtained from:		Stormshield
2022-06-29 10:48:01 +02:00
Dmitry Chagin
31d1b816fe sysent: Get rid of bogus sys/sysent.h include.
Where appropriate hide sysent.h under proper condition.

MFC after:	2 weeks
2022-05-28 20:52:17 +03:00
Wojciech Macek
14b7706264 mac_pimd: Support for privilege drop in pimd
Create new kernel module for privilege check in case
the user wants to run pimd daemon.

Sponsored by:		Stormshield
Obtained from:		Semihalf
2022-04-20 08:07:37 +02:00
Mark Johnston
ecd764b0ea audit: Initialize vattr fields before calling VOP_GETATTR
Some filesystems do not fill out certain optional vattr fields.  To
ensure that they do not get copied out to userspace uninitialized, use
VATTR_NULL to provide default values.

Reported by:	KMSAN
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2022-03-28 11:23:45 -04:00
Gordon Bergling
1920133d8f mac_veriexec: Fix a typo in a source code comment
- s/seach/search/

MFC after:	3 days
2022-03-27 19:56:15 +02:00
Mateusz Guzik
bb92cd7bcd vfs: NDFREE(&nd, NDF_ONLY_PNBUF) -> NDFREE_PNBUF(&nd) 2022-03-24 10:20:51 +00:00
Florian Walpen
e28767f0e1 Thread creation privilege for realtime group
With the mac_priority(4) realtime policy active, users and processes in
the realtime group may promote existing threads and processes to
realtime scheduling priority. Extend the privileges granted to
PRIV_SCHED_SETPOLICY which allows explicit creation of new realtime
threads.

One use case of this is when the pthread scheduling policy is set to
SCHED_RR or SCHED_FIFO via pthread_attr_setschedpolicy(...) before
calling pthread_create(...). I ran into this when testing audio software
with realtime threads, particularly audio/ardour6.

MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33393
2021-12-15 00:01:58 +02:00
Florian Walpen
a9545eede4 Add idle priority scheduling privilege group to MAC/priority
Add an idletime user group that allows non-root users to run processes
with idle scheduling priority. Privileges are granted by a MAC policy in
the mac_priority module. For this purpose, the kernel privilege
PRIV_SCHED_IDPRIO was added to sys/priv.h (kernel module ABI change).

Deprecate the system wide sysctl(8) knob
security.bsd.unprivileged_idprio which lets any user run idle priority
processes, regardless of context. While the knob is still working, it is
marked as deprecated in the description and in the man pages.

MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D33338
2021-12-10 04:54:48 +02:00
Florian Walpen
a20a2450cd Add PRIV_SCHED_IDPRIO
The privilege allows the holder to assign idle priority type to thread
or process.

MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D33338
2021-12-10 04:54:48 +02:00
Florian Walpen
bf2fa8d9d1 MAC/priority module for realtime privilege group
This is a MAC policy module that grants scheduling privileges based on
group membership.  Users or processes in the group realtime (gid 47) are
allowed to run threads and processes with realtime scheduling priority.
For timing-sensitive, low-latency software like audio/jack, running with
realtime priority helps to avoid stutter and gaps.

PR:	239125
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D33191
2021-12-04 20:19:25 +02:00
Mateusz Guzik
7e1d3eefd4 vfs: remove the unused thread argument from NDINIT*
See b4a58fbf64 ("vfs: remove cn_thread")

Bump __FreeBSD_version to 1400043.
2021-11-25 22:50:42 +00:00
Ka Ho Ng
0dc332bff2 Add fspacectl(2), vn_deallocate(9) and VOP_DEALLOCATE(9).
fspacectl(2) is a system call to provide space management support to
userspace applications. VOP_DEALLOCATE(9) is a VOP call to perform the
deallocation. vn_deallocate(9) is a public KPI for kmods' use.

The purpose of proposing a new system call, a KPI and a VOP call is to
allow bhyve or other hypervisor monitors to emulate the behavior of SCSI
UNMAP/NVMe DEALLOCATE on a plain file.

fspacectl(2) comprises of cmd and flags parameters to specify the
space management operation to be performed. Currently cmd has to be
SPACECTL_DEALLOC, and flags has to be 0.

fo_fspacectl is added to fileops.
VOP_DEALLOCATE(9) is added as a new VOP call. A trivial implementation
of VOP_DEALLOCATE(9) is provided.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D28347
2021-08-05 23:20:42 +08:00
Wojciech Macek
fe8ce390b8 Fix mac_veriexec version mismatch
mac_veriexec sets its version to 1, but the mac_veriexec_shaX modules which depend on it expect MAC_VERIEXEC_VERSION = 2.
Be consistent and use MAC_VERIEXEC_VERSION everywhere.
This unbreaks loading of mac_veriexec modules at boot time.

Authored by: 		Kornel Duleba <mindal@semihalf.com>
Obtained from: 		Semihalf
Sponsored by: 		Stormshield
Differential Revision: 	https://reviews.freebsd.org/D31268
2021-07-29 11:05:13 +02:00
Mateusz Guzik
f77697dd9f mac: cheaper check for ifnet_create_mbuf and ifnet_check_transmit
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-06-29 15:06:45 +02:00
Gleb Smirnoff
08d9c92027 tcp_input/syncache: acquire only read lock on PCB for SYN,!ACK packets
When packet is a SYN packet, we don't need to modify any existing PCB.
Normally SYN arrives on a listening socket, we either create a syncache
entry or generate syncookie, but we don't modify anything with the
listening socket or associated PCB. Thus create a new PCB lookup
mode - rlock if listening. This removes the primary contention point
under SYN flood - the listening socket PCB.

Sidenote: when SYN arrives on a synchronized connection, we still
don't need write access to PCB to send a challenge ACK or just to
drop. There is only one exclusion - tcptw recycling. However,
existing entanglement of tcp_input + stacks doesn't allow to make
this change small. Consider this patch as first approach to the problem.

Reviewed by:	rrs
Differential revision:	https://reviews.freebsd.org/D29576
2021-04-12 08:25:31 -07:00
Robert Watson
a92c6b24c0 Add a comment on why the call to mac_vnode_relabel() might be in the wrong
place -- in the VOP rather than vn_setexttr() -- and that it is for historic
reasons.  We might wish to relocate it in due course, but this way at least
we document the asymmetry.
2021-02-27 16:25:26 +00:00
Alex Richardson
fa32350347 close_range: add audit support
This fixes the closefrom test in sys/audit.

Includes cherry-picks of the following commits from openbsm:

4dfc628aaf
99ff6fe32a
da48a0399e

Reviewed By:	kevans
Differential Revision: https://reviews.freebsd.org/D28388
2021-02-23 17:47:07 +00:00
Mateusz Guzik
6b3a9a0f3d Convert remaining cap_rights_init users to cap_rights_init_one
semantic patch:

@@

expression rights, r;

@@

- cap_rights_init(&rights, r)
+ cap_rights_init_one(&rights, r)
2021-01-12 13:16:10 +00:00
Mateusz Guzik
77589de8aa mac: cheaper check for mac_vnode_check_readlink 2021-01-08 13:57:10 +00:00
Mateusz Guzik
33f3e81df5 cache: combine fast path enabled status into one flag
Tested by:	pho
2021-01-06 07:28:06 +00:00
Mateusz Guzik
08a5615cfe audit: rework AUDIT_SYSCLOSE
This in particular avoids spurious lookups on close.
2020-12-17 18:52:04 +00:00
Mateusz Guzik
89744405e6 pipe: allow for lockless pipe_stat
pipes get stated all thet time and this avoidably contributed to contention.
The pipe lock is only held to accomodate MAC and to check the type.

Since normally there is no probe for pipe stat depessimize this by having the
flag.

The pipe_state field gets modified with locks held all the time and it's not
feasible to convert them to use atomic store. Move the type flag away to a
separate variable as a simple cleanup and to provide stable field to read.
Use short for both fields to avoid growing the struct.

While here short-circuit MAC for pipe_poll as well.
2020-11-19 06:30:25 +00:00
Andriy Gapon
137d26e8a3 mac_framework.h: fix build with DEBUG_VFS_LOCKS and !MAC
I have such a custom kernel configuration and its build failed with:
linking kernel.full
ld: error: undefined symbol: mac_vnode_assert_locked
>>> referenced by mac_framework.h:556 (/usr/devel/git/apu2c4/sys/security/mac/mac_framework.h:556)
>>>               tmpfs_vnops.o:(mac_vnode_check_stat)
>>> referenced by mac_framework.h:556 (/usr/devel/git/apu2c4/sys/security/mac/mac_framework.h:556)
>>>               vfs_default.o:(mac_vnode_check_stat)
>>> referenced by mac_framework.h:556 (/usr/devel/git/apu2c4/sys/security/mac/mac_framework.h:556)
>>>               ufs_vnops.o:(mac_vnode_check_stat)
2020-09-03 20:30:52 +00:00
Mateusz Guzik
e5ecee7440 security: clean up empty lines in .c and .h files 2020-09-01 21:26:00 +00:00
Mateusz Guzik
feabaaf995 cache: drop the always curthread argument from reverse lookup routines
Note VOP_VPTOCNP keeps getting it as temporary compatibility for zfs.

Tested by:	pho
2020-08-24 08:57:02 +00:00
Mateusz Guzik
51ea7bea91 vfs: add VOP_STAT
The current scheme of calling VOP_GETATTR adds avoidable overhead.

An example with tmpfs doing fstat (ops/s):
before: 7488958
after:  7913833

Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D25910
2020-08-07 23:06:40 +00:00
Mateusz Guzik
4ec34a908b mac: even up all entry points to the same scheme
- use a macro for checking whether the site is enabled
- expand it to 0 if mac is not compiled in to begin with
2020-08-06 00:23:06 +00:00
Mateusz Guzik
18f67bc413 vfs: add a cheaper entry for mac_vnode_check_access 2020-08-05 07:34:45 +00:00
Mateusz Guzik
5b0acaf75f Fix tinderbox build after r363714 2020-07-30 22:56:57 +00:00
Mateusz Guzik
fad6dd772d vfs: elide MAC-induced locking on rename if there are no relevant hoooks 2020-07-29 17:05:31 +00:00
Mateusz Guzik
07d2145a17 vfs: add the infrastructure for lockless lookup
Reviewed by:    kib
Tested by:      pho (in a patchset)
Differential Revision:	https://reviews.freebsd.org/D25577
2020-07-25 10:32:45 +00:00
Mateusz Guzik
3ea3fbe685 vfs: fix vn_poll performance with either MAC or AUDIT
The code would unconditionally lock the vnode to audit or call the
mac hoook, even if neither want to do anything. Pre-check the state
to avoid locking in the common case of nothing to do.

Note this code should not be normally executed anyway as vnodes are
always return ready. However, poll1/2 from will-it-scale use regular
files for benchmarking, presumably to focus on the interface itself
as the vnode handler is not supposed to do almost anything.

This in particular fixes poll2 which passes 128 fds.

$ ./poll2_processes -s 10
before: 134411
after:  271572
2020-07-16 14:09:18 +00:00
Mateusz Guzik
ab06a30517 vfs: fix MAC/AUDIT mismatch in vn_poll
Auditing would not be performed without MAC compiled in.
2020-07-16 14:04:28 +00:00
Mateusz Guzik
8e5679aa10 audit: provide AUDITING_TD for !AUDIT case 2020-07-04 06:21:20 +00:00
Simon J. Gerraty
66d8bce379 mac_veriexec_fingerprint_check_vnode: v_writecount > 0 means active writers
v_writecount can actually be < 0 for text,
so check for v_writecount > 0

Reviewed by:	stevek
MFC after:	1 week
2020-06-12 21:51:20 +00:00
Ryan Moeller
245bfd34da Deduplicate fsid comparisons
Comparing fsid_t objects requires internal knowledge of the fsid structure
and yet this is duplicated across a number of places in the code.

Simplify by creating a fsidcmp function (macro).

Reviewed by:	mjg, rmacklem
Approved by:	mav (mentor)
MFC after:	1 week
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D24749
2020-05-21 01:55:35 +00:00
Christian S.J. Peron
757a564248 Add BSM record conversion for a number of syscalls:
- thr_kill(2) and thr_exit(2) generally (no argument auditing here.
- A set of syscalls for the process descriptor family, specifically:
  pdfork(2), pdgetpid(2) and pdkill(2)

  For these syscalls, audit the file descriptor. In the case of pdfork(2)
  a pointer to an integer (file descriptor) is passed in as an argument.
  We audit the post initialized file descriptor (not the random garbage
  that would have been passed in). We will also audit the child process
  which was created from the fork operation (similar to what is done for
  the fork(2) syscall).

  pdkill(2) we audit the signal value and fd, and finally pdgetpid(2)
  just the file descriptor:

- Following is a sample of the produced audit trails:

  header,111,11,pdfork(2),0,Sat May 16 03:07:50 2020, + 394 msec
  argument,0,0x39d,child PID
  argument,2,0x2,flags
  argument,1,0x8,fd
  subject,root,root,0,root,0,924,0,0,0.0.0.0
  return,success,925

  header,79,11,pdgetpid(2),0,Sat May 16 03:07:50 2020, + 394 msec
  argument,1,0x8,fd
  subject,root,root,0,root,0,924,0,0,0.0.0.0
  return,success,0
  trailer,79

  header,135,11,pdkill(2),0,Sat May 16 03:07:50 2020, + 395 msec
  argument,1,0x8,fd
  argument,2,0xf,signal
  process_ex,root,root,0,root,0,925,0,0,0.0.0.0
  subject,root,root,0,root,0,924,0,0,0.0.0.0
  return,success,0
  trailer,135

MFC after:      1 week
2020-05-16 03:45:15 +00:00
Kyle Evans
cc62118e65 audit_canon_path_vp: don't panic if cdir == NULL
cdir may have simply failed to resolve (e.g. fget_cap failure in namei
leading to NULL dp passed to AUDIT_ARG_UPATH*_VP); restore the pre-rS358191
behavior of setting cpath[0] = '\0' and bailing out instead of panicking.

This was found by inadvertently running the libc/c063 tests with auditing
enabled, resulting in a panic.

Reviewed by:	mjg (committed version actually his)
Differential Revision:	https://reviews.freebsd.org/D24445
2020-04-17 02:09:31 +00:00
Jason A. Harmening
407a5b7953 mac_policy: Remove mac_policy_sx
This lock was made unnecessary by the addition of mac_policy_rms in r356120.

Reviewed by:	mjg, kib
Differential Revision:	https://reviews.freebsd.org/D24283
2020-04-04 04:03:10 +00:00