realpath(3) is used a lot e.g., by clang and is a major source of getcwd
and fstatat calls. This can be done more efficiently in the kernel.
This works by performing a regular lookup while saving the name and found
parent directory. If the terminal vnode is a directory we can resolve it using
usual means. Otherwise we can use the name saved by lookup and resolve the
parent.
See the review for sample syscall counts.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D23574
This in particular significantly shortens amd64_syscall, which otherwise
keeps jumping forward over 2KB of code in total.
Note some of these branches should be either eliminated altogether or
coalesced.
All checking routines walk a linked list of all modules in order to determine
if given hook is installed. This became a significant problem after mac_ntpd
started being loaded by default.
Implement a way perform checks for select hooks by testing a boolean.
Use it for priv_check and priv_grant, which are constantly called from priv_check.
The real fix would use hotpatching, but the above provides a way to know when
to do it.
There was only one consumer and it was using it incorrectly.
It is given an equivalent hack.
Reviewed by: jeff
Differential Revision: https://reviews.freebsd.org/D23037
Filesystems which want to use it in limited capacity can employ the
VOP_UNLOCK_FLAGS macro.
Reviewed by: kib (previous version)
Differential Revision: https://reviews.freebsd.org/D21427
If any non-static modules are loaded (and mac_ntpd tends to be), the lock is
taken all the time al over the kernel. On platforms like arm64 this results in
an avoidable significant performance degradation. Since write-locking is almost
never needed, use a primitive optimized towards read-locking.
Sample result of building the kernel on tmpfs 11 times:
stock 11142.80s user 6704.44s system 4924% cpu 6:02.42 total
patched 11118.95s user 2374.94s system 4547% cpu 4:56.71 total
entry, when that entry has been seen already, keep the
already-looked-up value in a variable and use that instead of looking
it up again.
Approved by: alc, markj (earlier version), kib (earlier version)
Differential Revision: https://reviews.freebsd.org/D22348
Co-mingling two things here:
* Addressing some feedback from Konstantin and Kyle re: jail,
capability mode, and a few other things
* Adding audit support as promised.
The audit support change includes a partial refresh of OpenBSM from
upstream, where the change to add shm_rename has already been
accepted. Matthew doesn't plan to work on refreshing anything else to
support audit for those new event types.
Submitted by: Matthew Bryan <matthew.bryan@isilon.com>
Reviewed by: kib
Relnotes: Yes
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D22083
around entry->{next,prev} when those are used for ordered list
traversal, and use those wrapper functions everywhere. Where the next
field is used for maintaining a stack of deferred operations, #define
defer_next to make that different usage clearer, and then use the
'right' pointer instead of 'next' for that purpose.
Approved by: markj
Tested by: pho (as part of a larger patch)
Differential Revision: https://reviews.freebsd.org/D22347
In case the implementation ever changes from using a chain of next pointers,
then changing the macro definition will be necessary, but changing all the
files that iterate over vm_map entries will not.
Drop a counter in vm_object.c that would have an effect only if the
vm_map entry count was wrong.
Discussed with: alc
Reviewed by: markj
Tested by: pho (earlier version)
Differential Revision: https://reviews.freebsd.org/D21882
neighbors, and is used in a way so that if entries a and b cannot be
merged, we consider them twice, first not-merging a with its successor
b, and then not-merging b with its predecessor a. This change replaces
vm_map_simplify_entry with vm_map_try_merge_entries, which compares
two adjacent entries only, and uses it to avoid duplicated
merge-checks.
Tested by: pho
Reviewed by: alc
Approved by: markj (implicit)
Differential Revision: https://reviews.freebsd.org/D20814
This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h"
in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces header
pollution substantially.
EVENTHANDLER_DECLARE and EVENTHANDLER_LIST_DECLAREs were moved out of .c
files into appropriate headers (e.g., sys/proc.h, powernv/opal.h).
As a side effect of reduced header pollution, many .c files and headers no
longer contain needed definitions. The remainder of the patch addresses
adding appropriate includes to fix those files.
LOCK_DEBUG and LOCK_FILE_LINE_ARG are moved to sys/_lock.h, as required by
sys/mutex.h since r326106 (but silently protected by header pollution prior
to this change).
No functional change (intended). Of course, any out of tree modules that
relied on header pollution for sys/eventhandler.h, sys/lock.h, or
sys/mutex.h inclusion need to be fixed. __FreeBSD_version has been bumped.
We need to make the find_veriexec_file() function available publicly, so
rename it to mac_veriexec_metadata_find_file_info() and make it non-static.
Bump the version of the veriexec device interface so user space will know
the labelized version of fingerprint loading is available.
Approved by: sjg
Obtained from: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D20295
MAC_VERIEXEC_CHECK_PATH_SYSCALL per-MAC policy system call.
When we are checking the status of the fingerprint on a vnode using the
per-MAC-policy syscall, we do not need an exclusive lock on the vnode.
Even if there is more than one thread requesting the status at the same time,
the worst we can end up doing is processing the file more than once.
This can potentially be improved in the future with offloading the fingerprint
evaluation to a separate thread and blocking until the update completes. But
for now the race is acceptable.
Obtained from: Juniper Networks, Inc.
MFC after: 1 week
be restricted when veriexec is enforced.
Add mpo_system_check_sysctl method to mac_veriexec which does this.
Obtained from: Juniper Networks, Inc.
MFC after: 1 week
are different types across architectures by using %ju and typecasting to
uintmax_t, where appropriate.
Obtained from: Juniper Networks, Inc.
MFC after: 1 week
mac_veriexec_get_executable_flags(). Only try locking/unlocking if the caller
has not already acquired the process lock.
Obtained from: Juniper Networks, Inc.
MFC after: 1 week
lock mac_ifnet_mtx, which protects labels on struct ifnet, unless at least
one policy is actively using labels on ifnets. This avoids a global mutex
acquire in certain fast paths -- most noticeably ifnet transmit. This was
previously invisible by default, as no MAC policies were loaded by default,
but recently became visible due to mac_ntpd being enabled by default.
gallatin@ reports a reduction in PPS overhead from 300% to 2.2% with this
change. We will want to explore further MAC Framework optimisation to
reduce overhead further, but this brings things more back into the world
of the sane.
MFC after: 3 days
The current approach of injecting manifest into mac_veriexec is to
verify the integrity of it in userspace (veriexec (8)) and pass its
entries into kernel using a char device (/dev/veriexec).
This requires verifying root partition integrity in loader,
for example by using memory disk and checking its hash.
Otherwise if rootfs is compromised an attacker could inject their own data.
This patch introduces an option to parse manifest in kernel based on envs.
The loader sets manifest path and digest.
EVENTHANDLER is used to launch the module right after the rootfs is mounted.
It has to be done this way, since one might want to verify integrity of the init file.
This means that manifest is required to be present on the root partition.
Note that the envs have to be set right before boot to make sure that no one can spoof them.
Submitted by: Kornel Duleba <mindal@semihalf.com>
Reviewed by: sjg
Obtained from: Semihalf
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D19281
An integrity check such as a check-hash or a cross-correlation failed.
The integrity error falls between EINVAL that identifies errors in
parameters to a system call and EIO that identifies errors with the
underlying storage media. EINTEGRITY is typically raised by intermediate
kernel layers such as a filesystem or an in-kernel GEOM subsystem when
they detect inconsistencies. Uses include allowing the mount(8) command
to return a different exit value to automate the running of fsck(8)
during a system boot.
These changes make no use of the new error, they just add it. Later
commits will be made for the use of the new error number and it will
be added to additional manual pages as appropriate.
Reviewed by: gnn, dim, brueffer, imp
Discussed with: kib, cem, emaste, ed, jilles
Differential Revision: https://reviews.freebsd.org/D18765
Prior to the change the code would branch on return value and then check
if probes are enabled. Since vast majority of the time they are not, this
is clearly wasteful. Check probes first.
Sponsored by: The FreeBSD Foundation
These arguments are mostly paths handled by NAMEI*() macros which already
take const char * arguments.
This change improves the match between syscalls.master and the public
declerations of system calls.
Reviewed by: kib (prior version)
Obtained from: CheriBSD
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D17812
system-call entry and whenever audit arguments or return values are
captured:
1. Expose a single global, audit_syscalls_enabled, which controls
whether the audit framework is entered, rather than exposing
components of the policy -- e.g., if the trail is enabled,
suspended, etc.
2. Introduce a new function audit_syscalls_enabled_update(), which is
called to update audit_syscalls_enabled whenever an aspect of the
policy changes, so that the value can be updated.
3. Remove a check of trail enablement/suspension from audit_new() --
at the point where this function has been entered, we believe that
system-call auditing is already in force, or we wouldn't get here,
so simply proceed to more expensive policy checks.
4. Use an audit-provided global, audit_dtrace_enabled, rather than a
dtaudit-provided global, to provide policy indicating whether
dtaudit would like system calls to be audited.
5. Do some minor cosmetic renaming to clarify what various variables
are for.
These changes collectively arrange it so that traditional audit
(trail, pipes) or the DTrace audit provider can enable system-call
probes without the other configured. Otherwise, dtaudit cannot
capture system-call data without auditd(8) started.
Reviewed by: gnn
Sponsored by: DARPA, AFRL
Approved by: re (gjb)
Differential Revision: https://reviews.freebsd.org/D17348
/etc/security/audit_event to provide a list of audit event-number <->
name mappings. However, this occurs too late for anonymous tracing.
With this change, adding 'audit_event_load="YES"' to /boot/loader.conf
will cause the boot loader to preload the file, and then the kernel
audit code will parse it to register an initial set of audit event-number
<-> name mappings. Those mappings can later be updated by auditd(8) if
the configuration file changes.
Reviewed by: gnn, asomers, markj, allanjude
Discussed with: jhb
Approved by: re (kib)
MFC after: 1 week
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D16589
The buffer size may be used to initialize an sbuf in
MAC_POLICY_EXTERNALIZE, and without this constraint it's possible to
trigger an assertion failure in the sbuf code. With INVARIANTS
disabled, the first attempt to write to the sbuf will fail.
Reported by: pho
Reviewed by: delphij
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D16527
These syscalls were always supposed to have been auditted, but due to
oversights never were.
PR: 228374
Reported by: aniketp
Reviewed by: aniketp
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D16388
Code analysis and runtime analysis using truss(8) indicate that the only
privileged operations performed by ntpd are adjusting system time, and
(re-)binding to privileged UDP port 123. These changes add a new mac(4)
policy module, mac_ntpd(4), which grants just those privileges to any
process running with uid 123.
This also adds a new user and group, ntpd:ntpd, (uid:gid 123:123), and makes
them the owner of the /var/db/ntp directory, so that it can be used as a
location where the non-privileged daemon can write files such as the
driftfile, and any optional logfile or stats files.
Because there are so many ways to configure ntpd, the question of how to
configure it to run without root privs can be a bit complex, so that will be
addressed in a separate commit. These changes are just what's required to
grant the limited subset of privs to ntpd, and the small change to ntpd to
prevent it from exiting with an error if running as non-root.
Differential Revision: https://reviews.freebsd.org/D16281
A_SETPOLICY is supposed to work with either 64 or 32-bit values, but due to a
typo the 64-bit version has never worked correctly.
Submitted by: aniketp
Reviewed by: asomers, cem
MFC after: 2 weeks
Sponsored by: Google, Inc. (GSoC 2018)
Differential Revision: https://reviews.freebsd.org/D16222
fsid_t and ino_t are 64-bit entities, use uintmax_t typecast to ensure we
can print it on 32-bit or 64-bit architectures by using the %ju format for
prints.
Obtained from: Juniper Networks, Inc.
framework.
The code is organized into a few distinct pieces:
* The meta-data store (in veriexec_metadata.c) which maps a file system
identifier, file identifier, and generation key tuple to veriexec
meta-data record.
* Fingerprint management (in veriexec_fingerprint.c) which deals with
calculating the cryptographic hash for a file and verifying it. It also
manages the loadable fingerprint modules.
* MAC policy implementation (in mac_veriexec.c) which implements the
following MAC methods:
mpo_init
Initializes the veriexec state, meta-data store, fingerprint modules,
and registers mount and unmount EVENTHANDLERs
mpo_syscall
Implements the following per-policy system calls:
MAC_VERIEXEC_CHECK_FD_SYSCALL
Check a file descriptor to see if the referenced file has a valid
fingerprint.
MAC_VERIEXEC_CHECK_PATH_SYSCALL
Check a path to see if the referenced file has a valid fingerprint.
mpo_kld_check_load
Check if loading a kld is allowed. This checks if the referenced vnode
has a valid fingerprint.
mpo_mount_destroy_label
Clears the veriexec slot data in a mount point label.
mpo_mount_init_label
Initializes the veriexec slot data in a mount point label.
The file system identifier is saved in the veriexec slot data.
mpo_priv_check
Check if a process is allowed to write to /dev/kmem and /dev/mem
devices.
If a process is flagged as trusted, it is allowed to write.
mpo_proc_check_debug
Check if a process is allowed to be debugged. If a process is not
flagged with VERIEXEC_NOTRACE, then debugging is allowed.
mpo_vnode_check_exec
Check is an exectuable is allowed to run. If veriexec is not enforcing
or the executable has a valid fingerprint, then it is allowed to run.
NOTE: veriexec will complain about mismatched fingerprints if it is
active, regardless of the state of the enforcement.
mpo_vnode_check_open
Check is a file is allowed to be opened. If verification was not
requested, veriexec is not enforcing, or the file has a valid
fingerprint, then veriexec will allow the file to be opened.
mpo_vnode_copy_label
Copies the veriexec slot data from one label to another.
mpo_vnode_destroy_label
Clears the veriexec slot data in a vnode label.
mpo_vnode_init_label
Initializes the veriexec slot data in a vnode label.
The fingerprint status for the file is stored in the veriexec slot data.
* Some sysctls, under security.mac.veriexec, for setting debug level,
fetching the current state in a human-readable form, and dumping the
fingerprint database are implemented.
* The MAC policy implementation source file also contains some utility
functions.
* A set of fingerprint modules for the following cryptographic hash
algorithms:
RIPEMD-160, SHA1, SHA2-256, SHA2-384, SHA2-512
* Loadable module builds for MAC/veriexec and fingerprint modules.
WARNING: Using veriexec with NFS (or other network-based) file systems is
not recommended as one cannot guarantee the integrity of the files
served, nor the uniqueness of file system identifiers which are
used as key in the meta-data store.
Reviewed by: ian, jtl
Obtained from: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D8554
Due to a copy/paste error in r168688, ARG_TERMID_ADDR has the same
definition as ARG_SADDRUNIX. Fix it.
The header change, while publicly visible, is guarded by #ifdef KERNEL, and
I can't find any kmod ports that use it. So I'm not bumping
__FreeBSD_version.
PR: 228820
Submitted by: aniketp
Sponsored by: Google, Inc. (GSoC 2018)
Differential Revision: https://reviews.freebsd.org/D15702
security/audit/audit_ioctl.h uses a type from bsm/audit.h, so needs to
include it. And it needs to know the type's size, so it can't just
forward-declare.
PR: 228470
Submitted by: aniketp
MFC after: 2 weeks
Sponsored by: Google, Inc. (GSoC 2018)
Differential Revision: https://reviews.freebsd.org/D15561
Due to an oversight in r195280, auditon(A_SETCLASS, ...) would cause a tailq
element to get added to the tailq twice, resulting in a circular tailq. This
panics when INVARIANTS are on.
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D15381
This fixes 32-bit compat (no ioctl command defintions are required
as struct ifreq is the same size). This is believed to be sufficent to
fully support ifconfig on 32-bit systems.
Reviewed by: kib
Obtained from: CheriBSD
MFC after: 1 week
Relnotes: yes
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14900
This reduces noise when kernel is compiled by newer GCC versions,
such as one used by external toolchain ports.
Reviewed by: kib, andrew(sys/arm and sys/arm64), emaste(partial), erj(partial)
Reviewed by: jhb (sys/dev/pci/* sys/kern/vfs_aio.c and sys/kern/kern_synch.c)
Differential Revision: https://reviews.freebsd.org/D10385
Mainly focus on files that use BSD 3-Clause license.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.