Reviewed by: kib
Obtained from: CheriBSD
Sponsored by: The University of Cambridge, Google Inc.
Differential Revision: https://reviews.freebsd.org/D33052
Reviewed by: kib
Obtained from: CheriBSD
Sponsored by: The University of Cambridge, Google Inc.
Differential Revision: https://reviews.freebsd.org/D33051
Address Space Layout Randomization (ASLR) is an exploit mitigation
technique implemented in the majority of modern operating systems.
It involves randomly positioning the base address of an executable
and the position of libraries, heap, and stack, in a process's address
space. Although over the years ASLR proved to not guarantee full OS
security on its own, this mechanism can make exploitation more difficult.
Tests on the tier 1 64-bit architectures demonstrated that the ASLR is
stable and does not result in noticeable performance degradation,
therefore it should be safe to enable this mechanism by default.
Moreover its effectiveness is increased for PIE (Position Independent
Executable) binaries. Thanks to commit 9a227a2fd642 ("Enable PIE by
default on 64-bit architectures"), building from src is not necessary
to have PIE binaries. It is enough to control usage of ASLR in the
OS solely by setting the appropriate sysctls.
This patch toggles the kernel settings to use address map randomization
for PIE & non-PIE 64-bit binaries. It also disables SBRK, in order
to allow utilization of the bss grow region for mappings. The latter
has no effect if ASLR is disabled, so apply it to all architectures.
As for the drawbacks, a consequence of using the ASLR is more
significant VM fragmentation, hence the issues may be encountered
in the systems with a limited address space in high memory consumption
cases, such as buildworld. As a result, although the tests on 32-bit
architectures with ASLR enabled were mostly on par with what was
observed on 64-bit ones, the defaults for the former are not changed
at this time. Also, for the sake of safety keep the feature disabled
for 32-bit executables on 64-bit machines, too.
The committed change affects the overall OS operation, so the
following should be taken into consideration:
* Address space fragmentation.
* A changed ABI due to modified layout of address space.
* More complicated debugging due to:
* Non-reproducible address space layout between runs.
* Some debuggers automatically disable ASLR for spawned processes,
making target's environment different between debug and
non-debug runs.
In order to confirm/rule-out the dependency of any encountered issue
on ASLR it is strongly advised to re-run the test with the feature
disabled - it can be done by setting the following sysctls
in the /etc/sysctl.conf file:
kern.elf64.aslr.enable=0
kern.elf64.aslr.pie_enable=0
Co-developed by: Dawid Gorecki <dgr@semihalf.com>
Reviewed by: emaste, kib
Obtained from: Semihalf
Sponsored by: Stormshield
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D27666
Calling setrlimit with stack gap enabled and with low values of stack
resource limit often caused the program to abort immediately after
exiting the syscall. This happened due to the fact that the resource
limit was calculated assuming that the stack started at sv_usrstack,
while with stack gap enabled the stack is moved by a random number
of bytes.
Save information about stack size in struct vmspace and adjust the
rlim_cur value. If the rlim_cur and stack gap is bigger than rlim_max,
then the value is truncated to rlim_max.
PR: 253208
Reviewed by: kib
Obtained from: Semihalf
Sponsored by: Stormshield
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D31516
It allows to override kern.elf{32,64}.allow_wx on per-process basis.
In particular, it makes it possible to run binaries without PT_GNU_STACK
and without elfctl note while allow_wx = 0.
Reviewed by: brooks, emaste, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31779
Move the common kernel function signatures from machine/reg.h to a new
sys/reg.h. This is in preperation for adding PT_GETREGSET to ptrace(2).
Reviewed by: imp, markj
Sponsored by: DARPA, AFRL (original work)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19830
Instead of p_elf_machine use machine member of the Elf_Brandinfo which is now
cached in the struct proc at p_elf_brandinfo member.
Note to MFC: D30918, KBI
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D30926
MFC after: 2 weeks
To allow the ABI to make a dicision based on the Brandinfo add a link
to the Elf_Brandinfo into the struct proc. Add a note that the high 8 bits
of Elf_Brandinfo flags is private to the ABI.
Note to MFC: it breaks KBI.
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D30918
MFC after: 2 weeks
This adds `sv_elf_core_osabi`, `sv_elf_core_abi_vendor`,
and `sv_elf_core_prepare_notes` fields to `struct sysentvec`,
and modifies imgact_elf.c to make use of them instead
of hardcoding FreeBSD-specific values. It also updates all
of the ABI definitions to preserve current behaviour.
This makes it possible to implement non-native ELF coredump
support without unnecessary code duplication. It will be used
for Linux coredumps.
Reviewed By: kib
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D30921
The new buffer is somewhat larger, but there should be no functional
changes.
Reviewed By: kib, imp
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D30821
This makes it possible to call __elfN(size_segments) and __elfN(puthdr)
from Linux coredump code.
Reviewed By: kib
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D30455
This makes it possible to use core_write(), core_output(),
and sbuf_drain_core_output(), in Linux coredump code. Moving
them out of imgact_elf.c is necessary because of the weird way
it's being built.
Reviewed By: kib
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D30369
- SVC_ALL request dumping all map entries, including those marked as
non-dumpable
- SVC_NOCOMPRESS disallows compressing the dump regardless of the sysctl
policy
- SVC_PC_COREDUMP is provided for future use by userspace core dump
request
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D29955
parse_notes relies on the caller-supplied callback to initialize "res".
Two callbacks are used in practice, brandnote_cb and note_fctl_cb, and
the latter fails to initialize res. Fix it.
In the worst case, the bug would cause the inner loop of check_note to
examine more program headers than necessary, and the note header usually
comes last anyway.
Reviewed by: kib
Reported by: KMSAN
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29986
When locating the anonymous memory region for a vm_map with ASLR
enabled, we try to keep the slid base address aligned on a superpage
boundary to minimize pagetable fragmentation and maximize the potential
usage of superpage mappings. We can't (portably) do this if superpages
have been disabled by loader tunable and pagesizes[1] is 0, and it
would be less beneficial in that case anyway.
PR: 253511
Reported by: johannes@jo-t.de
MFC after: 1 week
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D28678
This prevents a situation where other thread modifies map entries
permissions between setting max_prot, then relocking, then setting prot,
confusing the operation outcome. E.g. you can get an error that is not
possible if operation is performed atomic.
Also enable setting rwx for max_prot even if map does not allow to set
effective rwx protection.
Reviewed by: brooks, markj (previous version)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28117
It is checked in vm_map_insert() and vm_map_protect() that PROT_WRITE |
PROT_EXEC are never specified together, if vm_map has MAP_WX flag set.
FreeBSD control flag allows specific binary to request WX exempt, and
there are per ABI boolean sysctls kern.elf{32,64}.allow_wx to enable/
disable globally.
Reviewed by: emaste, jhb
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28050
Proctree lock is needed for correct calculation and collection of the
job-control related data in kinfo_proc. There was even an XXX comment
about it.
Satisfy locking and lock ordering requirements by taking proctree lock
around pass over each bucket in proc_iterate(), and in sysctl_kern_proc()
and note_procstat_proc() for individual process reporting.
Reviewed by: jilles
Tested by: pho
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27871
Also centralize and unify checks to enable ASLR stack gap in a new
helper exec_stackgap().
PR: 239873
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
No functional change intended.
Tracking these structures separately for each proc enables future work to
correctly emulate clone(2) in linux(4).
__FreeBSD_version is bumped (to 1300130) for consumption by, e.g., lsof.
Reviewed by: kib
Discussed with: markj, mjg
Differential Revision: https://reviews.freebsd.org/D27037
Push the root seed version to userspace through the VDSO page, if
the RANDOM_FENESTRASX algorithm is enabled. Otherwise, there is no
functional change. The mechanism can be disabled with
debug.fxrng_vdso_enable=0.
arc4random(3) obtains a pointer to the root seed version published by
the kernel in the shared page at allocation time. Like arc4random(9),
it maintains its own per-process copy of the seed version corresponding
to the root seed version at the time it last rekeyed. On read requests,
the process seed version is compared with the version published in the
shared page; if they do not match, arc4random(3) reseeds from the
kernel before providing generated output.
This change does not implement the FenestrasX concept of PCPU userspace
generators seeded from a per-process base generator. That change is
left for future discussion/work.
Reviewed by: kib (previous version)
Approved by: csprng (me -- only touching FXRNG here)
Differential Revision: https://reviews.freebsd.org/D22839
Currently we allocate and map zero-filled anonymous pages when dumping
core. This can result in lots of needless disk I/O and page
allocations. This change tries to make the core dumper more clever and
represent unbacked ranges of virtual memory by holes in the core dump
file.
Add a new page fault type, VM_FAULT_NOFILL, which causes vm_fault() to
clean up and return an error when it would otherwise map a zero-filled
page. Then, in the core dumper code, prefault all user pages and handle
errors by simply extending the size of the core file. This also fixes a
bug related to the fact that vn_io_fault1() does not attempt partial I/O
in the face of errors from vm_fault_quick_hold_pages(): if a truncated
file is mapped into a user process, an attempt to dump beyond the end of
the file results in an error, but this means that valid pages
immediately preceding the end of the file might not have been dumped
either.
The change reduces the core dump size of trivial programs by a factor of
ten simply by excluding unaccessed libc.so pages.
PR: 249067
Reviewed by: kib
Tested by: pho
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26590
OBJT_DEFAULT, _SWAP, _VNODE and _PHYS is exactly the set of
non-fictitious object types, so just check for OBJ_FICTITIOUS. The
check no longer excludes dead objects, but such objects have to be
handled regardless.
No functional change intended.
Reviewed by: alc, dougm, kib
Tested by: pho
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26589
Similar to the userspace rtld check.
Reviewed by: dim, emaste (previous versions)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D26339
Currently, we parse notes for the values of ELF FreeBSD feature flags
and osrel. Knowing these values, or knowing that image does not carry
the note if pointers are NULL, is useful to decide which ABI variant
(brand) we want to activate for the image.
Right now this is only a plumbing change
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D25273
Use AUXARGS_ENTRY_PTR to export these pointers. This is a followup to
r359987 and r359988.
Reviewed by: jhb
Obtained from: CheriBSD
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D24446
This simplifies discovery of these values, potentially with reducing the
number of syscalls we need to make at runtime. Longer term, we wish to
convert the startup process to pass an auxargs pointer to _start() and
use that rather than walking off the end of envv. This is cleaner,
more C-friendly, and for systems with strong bounds (e.g. CHERI)
necessary.
Reviewed by: kib
Obtained from: CheriBSD
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D24407
Modern debuggers and process tracers use ptrace() rather than procfs
for debugging. ptrace() has a supserset of functionality available
via procfs and new debugging features are only added to ptrace().
While the two debugging services share some fields in struct proc,
they each use dedicated fields and separate code. This results in
extra complexity to support a feature that hasn't been enabled in the
default install for several years.
PR: 244939 (exp-run)
Reviewed by: kib, mjg (earlier version)
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D23837
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.
Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT
Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718
The intent is to provide bsd-specific flags relevant to interpreter
and C runtime. I did not want to reuse AT_FLAGS which is common ELF
auxv entry.
Use bsdflags to report kernel support for sigfastblock(2). This
allows rtld and libthr to safely infer the syscall presence without
SIGSYS. The tunable kern.elf{32,64}.sigfastblock blocks reporting.
Tested by: pho
Disscussed with: cem, emaste, jilles
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D12773
Filesystems which want to use it in limited capacity can employ the
VOP_UNLOCK_FLAGS macro.
Reviewed by: kib (previous version)
Differential Revision: https://reviews.freebsd.org/D21427
Summary:
r356113 used an older patch, which predated the
freebsd_copyout_auxargs() addition. Fix this by using a private
powerpc_copyout_auxargs() instead, and keep it private to powerpc, not in MI
files.
Reviewed by: kib, bdragon
Differential Revision: https://reviews.freebsd.org/D22935
Summary:
As a transition aide, implement an alternative elfN_freebsd_fixup which
is called for old powerpc binaries. Similarly, add a translation to rtld to
convert old values to new ones (as expected by a new rtld).
Translation of old<->new values is incomplete, but sufficient to allow an
installworld of a new userspace from an old one when a new kernel is running.
Test Plan:
Someone needs to see how a new kernel/rtld/libc works with an old
binary. If if works we can probalby ship this. If not we probalby need
some more compat bits.
Submitted by: brooks
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D20799
Partially revert r354741 and r354754 and go back to allocating a
fixed-size chunk of stack space for the auxiliary vector. Keep
sv_copyout_auxargs but change it to accept the address at the end of
the environment vector as an input stack address and no longer
allocate room on the stack. It is now called at the end of
copyout_strings after the argv and environment vectors have been
copied out.
This should fix a regression in r354754 that broke the stack alignment
for newer Linux amd64 binaries (and probably broke Linux arm64 as
well).
Reviewed by: kib
Tested on: amd64 (native, linux64 (only linux-base-c7), and i386)
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D22695
- Use ustringp for the location of the argv and environment strings
and allow destp to travel further down the stack for the stackgap
and auxv regions.
- Update the Linux copyout_strings variants to move destp down the
stack as was done for the native ABIs in r263349.
- Stop allocating a space for a stack gap in the Linux ABIs. This
used to hold translated system call arguments, but hasn't been used
since r159992.
Reviewed by: kib
Tested on: md64 (amd64, i386, linux64), i386 (i386, linux)
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D22501
Change the FreeBSD ELF ABIs to use this new hook to copyout ELF auxv
instead of doing it in the sv_fixup hook. In particular, this new
hook allows the stack space to be allocated at the same time the auxv
values are copied out to userland. This allows us to avoid wasting
space for unused auxv entries as well as not having to recalculate
where the auxv vector is by walking back up over the argv and
environment vectors.
Reviewed by: brooks, emaste
Tested on: amd64 (amd64 and i386 binaries), i386, mips, mips64
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D22355
bzero the entire thrmisc struct, not just the padding. Other core dump
notes are already done this way.
Reported by: Ilja Van Sprundel <ivansprundel@ioactive.com>
Reviewed by: markj
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
In case the implementation ever changes from using a chain of next pointers,
then changing the macro definition will be necessary, but changing all the
files that iterate over vm_map entries will not.
Drop a counter in vm_object.c that would have an effect only if the
vm_map entry count was wrong.
Discussed with: alc
Reviewed by: markj
Tested by: pho (earlier version)
Differential Revision: https://reviews.freebsd.org/D21882