357 Commits

Author SHA1 Message Date
jhb
81f62ee15e Check for errors from copyout() and suword*() in sv_copyout_args/strings.
Reviewed by:	brooks, kib
Tested on:	amd64 (amd64, i386, linux64), i386 (i386, linux)
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D22401
2019-11-18 20:07:43 +00:00
jhb
3f50cb7491 Add a sv_copyout_auxargs() hook in sysentvec.
Change the FreeBSD ELF ABIs to use this new hook to copyout ELF auxv
instead of doing it in the sv_fixup hook.  In particular, this new
hook allows the stack space to be allocated at the same time the auxv
values are copied out to userland.  This allows us to avoid wasting
space for unused auxv entries as well as not having to recalculate
where the auxv vector is by walking back up over the argv and
environment vectors.

Reviewed by:	brooks, emaste
Tested on:	amd64 (amd64 and i386 binaries), i386, mips, mips64
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D22355
2019-11-15 18:42:13 +00:00
emaste
685a165c4c avoid kernel stack data leak in core dump thrmisc note
bzero the entire thrmisc struct, not just the padding.  Other core dump
notes are already done this way.

Reported by:	Ilja Van Sprundel <ivansprundel@ioactive.com>
Reviewed by:	markj
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2019-10-31 20:42:36 +00:00
dougm
918670a5ed Define macro VM_MAP_ENTRY_FOREACH for enumerating the entries in a vm_map.
In case the implementation ever changes from using a chain of next pointers,
then changing the macro definition will be necessary, but changing all the
files that iterate over vm_map entries will not.

Drop a counter in vm_object.c that would have an effect only if the
vm_map entry count was wrong.

Discussed with: alc
Reviewed by: markj
Tested by: pho (earlier version)
Differential Revision:	https://reviews.freebsd.org/D21882
2019-10-08 07:14:21 +00:00
kib
5fcd6fab09 kern.elf{32,64}.pie_base sysctl: enforce page alignment.
Requested by:	rstone
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-09-21 20:03:17 +00:00
kib
ac8689b233 Make non-ASLR pie base tunable.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-09-21 18:00:23 +00:00
kib
8d669d15e5 When loading ELF interpreter, initialize whole nested image_params with zero.
Otherwise we could mishandle imgp->textset.

Reviewed by:	markj
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D21560
2019-09-07 16:03:26 +00:00
kib
c5c0c01aeb Fix mis-merge.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-08-05 19:19:25 +00:00
jhb
bed7b437a8 Set ISOPEN in namei flags when opening executable interpreters.
These vnodes are explicitly opened via VOP_OPEN via
exec_check_permissions identical to the main exectuable image.
Setting ISOPEN allows filesystems to perform suitable checks in
VOP_LOOKUP (e.g. close-to-open consistency in the NFS client).

Reviewed by:	kib
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D21129
2019-08-03 01:02:52 +00:00
kib
3df08381ed Make randomized stack gap between strings and pointers to argv/envs.
This effectively makes the stack base on the csu _start entry
randomized.

The gap is enabled if ASLR is for the ABI is enabled, and then
kern.elf{64,32}.aslr.stack_gap specify the max percentage of the
initial stack size that can be wasted for gap.  Setting it to zero
disables the gap, and max is capped at 50%.

Only amd64 for now.

Reviewed by:	cem, markj
Discussed with:	emaste
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D21081
2019-07-31 20:23:10 +00:00
markj
66e6bd0562 Remove a redundant offset computation in elf_load_section().
With r344705 the offset is always zero.

Submitted by:	Wuyang Chung <wuyang.chung1@gmail.com>
2019-07-24 15:18:05 +00:00
mav
a4a0ae9441 Optimize kern.geom.conf* sysctls.
On large systems those sysctls may generate megabytes of output.  Before
this change sbuf(9) code was resizing buffer by 4KB each time many times,
generating tons of TLB shootdowns.  Unfortunately in this case existing
sbuf_new_for_sysctl() mechanism, supposed to help with this issue, is not
applicable, since all the sbuf writes are done in different kernel thread.

This change improves situation in two ways:
 - on first sysctl call, not providing any output buffer, it sets special
sbuf drain function, just counting the data and so not needing big buffer;
 - on second sysctl call it uses as initial buffer size value saved on
previous call, so that in most cases there will be no reallocation, unless
GEOM topology changed significantly.

MFC after:	1 week
Sponsored by:	iXsystems, Inc.
2019-06-18 21:05:10 +00:00
kib
a2f4dfb45b Grammar fixes for r347690.
Submitted by:	alc
MFC after:	3 days
2019-05-17 21:18:11 +00:00
kib
27769d9344 imgact_elf.c: Add comment explaining the malloc/VOP_UNLOCK() dance
from r347148.

Requested by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2019-05-16 13:03:54 +00:00
kib
2dc0d9edaa Switch to use shared vnode locks for text files during image activation.
kern_execve() locks text vnode exclusive to be able to set and clear
VV_TEXT flag. VV_TEXT is mutually exclusive with the v_writecount > 0
condition.

The change removes VV_TEXT, replacing it with the condition
v_writecount <= -1, and puts v_writecount under the vnode interlock.
Each text reference decrements v_writecount.  To clear the text
reference when the segment is unmapped, it is recorded in the
vm_map_entry backed by the text file as MAP_ENTRY_VN_TEXT flag, and
v_writecount is incremented on the map entry removal

The operations like VOP_ADD_WRITECOUNT() and VOP_SET_TEXT() check that
v_writecount does not contradict the desired change.  vn_writecheck()
is now racy and its use was eliminated everywhere except access.
Atomic check for writeability and increment of v_writecount is
performed by the VOP.  vn_truncate() now increments v_writecount
around VOP_SETATTR() call, lack of which is arguably a bug on its own.

nullfs bypasses v_writecount to the lower vnode always, so nullfs
vnode has its own v_writecount correct, and lower vnode gets all
references, since object->handle is always lower vnode.

On the text vnode' vm object dealloc, the v_writecount value is reset
to zero, and deadfs vop_unset_text short-circuit the operation.
Reclamation of lowervp always reclaims all nullfs vnodes referencing
lowervp first, so no stray references are left.

Reviewed by:	markj, trasz
Tested by:	mjg, pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D19923
2019-05-05 11:20:43 +00:00
kib
e3b87f7a32 imgact_elf: do not relock the text vnode if possible.
We unlock the vnode around malloc(M_WAITOK), to make it possible for
pagedaemon to flush vnode pages for us.  Instead of doing it
unconditionally, first try M_NOWAIT allocation, which typically
succeed.  Only on failure, unlock the vnode and retry with M_WAITOK.

Reviewed by:	markj, trasz
Tested by:	mjg, pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D19923
2019-05-05 11:04:01 +00:00
trasz
9e141477c1 Use shared vnode locks for the ELF interpreter.
Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D19874
2019-04-11 11:21:45 +00:00
trasz
8fedfd26a7 Improve vnode lock assertions.
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
2019-04-10 10:21:14 +00:00
trasz
ca6bb3d6ec Factor out section loading into a separate function.
Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D19846
2019-04-09 15:24:38 +00:00
trasz
de8cb8b511 Refactor ELF interpreter loading into a separate function.
Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D19741
2019-04-08 14:31:07 +00:00
kib
e644a7809f Fix branding after r345661.
In particular, elf32 FreeBSD binaries were not executed on LP64 hosts.
The interp_name_len value should account for the nul terminator.  This
is needed for strncmp()s in brand checking code to work.

Reported by:	andreast
Sponsored by:	The FreeBSD Foundation
MFC after:	12 days (together with r345661)
2019-03-30 16:58:51 +00:00
trasz
de48cc4214 Factor out retrieving the interpreter path from the main ELF
loader routine.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D19715
2019-03-28 21:43:01 +00:00
trasz
390bc6cf0d Factor out resource limit enforcement code in the ELF loader.
It makes the code slightly easier to follow, and might make
it easier to fix the resouce accounting to also account for
the interpreter.

The PROC_UNLOCK() is moved earlier - I don't see anything
it should protect; the lim_max() is a wrapper around lim_rlimit(),
and that, differently from lim_rlimit_proc(), doesn't require
the proc lock to be held.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D19689
2019-03-26 15:35:49 +00:00
trasz
0e6bf0c478 Remove trunc_page_ps() and round_page_ps() macros. This completes
the undoing of r100384.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D19680
2019-03-23 13:41:14 +00:00
trasz
1a536e4f24 Remove sv_pagesize, originally introduced with r100384.
In all of the architectures we have today, we always use PAGE_SIZE.
While in theory one could define different things, none of the
current architectures do, even the ones that have transitioned from
32-bit to 64-bit like i386 and arm. Some ancient mips binaries on
other systems used 8k instead of 4k, but we don't support running
those and likely never will due to their age and obscurity.

Reviewed by:	imp (who also contributed the commit message)
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D19280
2019-03-01 16:16:38 +00:00
kib
08849e56ba Implement Address Space Layout Randomization (ASLR)
With this change, randomization can be enabled for all non-fixed
mappings.  It means that the base address for the mapping is selected
with a guaranteed amount of entropy (bits). If the mapping was
requested to be superpage aligned, the randomization honours the
superpage attributes.

Although the value of ASLR is diminshing over time as exploit authors
work out simple ASLR bypass techniques, it elimintates the trivial
exploitation of certain vulnerabilities, at least in theory.  This
implementation is relatively small and happens at the correct
architectural level.  Also, it is not expected to introduce
regressions in existing cases when turned off (default for now), or
cause any significant maintaince burden.

The randomization is done on a best-effort basis - that is, the
allocator falls back to a first fit strategy if fragmentation prevents
entropy injection.  It is trivial to implement a strong mode where
failure to guarantee the requested amount of entropy results in
mapping request failure, but I do not consider that to be usable.

I have not fine-tuned the amount of entropy injected right now. It is
only a quantitive change that will not change the implementation.  The
current amount is controlled by aslr_pages_rnd.

To not spoil coalescing optimizations, to reduce the page table
fragmentation inherent to ASLR, and to keep the transient superpage
promotion for the malloced memory, locality clustering is implemented
for anonymous private mappings, which are automatically grouped until
fragmentation kicks in.  The initial location for the anon group range
is, of course, randomized.  This is controlled by vm.cluster_anon,
enabled by default.

The default mode keeps the sbrk area unpopulated by other mappings,
but this can be turned off, which gives much more breathing bits on
architectures with small address space, such as i386.  This is tied
with the question of following an application's hint about the mmap(2)
base address. Testing shows that ignoring the hint does not affect the
function of common applications, but I would expect more demanding
code could break. By default sbrk is preserved and mmap hints are
satisfied, which can be changed by using the
kern.elf{32,64}.aslr.honor_sbrk sysctl.

ASLR is enabled on per-ABI basis, and currently it is only allowed on
FreeBSD native i386 and amd64 (including compat 32bit) ABIs.  Support
for additional architectures will be added after further testing.

Both per-process and per-image controls are implemented:
- procctl(2) adds PROC_ASLR_CTL/PROC_ASLR_STATUS;
- NT_FREEBSD_FCTL_ASLR_DISABLE feature control note bit makes it possible
  to force ASLR off for the given binary.  (A tool to edit the feature
  control note is in development.)
Global controls are:
- kern.elf{32,64}.aslr.enable - for non-fixed mappings done by mmap(2);
- kern.elf{32,64}.aslr.pie_enable - for PIE image activation mappings;
- kern.elf{32,64}.aslr.honor_sbrk - allow to use sbrk area for mmap(2);
- vm.cluster_anon - enables anon mapping clustering.

PR:	208580 (exp runs)
Exp-runs done by:	antoine
Reviewed by:	markj (previous version)
Discussed with:	emaste
Tested by:	pho
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D5603
2019-02-10 17:19:45 +00:00
kib
219b0b3431 Port sysctl kern.elf32.read_exec from amd64 to i386.
Make it more comprehensive on i386, by not setting nx bit for any
mapping, not just adding PF_X to all kernel-loaded ELF segments.  This
is needed for the compatibility with older i386 programs that assume
that read access implies exec, e.g. old X servers with hand-rolled
module loader.

Reported and tested by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-02-07 02:17:34 +00:00
mjg
bae6f9dc2d Remove proctree acquire from note_procstat_proc
It is not needed since r340482 ("proc: always store parent pid in p_oppid")

Sponsored by:	The FreeBSD Foundation
2018-12-08 11:38:39 +00:00
kib
ec0b91f01d Parse FreeBSD Feature Control note on the ELF image activation.
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2018-11-23 23:33:55 +00:00
kib
3f0632c424 Generalize ELF parse_notes().
Remove the knowledge of the ABI note type and brandnote from it,
instead provide it with a callback to do note-specific matching and
data fetching.  Implement callback to match against ELF brand.

Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2018-11-23 23:29:14 +00:00
kib
412c849e74 Trivial reduction of the code duplication, reuse the return FALSE code.
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2018-11-23 23:16:01 +00:00
jhb
db3b3f63c4 Enable non-executable stacks by default on RISC-V.
Reviewed by:	markj
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D17878
2018-11-07 18:32:02 +00:00
gordon
8f59931436 Correct ELF header parsing code to prevent invalid ELF sections from
disclosing memory.

Submitted by:	markj
Reported by:	Thomas Barabosch, Fraunhofer FKIE
Approved by:	re (implicit)
Approved by:	so
Security:	FreeBSD-SA-18:12.elf
Security:	CVE-2018-6924
Sponsored by:	The FreeBSD Foundation
2018-09-12 04:57:34 +00:00
obrien
48e7325363 Correct copyright dates. 2018-07-30 07:01:00 +00:00
kib
16aa04d663 When reporting an error, print the errno value.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2018-07-19 19:03:18 +00:00
brooks
474f6cc8eb Correct pointer subtraction in KASSERT().
The assertion would never fire without truly spectacular future
programming errors.

Reported by:	Coverity
CID:		1391367, 1391368
Sponsored by:	DARPA, AFRL
2018-05-29 17:49:03 +00:00
brooks
a36ed4ef19 Avoid two suword() calls per auxarg entry.
Instead, construct an auxargs array and copy it out all at once.

Use an array of Elf_Auxinfo rather than pairs of Elf_Addr * to represent
the array. This is the correct type where pairs of words just happend
to work. To reduce the size of the diff, AUXARGS_ENTRY is altered to act
on this array rather than introducing a new macro.

Return errors on copyout() and suword() failures and handle them in the
caller.

Incidentally fixes AT_RANDOM and AT_EXECFN in 32-bit linux on amd64
which incorrectly used AUXARG_ENTRY instead of AUXARGS_ENTRY_32
(now removed due to the use of proper types).

Reviewed by:	kib
Comments from:	emaste, jhb
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D15485
2018-05-24 16:25:18 +00:00
brooks
9d79658aab Move most of the contents of opt_compat.h to opt_global.h.
opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.

Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c.  A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.

Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.

Reviewed by:	kib, cem, jhb, jtl
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14941
2018-04-06 17:35:35 +00:00
emaste
6851f84d1f Use C99 boolean type for translate_osrel
Migrate to modern types before creating MD Linuxolator bits for new
architectures.

Reviewed by:	cem
Sponsored by:	Turing Robotic Industries Inc.
Differential Revision:	https://reviews.freebsd.org/D14676
2018-03-13 16:40:29 +00:00
emaste
742eaeb102 Use C99 designated initializers for struct execsw
It it makes use slightly more clear and facilitates grepping.
2018-03-13 13:09:10 +00:00
emaste
1f17982a75 ANSIfy sys/kern/imgact_* 2018-03-12 15:45:50 +00:00
jhb
3f3863ea7c Always give ELF brands a chance to veto a match.
If a brand provides a header_supported hook, check it when trying to
find a brand based on a matching interpreter as well as in the final
loop for the fallback brand. Previously a brand might reject a binary
via a header_supported hook in one of the earlier loops, but still be
chosen by one of these later loops.

Reviewed by:	kib
Obtained from:	CheriBSD
MFC after:	2 weeks
Sponsored by:	DARPA / AFRL
Differential Revision:	https://reviews.freebsd.org/D13945
2018-02-05 23:27:42 +00:00
markj
8b210be68e Generalize the gzio API.
We currently use a set of subroutines in kern_gzio.c to perform
compression of user and kernel core dumps. In the interest of adding
support for other compression algorithms (zstd) in this role without
complicating the API consumers, add a simple compressor API which can be
used to select an algorithm.

Also change the (non-default) GZIO kernel option to not enable
compressed user cores by default. It's not clear that such a default
would be desirable with support for multiple algorithms implemented,
and it's inconsistent in that it isn't applied to kernel dumps.

Reviewed by:	cem
Differential Revision:	https://reviews.freebsd.org/D13632
2018-01-08 21:27:41 +00:00
pfg
cc22a86800 sys/kern: adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 15:20:12 +00:00
mmel
be76b77ce7 Add AT_HWCAP2 ELF auxiliary vector.
- allocate value for new AT_HWCAP2 auxiliary vector on all platforms.
 - expand 'struct sysentvec' by new 'u_long *sv_hwcap2', in exactly
   same way as for AT_HWCAP.

MFC after:	1 month
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D12699
2017-10-21 12:05:01 +00:00
jhb
e5ea82a50d Add AT_HWCAP and AT_EHDRFLAGS on all platforms.
A new 'u_long *sv_hwcap' field is added to 'struct sysentvec'.  A
process ABI can set this field to point to a value holding a mask of
architecture-specific CPU feature flags.  If an ABI does not wish to
supply AT_HWCAP to processes the field can be left as NULL.

The support code for AT_EHDRFLAGS was already present on all systems,
just the #define was not present.  This is a step towards unifying the
AT_* constants across platforms.

Reviewed by:	kib
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D12290
2017-09-14 14:26:55 +00:00
jhb
131ea5a7f4 Store a 32-bit PT_LWPINFO struct for 32-bit process core dumps.
Process core notes for a 32-bit process running on a 64-bit host need to
use 32-bit structures so that the note layout matches the layout of notes
of a core dump of a 32-bit process under a 32-bit kernel.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D11407
2017-06-29 21:31:13 +00:00
tychon
2a8ac69f46 Add support for capturing 'struct ptrace_lwpinfo' for signals
resulting in a process dumping core in the corefile.

Also extend procstat to view select members of 'struct ptrace_lwpinfo'
from the contents of the note.

Sponsored by:	Dell EMC Isilon
2017-03-30 18:21:36 +00:00
kib
e617648699 A followup to r315749, two more places where brand->interp_path was
accessed unconditionally.

Reported by:	se
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-03-30 04:21:02 +00:00
ed
a9f48eafc0 Don't require the presence of the compat_3_brand.
The existing ELF image activator requires the brandinfo to provide such
a string unconditionally, even if the executable format in question
doesn't use this type of branding. Skip matching when it's a null
pointer.

Reviewed by:	kib
MFC after:	2 weeks
2017-03-23 14:09:45 +00:00