Commit Graph

399 Commits

Author SHA1 Message Date
Konstantin Belousov
a1549acbaf Fix mis-merge.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-08-05 19:19:25 +00:00
John Baldwin
f422bc3092 Set ISOPEN in namei flags when opening executable interpreters.
These vnodes are explicitly opened via VOP_OPEN via
exec_check_permissions identical to the main exectuable image.
Setting ISOPEN allows filesystems to perform suitable checks in
VOP_LOOKUP (e.g. close-to-open consistency in the NFS client).

Reviewed by:	kib
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D21129
2019-08-03 01:02:52 +00:00
Konstantin Belousov
fc83c5a7d0 Make randomized stack gap between strings and pointers to argv/envs.
This effectively makes the stack base on the csu _start entry
randomized.

The gap is enabled if ASLR is for the ABI is enabled, and then
kern.elf{64,32}.aslr.stack_gap specify the max percentage of the
initial stack size that can be wasted for gap.  Setting it to zero
disables the gap, and max is capped at 50%.

Only amd64 for now.

Reviewed by:	cem, markj
Discussed with:	emaste
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D21081
2019-07-31 20:23:10 +00:00
Mark Johnston
e020a35f5b Remove a redundant offset computation in elf_load_section().
With r344705 the offset is always zero.

Submitted by:	Wuyang Chung <wuyang.chung1@gmail.com>
2019-07-24 15:18:05 +00:00
Alexander Motin
5c32e9fcb2 Optimize kern.geom.conf* sysctls.
On large systems those sysctls may generate megabytes of output.  Before
this change sbuf(9) code was resizing buffer by 4KB each time many times,
generating tons of TLB shootdowns.  Unfortunately in this case existing
sbuf_new_for_sysctl() mechanism, supposed to help with this issue, is not
applicable, since all the sbuf writes are done in different kernel thread.

This change improves situation in two ways:
 - on first sysctl call, not providing any output buffer, it sets special
sbuf drain function, just counting the data and so not needing big buffer;
 - on second sysctl call it uses as initial buffer size value saved on
previous call, so that in most cases there will be no reallocation, unless
GEOM topology changed significantly.

MFC after:	1 week
Sponsored by:	iXsystems, Inc.
2019-06-18 21:05:10 +00:00
Konstantin Belousov
f1f81d3b96 Grammar fixes for r347690.
Submitted by:	alc
MFC after:	3 days
2019-05-17 21:18:11 +00:00
Konstantin Belousov
0ddfdc60f8 imgact_elf.c: Add comment explaining the malloc/VOP_UNLOCK() dance
from r347148.

Requested by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2019-05-16 13:03:54 +00:00
Konstantin Belousov
78022527bb Switch to use shared vnode locks for text files during image activation.
kern_execve() locks text vnode exclusive to be able to set and clear
VV_TEXT flag. VV_TEXT is mutually exclusive with the v_writecount > 0
condition.

The change removes VV_TEXT, replacing it with the condition
v_writecount <= -1, and puts v_writecount under the vnode interlock.
Each text reference decrements v_writecount.  To clear the text
reference when the segment is unmapped, it is recorded in the
vm_map_entry backed by the text file as MAP_ENTRY_VN_TEXT flag, and
v_writecount is incremented on the map entry removal

The operations like VOP_ADD_WRITECOUNT() and VOP_SET_TEXT() check that
v_writecount does not contradict the desired change.  vn_writecheck()
is now racy and its use was eliminated everywhere except access.
Atomic check for writeability and increment of v_writecount is
performed by the VOP.  vn_truncate() now increments v_writecount
around VOP_SETATTR() call, lack of which is arguably a bug on its own.

nullfs bypasses v_writecount to the lower vnode always, so nullfs
vnode has its own v_writecount correct, and lower vnode gets all
references, since object->handle is always lower vnode.

On the text vnode' vm object dealloc, the v_writecount value is reset
to zero, and deadfs vop_unset_text short-circuit the operation.
Reclamation of lowervp always reclaims all nullfs vnodes referencing
lowervp first, so no stray references are left.

Reviewed by:	markj, trasz
Tested by:	mjg, pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D19923
2019-05-05 11:20:43 +00:00
Konstantin Belousov
2d6b8546b7 imgact_elf: do not relock the text vnode if possible.
We unlock the vnode around malloc(M_WAITOK), to make it possible for
pagedaemon to flush vnode pages for us.  Instead of doing it
unconditionally, first try M_NOWAIT allocation, which typically
succeed.  Only on failure, unlock the vnode and retry with M_WAITOK.

Reviewed by:	markj, trasz
Tested by:	mjg, pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D19923
2019-05-05 11:04:01 +00:00
Edward Tomasz Napierala
4033ecc915 Use shared vnode locks for the ELF interpreter.
Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D19874
2019-04-11 11:21:45 +00:00
Edward Tomasz Napierala
b65ca345ef Improve vnode lock assertions.
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
2019-04-10 10:21:14 +00:00
Edward Tomasz Napierala
9bcd7482b2 Factor out section loading into a separate function.
Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D19846
2019-04-09 15:24:38 +00:00
Edward Tomasz Napierala
9274fb3599 Refactor ELF interpreter loading into a separate function.
Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D19741
2019-04-08 14:31:07 +00:00
Konstantin Belousov
be7808dca3 Fix branding after r345661.
In particular, elf32 FreeBSD binaries were not executed on LP64 hosts.
The interp_name_len value should account for the nul terminator.  This
is needed for strncmp()s in brand checking code to work.

Reported by:	andreast
Sponsored by:	The FreeBSD Foundation
MFC after:	12 days (together with r345661)
2019-03-30 16:58:51 +00:00
Edward Tomasz Napierala
09c78d53bf Factor out retrieving the interpreter path from the main ELF
loader routine.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D19715
2019-03-28 21:43:01 +00:00
Edward Tomasz Napierala
20e1174a00 Factor out resource limit enforcement code in the ELF loader.
It makes the code slightly easier to follow, and might make
it easier to fix the resouce accounting to also account for
the interpreter.

The PROC_UNLOCK() is moved earlier - I don't see anything
it should protect; the lim_max() is a wrapper around lim_rlimit(),
and that, differently from lim_rlimit_proc(), doesn't require
the proc lock to be held.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D19689
2019-03-26 15:35:49 +00:00
Edward Tomasz Napierala
545517f198 Remove trunc_page_ps() and round_page_ps() macros. This completes
the undoing of r100384.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D19680
2019-03-23 13:41:14 +00:00
Edward Tomasz Napierala
1699546def Remove sv_pagesize, originally introduced with r100384.
In all of the architectures we have today, we always use PAGE_SIZE.
While in theory one could define different things, none of the
current architectures do, even the ones that have transitioned from
32-bit to 64-bit like i386 and arm. Some ancient mips binaries on
other systems used 8k instead of 4k, but we don't support running
those and likely never will due to their age and obscurity.

Reviewed by:	imp (who also contributed the commit message)
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D19280
2019-03-01 16:16:38 +00:00
Konstantin Belousov
fa50a3552d Implement Address Space Layout Randomization (ASLR)
With this change, randomization can be enabled for all non-fixed
mappings.  It means that the base address for the mapping is selected
with a guaranteed amount of entropy (bits). If the mapping was
requested to be superpage aligned, the randomization honours the
superpage attributes.

Although the value of ASLR is diminshing over time as exploit authors
work out simple ASLR bypass techniques, it elimintates the trivial
exploitation of certain vulnerabilities, at least in theory.  This
implementation is relatively small and happens at the correct
architectural level.  Also, it is not expected to introduce
regressions in existing cases when turned off (default for now), or
cause any significant maintaince burden.

The randomization is done on a best-effort basis - that is, the
allocator falls back to a first fit strategy if fragmentation prevents
entropy injection.  It is trivial to implement a strong mode where
failure to guarantee the requested amount of entropy results in
mapping request failure, but I do not consider that to be usable.

I have not fine-tuned the amount of entropy injected right now. It is
only a quantitive change that will not change the implementation.  The
current amount is controlled by aslr_pages_rnd.

To not spoil coalescing optimizations, to reduce the page table
fragmentation inherent to ASLR, and to keep the transient superpage
promotion for the malloced memory, locality clustering is implemented
for anonymous private mappings, which are automatically grouped until
fragmentation kicks in.  The initial location for the anon group range
is, of course, randomized.  This is controlled by vm.cluster_anon,
enabled by default.

The default mode keeps the sbrk area unpopulated by other mappings,
but this can be turned off, which gives much more breathing bits on
architectures with small address space, such as i386.  This is tied
with the question of following an application's hint about the mmap(2)
base address. Testing shows that ignoring the hint does not affect the
function of common applications, but I would expect more demanding
code could break. By default sbrk is preserved and mmap hints are
satisfied, which can be changed by using the
kern.elf{32,64}.aslr.honor_sbrk sysctl.

ASLR is enabled on per-ABI basis, and currently it is only allowed on
FreeBSD native i386 and amd64 (including compat 32bit) ABIs.  Support
for additional architectures will be added after further testing.

Both per-process and per-image controls are implemented:
- procctl(2) adds PROC_ASLR_CTL/PROC_ASLR_STATUS;
- NT_FREEBSD_FCTL_ASLR_DISABLE feature control note bit makes it possible
  to force ASLR off for the given binary.  (A tool to edit the feature
  control note is in development.)
Global controls are:
- kern.elf{32,64}.aslr.enable - for non-fixed mappings done by mmap(2);
- kern.elf{32,64}.aslr.pie_enable - for PIE image activation mappings;
- kern.elf{32,64}.aslr.honor_sbrk - allow to use sbrk area for mmap(2);
- vm.cluster_anon - enables anon mapping clustering.

PR:	208580 (exp runs)
Exp-runs done by:	antoine
Reviewed by:	markj (previous version)
Discussed with:	emaste
Tested by:	pho
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D5603
2019-02-10 17:19:45 +00:00
Konstantin Belousov
eb785fab3b Port sysctl kern.elf32.read_exec from amd64 to i386.
Make it more comprehensive on i386, by not setting nx bit for any
mapping, not just adding PF_X to all kernel-loaded ELF segments.  This
is needed for the compatibility with older i386 programs that assume
that read access implies exec, e.g. old X servers with hand-rolled
module loader.

Reported and tested by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-02-07 02:17:34 +00:00
Mateusz Guzik
b0b246b0ba Remove proctree acquire from note_procstat_proc
It is not needed since r340482 ("proc: always store parent pid in p_oppid")

Sponsored by:	The FreeBSD Foundation
2018-12-08 11:38:39 +00:00
Konstantin Belousov
cefb93f253 Parse FreeBSD Feature Control note on the ELF image activation.
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2018-11-23 23:33:55 +00:00
Konstantin Belousov
92328a3251 Generalize ELF parse_notes().
Remove the knowledge of the ABI note type and brandnote from it,
instead provide it with a callback to do note-specific matching and
data fetching.  Implement callback to match against ELF brand.

Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2018-11-23 23:29:14 +00:00
Konstantin Belousov
eda8fe63c9 Trivial reduction of the code duplication, reuse the return FALSE code.
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2018-11-23 23:16:01 +00:00
John Baldwin
4bf4b0f139 Enable non-executable stacks by default on RISC-V.
Reviewed by:	markj
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D17878
2018-11-07 18:32:02 +00:00
Gordon Tetlow
c9e562b188 Correct ELF header parsing code to prevent invalid ELF sections from
disclosing memory.

Submitted by:	markj
Reported by:	Thomas Barabosch, Fraunhofer FKIE
Approved by:	re (implicit)
Approved by:	so
Security:	FreeBSD-SA-18:12.elf
Security:	CVE-2018-6924
Sponsored by:	The FreeBSD Foundation
2018-09-12 04:57:34 +00:00
David E. O'Brien
455d358977 Correct copyright dates. 2018-07-30 07:01:00 +00:00
Konstantin Belousov
53e20b2702 When reporting an error, print the errno value.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2018-07-19 19:03:18 +00:00
Brooks Davis
d8b2f0790b Correct pointer subtraction in KASSERT().
The assertion would never fire without truly spectacular future
programming errors.

Reported by:	Coverity
CID:		1391367, 1391368
Sponsored by:	DARPA, AFRL
2018-05-29 17:49:03 +00:00
Brooks Davis
5f77b8a88b Avoid two suword() calls per auxarg entry.
Instead, construct an auxargs array and copy it out all at once.

Use an array of Elf_Auxinfo rather than pairs of Elf_Addr * to represent
the array. This is the correct type where pairs of words just happend
to work. To reduce the size of the diff, AUXARGS_ENTRY is altered to act
on this array rather than introducing a new macro.

Return errors on copyout() and suword() failures and handle them in the
caller.

Incidentally fixes AT_RANDOM and AT_EXECFN in 32-bit linux on amd64
which incorrectly used AUXARG_ENTRY instead of AUXARGS_ENTRY_32
(now removed due to the use of proper types).

Reviewed by:	kib
Comments from:	emaste, jhb
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D15485
2018-05-24 16:25:18 +00:00
Brooks Davis
6469bdcdb6 Move most of the contents of opt_compat.h to opt_global.h.
opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.

Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c.  A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.

Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.

Reviewed by:	kib, cem, jhb, jtl
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14941
2018-04-06 17:35:35 +00:00
Ed Maste
a95659f75f Use C99 boolean type for translate_osrel
Migrate to modern types before creating MD Linuxolator bits for new
architectures.

Reviewed by:	cem
Sponsored by:	Turing Robotic Industries Inc.
Differential Revision:	https://reviews.freebsd.org/D14676
2018-03-13 16:40:29 +00:00
Ed Maste
b7feabf906 Use C99 designated initializers for struct execsw
It it makes use slightly more clear and facilitates grepping.
2018-03-13 13:09:10 +00:00
Ed Maste
5cc6d253f4 ANSIfy sys/kern/imgact_* 2018-03-12 15:45:50 +00:00
John Baldwin
d722231bca Always give ELF brands a chance to veto a match.
If a brand provides a header_supported hook, check it when trying to
find a brand based on a matching interpreter as well as in the final
loop for the fallback brand. Previously a brand might reject a binary
via a header_supported hook in one of the earlier loops, but still be
chosen by one of these later loops.

Reviewed by:	kib
Obtained from:	CheriBSD
MFC after:	2 weeks
Sponsored by:	DARPA / AFRL
Differential Revision:	https://reviews.freebsd.org/D13945
2018-02-05 23:27:42 +00:00
Mark Johnston
78f57a9cde Generalize the gzio API.
We currently use a set of subroutines in kern_gzio.c to perform
compression of user and kernel core dumps. In the interest of adding
support for other compression algorithms (zstd) in this role without
complicating the API consumers, add a simple compressor API which can be
used to select an algorithm.

Also change the (non-default) GZIO kernel option to not enable
compressed user cores by default. It's not clear that such a default
would be desirable with support for multiple algorithms implemented,
and it's inconsistent in that it isn't applied to kernel dumps.

Reviewed by:	cem
Differential Revision:	https://reviews.freebsd.org/D13632
2018-01-08 21:27:41 +00:00
Pedro F. Giffuni
8a36da99de sys/kern: adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 15:20:12 +00:00
Michal Meloun
904d8c492f Add AT_HWCAP2 ELF auxiliary vector.
- allocate value for new AT_HWCAP2 auxiliary vector on all platforms.
 - expand 'struct sysentvec' by new 'u_long *sv_hwcap2', in exactly
   same way as for AT_HWCAP.

MFC after:	1 month
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D12699
2017-10-21 12:05:01 +00:00
John Baldwin
c2f37b9245 Add AT_HWCAP and AT_EHDRFLAGS on all platforms.
A new 'u_long *sv_hwcap' field is added to 'struct sysentvec'.  A
process ABI can set this field to point to a value holding a mask of
architecture-specific CPU feature flags.  If an ABI does not wish to
supply AT_HWCAP to processes the field can be left as NULL.

The support code for AT_EHDRFLAGS was already present on all systems,
just the #define was not present.  This is a step towards unifying the
AT_* constants across platforms.

Reviewed by:	kib
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D12290
2017-09-14 14:26:55 +00:00
John Baldwin
51645e836d Store a 32-bit PT_LWPINFO struct for 32-bit process core dumps.
Process core notes for a 32-bit process running on a 64-bit host need to
use 32-bit structures so that the note layout matches the layout of notes
of a core dump of a 32-bit process under a 32-bit kernel.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D11407
2017-06-29 21:31:13 +00:00
Tycho Nightingale
86be94fca3 Add support for capturing 'struct ptrace_lwpinfo' for signals
resulting in a process dumping core in the corefile.

Also extend procstat to view select members of 'struct ptrace_lwpinfo'
from the contents of the note.

Sponsored by:	Dell EMC Isilon
2017-03-30 18:21:36 +00:00
Konstantin Belousov
3aeacc55a5 A followup to r315749, two more places where brand->interp_path was
accessed unconditionally.

Reported by:	se
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-03-30 04:21:02 +00:00
Ed Schouten
0fe9832013 Don't require the presence of the compat_3_brand.
The existing ELF image activator requires the brandinfo to provide such
a string unconditionally, even if the executable format in question
doesn't use this type of branding. Skip matching when it's a null
pointer.

Reviewed by:	kib
MFC after:	2 weeks
2017-03-23 14:09:45 +00:00
Konstantin Belousov
2274ab3d7b Update r315753 with the proper flag name.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-03-22 22:28:13 +00:00
Konstantin Belousov
1438fe3cf2 Add a flag BI_BRAND_ONLY_STATIC to specify that the brand only
matches static binaries.

Interpretation of the 'static' there is that the binary must not
specify an interpreter.  In particular, shared objects are matched by
the brand if BI_CAN_EXEC_DYN is also set.

This improves precision of the brand matching, which should eliminate
surprises due to brand ordering.

Revert r315701.

Discussed with and tested by:	ed (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-03-22 22:23:01 +00:00
Konstantin Belousov
7aab7a80e2 Adjust r314851 to not require every brand to specify interpreter path.
Reported and tested by:	ed
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-03-22 22:06:48 +00:00
Alan Cox
c547cbb49c Avoid unnecessary calls to vm_map_protect() in elf_load_section().
Typically, when elf_load_section() unconditionally passed VM_PROT_ALL to
elf_map_insert(), it was needlessly enabling execute access on the
mapping, and it would later have to call vm_map_protect() to correct the
mapping's access rights.  Now, instead, elf_load_section() always passes
its parameter "prot" to elf_map_insert().  So, elf_load_section() must
only call vm_map_protect() if it needs to remove the write access that
was temporarily granted to perform a copyout().

Reviewed by:	kib
MFC after:	1 week
2017-03-18 23:37:00 +00:00
Konstantin Belousov
9bcf2f2da0 Accept linkers representation for ELF segments with zero on-disk length.
For such segments, GNU bfd linker writes knowingly incorrect value
into the the file offset field of the program header entry, with the
motivation that file should not be mapped for creation of this segment
at all.

Relax checks for the ELF structure validity when on-disk segment
length is zero, and explicitely set mapping length to zero for such
segments to avoid validating rounding arithmetic.

PR:	217610
Reported by:	Robert Clausecker <fuz@fuz.su>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-03-12 13:51:13 +00:00
Konstantin Belousov
973d67c407 Style.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-03-12 13:49:42 +00:00
Alan Cox
e383e820d3 Simplify the control flow and tidy up a comment in map_insert.
In collaboration with:	kib
MFC after:	1 week
2017-03-11 18:57:13 +00:00
Konstantin Belousov
15a9aedfa1 When selecting brand based on old Elf branding, prefer the brand which
interpreter exactly matches the one requested by the activated image.

This change applies r295277, which did the same for note branding, to
the old brand selection, with the same reasoning of fixing compat32
interpreter substitution.

PR:	211837
Reported by:	kenji@kens.fm
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-03-07 13:38:25 +00:00
Konstantin Belousov
3d560b4be2 Require whole brand string matching for old Elf branding.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-03-07 13:37:35 +00:00
Konstantin Belousov
0bbee4cd3f Consistently use vm_ooffset_t type for the vm object offset in
elf_load_section.

The values passed currently as vm_offset_t are phdr.p_offset, which
have the native Elf word size.  Since elf_load_section interprets them
as the file offset, use vm object offset type.

Noted and reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-03-07 13:36:43 +00:00
Konstantin Belousov
aaadc41f6c Instead of direct use of vm_map_insert(), call vm_map_fixed(MAP_CHECK_EXCL).
This KPI explicitely indicates the intent of creating the mapping at
the fixed address, and incorporates the map locking into the callee.

Suggested and reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-03-06 14:09:54 +00:00
Alan Cox
28e8da6517 Style and punctuation fixes.
Reviewed by:	kib
MFC after:	3 days
2017-03-05 23:59:04 +00:00
Konstantin Belousov
fe0a8a3994 Style.
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2017-03-02 17:35:13 +00:00
Konstantin Belousov
55b985b43b Use vm_map_insert() instead of vm_map_find() in elf_map_insert().
Elf_map_insert() needs to create mapping at the known fixed address.
Usage of vm_map_find() assumes, on the other hand, that any suitable
address space range above or equal the specified hint, is acceptable.
Due to operating on the fresh or cleared address space, vm_map_find()
usually creates mapping starting exactly at hint.

Switch to vm_map_insert() use to clearly request fixed mapping from
the VM.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2017-03-01 10:28:15 +00:00
Konstantin Belousov
e3d8f8fed4 When deallocating the vm object in elf_map_insert() due to
vm_map_insert() failure, drop the vnode lock around the call to
vm_object_deallocate().

Since the deallocated object is the vm object of the vnode, we might
get the vnode lock recursion there.  In fact, it is almost impossible
to make vm_map_insert() failing there on stock kernel.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2017-03-01 10:22:07 +00:00
John Baldwin
885f13dc96 Copy the e_machine and e_flags fields from the binary into an ELF core dump.
In the kernel, cache the machine and flags fields from ELF header to use in
the ELF header of a core dump. For gcore, the copy these fields over from
the ELF header in the binary.

This matters for platforms which encode ABI information in the flags field
(such as o32 vs n32 on MIPS).

Reviewed by:	kib
Sponsored by:	DARPA / AFRL
Differential Revision:	https://reviews.freebsd.org/D9392
2017-02-07 20:34:03 +00:00
Ed Maste
77ebe276ba imgact_elf: refactor et_dyn_addr calculation
This simplifies the logic somewhat. It is extracted from the change in
review in D5603.

Differential Revision:	https://reviews.freebsd.org/D9321
2017-01-24 22:46:43 +00:00
Andriy Gapon
c468ff880a don't abort writing of a core dump after EFAULT
It's possible to get EFAULT when writing a segment backed by a file
if the segment extends beyond the file.
The core dump could still be useful if we skip the rest of the segment
and proceed to other segements.
The skipped segment (or a portion of it) will be zero-filled.

While there, use 'const' to signify that core_write() only reads the
buffer and use __DECONST before calling vn_rdwr_inchunks() because it
can be used for both reading and writing.

Before the change:
kernel: Failed to write core file for process mmap_trunc_core (error 14)
kernel: pid 77718 (mmap_trunc_core), uid 1001: exited on signal 6

After the change:
kernel: Failed to fully fault in a core file segment at VA 0x800645000 with size 0x4000 to be written at offset 0x29000 for process mmap_trunc_core
kernel: pid 4901 (mmap_trunc_core), uid 1001: exited on signal 6 (core dumped)

Reviewed by:	julian, kib
Obtained from:	Panzura (older version of the change)
MFC after:	5 days
Sponsored by:	Panzura
Differential Revision: https://reviews.freebsd.org/D9233
2017-01-20 13:39:07 +00:00
Konstantin Belousov
5420f76b59 Style.
Reviewed by:	emaste
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2016-10-04 15:23:03 +00:00
Nathan Whitehorn
09c697016b Back out misfired extra file in r305108. 2016-08-31 04:03:55 +00:00
Nathan Whitehorn
c9a124dc9a Refix operation on sparse CPU mappings as in r302372, temporarily broken
by r304716.

PR:		kern/210106
MFC after:	2 days
2016-08-31 04:02:52 +00:00
Conrad Meyer
1005d8aff4 imgact_elf: Rename the segment iterator to match reality
The each_writable_segment routine evaluates segments on a slightly little more
nuanced metric than simply "writable" or not.  Rename the function to more
closely match its behavior (each_dumpable_segment).

Suggested by:	jhb
Sponsored by:	EMC / Isilon Storage Division
2016-07-20 22:51:33 +00:00
Conrad Meyer
f3325003d9 ANSI-fy imgact_elf.c
Sponsored by:	EMC / Isilon Storage Division
2016-07-20 22:46:56 +00:00
Conrad Meyer
07f825e871 Fix DEBUG build on 64-bit arch after r303099
Reported by:	Larry Rosenman <ler at lerctr.org>
2016-07-20 18:11:22 +00:00
Conrad Meyer
c17b0bd2a6 Extend ELF coredump to support more than 65535 segments
The ELF e_phnum field is only 16 bits wide. To support more than 65535 segments
(program headers), Sun's "Linker and Libraries Guide" table 7-7 (or 12-7,
depending on document version) prescribes a special first section header where
sh_info represents the real number of program headers.

Test code to follow, when it is ready.

Reference:	http://docs.oracle.com/cd/E18752_01/pdf/817-1984.pdf

Reviewed by:	emaste, markj
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D7255
2016-07-20 16:59:36 +00:00
John Baldwin
ccb83afd81 Include process IDs in core dumps.
When threads were added to the kernel, the pr_pid member of the
NT_PRSTATUS note was repurposed to store LWP IDs instead of process
IDs.  However, the process ID was no longer recorded in core dumps.
This change adds a pr_pid field to prpsinfo (NT_PRSINFO).  Rather than
bumping the prpsinfo version number, note parsers can use the note's
payload size to determine if pr_pid is present.

Reviewed by:	kib, emaste (older version)
MFC after:	2 months
Differential Revision:	https://reviews.freebsd.org/D7117
2016-07-18 15:14:23 +00:00
John Baldwin
c77547d2f9 Include command line arguments in core dump process info.
Fill in pr_psargs in the NT_PRSINFO ELF core dump note with command
line arguments.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D7116
2016-07-14 23:20:05 +00:00
Ed Maste
1cbb879df8 add description for debug.elf{32,64}_legacy_coredump sysctl
Approved by:	re (kib)
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2016-07-05 14:46:06 +00:00
Ian Lepore
a66dc0c52b Include machine/acle-compat.h in cdefs.h on arm if the compiler doesn't
have ACLE support built in.  The ACLE (ARM C Language Extensions) defines
a set of standardized symbols which indicate the architecture version and
features available.  ACLE support is built in to modern compilers (both
clang and gcc), but absent from gcc prior to 4.4.

ARM (the company) provides the acle-compat.h header file to define the
right symbols for older versions of gcc.  Basically, acle-compat.h does
for arm about the same thing cdefs.h does for freebsd: defines
standardized macros that work no matter which compiler you use.  If ARM
hadn't provided this file we would have ended up with a big #ifdef __arm__
section in cdefs.h with our own compatibility shims.

Remove #include <machine/acle-compat.h> from the zillion other places (an
ever-growing list) that it appears.  Since style(9) requires sys/types.h
or sys/param.h early in the include list, and both of those lead to
including cdefs.h, only a couple special cases still need to include
acle-compat.h directly.

Loves it:     imp
2016-05-25 19:44:26 +00:00
Pedro F. Giffuni
d9c9c81c08 sys: use our roundup2/rounddown2() macros when param.h is available.
rounddown2 tends to produce longer lines than the original code
and when the code has a high indentation level it was not really
advantageous to do the replacement.

This tries to strike a balance between readability using the macros
and flexibility of having the expressions, so not everything is
converted.
2016-04-21 19:57:40 +00:00
Edward Tomasz Napierala
35030a5dd4 Remove some NULL checks for M_WAITOK allocations.
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-03-29 13:56:59 +00:00
Konstantin Belousov
af582aaed1 When matching brand to the ELF binary by notes, try to find a brand
with interpreter name exactly matching one wanted by the binary.  If
no such brand exists, return first brand which accepted the binary by
note.

The change fixes a regression after r292749, where e.g. our two ia32
compat brands, ia32_brand_info and ia32_brand_oinfo, only differ by
the interpeter path and binary matches to a brand by linkage order.
Then old binaries which require /usr/libexec/ld-elf.so.1 but matched
against ia32_brand_info with interp_path /libexec/ld-elf.so.1, were
considered requiring non-standard interpreter name, and magic to force
ld-elf32.so.1 did not happen.

Note that it might make sense to apply the same selection of brands
for other matching criteria, SCO EI_OSABI and 3.x string.

Reported and tested by:	dwmalone
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2016-02-04 20:55:49 +00:00
Konstantin Belousov
1899507706 Do not substitute interpeter if the brand interpreter path is
different from the interpreter path requested by the binary.

Before this change, it is impossible to activate non-default
interpreter for 32bit image on amd64, when /libexec/ld-elf32.so.1 file
exists.

Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-12-26 15:40:12 +00:00
Jonathan T. Looney
d3ee0a15a4 Only allow one PT_INTERP ELF program header. This also fixes a potential
memory leak for interp_buf.

Differential Revision:	https://reviews.freebsd.org/D4692
Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	Juniper Networks
2015-12-24 00:58:11 +00:00
Konstantin Belousov
d943fa35ad If we annoy user with the terminal output due to failed load of
interpreter, also show the actual error code instead of some
interpretation.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-12-22 20:12:52 +00:00
Ed Maste
4c22b4686b Replace magic value ELF note type with NT_FREEBSD_ABI_TAG
As of r291909 elf_common.h provides a definition.

Suggested by:	kib
Sponsored by:	The FreeBSD Foundation
2015-12-07 18:43:27 +00:00
Konstantin Belousov
4d22d07a07 Add support for usermode (vdso-like) gettimeofday(2) and
clock_gettime(2) on ARMv7 and ARMv8 systems which have architectural
generic timer hardware. It is similar how the RDTSC timer is used in
userspace on x86.

Fix a permission problem where generic timer access from EL0 (or
userspace on v7) was not properly initialized on APs.

For ARMv7, mark the stack non-executable. The shared page is added for
all arms (including ARMv8 64bit), and the signal trampoline code is
moved to the page.

Reviewed by:	andrew
Discussed with:	emaste, mmel
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D4209
2015-12-07 12:20:26 +00:00
Nathan Whitehorn
f19d421ac6 Missed header_supported call from r291020: make really, really sure the brand
likes the executable.
2015-12-01 17:00:31 +00:00
Nathan Whitehorn
686d2f317a Extend r270123 to run the brand info's header_supported() routine for
branded as well as unbranded binaries. This will be required to add
support for the new ELFv2 ABI on powerpc64, which is distinguished from
ELFv1 by the contents of the ELF header's flags field.

Reviewed by:	imp
MFC after:	2 weeks
2015-11-18 17:03:22 +00:00
Enji Cooper
9a12e28212 Define compress in __elfN(coredump) when #ifdef GZIO is true to mute
an -Wunused-but-set-variable warning

Reported by: FreeBSD_HEAD_amd64_gcc4.9 jenkins job
Sponsored by: EMC / Isilon Storage Division
2015-11-02 01:47:26 +00:00
Konstantin Belousov
6c775eb64e Allow PT_INTERP and PT_NOTES segments to be located anywhere in the
executable image.  Keep one page (arbitrary) limit on the max allowed
size of the PT_NOTES.

The ELF image activators still require that program headers of the
executable are fully contained in the first page of the image file.

Reviewed by:	emaste, jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D3871
2015-10-14 18:27:35 +00:00
Conrad Meyer
e6b95927f3 Fix core corruption caused by race in note_procstat_vmmap
This fix is spiritually similar to r287442 and was discovered thanks to
the KASSERT added in that revision.

NT_PROCSTAT_VMMAP output length, when packing kinfo structs, is tied to
the length of filenames corresponding to vnodes in the process' vm map
via vn_fullpath.  As vnodes may move during coredump, this is racy.

We do not remove the race, only prevent it from causing coredump
corruption.

- Add a sysctl, kern.coredump_pack_vmmapinfo, to allow users to disable
  kinfo packing for PROCSTAT_VMMAP notes.  This avoids VMMAP corruption
  and truncation, even if names change, at the cost of up to PATH_MAX
  bytes per mapped object.  The new sysctl is documented in core.5.

- Fix note_procstat_vmmap to self-limit in the second pass.  This
  addresses corruption, at the cost of sometimes producing a truncated
  result.

- Fix PROCSTAT_VMMAP consumers libutil (and libprocstat, via copy-paste)
  to grok the new zero padding.

Reported by:	pho (https://people.freebsd.org/~pho/stress/log/datamove4-2.txt)
Relnotes:	yes
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D3824
2015-10-06 18:07:00 +00:00
Conrad Meyer
bcb60d52e6 Follow-up to r287442: Move sysctl to compiled-once file
Avoid duplicate sysctl nodes.

Found by:	tijl
Approved by:	markj (mentor)
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D3586
2015-09-07 16:44:28 +00:00
Conrad Meyer
14bdbaf2e4 Detect badly behaved coredump note helpers
Coredump notes depend on being able to invoke dump routines twice; once
in a dry-run mode to get the size of the note, and another to actually
emit the note to the corefile.

When a note helper emits a different length section the second time
around than the length it requested the first time, the kernel produces
a corrupt coredump.

NT_PROCSTAT_FILES output length, when packing kinfo structs, is tied to
the length of filenames corresponding to vnodes in the process' fd table
via vn_fullpath.  As vnodes may move around during dump, this is racy.

So:

 - Detect badly behaved notes in putnote() and pad underfilled notes.

 - Add a fail point, debug.fail_point.fill_kinfo_vnode__random_path to
   exercise the NT_PROCSTAT_FILES corruption.  It simply picks random
   lengths to expand or truncate paths to in fo_fill_kinfo_vnode().

 - Add a sysctl, kern.coredump_pack_fileinfo, to allow users to
   disable kinfo packing for PROCSTAT_FILES notes.  This should avoid
   both FILES note corruption and truncation, even if filenames change,
   at the cost of about 1 kiB in padding bloat per open fd.  Document
   the new sysctl in core.5.

 - Fix note_procstat_files to self-limit in the 2nd pass.  Since
   sometimes this will result in a short write, pad up to our advertised
   size.  This addresses note corruption, at the risk of sometimes
   truncating the last several fd info entries.

 - Fix NT_PROCSTAT_FILES consumers libutil and libprocstat to grok the
   zero padding.

With suggestions from:	bjk, jhb, kib, wblock
Approved by:	markj (mentor)
Relnotes:	yes
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D3548
2015-09-03 20:32:10 +00:00
Mark Johnston
02d131ad11 Fix some error-handling bugs when core dump compression is enabled:
- Ensure that core dump parameters are initialized in the error path.
- Don't call gzio_fini() on a NULL stream.

Reported by:	rpaulo
2015-07-14 18:24:05 +00:00
Mateusz Guzik
f6f6d24062 Implement lockless resource limits.
Use the same scheme implemented to manage credentials.

Code needing to look at process's credentials (as opposed to thred's) is
provided with *_proc variants of relevant functions.

Places which possibly had to take the proc lock anyway still use the proc
pointer to access limits.
2015-06-10 10:48:12 +00:00
Ed Maste
6b16d66497 Add user facing errors for exceeding process memory limits
Previously the process terminating with SIGABRT at startup was the
only notification.

PR:		200617
Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D2731
2015-06-08 16:07:07 +00:00
Warner Losh
ee960398a0 Fix typo in symbol name. It helps to hit save in all your buffers
before committing.
2015-05-22 21:10:14 +00:00
Warner Losh
d36eec691a Export the eflags field from the elf header. This allows better
discrimination between different subarch binaries, at least for mips
and arm. Arm is implemented, mips is still tbd, so not currently
exported. aarch64 does not export this because aarch64 binaries use
different tags and flags than arm.

Differential Revision: https://reviews.freebsd.org/D2611
2015-05-22 20:50:35 +00:00
Edward Tomasz Napierala
4b5c9cf62f Add kern.racct.enable tunable and RACCT_DISABLED config option.
The point of this is to be able to add RACCT (with RACCT_DISABLED)
to GENERIC, to avoid having to rebuild the kernel to use rctl(8).

Differential Revision:	https://reviews.freebsd.org/D2369
Reviewed by:	kib@
MFC after:	1 month
Relnotes:	yes
Sponsored by:	The FreeBSD Foundation
2015-04-29 10:23:02 +00:00
Konstantin Belousov
316b384343 Implement support for binary to requesting specific stack size for the
initial thread.  It is read by the ELF image activator as the virtual
size of the PT_GNU_STACK program header entry, and can be specified by
the linker option -z stack-size in newer binutils.

The soft RLIMIT_STACK is auto-increased if possible, to satisfy the
binary' request.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-04-15 08:13:53 +00:00
Mark Johnston
aa14e9b7c9 Reimplement support for userland core dump compression using a new interface
in kern_gzio.c. The old gzio interface was somewhat inflexible and has not
worked properly since r272535: currently, the gzio functions are called with
a range lock held on the output vnode, but kern_gzio.c does not pass the
IO_RANGELOCKED flag to vn_rdwr() calls, resulting in deadlock when vn_rdwr()
attempts to reacquire the range lock. Moreover, the new gzio interface can
be used to implement kernel core compression.

This change also modifies the kernel configuration options needed to enable
userland core dump compression support: gzio is now an option rather than a
device, and the COMPRESS_USER_CORES option is removed. Core dump compression
is enabled using the kern.compress_user_cores sysctl/tunable.

Differential Revision:	https://reviews.freebsd.org/D1832
Reviewed by:	rpaulo
Discussed with:	kib
2015-03-09 03:50:53 +00:00
Ian Lepore
b96bd95b85 Allow the kern.osrelease and kern.osreldate sysctl values to be set in a
jail's creation parameters.  This allows the kernel version to be reliably
spoofed within the jail whether examined directly with sysctl or
indirectly with the uname -r and -K options.

The values can only be set at jail creation time, to eliminate the need
for any locking when accessing the values via sysctl.

The overridden values are inherited by nested jails (unless the config for
the nested jails also overrides the values).

There is no sanity or range checking, other than disallowing an empty
release string or a zero release date, by design.  The system
administrator is trusted to set sane values.  Setting values that are
newer than the actual running kernel will likely cause compatibility
problems.

Differential Revision:	https://reviews.freebsd.org/D1948
Relnotes:	yes
2015-02-27 16:28:55 +00:00
John Baldwin
bc411bc2d0 Include OBJT_PHYS VM objects in ELF core dumps. In particular this
includes the shared page allowing debuggers to use the signal trampoline
code to identify signal frames in core dumps.

Differential Revision:	https://reviews.freebsd.org/D1828
Reviewed by:	alc, kib
MFC after:	1 week
2015-02-14 17:12:31 +00:00
Konstantin Belousov
64779280c9 The size value should be asserted when it is known.
Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
2014-11-22 18:15:02 +00:00
John Baldwin
180e57e5c7 Improve support for XSAVE with debuggers.
- Dump an NT_X86_XSTATE note if XSAVE is in use. This note is designed
  to match what Linux does in that 1) it dumps the entire XSAVE area
  including the fxsave state, and 2) it stashes a copy of the current
  xsave mask in the unused padding between the fxsave state and the
  xstate header at the same location used by Linux.
- Teach readelf() to recognize NT_X86_XSTATE notes.
- Change PT_GET/SETXSTATE to take the entire XSAVE state instead of
  only the extra portion. This avoids having to always make two
  ptrace() calls to get or set the full XSAVE state.
- Add a PT_GET_XSTATE_INFO which returns the length of the current
  XSTATE save area (so the size of the buffer needed for PT_GETXSTATE)
  and the current XSAVE mask (%xcr0).

Differential Revision:	https://reviews.freebsd.org/D1193
Reviewed by:	kib
MFC after:	2 weeks
2014-11-21 20:53:17 +00:00
Konstantin Belousov
539c9eef12 Fixes for i/o during coredumping:
- Do not dump into system files.
- Do not acquire write reference to the mount point where img.core is
  written, in the coredump().  The vn_rdwr() calls from ELF imgact
  request the write ref from vn_rdwr().  Recursive acqusition of the
  write ref deadlocks with the unmount.
- Instead, take the range lock for the whole core file.  This prevents
  parallel dumping from two processes executing the same image,
  converting the useless interleaved dump into sequential dumping,
  with second core overwriting the first.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-10-04 18:35:00 +00:00