interpreter exactly matches the one requested by the activated image.
This change applies r295277, which did the same for note branding, to
the old brand selection, with the same reasoning of fixing compat32
interpreter substitution.
PR: 211837
Reported by: kenji@kens.fm
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
elf_load_section.
The values passed currently as vm_offset_t are phdr.p_offset, which
have the native Elf word size. Since elf_load_section interprets them
as the file offset, use vm object offset type.
Noted and reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
This KPI explicitely indicates the intent of creating the mapping at
the fixed address, and incorporates the map locking into the callee.
Suggested and reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Elf_map_insert() needs to create mapping at the known fixed address.
Usage of vm_map_find() assumes, on the other hand, that any suitable
address space range above or equal the specified hint, is acceptable.
Due to operating on the fresh or cleared address space, vm_map_find()
usually creates mapping starting exactly at hint.
Switch to vm_map_insert() use to clearly request fixed mapping from
the VM.
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
vm_map_insert() failure, drop the vnode lock around the call to
vm_object_deallocate().
Since the deallocated object is the vm object of the vnode, we might
get the vnode lock recursion there. In fact, it is almost impossible
to make vm_map_insert() failing there on stock kernel.
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
In the kernel, cache the machine and flags fields from ELF header to use in
the ELF header of a core dump. For gcore, the copy these fields over from
the ELF header in the binary.
This matters for platforms which encode ABI information in the flags field
(such as o32 vs n32 on MIPS).
Reviewed by: kib
Sponsored by: DARPA / AFRL
Differential Revision: https://reviews.freebsd.org/D9392
It's possible to get EFAULT when writing a segment backed by a file
if the segment extends beyond the file.
The core dump could still be useful if we skip the rest of the segment
and proceed to other segements.
The skipped segment (or a portion of it) will be zero-filled.
While there, use 'const' to signify that core_write() only reads the
buffer and use __DECONST before calling vn_rdwr_inchunks() because it
can be used for both reading and writing.
Before the change:
kernel: Failed to write core file for process mmap_trunc_core (error 14)
kernel: pid 77718 (mmap_trunc_core), uid 1001: exited on signal 6
After the change:
kernel: Failed to fully fault in a core file segment at VA 0x800645000 with size 0x4000 to be written at offset 0x29000 for process mmap_trunc_core
kernel: pid 4901 (mmap_trunc_core), uid 1001: exited on signal 6 (core dumped)
Reviewed by: julian, kib
Obtained from: Panzura (older version of the change)
MFC after: 5 days
Sponsored by: Panzura
Differential Revision: https://reviews.freebsd.org/D9233
The each_writable_segment routine evaluates segments on a slightly little more
nuanced metric than simply "writable" or not. Rename the function to more
closely match its behavior (each_dumpable_segment).
Suggested by: jhb
Sponsored by: EMC / Isilon Storage Division
The ELF e_phnum field is only 16 bits wide. To support more than 65535 segments
(program headers), Sun's "Linker and Libraries Guide" table 7-7 (or 12-7,
depending on document version) prescribes a special first section header where
sh_info represents the real number of program headers.
Test code to follow, when it is ready.
Reference: http://docs.oracle.com/cd/E18752_01/pdf/817-1984.pdf
Reviewed by: emaste, markj
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D7255
When threads were added to the kernel, the pr_pid member of the
NT_PRSTATUS note was repurposed to store LWP IDs instead of process
IDs. However, the process ID was no longer recorded in core dumps.
This change adds a pr_pid field to prpsinfo (NT_PRSINFO). Rather than
bumping the prpsinfo version number, note parsers can use the note's
payload size to determine if pr_pid is present.
Reviewed by: kib, emaste (older version)
MFC after: 2 months
Differential Revision: https://reviews.freebsd.org/D7117
Fill in pr_psargs in the NT_PRSINFO ELF core dump note with command
line arguments.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D7116
have ACLE support built in. The ACLE (ARM C Language Extensions) defines
a set of standardized symbols which indicate the architecture version and
features available. ACLE support is built in to modern compilers (both
clang and gcc), but absent from gcc prior to 4.4.
ARM (the company) provides the acle-compat.h header file to define the
right symbols for older versions of gcc. Basically, acle-compat.h does
for arm about the same thing cdefs.h does for freebsd: defines
standardized macros that work no matter which compiler you use. If ARM
hadn't provided this file we would have ended up with a big #ifdef __arm__
section in cdefs.h with our own compatibility shims.
Remove #include <machine/acle-compat.h> from the zillion other places (an
ever-growing list) that it appears. Since style(9) requires sys/types.h
or sys/param.h early in the include list, and both of those lead to
including cdefs.h, only a couple special cases still need to include
acle-compat.h directly.
Loves it: imp
rounddown2 tends to produce longer lines than the original code
and when the code has a high indentation level it was not really
advantageous to do the replacement.
This tries to strike a balance between readability using the macros
and flexibility of having the expressions, so not everything is
converted.
with interpreter name exactly matching one wanted by the binary. If
no such brand exists, return first brand which accepted the binary by
note.
The change fixes a regression after r292749, where e.g. our two ia32
compat brands, ia32_brand_info and ia32_brand_oinfo, only differ by
the interpeter path and binary matches to a brand by linkage order.
Then old binaries which require /usr/libexec/ld-elf.so.1 but matched
against ia32_brand_info with interp_path /libexec/ld-elf.so.1, were
considered requiring non-standard interpreter name, and magic to force
ld-elf32.so.1 did not happen.
Note that it might make sense to apply the same selection of brands
for other matching criteria, SCO EI_OSABI and 3.x string.
Reported and tested by: dwmalone
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
different from the interpreter path requested by the binary.
Before this change, it is impossible to activate non-default
interpreter for 32bit image on amd64, when /libexec/ld-elf32.so.1 file
exists.
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
clock_gettime(2) on ARMv7 and ARMv8 systems which have architectural
generic timer hardware. It is similar how the RDTSC timer is used in
userspace on x86.
Fix a permission problem where generic timer access from EL0 (or
userspace on v7) was not properly initialized on APs.
For ARMv7, mark the stack non-executable. The shared page is added for
all arms (including ARMv8 64bit), and the signal trampoline code is
moved to the page.
Reviewed by: andrew
Discussed with: emaste, mmel
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D4209
branded as well as unbranded binaries. This will be required to add
support for the new ELFv2 ABI on powerpc64, which is distinguished from
ELFv1 by the contents of the ELF header's flags field.
Reviewed by: imp
MFC after: 2 weeks
executable image. Keep one page (arbitrary) limit on the max allowed
size of the PT_NOTES.
The ELF image activators still require that program headers of the
executable are fully contained in the first page of the image file.
Reviewed by: emaste, jhb
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D3871
This fix is spiritually similar to r287442 and was discovered thanks to
the KASSERT added in that revision.
NT_PROCSTAT_VMMAP output length, when packing kinfo structs, is tied to
the length of filenames corresponding to vnodes in the process' vm map
via vn_fullpath. As vnodes may move during coredump, this is racy.
We do not remove the race, only prevent it from causing coredump
corruption.
- Add a sysctl, kern.coredump_pack_vmmapinfo, to allow users to disable
kinfo packing for PROCSTAT_VMMAP notes. This avoids VMMAP corruption
and truncation, even if names change, at the cost of up to PATH_MAX
bytes per mapped object. The new sysctl is documented in core.5.
- Fix note_procstat_vmmap to self-limit in the second pass. This
addresses corruption, at the cost of sometimes producing a truncated
result.
- Fix PROCSTAT_VMMAP consumers libutil (and libprocstat, via copy-paste)
to grok the new zero padding.
Reported by: pho (https://people.freebsd.org/~pho/stress/log/datamove4-2.txt)
Relnotes: yes
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D3824
Coredump notes depend on being able to invoke dump routines twice; once
in a dry-run mode to get the size of the note, and another to actually
emit the note to the corefile.
When a note helper emits a different length section the second time
around than the length it requested the first time, the kernel produces
a corrupt coredump.
NT_PROCSTAT_FILES output length, when packing kinfo structs, is tied to
the length of filenames corresponding to vnodes in the process' fd table
via vn_fullpath. As vnodes may move around during dump, this is racy.
So:
- Detect badly behaved notes in putnote() and pad underfilled notes.
- Add a fail point, debug.fail_point.fill_kinfo_vnode__random_path to
exercise the NT_PROCSTAT_FILES corruption. It simply picks random
lengths to expand or truncate paths to in fo_fill_kinfo_vnode().
- Add a sysctl, kern.coredump_pack_fileinfo, to allow users to
disable kinfo packing for PROCSTAT_FILES notes. This should avoid
both FILES note corruption and truncation, even if filenames change,
at the cost of about 1 kiB in padding bloat per open fd. Document
the new sysctl in core.5.
- Fix note_procstat_files to self-limit in the 2nd pass. Since
sometimes this will result in a short write, pad up to our advertised
size. This addresses note corruption, at the risk of sometimes
truncating the last several fd info entries.
- Fix NT_PROCSTAT_FILES consumers libutil and libprocstat to grok the
zero padding.
With suggestions from: bjk, jhb, kib, wblock
Approved by: markj (mentor)
Relnotes: yes
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D3548
Use the same scheme implemented to manage credentials.
Code needing to look at process's credentials (as opposed to thred's) is
provided with *_proc variants of relevant functions.
Places which possibly had to take the proc lock anyway still use the proc
pointer to access limits.
Previously the process terminating with SIGABRT at startup was the
only notification.
PR: 200617
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D2731
discrimination between different subarch binaries, at least for mips
and arm. Arm is implemented, mips is still tbd, so not currently
exported. aarch64 does not export this because aarch64 binaries use
different tags and flags than arm.
Differential Revision: https://reviews.freebsd.org/D2611
The point of this is to be able to add RACCT (with RACCT_DISABLED)
to GENERIC, to avoid having to rebuild the kernel to use rctl(8).
Differential Revision: https://reviews.freebsd.org/D2369
Reviewed by: kib@
MFC after: 1 month
Relnotes: yes
Sponsored by: The FreeBSD Foundation
initial thread. It is read by the ELF image activator as the virtual
size of the PT_GNU_STACK program header entry, and can be specified by
the linker option -z stack-size in newer binutils.
The soft RLIMIT_STACK is auto-increased if possible, to satisfy the
binary' request.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
in kern_gzio.c. The old gzio interface was somewhat inflexible and has not
worked properly since r272535: currently, the gzio functions are called with
a range lock held on the output vnode, but kern_gzio.c does not pass the
IO_RANGELOCKED flag to vn_rdwr() calls, resulting in deadlock when vn_rdwr()
attempts to reacquire the range lock. Moreover, the new gzio interface can
be used to implement kernel core compression.
This change also modifies the kernel configuration options needed to enable
userland core dump compression support: gzio is now an option rather than a
device, and the COMPRESS_USER_CORES option is removed. Core dump compression
is enabled using the kern.compress_user_cores sysctl/tunable.
Differential Revision: https://reviews.freebsd.org/D1832
Reviewed by: rpaulo
Discussed with: kib
jail's creation parameters. This allows the kernel version to be reliably
spoofed within the jail whether examined directly with sysctl or
indirectly with the uname -r and -K options.
The values can only be set at jail creation time, to eliminate the need
for any locking when accessing the values via sysctl.
The overridden values are inherited by nested jails (unless the config for
the nested jails also overrides the values).
There is no sanity or range checking, other than disallowing an empty
release string or a zero release date, by design. The system
administrator is trusted to set sane values. Setting values that are
newer than the actual running kernel will likely cause compatibility
problems.
Differential Revision: https://reviews.freebsd.org/D1948
Relnotes: yes
includes the shared page allowing debuggers to use the signal trampoline
code to identify signal frames in core dumps.
Differential Revision: https://reviews.freebsd.org/D1828
Reviewed by: alc, kib
MFC after: 1 week
- Dump an NT_X86_XSTATE note if XSAVE is in use. This note is designed
to match what Linux does in that 1) it dumps the entire XSAVE area
including the fxsave state, and 2) it stashes a copy of the current
xsave mask in the unused padding between the fxsave state and the
xstate header at the same location used by Linux.
- Teach readelf() to recognize NT_X86_XSTATE notes.
- Change PT_GET/SETXSTATE to take the entire XSAVE state instead of
only the extra portion. This avoids having to always make two
ptrace() calls to get or set the full XSAVE state.
- Add a PT_GET_XSTATE_INFO which returns the length of the current
XSTATE save area (so the size of the buffer needed for PT_GETXSTATE)
and the current XSAVE mask (%xcr0).
Differential Revision: https://reviews.freebsd.org/D1193
Reviewed by: kib
MFC after: 2 weeks