249 Commits

Author SHA1 Message Date
Edward Tomasz Napierala
72f7ddb587 linux: implement rt_sigsuspend(2) on arm64
... by making it architecture-independent.

Reviewed By:	dchagin
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D31259
2021-07-23 20:13:00 +00:00
Dmitry Chagin
cf8d74e3fe linux(4): Allow musl brand to use FUTEX_REQUEUE op.
Initial patch from submitter was adapted by me to prevent unconditional
FUTEX_REQUEUE use.

PR:			255947
Submitted by:		Philippe Michaud-Boudreault
Differential Revision:	https://reviews.freebsd.org/D30332
2021-07-20 14:39:20 +03:00
Dmitry Chagin
1ca6b15bbd Drop "All rights reserved" from my copyright statements.
Add email and fixup years while here.

Reviewed by:		imp
Differential Revision:	https://reviews.freebsd.org/D30912
MFC after:		2 weeks
2021-07-20 10:05:50 +03:00
Dmitry Chagin
ae8330b448 linux(4): Add arch name to the some printfs.
Reviewed by:		emaste
Differential revision:	https://reviews.freebsd.org/D30904
MFC after:		2 weeks
2021-07-20 10:05:08 +03:00
Dmitry Chagin
09cffde975 linux(4): Fixup the vDSO initialization order.
The vDSO initialisation order should be as follows:
- native abi init via exec_sysvec_init();
- vDSO symbols queued to the linux_vdso_syms list;
- linux_vdso_install();
- linux_exec_sysvec_init();

As the exec_sysvec_init() called with SI_ORDER_ANY (last) at SI_SUB_EXEC
order, move linux_vdso_install() and linux_exec_sysvec_init() to the
SI_SUB_EXEC+1 order.

Reviewed by:		trasz
Differential Revision:	https://reviews.freebsd.org/D30902
MFC after		2 weeks
2021-07-20 10:02:34 +03:00
Dmitry Chagin
a543556c81 linux(4): Constify vdso install/deinstall.
In order to reduce diff between arches constify vdso install/deinstall
functions like arm64.

Reviewed by:		emaste
Differential revision:	https://reviews.freebsd.org/D30901
MFC after:		2 weeks
2021-07-20 10:01:47 +03:00
Dmitry Chagin
9931033bbf linux(4); Almost complete the vDSO.
The vDSO (virtual dynamic shared object) is a small shared library that the
kernel maps R/O into the address space of all Linux processes on image
activation. The vDSO is a fully formed ELF image, shared by all processes
with the same ABI, has no process private data.

The primary purpose of the vDSO:
- non-executable stack, signal trampolines not copied to the stack;
- signal trampolines unwind, mandatory for the NPTL;
- to avoid contex-switch overhead frequently used system calls can be
  implemented in the vDSO: for now gettimeofday, clock_gettime.

The first two have been implemented, so add the implementation of system
calls.

System calls implemenation based on a native timekeeping code with some
limitations:
- ifunc can't be used, as vDSO r/o mapped to the process VA and rtld
  can't relocate symbols;
- reading HPET memory is not implemented for now (TODO).

In case on any error vDSO system calls fallback to the kernel system
calls. For unimplemented vDSO system calls added prototypes which call
corresponding kernel system call.

Tested by:		trasz (arm64)
Differential revision:  https://reviews.freebsd.org/D30900
MFC after:              2 weeks
2021-07-20 10:01:18 +03:00
Dmitry Chagin
5fd9cd53d2 linux(4): Modify sv_onexec hook to return an error.
Temporary add stubs to the Linux emulation layer which calls the existing hook.

Reviewed by:		kib
Differential Revision:	https://reviews.freebsd.org/D30911
MFC after:		2 weeks
2021-07-20 09:56:25 +03:00
Dmitry Chagin
815165be20 linux(4): Remove function prototypes from the vDSO.
In preparation for vDSO code revision get rid of incomplete vDSO methods
from locore, but leave .note.Linux section commented out.
.note.Linux section is used by glibc rtld to get the kernel version, that
saves one system call call. I'll try to implement it later, if figure out
how to use it with jails.

MFC after:	2 weeks
2021-07-20 09:52:08 +03:00
David Chisnall
cf98bc28d3 Pass the syscall number to capsicum permission-denied signals
The syscall number is stored in the same register as the syscall return
on amd64 (and possibly other architectures) and so it is impossible to
recover in the signal handler after the call has returned.  This small
tweak delivers it in the `si_value` field of the signal, which is
sufficient to catch capability violations and emulate them with a call
to a more-privileged process in the signal handler.

This reapplies 3a522ba1bc852c3d4660a4fa32e4a94999d09a47 with a fix for
the static assertion failure on i386.

Approved by:	markj (mentor)

Reviewed by:	kib, bcr (manpages)

Differential Revision: https://reviews.freebsd.org/D29185
2021-07-16 18:06:44 +01:00
David Chisnall
d2b558281a Revert "Pass the syscall number to capsicum permission-denied signals"
This broke the i386 build.

This reverts commit 3a522ba1bc852c3d4660a4fa32e4a94999d09a47.
2021-07-10 20:26:01 +01:00
David Chisnall
3a522ba1bc Pass the syscall number to capsicum permission-denied signals
The syscall number is stored in the same register as the syscall return
on amd64 (and possibly other architectures) and so it is impossible to
recover in the signal handler after the call has returned.  This small
tweak delivers it in the `si_value` field of the signal, which is
sufficient to catch capability violations and emulate them with a call
to a more-privileged process in the signal handler.

Approved by:	markj (mentor)

Reviewed by:	kib, bcr (manpages)

Differential Revision: https://reviews.freebsd.org/D29185
2021-07-10 17:19:52 +01:00
Edward Tomasz Napierala
447636e43c linux(4): implement coredump support
Implement dumping core for Linux binaries on amd64, for both
32- and 64-bit executables.  Some bits are still missing.

This is based on a prototype by chuck@.

Reviewed By:	kib
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D30019
2021-06-30 22:45:06 +01:00
Edward Tomasz Napierala
435754a59e Add infrastructure required for Linux coredump support
This adds `sv_elf_core_osabi`, `sv_elf_core_abi_vendor`,
and `sv_elf_core_prepare_notes` fields to `struct sysentvec`,
and modifies imgact_elf.c to make use of them instead
of hardcoding FreeBSD-specific values.  It also updates all
of the ABI definitions to preserve current behaviour.

This makes it possible to implement non-native ELF coredump
support without unnecessary code duplication.  It will be used
for Linux coredumps.

Reviewed By:	kib
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D30921
2021-06-29 08:49:12 +01:00
Dmitry Chagin
c1da89fec2 linux(4): Retire linux_kplatform.
Assuming we can't run on i486, i586 class cpu, retire linux_kplatform var
and use hardcoded 'machine' value in linux_newuname().

I have added linux_kplatform for consistency with linux_platform which is
placed in to vdso to avoid excess copyout it on stack for AT_PLATFORM at
exec time.

This is the first stage of Linuxulator's vdso revision.

Reviewed by:		trasz, imp
Differential Revision:	https://reviews.freebsd.org/D30774
MFC after:		2 weeks
2021-06-22 08:36:21 +03:00
Edward Tomasz Napierala
135dd0cab5 linux: reduce differences between rt_sendsig() and sendsig()
This makes it easier to compare the two.  This involves moving
the mutex slightly lower down, but there should be no functional
changes.

Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D30541
2021-06-21 17:51:56 +01:00
Dmitry Chagin
8fe8bb7cb5 linux(4): Regen for linux_poll system call.
MFC after:	2 weeks
2021-06-22 08:09:55 +03:00
Dmitry Chagin
2eff670fde linux(4): Implement poll system call via linux_common_ppol()
for the sake of converting events to/from native.

MFC after:	2 weeks
2021-06-22 08:07:46 +03:00
Dmitry Chagin
26795a0378 linux(4): Rework Linux ppoll system call.
For now the Linux emulation layer uses in kernel ppoll(2) without
conversion of user supplied fd 'events', and does not convert the
kernel supplied fd 'revents'.

At least POLLRDHUP is handled by FreeBSD differently than by
Linux. Seems that Linux silencly ignores POLLRDHUP on non socket fd's
unlike FreeBSD, which does more strictly check and fails.

Rework the Linux ppoll, using kern_poll and converting 'events'
and 'revents' values.
While here, move poll events defines to the MI part of code as they
mostly identical on all arches except arm.

Differential Revision:	https://reviews.freebsd.org/D30716
MFC after:		2 weeks
2021-06-22 08:06:05 +03:00
Konstantin Belousov
870e197d52 Add quirks for Linux ABI signals handling
Require queueing of the signals with default action, and disable
dequeueing SIGCHLD on wait for live process.

Reported and tested by:	dchagin
Reviewed by:	dchagin, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30675
2021-06-16 02:01:35 +03:00
Dmitry Chagin
ee64d98204 linux(4): Regen for futex system call.
MFC after:	2 weeks
2021-06-10 14:16:40 +03:00
Dmitry Chagin
3c1de151e3 linux(4): Change Linux futex syscall definition to match Linux actual one.
MFC after:	2 weeks
2021-06-10 14:00:00 +03:00
Edward Tomasz Napierala
f102b61d0e linux: make sure to zero the l_siginfo structure for ptrace(2)
Reported By:	dchagin
Sponsored By:	EPSRC
2021-06-08 10:18:29 +01:00
Konstantin Belousov
598f6fb49c linuxolator: Add compat.linux.setid_allowed knob
PR:	21463
Reported by:	kris
Reviewed by:	dchagin
Tested by:	trasz
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D28154
2021-06-06 21:43:00 +03:00
Dmitry Chagin
f4e801085b linux(4): optimize ksiginfo to siginfo conversion.
Retire ksiginfo_to_lsiginfo function, use siginfo_to_lsiginfo instead.
Convert rt_sigtimedwait siginfo variables to well known names.

MFC after:	2 weeks
2021-06-07 06:06:17 +03:00
Dmitry Chagin
e29ea22f70 Regen for ('0f8dab45404f347752470579feccc6d2739b9570') Linux
rt_sigtimedwait system call.

MFC after:	2 weeks
2021-06-07 05:39:29 +03:00
Dmitry Chagin
0f8dab4540 linux(4): Fix timeout parameter of rt_sigtimedwait syscall, which is
timespec not a timeval.

MFC after:	2 weeks
2021-06-07 05:35:35 +03:00
Dmitry Chagin
19593f775c linux(4); Retire unnecessary __packed attribute from some struct's
definition.

Differential Revision:	https://reviews.freebsd.org/D30482
MFC after:		2 weeks
2021-05-31 21:56:34 +03:00
Edward Tomasz Napierala
c0f171736a Regen after 6d926e850d2.
Sponsored By:	EPSRC
2021-05-28 09:04:17 +01:00
Edward Tomasz Napierala
6d926e850d linux: add new syscall numbers
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D30193
2021-05-28 09:02:16 +01:00
Konstantin Belousov
a59f028537 amd64/linux*: add required header to get the constant value
Otherwise asm silently interpret it as the external global symbol.

Reported by:	bz
Sponsored by:	The FreeBSD Foundation
Fixes:	91aae953cb80
2021-05-26 01:24:09 +03:00
Konstantin Belousov
91aae953cb amd64: clear PSL.AC in the right frame
If copyin family of routines fault, kernel does clear PSL.AC on the
fault entry, but the AC flag of the faulted frame is kept intact.  Since
onfault handler is effectively jump, AC survives until syscall exit.

Reported by:	m00nbsd, via Sony
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
admbugs:	975
2021-05-25 18:20:46 +03:00
Edward Tomasz Napierala
95c19e1d65 linux: refactor bsd_to_linux_regset() out of linux_ptrace.c
This will be used for Linux coredump support.

Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D30365
2021-05-21 07:26:07 +01:00
Mark Johnston
fb58045145 linux: Fix SMAP-enabled futex routines
Some of them were dereferencing the user pointer before disabling SMAP.

PR:		255591
Reviewed by:	kib
Tested by:	pitwuu@gmail.com
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D30276
2021-05-16 13:42:08 -04:00
Ed Maste
2c9764f36b regen syscall files after d51198d63b63 2021-05-13 14:09:58 -04:00
Edward Tomasz Napierala
916f3dba45 linux(4): make arch_prctl(2) support GET_CET_STATUS, report unknown codes
This is largely a no-op, to make future debugging slightly easier.

Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D30035
2021-05-06 09:33:42 +01:00
Edward Tomasz Napierala
023bff7990 linux(4): fix ptrace(2) to properly handle orig_rax
This fixes strace(1) erroneously reporting return values
as "Function not implemented", combined with reporting the binary
ABI as X32.

Very similar code in linux_ptrace_getregs() is left as it is - it's
probably wrong too, but I don't have a way to test it.

Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D29927
2021-05-04 15:21:06 +01:00
Edward Tomasz Napierala
77651151f3 linux: make ptrace(2) return EIO when trying to peek invalid address
Previously we've returned the error from native ptrace(2), ENOMEM.
This confused Linux strace(2).

Reviewed By:	emaste
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D29925
2021-04-24 11:37:50 +01:00
Edward Tomasz Napierala
ca6e1fa3ce linux: adjust ordering of Linux auxv and add dummy AT_HWCAP2
This should be a no-op; the purpose of this is to reduce
a spurious difference between Linuxulator and Linux, to make
debugging core dumps slightly easier.

Note that AT_HWCAP2 we pass to Linux binaries is always 0,
instead of being equal to 'cpu_feature2'.  This matches what
I've observed under Ubuntu Focal VM.

Reviewed By:	chuck, dchagin
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D29609
2021-04-13 13:14:30 +01:00
Konstantin Belousov
2f15884747 amd64 linux64: use x86_clear_dbregs()
instead of manually inlining it

Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29687
2021-04-10 04:25:02 +03:00
Konstantin Belousov
aa3ea612be x86: remove gcov kernel support
Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D29529
2021-04-02 15:41:51 +03:00
John Baldwin
3b57ddb029 Rename linux_set_upcall_kse() to linux_set_upcall().
This matches the rename of cpu_set_upcall_kse() in
5c2cf818454375536fda522ba83cf67c50929e6b.

Reviewed by:	kib, emaste
MFC after:	1 week
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D29295
2021-03-18 12:14:34 -07:00
John Baldwin
704547ce1c amd64: Cleanups to setting TLS registers for Linux binaries.
- Use update_pcb_bases() when updating FS or GS base addresses to
  permit use of FSBASE and GSBASE in Linux processes.  This also sets
  PCB_FULL_IRET.  linux32 was setting PCB_32BIT which should be a
  no-op (exec sets it).

- Remove write-only variables to construct unused segment descriptors
  for linux32.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D29026
2021-03-12 09:47:31 -08:00
Mark Johnston
0fc8a79672 linux: Unmap the VDSO page when unloading
linux_shared_page_init() creates an object and grabs and maps a single
page to back the VDSO.  When destroying the VDSO object, we failed to
destroy the mapping and free KVA.  Fix this.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D28696
2021-02-16 09:40:02 -05:00
Edward Tomasz Napierala
cc743b050a linux: drop unneeded casts
No functional changes.

Sponsored By:	The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28533
2021-02-15 13:14:15 +00:00
Edward Tomasz Napierala
c987d6a677 linux: map EBUSY returned by ptrace into ESRCH
The ptrace(2) Linux man page claims the syscall returns ESRCH,
if the tracee is not stopped; the native ptrace(2) returns EBUSY.

Sponsored by:	The FreeBSD Foundation
2021-01-19 11:21:55 +00:00
Edward Tomasz Napierala
47f7345bab linux: fix PTRACE_POKEDATA and PTRACE_POKETEXT.
Sponsored by:	The FreeBSD Foundation
2021-01-19 10:30:55 +00:00
Konstantin Belousov
4815f175d0 Linuxolator: Replace use of eventhandlers by sysent hooks.
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D27309
2020-11-23 18:18:16 +00:00
Conrad Meyer
77eb984147 'make sysent' for r367773
X-MFC-With:	r367773
2020-11-17 19:53:59 +00:00
Conrad Meyer
de774e422e linux(4): Implement name_to_handle_at(), open_by_handle_at()
They are similar to our getfhat(2) and fhopen(2) syscalls.

Differential Revision:	https://reviews.freebsd.org/D27111
2020-11-17 19:51:47 +00:00