The vDSO initialisation order should be as follows:
- native abi init via exec_sysvec_init();
- vDSO symbols queued to the linux_vdso_syms list;
- linux_vdso_install();
- linux_exec_sysvec_init();
As the exec_sysvec_init() called with SI_ORDER_ANY (last) at SI_SUB_EXEC
order, move linux_vdso_install() and linux_exec_sysvec_init() to the
SI_SUB_EXEC+1 order.
Reviewed by: trasz
Differential Revision: https://reviews.freebsd.org/D30902
MFC after 2 weeks
The vDSO (virtual dynamic shared object) is a small shared library that the
kernel maps R/O into the address space of all Linux processes on image
activation. The vDSO is a fully formed ELF image, shared by all processes
with the same ABI, has no process private data.
The primary purpose of the vDSO:
- non-executable stack, signal trampolines not copied to the stack;
- signal trampolines unwind, mandatory for the NPTL;
- to avoid contex-switch overhead frequently used system calls can be
implemented in the vDSO: for now gettimeofday, clock_gettime.
The first two have been implemented, so add the implementation of system
calls.
System calls implemenation based on a native timekeeping code with some
limitations:
- ifunc can't be used, as vDSO r/o mapped to the process VA and rtld
can't relocate symbols;
- reading HPET memory is not implemented for now (TODO).
In case on any error vDSO system calls fallback to the kernel system
calls. For unimplemented vDSO system calls added prototypes which call
corresponding kernel system call.
Tested by: trasz (arm64)
Differential revision: https://reviews.freebsd.org/D30900
MFC after: 2 weeks
Temporary add stubs to the Linux emulation layer which calls the existing hook.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D30911
MFC after: 2 weeks
In preparation for vDSO code revision get rid of incomplete vDSO methods
from locore, but leave .note.Linux section commented out.
.note.Linux section is used by glibc rtld to get the kernel version, that
saves one system call call. I'll try to implement it later, if figure out
how to use it with jails.
MFC after: 2 weeks
The underlying types for both are the same so arguably this doesn't
really matter, but using the wrong type is still confusing and
technically incorrect.
There is multiple reason for this :
- This makes it easier to see which driver is needed for each SoC
- This makes it easier to create a custom config for one SoC
- This really reduce boot time (which some people might want)
Some explaination about the files :
- std.arm64 contains all standard kernel option
- std.dev contains all the standard kernel devices
- std.<soc> contains all drivers needed to boot on this SoC family
- <SOC> includes std.arm64, std.dev and std.<soc>
- GENERIC includes std.arm64, std.dev and all std.<soc>
Sponsored by: Diablotin Systems
MFC After: 2 months
Reviewed by: mmel, cognet, imp
Differential Revision: https://reviews.freebsd.org/D30474
The syscall number is stored in the same register as the syscall return
on amd64 (and possibly other architectures) and so it is impossible to
recover in the signal handler after the call has returned. This small
tweak delivers it in the `si_value` field of the signal, which is
sufficient to catch capability violations and emulate them with a call
to a more-privileged process in the signal handler.
This reapplies 3a522ba1bc852c3d4660a4fa32e4a94999d09a47 with a fix for
the static assertion failure on i386.
Approved by: markj (mentor)
Reviewed by: kib, bcr (manpages)
Differential Revision: https://reviews.freebsd.org/D29185
amd64 and 32-bit ARM already had assertions to this effect. Add them to
other pmaps.
Reviewed by: alc, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31171
They are valid as of the ARMv8.7 XML.
While here remove SCTLR_RES0 as it's unused and depends on which CPU
the kernel is running on and switch to shifted values as they are
easier to compare with the documentation.
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31120
They are valid as of the ARMv8.7 XML.
While here switch to use shifted values as they are easier to compare
with values in the Arm Reference Manual.
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31093
pmap_copy() is used to speculatively create mappings, so those mappings
should not have their access bit preset.
Reviewed by: kib, markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31162
Reduce the live ranges for three variables so that they do not span the
call to PHYS_TO_VM_PAGE(). This enables the compiler to generate
slightly smaller machine code.
Reviewed by: kib, markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31161
The syscall number is stored in the same register as the syscall return
on amd64 (and possibly other architectures) and so it is impossible to
recover in the signal handler after the call has returned. This small
tweak delivers it in the `si_value` field of the signal, which is
sufficient to catch capability violations and emulate them with a call
to a more-privileged process in the signal handler.
Approved by: markj (mentor)
Reviewed by: kib, bcr (manpages)
Differential Revision: https://reviews.freebsd.org/D29185
Rather than extending syr827 for syr828 (as initially done in D31103)
switch to the Fairchild Semiconductor Corporation fan53555 implementation
which is in-tree but was not attached to the build. The fan53555
implementation also supports syr827/syr8278 already. [1]
Update NOTES and the arm64 GENERIC configuration for the switch.
syr827 for now stays in the tree but is not used by any
kernel configuration.
Suggested by: mmel [1]
Reviewed by: mmel, manu
Differential Revision: https://reviews.freebsd.org/D31112
The character between the E's was the letter O, however in the Arm
Documentation and XML the character is the number 0 (zero).
Sponsored by: The FreeBSD Foundation
When the initial fcmpset in pmap_promote_l2() fails, there is no need
to repeat the check for the physical address being 2MB aligned or for
the accessed bit being set. While the pmap is locked the hardware can
only transition the accessed bit from 0 to 1, and we have already
determined that it is 1 when the fcmpset fails.
MFC after: 1 week
Use sysentvec hooks to only call umtx_thread_exit/umtx_exec, which handle
robust mutexes, for native FreeBSD ABI. Similarly, there is no sense
in calling sigfastblock_clear() for non-native ABIs.
Requested by: dchagin
Reviewed by: dchagin, markj (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D30987
Add the missing macros and decode all the fields as described in the
Arm Architecture System Registers XML corresponding to Armv8.5.
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30983
In a few places, on a failed compare-and-set, both the amd64 pmap and
the arm64 pmap repeat tests on bits that won't change state while the
pmap is locked. Eliminate some of these unnecessary tests.
Reviewed by: andrew, kib, markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31014
The r intc interrupt controller seems to do a lot of things :
- It can handle the NMI interrupt
- It have local interrupts for some device that also can be muxed with GIC
- It can serve as an forwarder for the GIC
It's mostly used for deepsleep/wakeup if I understood correctly and we do not
support this on arm64.
For now just forward everything to the GIC so interrupts works again for device
which now have this interrupts controller set since dts v5.12
Sponsored by: Diablotin Systems
Implement dumping core for Linux binaries on amd64, for both
32- and 64-bit executables. Some bits are still missing.
This is based on a prototype by chuck@.
Reviewed By: kib
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D30019
Make it possible to specify event codes without an offset of
PMC_EV_ARMV8_FIRST, by setting a machine-dependent flag. This is
required to make use of event definitions from pmu-events.
Reviewed by: ray (slightly earlier version)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30602
If the entry point for the binary executed is a thumb 2 entry point, make
sure we set the PSR_T bit, or the CPU will interpret it as arm32 code and
bad things will happen.
PR: 256899
MFC after: 1 week
This adds `sv_elf_core_osabi`, `sv_elf_core_abi_vendor`,
and `sv_elf_core_prepare_notes` fields to `struct sysentvec`,
and modifies imgact_elf.c to make use of them instead
of hardcoding FreeBSD-specific values. It also updates all
of the ABI definitions to preserve current behaviour.
This makes it possible to implement non-native ELF coredump
support without unnecessary code duplication. It will be used
for Linux coredumps.
Reviewed By: kib
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D30921
Eliminate some unnecessary unlocking and relocking when we have to retry
the operation to avoid deadlock. (All of the other pmap functions that
iterate over a PV list already implemented retries without these same
unlocking and relocking operations.)
Avoid a pointer dereference by using an existing local variable that
already holds the desired value.
Eliminate some unnecessary repetition of code on a failed fcmpset.
Specifically, there is no point in retesting the DBM bit because it
cannot change state while the pmap lock is held.
Reviewed by: kib, markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D30931
The genet driver (RPi4 Ethernet) had code to pull headers into the
first mbuf if there was only an Ethernet header there. This was
originally needed for ICMPv6 replies, then for forwarded IPv6/TCP.
Now a situation has been found where it is needed for IPv4, when
using NAT with IPFW. Generalize to do this for all protocols.
Rather than using an IPv6-related definition for the length, move
the length to a variable that can be set with sysctl
(hw.genet.tx_hdr_min). Move an old tunable to a new RDTUN variable
with a better name.
PR: 25607
MFC after: 3 days
Reviewers: emaste
Differential Revision: https://reviews.freebsd.org/D30831
In the unlikely event that the 1 GB page mapping being demoted is used
to access the L1 page table page containing the 1 GB page mapping and
the vm_page_alloc() to allocate a new L2 page table page fails, we
would leak a page of kernel virtual address space. Fix this leak.
MFC after: 1 week
Remove an #if 0 that results in a compilation error if PV_STATS is
defined. Aside from this #if 0, there is nothing wrong with the
PV_STATS code.
MFC after: 1 week
ENETC it a gigabit Ethernet controller found on the LS1028A board.
It supports basic VLAN offloads - tag extraction, injection and hardware
filtering. Inband MDIO connectivity is used for link status
monitoring through the miibus interface. Fixed-link mode is also
supported, which allows for operation of internal cpu to switch port.
Since no admin interrupts are present in hardware, link status polling
has to be used.
Due to a hardware bug software reset of the NIC results in a external
abort. Because of that most of the hardware initialization is done
during attach. This also means that in the case of an fatal error full
board reset is required.
The enetc_hw.h header was imporoted from Linux. It is dual licensed.
Submitted by: Kornel Duleba <mindal@semihalf.com>
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D30729
The "function entry" needs to be recorded with TSRAW() rather than
TSENTER() since curthread is not yet defined. (The same approach
is taken in amd64's hammer_time.)
The warning message "ERROR loading DTB" (for systems without a device
tree blob) is printed extremely early in the boot process -- among
other things, before curthread or other pcpu data has been set up.
Unfortunately, printf is instrumented with TSLOG, which cannot run
quite this early.
Wrap the printf in #ifndef TSLOG; the situations where the printf
will be useful are not ones where TSLOG would be in use.
Revise pmap_remove_l2() to use the constant-time function page_to_pvh()
instead of the linear-time function pa_to_pvh().
Reviewed by: kib, markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D30876
The page table entry for a 4KB page mapping must be valid if a PV entry
for the mapping exists, so there is no point in testing each page table
entry's validity when iterating over a PV list.
Reviewed by: kib, markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D30875
When support for a sparse pv_table was added, the implementation of
pa_to_pvh() changed from a simple constant-time calculation to iterating
over the array vm_phys_segs[]. To mitigate this issue, an alternative
function, page_to_pvh(), was introduced that still runs in constant time
but requires the vm_page_t to be known. However, three cases where the
vm_page_t is known were not converted to page_to_pvh(). This change
converts those three cases.
Reviewed by: kib, markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D30832
Assuming we can't run on i486, i586 class cpu, retire linux_kplatform var
and use hardcoded 'machine' value in linux_newuname().
I have added linux_kplatform for consistency with linux_platform which is
placed in to vdso to avoid excess copyout it on stack for AT_PLATFORM at
exec time.
This is the first stage of Linuxulator's vdso revision.
Reviewed by: trasz, imp
Differential Revision: https://reviews.freebsd.org/D30774
MFC after: 2 weeks