Upon successful completion, the execve() system call invokes
exec_setregs() to initialize the registers of the initial thread of the
newly executed process. What is weird is that when execve() returns, it
still goes through the normal system call return path, clobbering the
registers with the system call's return value (td->td_retval).
Though this doesn't seem to be problematic for x86 most of the times (as
the value of eax/rax doesn't matter upon startup), this can be pretty
frustrating for architectures where function argument and return
registers overlap (e.g., ARM). On these systems, exec_setregs() also
needs to initialize td_retval.
Even worse are architectures where cpu_set_syscall_retval() sets
registers to values not derived from td_retval. On these architectures,
there is no way cpu_set_syscall_retval() can set registers to the way it
wants them to be upon the start of execution.
To get rid of this madness, let sys_execve() return EJUSTRETURN. This
will cause cpu_set_syscall_retval() to leave registers intact. This
makes process execution easier to understand. It also eliminates the
difference between execution of the initial process and successive ones.
The initial call to sys_execve() is not performed through a system call
context.
Reviewed by: kib, jhibbits
Differential Revision: https://reviews.freebsd.org/D13180
The vast majority of pmap_kextract() calls are looking for a physical memory
address, not a device address. By checking the page table first this saves
the formerly inevitable 64 (on e500mc and derivatives) iteration loop
through TLB1 in the most common cases.
Benchmarking this on the P5020 (e5500 core) yields a 300% throughput
improvement on dtsec(4) (115Mbit/s -> 460Mbit/s) measured with iperf.
Benchmarked on the P1022 (e500v2 core, 16 TLB1 entries) yields a 50%
throughput improvement on tsec(4) (~93Mbit/s -> 165Mbit/s) measured with
iperf.
MFC after: 1 week
Relnotes: Maybe (significant performance improvement)
Mainly focus on files that use BSD 3-Clause license.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
Initially, only tag files that use BSD 4-Clause "Original" license.
RelNotes: yes
Differential Revision: https://reviews.freebsd.org/D13133
There's no need to special case 32-bit AIM to short circuit processing.
Some AIM CPUs can handle 36 bit addresses, and 64-bit CPUs can run 32-bit
OSes, so this will allow us to expand for that in the future if we desire.
The interrupt map wasn't being allocated properly, preventing IRQs from being
allocated to children of the PCIe bus. Fix this by cloning the ofw_pcib_pci
code, which handles all cases -- device tree and probed.
In the future this may become a subclass of the ofw_pcib_pci driver, but as
that's not an exported class, it's cloned for now.
MFC after: 3 weeks
* Check TLB1 in all mapdev cases, in case the memattr matches an existing
mapping (doesn't need to be MAP_DEFAULT).
* Fix mapping where the starting address is not a multiple of the widest size
base. For instance, it will now properly map 0xffffef000, size 0x11000 using
2 TLB entries, basing it at 0x****f000, instead of 0x***00000.
MFC after: 2 weeks
similar to the kernel memory allocator.
This simplifies NUMA allocation because the domain will be known at wait
time and races between failure and sleeping are eliminated. This also
reduces boilerplate code and simplifies callers.
A wait primitive is supplied for uma zones for similar reasons. This
eliminates some non-specific VM_WAIT calls in favor of more explicit
sleeps that may be satisfied without new pages.
Reviewed by: alc, kib, markj
Tested by: pho
Sponsored by: Netflix, Dell/EMC Isilon
According to EREF rlwinm is supposed to clear the upper 32 bits of the
register of 64-bit cores. However, from experience it seems there's a bug
in the e5500 which causes the result to be duplicated in the upper bits of
the register. This causes problems when applied to stashed SRR1 accessed
to retrieve context, as the upper bits are not masked out, so a
set_mcontext() fails. This causes sigreturn() to in turn return with
EINVAL, causing make(1) to exit with error.
This bit is unused in e500mc derivatives (including e5500), so could just be
conditional on non-powerpc64, but there may be other non-Freescale cores
which do use it. This is also the same as the POW bit on Book-S, so could
be cleared unconditionally with the only penalty being a few clock cycles
for these two interrupts.
When the segment count is > 16 it spills into an 'indirect descriptor list',
which immediately follows the main table, but the indirect list is entry 15, so
needs to be skipped for the general list.
The Freescale SATA controller has many similarities to AHCI controllers, so
this driver is a heavily modified AHCI driver. Currently it seems to only
do SATA 1.0 speeds (~100-150MB/s), so there is still room for improvement.
Still to be done:
* Address erratum SATA-A-006187 -- Spread Spectrum Support (intermittent
non-recoverable transient data integrity error seen when SSC enabled).
* Linux doesn't read the log page as it hangs on the P1022. See if that's
applicable to this, and address accordingly.
* Try to determine what's holding back performance, and address it.
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D6071
We already pass -many to the assembler, and -me500 drops 64-bit instruction
handling, for some reason only breaking module building for 64-bit kernels.
Additionally, build with CTF for dtrace.
gcc complains "cast to pointer from integer of different size". phandle_t is
*always* a uint32_t, so treat it as such, not as a pointer. Fixes 64-bit build.
This brings it closer to par with GENERIC64. In the future I hope to have a
GENERIC64-E and GENERIC-E kernels as Book-E analogues to the GENERIC64/GENERIC
AIM kernels.
Rework the dTSEC and FMan drivers to be more like a full bus relationship,
so that dtsec can use bus_alloc_resource() instead of trying to handle the
offset from the dts. This required taking some code from the sparc64 ebus
driver to allow subdividing the fman region for the dTSEC devices.
This adds some support for ARM as well as 64-bit. 64-bit on PowerPC is
currently not working, and ARM support has not been completed or tested on the
FreeBSD side.
As this was imported from a Linux tree, it includes some Linux-isms
(ioread/iowrite), so compile with the LinuxKPI for now. This may change in the
future.
- allocate value for new AT_HWCAP2 auxiliary vector on all platforms.
- expand 'struct sysentvec' by new 'u_long *sv_hwcap2', in exactly
same way as for AT_HWCAP.
MFC after: 1 month
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D12699
HEAD. Enable VIMAGE in GENERIC kernels and some others (where GENERIC does
not exist) on HEAD.
Disable building LINT-VIMAGE with VIMAGE being default.
This should give it a lot more exposure in the run-up to 12 to help
us evaluate whether to keep it on by default or not.
We are also hoping to get better performance testing.
The feature can be disabled using nooptions.
Requested by: many
Reviewed by: kristof, emaste, hiren
X-MFC after: never
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D12639
* Book-E can have Altivec exceptions, so move it out of the AIM-only block.
* Print the right DSI trap mode (read vs write) for Book-E
While here, fix some whitespace found while reviewing other diffs.
These devices bring the configs closer to a desktop-like (GENERIC) kernel
config.
* The Freescale DIU support was added to the config in r306358.
Without keyboard support video support is nearly pointless, so add ukbd and
ums.
* The AmigaOne X5000, and P1022 devboard, both use a variant of the ds1307 RTC
* cpufreq scaling is currently supported by the p1022. More SoCs will be added
eventually.
__builtin_frame_address with a non-zero argument is unsafe and rejected by
newer gcc. Since it doesn't seem to impact the stacktrace, don't bother
with gymnastics to unwind to a different frame for starting.
PR: kern/220118
MFC after: 2 weeks
All the Book-E world is no longer e500v{1,2}. e500mc the 64-bit derivatives do
not use the DOZE/NAP bits with MSR[WE], instead using the `wait' instruction to
wait for interrupts, and SoC plane controls (via CCSR) for power management.
MFC after: 1 week
A new 'u_long *sv_hwcap' field is added to 'struct sysentvec'. A
process ABI can set this field to point to a value holding a mask of
architecture-specific CPU feature flags. If an ABI does not wish to
supply AT_HWCAP to processes the field can be left as NULL.
The support code for AT_EHDRFLAGS was already present on all systems,
just the #define was not present. This is a step towards unifying the
AT_* constants across platforms.
Reviewed by: kib
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D12290
P5040/P5021 have the same number of LAWs as P5020. There may be a better way of
getting the count from the FDT (fsl,num-laws property on soc/corenet-law or
soc/ecm-law), but that's not supported everywhere, so we still need this check
for those other cases.
These processors may not be supported yet, but add them for completion.
POWER9 is planned for support. e300 may work (based on 603e core).
P5040/P5021 are similar to P5020, so should work as well. One addition is
needed for P5040, to support the number of LAWs, and will be a separate commit.
redundant initializations.
Hard-code base = 0, height = (approx. 1/8 of the boot-time font height)
in all cases, and remove the BIOS/MD support for setting these values.
This asks for an underline cursor sized for the boot-time font instead
of various less hard-coded but worse values. I used that think that
the x86 BIOS always gave the same values as the above hard-coding, but
on 1 of my systems it gives the wrong value of base = 1.
The remaining BIOS fields are shift_state and bell_pitch. These are now
consistently not explicitly reinitialized to 0. All sc_get_bios_value()
functions except x86's are now empty, and the only useful thing that x86
returns is shift_state. This really belongs in atkbdc, but heavier
use of the BIOS to read the more useful typematic rate has been removed
there. fb still makes much heavier use of the BIOS.
P1022 and MPC8536 include a 'jog' feature for clock control
(jog being a slower form of run mode). This is done by changing the
PLL multiplier, and cannot be done if any core is in doze or sleep mode.
--Remove special-case handling of sparc64 bus_dmamap* functions.
Replace with a more generic mechanism that allows MD busdma
implementations to generate inline mapping functions by
defining WANT_INLINE_DMAMAP in <machine/bus_dma.h>. This
is currently useful for sparc64, x86, and arm64, which all
implement non-load dmamap operations as simple wrappers
around map objects which may be bus- or device-specific.
--Remove NULL-checked bus_dmamap macros. Implement the
equivalent NULL checks in the inlined x86 implementation.
For non-x86 platforms, these checks are a minor pessimization
as those platforms do not currently allow NULL maps. NULL
maps were originally allowed on arm64, which appears to have
been the motivation behind adding arm[64]-specific barriers
to bus_dma.h, but that support was removed in r299463.
--Simplify the internal interface used by the bus_dmamap_load*
variants and move it to bus_dma_internal.h
--Fix some drivers that directly include sys/bus_dma.h
despite the recommendations of bus_dma(9)
Reviewed by: kib (previous revision), marius
Differential Revision: https://reviews.freebsd.org/D10729
Without disabling interrupts it's possible for another thread to preempt
and update the registers post-read (tlb1_read_entry) or pre-write
(tlb1_write_entry), and confuse the kernel with mixed register states.
MFC after: 2 weeks
AKA Make time_t 64 bits on powerpc(32).
PowerPC currently (until now) was one of two architectures with a 32-bit time_t
on 32-bit archs (the other being i386). This is an ABI breakage, so all ports,
and all local binaries, *must* be recompiled.
Tested by: andreast, others
MFC after: Never
Relnotes: Yes