Newer cores have the 'tlbilx' instruction, which doesn't broadcast over
CoreNet. This is significantly faster than walking the TLB to invalidate
the PID mappings. tlbilx with the arguments given takes 131 clock cycles to
complete, as opposed to 512 iterations through the loop plus tlbre/tlbwe at
each iteration.
MFC after: 3 weeks
Don't clobber the low part of the register restoring the high component of.
This could lead to very bad behavior if it's an ABI-affected register.
While here, also mark the asm volatile in the SPE high save case, to match
the load case.
Reported by: Branden Bergren (git_bdragon.rtk0.net)
MFC after: 1 week
Optimize the exception handler to only save and load the upper word of the
GPRs used in the emulating instruction. This reduces the save/load
overhead, and as a side effect does not overwrite the upper word of any
temporary register.
With this commit I am now able to run editors/abiword and math/gnumeric on a
e500-based system.
MFC after: 1 week
MFC With: r341752,r341751
The code was a near exact copy of the code in startup, but it doesn't need
the complexity since the kernel is already relocated. With
VM_MIN_KERNEL_ADDRESS as currently set to KERNBASE, this doesn't cause a
problem, because it's a zero offset. However, when KERNBASE is changed to a
physical load address, it then has a non-zero offset, and ends up with an
invalid stack pointer, causing the AP to hang.
The same behavior was moved to machdep.c, paired with AIM's relocation,
making this redundant. With this, it's now possible to boot FreeBSD with
ubldr on a uboot Book-E platform, even with a
KERNBASE != VM_MIN_KERNEL_ADDRESS.
The metadata pointer will almost never be at or above 'btext', as btext is a
relocated symbol, so will be based at VM_MIN_KERNEL_ADDRESS, not at
KERNBASE. Check the address against kernload, where the kernel is
physically loaded.
Book-E kernels really run at VM_MIN_KERNEL_ADDRESS, which currently happens to
be the same as KERNBASE. KERNBASE is the linked address, which the loader also
takes to be the physical load address. Treat KERNBASE as a physical address,
not a virtual, and change virtual address references for KERNBASE to use
something more appropriate.
debugf() is unnecessary for the TLB printing functions, as they're only
intended to be used from ddb. Instead, make them full DDB 'show'
commands, so now it can be written as 'show tlb1' and 'show tlb0'
instead of calling the function, hoping DEBUG has been defined.
The Signal Processing Engine (SPE) found in Freescale e500 cores (and
others) offloads IEEE-754 compliance (NaN, Inf handling, overflow,
underflow) to software, most likely as a means of simplifying the APU
silicon. Some software, like AbiWord, needs full IEEE-754 compliance,
including NaN handling. Implement the necessary bits to enable it.
Differential Revision: https://reviews.freebsd.org/D17446
This code caused more problems than it should have fixed (boot failures) on
the machines I tested, so has been commented out for a while now. Remove
it, and assume the errata fixups were done by the bootloader where they
belong.
Make int_external_input, int_decrementer, and int_performance_counter all
now use trap_common, just like on AIM. The effects of this are:
* All traps are now properly displayed in ddb. Previously traps from
external input, decrementer, and performance counters, would display as
just basic stack traces. Now the frame is displayed.
* External interrupts are now handled with interrupts enabled, so handling
can be preempted. This seems to fix a hang found post-r329882.
opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.
Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c. A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.
Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.
Reviewed by: kib, cem, jhb, jtl
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14941
TLB1 can handle ranges up to 4GB (through e5500, larger in e6500), but
ilog2() took a unsigned int, which maxes out at 4GB-1, but truncates
silently. Increase the input range to the largest supported, at least for
64-bit targets. This lets the DMAP be completely mapped, instead of only
1GB blocks with it assuming being fully mapped.
As with AIM64, map the DMAP at the beginning of the fourth "quadrant" of
memory, and move the KERNBASE to the the start of KVA.
Eventually we may run the kernel out of the DMAP, but for now, continue
booting as it has been.
assym is only to be included by other .s files, and should never
actually be assembled by itself.
Reviewed by: imp, bdrewery (earlier)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D14180
A reservation granule on PowerPC is a cache line.
On e500mc and derivatives a cacheline size is 64 bytes, not 32. Allocate
the maximum size permitted, but only utilize the size that is needed. On
e500v1 and e500v2 the reservation granule will still be 32 bytes.
Make vm_wait() take the vm_object argument which specifies the domain
set to wait for the min condition pass. If there is no object
associated with the wait, use curthread' policy domainset. The
mechanics of the wait in vm_wait() and vm_wait_domain() is supplied by
the new helper vm_wait_doms(), which directly takes the bitmask of the
domains to wait for passing min condition.
Eliminate pagedaemon_wait(). vm_domain_clear() handles the same
operations.
Eliminate VM_WAIT and VM_WAITPFAULT macros, the direct functions calls
are enough.
Eliminate several control state variables from vm_domain, unneeded
after the vm_wait() conversion.
Scetched and reviewed by: jeff
Tested by: pho
Sponsored by: The FreeBSD Foundation, Mellanox Technologies
Differential revision: https://reviews.freebsd.org/D14384
This is part of a long-term goal of merging Book-E and AIM into a single GENERIC
kernel. As more work is done, the struct may be optimized further.
Reviewed by: nwhitehorn
significant source of cache line contention from vm_page_alloc(). Use
accessors and vm_page_unwire_noq() so that the mechanism can be easily
changed in the future.
Reviewed by: markj
Discussed with: kib, glebius
Tested by: pho (earlier version)
Sponsored by: Netflix, Dell/EMC Isilon
Differential Revision: https://reviews.freebsd.org/D14273
global to per-domain state. Protect reservations with the free lock
from the domain that they belong to. Refactor to make vm domains more
of a first class object.
Reviewed by: markj, kib, gallatin
Tested by: pho
Sponsored by: Netflix, Dell/EMC Isilon
Differential Revision: https://reviews.freebsd.org/D14000
threads from compile-time defines to global variables. This removes a
significant amount of duplicated runtime patches to the compile-time
defines, centralizing the conditional logic in the early startup code.
Reviewed by: jhibbits
used with hashed page tables on AIM and place it into a new, modular pmap
function called pmap_decode_kernel_ptr(). This function is the inverse
of pmap_map_user_ptr(). With POWER9 radix tables, which mapping to use
becomes more complex than just AIM/BOOKE and it is best to have it in
the same place as pmap_map_user_ptr().
Reviewed by: jhibbits
buffers into a new pmap-module function pmap_map_user_ptr() that can
be implemented by the respective modules. This is required to implement
non-segment-based AIM-ish MMU systems such as the radix-tree page tables
introduced by POWER ISA 3.0 and present on POWER9.
Reviewed by: jhibbits
the 32-bit cookie can be sign-extended on its way out of the loader and
through Open Firmware. If sign-extended, the in-kernel check of its value
would fail on 64-bit systems, resulting in a mountroot prompt. Solve this
by telling the kernel to ignore the high-order bits.
PR: kern/224437
Submitted by: Gustavo Romero
pmap_track_page() only works with physical memory pages, which have a
constant vm_page_t address. Microoptimize pmap_track_page() to perform one
less operation under the lock.
This was done in 32-bit mode, but not duplicated when 64-bit mode was
brought in. Without this, stale mappings can be left, leading to odd
crashes when the wrong VA is checked in XX_PhysToVirt() (dpaa(4)).
Devices aren't mapped within the KVA, and with the way 64-bit hashes the
addresses pte_vatopa() may not return a 0 physical address for a device.
MFC after: 1 week
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
This allows modules creating mappings to be loaded post-boot, after SMP has
started. Without this, the TLB1 mappings can become unsynchronized and lead
to kernel page faults when accessed on the alternate CPUs.
MFC after: 3 weeks
PowerPC kernels in r6 is actually metadata from loader(8) or gibberish
left in r6, which is not required to be anything under the
PAPR/ePAPR/CHRP/OF standards, by another boot loader.
Note that, as a result, systems need a new boot loader to boot PPC kernels
after this revision without ending up at a mountroot prompt. New boot
loaders are backwards compatible and can boot older kernels.
Reviewed by: jhibbits
MFC after: 2 months
The vast majority of pmap_kextract() calls are looking for a physical memory
address, not a device address. By checking the page table first this saves
the formerly inevitable 64 (on e500mc and derivatives) iteration loop
through TLB1 in the most common cases.
Benchmarking this on the P5020 (e5500 core) yields a 300% throughput
improvement on dtsec(4) (115Mbit/s -> 460Mbit/s) measured with iperf.
Benchmarked on the P1022 (e500v2 core, 16 TLB1 entries) yields a 50%
throughput improvement on tsec(4) (~93Mbit/s -> 165Mbit/s) measured with
iperf.
MFC after: 1 week
Relnotes: Maybe (significant performance improvement)
* Check TLB1 in all mapdev cases, in case the memattr matches an existing
mapping (doesn't need to be MAP_DEFAULT).
* Fix mapping where the starting address is not a multiple of the widest size
base. For instance, it will now properly map 0xffffef000, size 0x11000 using
2 TLB entries, basing it at 0x****f000, instead of 0x***00000.
MFC after: 2 weeks
According to EREF rlwinm is supposed to clear the upper 32 bits of the
register of 64-bit cores. However, from experience it seems there's a bug
in the e5500 which causes the result to be duplicated in the upper bits of
the register. This causes problems when applied to stashed SRR1 accessed
to retrieve context, as the upper bits are not masked out, so a
set_mcontext() fails. This causes sigreturn() to in turn return with
EINVAL, causing make(1) to exit with error.
This bit is unused in e500mc derivatives (including e5500), so could just be
conditional on non-powerpc64, but there may be other non-Freescale cores
which do use it. This is also the same as the POW bit on Book-S, so could
be cleared unconditionally with the only penalty being a few clock cycles
for these two interrupts.