Move dump_avail[] extern declaration and inlines into a new header
vm/vm_dumpset.h. This fixes default gcc build for mips.
Reviewed by: alc, scottph
Tested by: kevans (previous version)
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D26741
OPAL unconditionally enters secondary CPUs with only HV and SF set.
I tried writing a secondary entry point instead, but OPAL rejected it
and I am unsure why, so I resorted to making the system reset interrupt
endian-flexible.
This means we take a slight performance hit on wakeup on LE, but it is
a good stopgap until we can figure out a reliable way to make OPAL enter
where we want it to.
It probably makes sense to have it around anyway, because I can imagine
scenarios where the cpu resets itself to BE and does a software reset.
Sponsored by: Tag1 Consulting, Inc.
It's much easier to implement this in an endian-independent way when we
don't also have to worry about masking half of the dword off.
Given that this code ran on a machine that ran a poudriere bulk with no
kernel oddities, I am relatively certain it is correctly implemented. ;)
This should be a minor performance boost on BE as well.
Sponsored by: Tag1 Consulting, Inc.
For a body of code that had its endian conversion bits written blind without
the ability to test, moea64 was VERY close to being correct.
There were only four instances where the existing code was getting it wrong.
Sponsored by: Tag1 Consulting, Inc.
Implement pmap_mincore() for moea64.
This will need some slight tweaks when large page support in HPT lands.
Submitted by: Fernando Eckhardt Valle <fernando.valle@eldorado.org.br>
Reviewed by: bdragon
Differential Revision: https://reviews.freebsd.org/D26314
When doing a build for a modern CPUTYPE, llvm will throw errors if obsolete
instructions are used, even if they will never run due to runtime checks.
Hiding the dssall instruction from the assembler fixes kernel build when
overriding CPUTYPE, without having any effect on the generated binary.
This has been in my local tree for over a year and is well tested across
a variety of machines.
Sponsored by: Tag1 Consulting, Inc.
Currently we use a single bit to indicate whether the virtual page is
part of a superpage. To support a forthcoming implementation of
non-transparent 1GB superpages, it is useful to provide more detailed
information about large page sizes.
The change converts MINCORE_SUPER into a mask for MINCORE_PSIND(psind)
values, indicating a mapping of size psind, where psind is an index into
the pagesizes array returned by getpagesizes(3), which in turn comes
from the hw.pagesizes sysctl. MINCORE_PSIND(1) is equal to the old
value of MINCORE_SUPER.
For now, two bits are used to record the page size, permitting values
of MAXPAGESIZES up to 4.
Reviewed by: alc, kib
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D26238
After spending a lot of time trying to track down what was going on, I have
isolated the "black screen" failures when using boot1 to boot a G4 machine.
It turns out we were replacing the traps before installing the temporary
BAT entry for the bottom of physical memory. That meant that until the MMU
was bootstrapped, the cached translations were the only thing keeping us
from losing.
Throwing boot1 into the mix was affecting execution flow enough to cause us
to hit an uncached page and crash.
Fix this by properly setting up the initial BAT entry at the same time we
are replacing the OpenFirmware traps, so we can continue executing in
segment 0 until the rest of the DMAP has been set up.
A second thing discovered while researching this is that we were entering a
BAT region for segment 16. It turns out this range was a) considered part
of KVA, and b) has firmware mappings with varying attributes.
If we ever accessed an unmapped page in segment 16, it would cause a BAT
entry to be installed for the whole segment, which would bypass the
existing mappings until it was flushed out again.
Instead, translate the OFW memory attributes into VM memory attributes and
install the ranges into the kernel address space properly.
Reviewed by: adalava
MFC after: 3 weeks
Sponsored by: Tag1 Consulting, Inc.
Differential Revision: https://reviews.freebsd.org/D25547
Subsequent to r240317, kmem_free() was replaced with kva_free() (r254025).
kva_free() releases the KVA allocation for the mapped region, but no longer
clears the pmap (pagetable) entries.
An affected pmap_unmapdev operation would leave the still-pmap'd VA space
free for allocation by other KVA consumers. However, this bug easily
avoided notice for ~7 years because most devices (1) never call
pmap_unmapdev and (2) on amd64, mostly fit within the DMAP and do not need
KVA allocations. Other affected arch are less popular: i386, MIPS, and
PowerPC. Arm64, arm32, and riscv are not affected.
Reported by: Don Morris <dgmorris AT earthlink.net>
Submitted by: Don Morris (amd64 part)
Reviewed by: kib, markj, Don (!amd64 parts)
MFC after: I don't intend to, but you might want to
Sponsored by: Dell Isilon
Differential Revision: https://reviews.freebsd.org/D25689
Use PVO_PADDR macro to get the physical address from a PVO, instead of
explicitly ANDing pvo_pte.pa with LPTE_RPGN where it is needed. Besides
improving readability, this is needed to support superpages (D25237), where
the steps to get the PA from a PVO are different.
Reviewed by: markj
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D25654
Summary:
Radix on AIM, and all of Book-E (currently), can do direct addressing of
user space, instead of needing to map user addresses into kernel space.
Take advantage of this to optimize the copy(9) functions for this
behavior, and avoid effectively NOP translations.
Test Plan: Tested on powerpcspe, powerpc64/booke, powerpc64/AIM
Reviewed by: bdragon
Differential Revision: https://reviews.freebsd.org/D25129
Summary:
The point of this addition is to cache CPU behavior 'features', to avoid
having to recompute based on CPU, etc.
The first such use case is to avoid the unnecessary manipulation of the
SLBs (Segment Lookaside Buffers) when using the Radix pmap on POWER9.
Since we already get the PCPU pointer wherever we swap the SLB entries,
we can use a cached flag to check if it's necessary to perform the
operation anyway, and skip it when not.
Reviewed by: bdragon
Differential Revision: https://reviews.freebsd.org/D24908
Found by running libc tests with radix enabled.
Detect unsigned integer wrapping with a postcondition.
Note: Radix MMU is not enabled by default yet.
Sponsored by: Tag1 Consulting, Inc.
With IFUNC support in the kernel, we can finally get rid of our poor-man's
ifunc for pmap, utilizing kobj. Since moea64 uses a second tier kobj as
well, for its own private methods, this adds a second pmap install function
(pmap_mmu_init()) to perform pmap 'post-install pre-bootstrap'
initialization, before the IFUNCs get initialized.
Reviewed by: bdragon
In this context, 0 actually means 0 (i.e. this is a li instruction).
While most assemblers will ignore this, I did have a compile failure at one
point when using an external toolchain.
In the future, we should use the li syntax to make this clearer.
Sponsored by: Tag1 Consulting, Inc.
Recent changes have caused the vmspace objects to start coming from KVA
instead of direct-mapped memory on powerpc. As far as I can tell, this is
not actually a problem, so we should stop arbitrarily asserting that it is.
I do not know why this was not being triggered before.
Approved by: jhibbits
Sponsored by: Tag1 Consulting, Inc.
Instead of crashing the user process when a D-ERAT multihit is detected, try
to flush the ERAT, and continue. This machine check indicates a likely PMAP
invalidation shortcoming that will need to be addressed, but it's
recoverable, so just recover. The recovery is pmap-specific to flush the
ERAT, so add a pmap function to do so, currently only implemented by the
POWER9 radix pmap.
x86 needs delayed TLB invalidation because invalidation requires an
expensive IPI. PowerPC has had a TLB invalidation instruction since the
POWER1 in 1990, so there's no need to delay anything.
A page (even physmem) can be marked as cache-inhibited. Attempting to use
'dcbz' to zero a page mapped cache-inhibited triggers an alignment
exception, which is fatal in kernel. This was seen when testing hardware
acceleration with X on POWER9.
At some point in the future, this should be changed to a more straight
forward zero loop instead of bzero(), and a similar change be made to the
other pmaps.
Reported by: pkubaj@
TRAP_ENTRY(0) should be TRAP_GENTRAP(0) here.
However, in practice, it doesn't matter, as the only time TRAP_ENTRY and
TRAP_GENTRAP can differ is when bridge mode is active, which is impossible
on the 64 bit kernel.
Fix it anyway in case we ever need to add a trap preamble on PPC64.
Summary:
POWER9 supports two MMU formats: traditional hashed page tables, and Radix
page tables, similar to what's presesnt on most other architectures. The
PowerISA also specifies a process table -- a table of page table pointers--
which on the POWER9 is only available with the Radix MMU, so we can take
advantage of it with the Radix MMU driver.
Written by Matt Macy.
Differential Revision: https://reviews.freebsd.org/D19516
Summary:
Some machine checks are process-recoverable, others are not. Let a
CPU-specific handler decide what to do.
This works around a machine check error hit while building www/firefox
and mail/thunderbird, which would otherwise cause the build to fail.
More work is needed to handle all possible machine check conditions, but
this is sufficient to unblock some ports building.
Differential Revision: https://reviews.freebsd.org/D23731
This is a general cleanup of the relocatable kernel support on powerpc,
needed to enable kernel ifuncs.
* Fix some relocatable issues in the kernel linker, and change to using
a RELOCATABLE_KERNEL #define instead of #ifdef __powerpc__ for parts that
other platforms can use in the future if they wish to have ET_DYN kernels.
* Get rid of the DB_STOFFS hack now that the kernel is relocated to the DMAP
properly across the board on powerpc64.
* Add powerpc64 and powerpc32 ifunc functionality.
* Allow AIM64 virtual mode OF kernels to run from the DMAP like other AIM64
by implementing a virtual mode restart. This fixes the runtime address on
PowerMac G5.
* Fix symbol relocation problems on post-relocation kernels by relocating
the symbol table.
* Add an undocumented method for supplying kernel symbols on powernv and
other powerpc machines using linux-style kernel/initrd loading -- If
you pass the kernel in as the initrd as well, the copy resident in initrd
will be used as a source for symbols when initializing the debugger.
This method is subject to removal once we have a better way of doing this.
Approved by: jhibbits
Relnotes: yes
Sponsored by: Tag1 Consulting, Inc.
Differential Revision: https://reviews.freebsd.org/D23156
Default moea64_bpvo_pool_size 327680 was insufficient for initial
memory mapping at boot time on systems with, for example, 64G and
no huge pages enabled.
Submitted by: Andre Silva <afscoelho@gmail.com>
Reviewed by: jhibbits, alfredo
Approved by: jhibbits (mentor)
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D24102
PR: 244118
Reported by: Francis Little <oggy at farscape.co.uk>
Tested by: Francis Little, Mark Millard <marklmi at yahoo.com>
Reviewed by: markj
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D23729
Remove mbuf_jumbo_alloc and let large mbuf zones use the new uma default
contig allocator (a copy of mbuf_jumbo_alloc). Tag other zones which
require contiguous objects, even if they don't use the new default
contig allocator, so that uma knows about their constraints.
Reviewed by: jeff, markj
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D23238
In r356767, memcpy/memmove/bcopy optimizations were added to libc to
improve performance.
This exposed an existing kernel issue in VSX handling. The PSL_VSX flag was
not being excluded from the psl_userstatic set, which meant that any thread
that used these and then called swapcontext(3) would get an EINVAL error.
Fixing this exposed a second issue - in r344123, the FPU was being forced
off in set_mcontext(). However, this was neglecting to ensure VSX was turned
off at the same time.
While here, add some code comments to explain what's going on.
Reviewed by: jhibbits, luporl (earlier rev), pkubaj (earlier rev)
Sponsored by: Tag1 Consulting, Inc.
Differential Revision: https://reviews.freebsd.org/D23497
Summary:
This fixes kernel crashing when tunable "machdep.moea64_bpvo_pool_size" is
set to a value higher then 327680 (default value). Function
moea64_mid_bootstrap() relies on moea64_bpvo_pool_size, but at time of the
use the variable wan't yet updated with the new value provided by user.
Problem was detected after trying to use a VM with 64GB of RAM, and default
moea64_bpvo_pool_size is insufficient (kernel boot used more than 470000) .
I think default value must be discussed to address this use case, or find a
way to calculate pool size automatically based on amount of memory detected.
Test Plan: Tested on QEMU VM with 64GB of RAM using "set
machdep.moea64_bpvo_pool_size=655360" on loader prompt
Submitted by: Alfredo Dal'Ava Júnior (alfredo.junior_eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D23233
In rS354701, I replaced text relocations with offsets from &generictrap.
Unfortunately, the magic variable I was using doesn't actually mean the
address of &generictrap, in bridge mode it actually means &generictrap64.
So, for bridge mode to work, it is necessary to differentiate between
"where do we need to branch to to handle a trap" and "where is &generictrap
for purposes of doing relative math".
Introduce a new TRAP_ENTRY and use it instead of TRAP_GENTRAP for doing
actual calls to the generic trap handler.
Reported by: Mark Millard <marklmi@yahoo.com>
Reviewed by: jhibbits
Sponsored by: Tag1 Consulting, Inc.
Differential Revision: https://reviews.freebsd.org/D23057
Summary:
This makes the interface described in the definition file act like a
pseudo-IFUNC service, by caching the found method locally.
Applying this to the PowerPC MMU definitions, it yields a significant
(15-20%) performance improvement, seen in both a 'make buildworld' and a
parallel build of LLVM, on a POWER9 system.
Reviewed By: imp
Differential Revision: https://reviews.freebsd.org/D23245
Due to ppc32 building mmu_oea64.c (for use when in bridge mode on a G5), we
need to guard the new moea64_page_array_startup code behind __powerpc64__
to avoid a compile error, since vm_offset_t is not 64-bit on ppc32.
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D22782
This is a 32-bit structure embedded in each vm_page, consisting mostly
of page queue state. The use of a structure makes it easy to store a
snapshot of a page's queue state in a stack variable and use cmpset
loops to update that state without requiring the page lock.
This change merely adds the structure and updates references to atomic
state fields. No functional change intended.
Reviewed by: alc, jeff, kib
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D22650
Summary:
moea64_pte_sync_native() and moea64_pte_unset_native() don't need the
full PTE created, they only need to check that the PVO has a matching
PTE to the PTE in the page table. Don't waste time creating the full
PTE in this case.
Reviewed by: luporl
Differential Revision: https://reviews.freebsd.org/D22341
Summary:
This matches r351198 from amd64. This only applies to AIM64 and Book-E.
On AIM64 it short-circuits with one domain, to behave similar to
existing. Otherwise it will allocate 16MB huge pages to hold the page
array, across all NUMA domains. On the first domain it will shift the
page array base up, to "upper-align" the page array in that domain, so
as to reduce the number of pages from the next domain appearing in this
domain. After the first domain, subsequent domains will be allocated in
full 16MB pages, until the final domain, which can be short. This means
some inner domains may have pages accounted in earlier domains.
On Book-E the page array is setup at MMU bootstrap time so that it's
always mapped in TLB1, on both 32-bit and 64-bit. This reduces the TLB0
overhead for touching the vm_page_array, which reduces up to one TLB
miss per array access.
Since page_range (vm_page_startup()) is no longer used on Book-E but is on
32-bit AIM, mark the variable as potentially unused, rather than using a
nasty #if defined() list.
Reviewed by: luporl
Differential Revision: https://reviews.freebsd.org/D21449
ENOENT is leftover from mmu_oea.c's moea_pvo_enter(), where it's used to
syncicache() on the first new mapping of a page. This sync is done
differently in OEA64.
Fix wrong section ordering that was causing a ".got is not contiguous with
other relro sections" lld error. This also brings ldscript.powerpc and
ldscript.powerpcspe closer to ldscript.powerpc64.
Also, remove unnecessary text relocs from the ppc32 AIM trap code.
Approved by: jhibbits (mentor)
Differential Revision: https://reviews.freebsd.org/D22349
PowerISA 3.0 eliminated the 64-bit bridge mode which allowed 32-bit kernels
to run on 64-bit AIM/Book-S hardware. Since therefore only a 64-bit kernel
can run on this hardware, and 64-bit native always has the direct map, there
is no need to guard it.
In some scenarios, the 4K trapstk may overflow, corrupting tmpstk.
This was observed during remote debugging, with the following steps:
At remote host (R):
- enter kdb during boot
- switch to gdb backend
At local host (L):
- attach gdb to R
- try to read an invalid memory position
At R:
- a DSI trap occurs and kdb restarts (all this occurs on trapstk)
- while printing the stacktrace, trapstk overflows and corrupts tmpstk
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D22200