The vast majority of pmap_kextract() calls are looking for a physical memory
address, not a device address. By checking the page table first this saves
the formerly inevitable 64 (on e500mc and derivatives) iteration loop
through TLB1 in the most common cases.
Benchmarking this on the P5020 (e5500 core) yields a 300% throughput
improvement on dtsec(4) (115Mbit/s -> 460Mbit/s) measured with iperf.
Benchmarked on the P1022 (e500v2 core, 16 TLB1 entries) yields a 50%
throughput improvement on tsec(4) (~93Mbit/s -> 165Mbit/s) measured with
iperf.
MFC after: 1 week
Relnotes: Maybe (significant performance improvement)
* Check TLB1 in all mapdev cases, in case the memattr matches an existing
mapping (doesn't need to be MAP_DEFAULT).
* Fix mapping where the starting address is not a multiple of the widest size
base. For instance, it will now properly map 0xffffef000, size 0x11000 using
2 TLB entries, basing it at 0x****f000, instead of 0x***00000.
MFC after: 2 weeks
According to EREF rlwinm is supposed to clear the upper 32 bits of the
register of 64-bit cores. However, from experience it seems there's a bug
in the e5500 which causes the result to be duplicated in the upper bits of
the register. This causes problems when applied to stashed SRR1 accessed
to retrieve context, as the upper bits are not masked out, so a
set_mcontext() fails. This causes sigreturn() to in turn return with
EINVAL, causing make(1) to exit with error.
This bit is unused in e500mc derivatives (including e5500), so could just be
conditional on non-powerpc64, but there may be other non-Freescale cores
which do use it. This is also the same as the POW bit on Book-S, so could
be cleared unconditionally with the only penalty being a few clock cycles
for these two interrupts.
Without disabling interrupts it's possible for another thread to preempt
and update the registers post-read (tlb1_read_entry) or pre-write
(tlb1_write_entry), and confuse the kernel with mixed register states.
MFC after: 2 weeks
The current method only sort of works, and usually doesn't work reliably.
Also, on Book-E the return address from DEBUG exceptions is not the sentinel
addresses, so it won't exit the loop correctly.
Fix this by better handling trap frames during unwinding, and using the
common trap handler for debug traps, as the code in that segment is
identical between the two.
MFC after: 1 week
Extend the Book-E pmap to support 64-bit operation. Much of this was taken from
Juniper's Junos FreeBSD port. It uses a 3-level page table (page directory
list -- PP2D, page directory, page table), but has gaps in the page directory
list where regions will repeat, due to the design of the PP2D hash (a 20-bit gap
between the two parts of the index). In practice this may not be a problem
given the expanded address space. However, an alternative to this would be to
use a 4-level page table, like Linux, and possibly reduce the available address
space; Linux appears to use a 46-bit address space. Alternatively, a cache of
page directory pointers could be used to keep the overall design as-is, but
remove the gaps in the address space.
This includes a new kernel config for 64-bit QorIQ SoCs, based on MPC85XX, with
the following notes:
* The DPAA driver has not yet been ported to 64-bit so is not included in the
kernel config.
* This has been tested on the AmigaOne X5000, using a MD_ROOT compiled in
(total size kernel+mdroot must be under 64MB).
* This can run both 32-bit and 64-bit processes, and has even been tested to run
a 32-bit init with 64-bit children.
Many thanks to stevek and marcel for getting Juniper's FreeBSD patches open
sourced to be used here, and to stevek for reviewing, and providing some
historical contexts on quirks of the code.
Reviewed by: stevek
Obtained from: Juniper (in part)
MFC after: 2 months
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D9433
Freescale added the E.D profile to e500mc and derivative cores. From
Freescale's EREF reference manual this is enabled by a bit in HID0 and should
otherwise default to traditional debug. However, none of the Freescale cores
support that bit, and instead always use E.D. This results in kernel panics
using the standard debug on e500mc+ cores.
Enhanced debug allows debugging of interrupts, including critical interrupts,
as it uses a different save/restore registers (srr*). At this time we don't use
this ability, so instead share the core of the debug handler code between both
handlers.
MFC after: 3 weeks
Before this, it would cause the one consumer of this API in powerpc usage
(dev/dpaa) to set the PTE WIMG flags to empty instead of --M-, making the
cache-enabled buffer portals non-coherent.
Drop the tracking down to the pmap layer, with optimizations to only track
necessary pages. This should give a (slight) performance improvement, as well
as a stability improvement, as the tracking is already mostly handled by the
pmap layer.
Summary:
The Freescale e500v2 PowerPC core does not use a standard FPU.
Instead, it uses a Signal Processing Engine (SPE)--a DSP-style vector processor
unit, which doubles as a FPU. The PowerPC SPE ABI is incompatible with the
stock powerpc ABI, so a new MACHINE_ARCH was created to deal with this.
Additionaly, the SPE opcodes overlap with Altivec, so these are mutually
exclusive. Taking advantage of this fact, a new file, powerpc/booke/spe.c, was
created with the same function set as in powerpc/powerpc/altivec.c, so it
becomes effectively a drop-in replacement. setjmp/longjmp were modified to save
the upper 32-bits of the now-64-bit GPRs (upper 32-bits are only accessible by
the SPE).
Note: This does _not_ support the SPE in the e500v1, as the e500v1 SPE does not
support double-precision floating point.
Also, without a new MACHINE_ARCH it would be impossible to provide binary
packages which utilize the SPE.
Additionally, no work has been done to support ports, work is needed for this.
This also means no newer gcc can yet be used. However, gcc's powerpc support
has been refactored which would make adding a powerpcspe-freebsd target very
easy.
Test Plan:
This was lightly tested on a RouterBoard RB800 and an AmigaOne A1222
(P1022-based) board, compiled against the new ABI. Base system utilities
(/bin/sh, /bin/ls, etc) still function appropriately, the system is able to boot
multiuser.
Reviewed By: bdrewery, imp
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D5683
Move PMAP_TS_REFERENCED_MAX out of the various pmap implementations and
into vm/pmap.h, and describe what its purpose is. Eliminate the archaic
"XXX" comment about its value. I don't believe that its exact value, e.g.,
5 versus 6, matters.
Update the arm64 and riscv pmap implementations of pmap_ts_referenced()
to opportunistically update the page's dirty field.
On amd64, use the PDE value already cached in a local variable rather than
dereferencing a pointer again and again.
Reviewed by: kib, markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D7836
pmap_early_io_map()/pmap_early_io_unmap(), if used in pairs, should be used in
the form:
pmap_early_io_map()
..do stuff..
pmap_early_io_unmap()
Without other allocations in the middle. Without reclaiming memory this can
leave large holes in the device space.
While here, make a simple change to the unmap loop which now permits it to unmap
multiple TLB entries in the range.
Idle page zeroing has been disabled by default on all architectures since
r170816 and has some bugs that make it seemingly unusable. Specifically,
the idle-priority pagezero thread exacerbates contention for the free page
lock, and yields the CPU without releasing it in non-preemptive kernels. The
pagezero thread also does not behave correctly when superpage reservations
are enabled: its target is a function of v_free_count, which includes
reserved-but-free pages, but it is only able to zero pages belonging to the
physical memory allocator.
Reviewed by: alc, imp, kib
Differential Revision: https://reviews.freebsd.org/D7714
Summary:
Kernel maps only one page of FDT. When FDT is more than one page in size, data
TLB miss occurs on memmove() when FDT is moved to kernel storage
(sys/powerpc/booke/booke_machdep.c, booke_init())
This introduces a pmap_early_io_unmap() to complement pmap_early_io_map(), which
can be used for any early I/O mapping, but currently is only used when mapping
the fdt.
Submitted by: Ivan Krivonos <int0dster_gmail.com>
Differential Revision: https://reviews.freebsd.org/D7605
Summary: Current booke/pmap code ignores mas7 and mas8 on e6500 CPU.
Submitted by: Ivan Krivonos <int0dster_gmail.com>
Differential Revision: https://reviews.freebsd.org/D7606
Summary:
There is often a need at the debugger to print arbitrary special
purpose registers (SPRs) on PowerPC. Using a rewritable asm stub, print any SPR
provided on the command line.
Note, as there is no checking in this, attempting to print a nonexistent SPR
may cause a Program exception (illegal instruction, or boundedly undefined).
Note also that this relies on the kernel text pages being writable. If in the
future this is made not the case, this will need to be reworked.
Test Plan:
Printing the Processor Version Register (PVR, SPR 287):
db> show spr 11f
SPR 287(11f): 80240012
Differential Revision: https://reviews.freebsd.org/D7403
Summary:
u-boot, following the ePAPR specification, puts secondary cores into a
spinloop at boot, rather than leaving them shut off. It then relies on the host
OS to write the correct values to a special spin table, located in coherent
memory (on newer implementations), or noncoherent memory (older
implementations).
This supports both implementations of ePAPR, as well as continuing to support
non-ePAPR booting, by first attempting to use the spintable, and falling back to
expecting non-started CPUs.
Test Plan:
Booted on a P5020 board. Tested before and after the changes.
Before the changes, prints the error "SMP: CPU 1 already out of hold-off state!"
and panics shortly thereafter. After the changes, same boot method lets it
complete boot.
Reviewed by: nwhitehorn
MFC after: 2 weeks
Relnotes: Yes
Sponsored by: Alex Perez/Inertial Computing
Differential Revision: https://reviews.freebsd.org/D7494
Though the chances of the code in these sections changing are low, future-proof
the sections and use label math.
Renumber the surrounding areas to avoid duplicate label numbers.
L3 cache is not defined by Book-E, so is platform specific. Since it was
already moved for e500-based devices into mpc85xx in r292903, just eliminate it
altogether. Any device that supports L3 cache should have its own platform
means to enable it.
rounddown2 tends to produce longer lines than the original code
and when the code has a high indentation level it was not really
advantageous to do the replacement.
This tries to strike a balance between readability using the macros
and flexibility of having the expressions, so not everything is
converted.
Summary:
PowerPC Book-E SMP is currently broken for unknown reasons. Pull in
Semihalf changes made c2012 for e500mc/e5500, which enables booting SMP.
This eliminates the shared software TLB1 table, replacing it with
tlb1_read_entry() function.
This does not yet support ePAPR SMP booting, and doesn't handle resetting CPUs
already released (ePAPR boot releases APs to a spin loop waiting on a specific
address). This will be addressed in the near future by using the MPIC to reset
the AP into our own alternate boot address.
This does include a change to the dpaa/dtsec(4) driver, to mark the portals as
CPU-private.
Test Plan:
Tested on Amiga X5000/20 (P5020). Boots, prints the following
messages:
Adding CPU 0, pir=0, awake=1
Waking up CPU 1 (dev=1)
Adding CPU 1, pir=20, awake=1
SMP: AP CPU #1 launched
top(1) shows CPU1 active.
Obtained from: Semihalf
Relnotes: Yes
Differential Revision: https://reviews.freebsd.org/D5945
Summary:
There is currently a 1GB hole between user and kernel address spaces
into which direct (1:1 PA:VA) device mappings go. This appears to go largely
unused, leaving all devices to contend with the 128MB block at the end of the
32-bit space (0xf8000000-0xffffffff). This easily fills up, and needs to be
densely packed. However, dense packing wastes precious TLB1 space, of which
there are only 16 (e500v2) or 64(e5500) entries available.
Change this by using the 1GB space for all device mappings, and allow the kernel
to use the entire upper 1GB for KVA. This also allows us to use sparse device
mappings, freeing up TLB entries.
Test Plan: Boot tested on p5020.
Differential Revision: https://reviews.freebsd.org/D5832
Freescale's QorIQ line includes a new ethernet controller, based on their
Datapath Acceleration Architecture (DPAA). This uses a combination of a Frame
manager, Buffer manager, and Queue manager to improve performance across all
interfaces by being able to pass data directly between hardware acceleration
interfaces.
As part of this import, Freescale's Netcomm Software (ncsw) driver is imported.
This was an attempt by Freescale to create an OS-agnostic sub-driver for
managing the hardware, using shims to interface to the OS-specific APIs. This
work was abandoned, and Freescale's primary work is in the Linux driver (dual
BSD/GPL license). Hence, this was imported directly to sys/contrib, rather than
going through the vendor area. Going forward, FreeBSD-specific changes may be
made to the ncsw code, diverging from the upstream in potentially incompatible
ways. An alternative could be to import the Linux driver itself, using the
linuxKPI layer, as that would maintain parity with the vendor-maintained driver.
However, the Linux driver has not been evaluated for reliability yet, and may
have issues with the import, whereas the ncsw-based driver in this commit was
completed by Semihalf 4 years ago, and is very stable.
Other SoC modules based on DPAA, which could be added in the future:
* Security and Encryption engine (SEC4.x, SEC5.x)
* RAID engine
Additional work to be done:
* Implement polling mode
* Test vlan support
* Add support for the Pattern Matching Engine, which can do regular expression
matching on packets.
This driver has been tested on the P5020 QorIQ SoC. Others listed in the
dtsec(4) manual page are expected to work as the same DPAA engine is included in
all.
Obtained from: Semihalf
Relnotes: Yes
Sponsored by: Alex Perez/Inertial Computing
Summary:
Some drivers need special memory requirements. X86 solves this with a
pmap_change_attr() API, which DRM uses for changing the mapping of the GART and
other memory regions. Implement the same function for PowerPC. AIM currently
does not need this, but will in the future for DRM, so a default is added for
that, for business as usual. Book-E has some drivers coming down that do
require non-default memory coherency. In this case, the Datapath Acceleration
Architecture (DPAA) based ethernet controller has 2 regions for the buffer
portals: cache-inhibited, and cache-enabled. By default, device memory is
cache-inhibited. If the cache-enabled memory regions are mapped
cache-inhibited, an alignment exception is thrown on access.
Test Plan:
Tested with a new driver to be added after this (DPAA dTSEC ethernet driver).
No alignment exceptions thrown, driver works as expected with this.
Reviewed By: nwhitehorn
Sponsored by: Alex Perez/Inertial Computing
Differential Revision: https://reviews.freebsd.org/D5471
Summary:
The revised Book-E spec, adding the specification for the MMUv2 and e6500,
includes a hardware PTE layout for indirect page tables. In order to support
this in the future, migrate the PTE format to match the MMUv2 hardware PTE
format.
Test Plan: Boot tested on a P5020 board. Booted to multiuser mode.
Differential Revision: https://reviews.freebsd.org/D5224
The only difference between dcbzl and dcbz is dcbzl operates on native cache
line lengths regardless of L1CSR0[DCBZ32]. Since we don't change the cache line
size, the cacheline_size variable will reflect the used cache line length, and
dcbz will work as expected.
By confining the page table management to a handful of functions it'll be
easier to modify the page table scheme without affecting other functions.
This will be necessary when 64-bit support is added, and page tables become
much larger.
On powerpc64, pointers are 64 bits, so casting from uint32_t changes the integer
width.
The alternative was to use register_t, but I didn't see register_t used as
argument type for any other functions, though didn't look too closely. u_long
was an acceptable alternative. On 64-bit it's 64 bits, on 32-bit it's 32 bits.
powerpc_init() initializes the mmu. Since this may clear pages via
pmap_zero_page(), set the cacheline size before calling into it, so
pmap_zero_page() has the right cacheline size. This isn't completely
necessary now, but will be when 64-bit book-e is completed.
This includes the following changes:
* SMP kickoff for QorIQ (tested on P5020)
* Errata fixes for some silicon revisions
* Enables L2 (and L3 if available) caches
Obtained from: Semihalf
Sponsored by: Alex Perez/Inertial Computing
There's no need for it to be in asm. Also, by writing in C, and marking it
static in pmap.c, it saves a branch to the function itself, as it's only used in
one location. The generated asm is virtually identical to the handwritten code.
Summary:
With some additional changes for AIM, that could also support much
larger physmem sizes. Given that 32-bit AIM is more or less obsolete, though,
it's not worth it at this time.
Differential Revision: https://reviews.freebsd.org/D4345