In theory the errors during llan_attach should be handled, but other
errors in llan_attach (e.g. bus_setup_intr) are already ignored, so
just remove the unused variable to preserve the status quo.
ISA 3.0 allows for nested radix translations with minimal to no
involvement of the hypervisor. This should make pseries signficantly
faster on POWER9 pseries instances, as fewer hypercalls are needed to
manage pmap now.
MFC after: 2 weeks
Relnotes: yes
Now that the upper layers all go through a layer to tie into these
information functions that translates an sbuf into char * and len. The
current interface suffers issues of what to do in cases of truncation,
etc. Instead, migrate all these functions to using struct sbuf and these
issues go away. The caller is also in charge of any memory allocation
and/or expansion that's needed during this process.
Create a bus_generic_child_{pnpinfo,location} and make it default. It
just returns success. This is for those busses that have no information
for these items. Migrate the now-empty routines to using this as
appropriate.
Document these new interfaces with man pages, and oversight from before.
Reviewed by: jhb, bcr
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D29937
Commit 49c894ddce introduced an issue that prevented pseries boot,
when hugepages were not available to the guest. Now large page
info must be available before moea64_install is called, so this change
moves the code that scans large page sizes before the call.
Reviewed by: jhibbits (IRC)
Sponsored by: Instituto de Pesquisas Eldorado (eldorado.org.br)
It's a class0 driver that implements some pcib methods and creates
a pci bus as its children.
The "ofw_pci" name will be used by a new driver that will be a subclass
of the pci bus.
No functional changes intended.
Submitted by: Kornel Duleba <mindal@semihalf.com>
Reviewed by: andrew
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D30226
Summary:
Some methods are split between DMAP and non-DMAP, conditional on
hw_direct_map variable. Rather than checking this variable every time,
use it to install different functions via IFUNCs.
Reviewed By: luporl
Differential Revision: https://reviews.freebsd.org/D30071
In uart_phyp_get(), when the internal buffer is empty, we make a
hypercall to retrieve up to 16 bytes of input data from the
hypervisor. As this is specified to be returned in BE format, we need
to do a 64-bit byte swap on the first and second half of the data.
If the buffer being passed in was insufficient to return the fetched
data, we store the remainder in the internal buffer and use it to
satisfy the following calls to uart_phyp_get() until it is drained.
However, in this case, we were accidentally byteswapping the internal
buffer again.
Move the byteswapping code to just after the hypercall so it only gets
swapped when we're filling the buffer.
Fixes arrow keys in qemu on pseries, among other console oddities.
Sponsored by: Tag1 Consulting, Inc.
MFC after: 3 days
Follow-up to r353959 and r368070: do the same for other architectures.
arm32 already seems to use its own .fnstart/.fnend directives, which
appear to be ARM-specific variants of the same thing. Likewise, MIPS
uses .frame directives.
Reviewed by: arichardson
Differential Revision: https://reviews.freebsd.org/D27387
This change adds support for transparent superpages for PowerPC64
systems using Hashed Page Tables (HPT). All pmap operations are
supported.
The changes were inspired by RISC-V implementation of superpages,
by @markj (r344106), but heavily adapted to fit PPC64 HPT architecture
and existing MMU OEA64 code.
While these changes are not better tested, superpages support is disabled by
default. To enable it, use vm.pmap.superpages_enabled=1.
In this initial implementation, when superpages are disabled, system
performance stays at the same level as without these changes. When
superpages are enabled, buildworld time increases a bit (~2%). However,
for workloads that put a heavy pressure on the TLB the performance boost
is much bigger (see HPC Challenge and pgbench on D25237).
Reviewed by: jhibbits
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D25237
Unlike virtio, which in legacy mode is guest endian, the hypervisor vscsi
interface operates in big endian, so we must convert back and forth in several
places.
These changes are enough to attach a rootdisk.
Sponsored by: Tag1 Consulting, Inc.
Since we will need to be able to take traps relatively early in the process,
ensure that the hypervisor changes our ILE for us as soon as we are ready.
Sponsored by: Tag1 Consulting, Inc.
With IFUNC support in the kernel, we can finally get rid of our poor-man's
ifunc for pmap, utilizing kobj. Since moea64 uses a second tier kobj as
well, for its own private methods, this adds a second pmap install function
(pmap_mmu_init()) to perform pmap 'post-install pre-bootstrap'
initialization, before the IFUNCs get initialized.
Reviewed by: bdragon
This change enables the use of OpenFirmware Console (ofwcons), even when VGA is
available, allowing early kernel messages to be seen, that is important in case
of crashes before VGA console initialization.
This is specially useful in virtualized environments, where the user/developer
doesn't have full control of the virtualization engine (e.g. OpenStack).
The old behavior is preserved by default and, in order to use ofwcons, a few
tunables that have been introduced need to be set:
- hw.ofwfb.disable=1 - disable OFW FrameBuffer device
- machdep.ofw.mtx_spin=1 - change PPC OFW mutex to SPIN type, to match kernel
console's mutex type
- debug.quiesce_ofw=0 - don't call OFW quiesce, needed to keep ofwcons I/O
working
More details can be found at differential revision D20640.
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D20640
This change makes it possible to use OPAL console as a GDB debug port.
Similar to uart and uart_phyp debug ports, it has to be enabled by
setting the hw.uart.dbgport variable to the serial console node
of the device tree.
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D22649
Since version 2.11.0, QEMU became bug-compatible with
PowerVM's vty implementation, by inserting a \0 after
every \r going to the guest. Guests are expected to
workaround this issue by removing every \0 immediately
following a \r.
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D22171
This change makes it possible to use a POWER Hypervisor virtual
terminal device (phyp vty) as a GDB debug port.
Similar to the uart debug port, it has to be enabled by setting
the hw.uart_phyp.dbgport variable to the vty node of the device
tree.
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D22205
Based on POWER9BSD implementation, with all POWER9 specific code removed and
addition of new methods in PPC64 MMU interface, to isolate platform specific
code. Currently, the new methods are implemented on pseries and PowerNV
(D21643).
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D21551
Summary:
It turns out statistics accounting is very expensive in the pmap driver,
and doesn't seem necessary in the common case. Make this optional
behind a MOEA64_STATS #define, which one can set if they really need
statistics.
This saves ~7-8% on buildworld time on a POWER9.
Found by bdragon.
Reviewed by: luporl
Differential Revision: https://reviews.freebsd.org/D20903
On POWER9/pseries, QEMU passes several regions of memory,
instead of a single region containing all memory, as the
code was expecting.
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D20857
There was an issue in pseries llan driver, that resulted in the first 2 bytes
of the MAC address getting stripped, and the last 2 being always 0.
In most cases the network interface still worked, despite the MAC being
different of what was specified to QEMU, but when some other host or DHCP
server expected a specific MAC, this would fail.
This change fixes this by shifting right by 2 the local-mac-address read from
device tree, if its length is 6 instead of 8, as observed in QEMU DT, that
always presents a 6 bytes value for this property.
PR: 237471
Reported by: Alfredo Dal'Ava Junior
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D20843
When building a kernel supporting PSERIES but not POWERNV,
the compiler would complain about an error variable being
possibly used before being initialized.
In practice, however, this should never happen. In any case, it
is now initialized to an error value.
This set of changes make it possible to run FreeBSD for PowerPC64/pseries,
under QEMU/KVM, without requiring the host to make hugepages available to the
guest.
While there was already this possibility, by means of setting hw_direct_map to
0, on PowerPC64 there were a couple of issues/wrong assumptions that prevented
this from working, before this changelist.
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D20522
This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h"
in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces header
pollution substantially.
EVENTHANDLER_DECLARE and EVENTHANDLER_LIST_DECLAREs were moved out of .c
files into appropriate headers (e.g., sys/proc.h, powernv/opal.h).
As a side effect of reduced header pollution, many .c files and headers no
longer contain needed definitions. The remainder of the patch addresses
adding appropriate includes to fix those files.
LOCK_DEBUG and LOCK_FILE_LINE_ARG are moved to sys/_lock.h, as required by
sys/mutex.h since r326106 (but silently protected by header pollution prior
to this change).
No functional change (intended). Of course, any out of tree modules that
relied on header pollution for sys/eventhandler.h, sys/lock.h, or
sys/mutex.h inclusion need to be fixed. __FreeBSD_version has been bumped.
As mphyp_pte_unset() can also remove PTE entries, and as this can
happen in parallel with PTEs evicted by mphyp_pte_insert(), there
is a (rare) chance the PTE being evicted gets removed before
mphyp_pte_insert() is able to do so. Thus, the KASSERT should
check wether the result is H_SUCCESS or H_NOT_FOUND, to avoid
panics if the situation described above occurs.
More details about this issue can be found in PR 237470.
PR: 237470
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D20012
Summary: when using pseries-llan driver, Opkts and Oerrs counters (netstat
-i) are always zero. This patch adds an small error handling to increment
these counters.
Submitted by: alfredo.junior_eldorado.org.br
Differential Revision: https://reviews.freebsd.org/D20009
Some hypervisor calls, such as H_SEND_LOGICAL_LAN, take more arguments than
are traditionally passed in registers. The HCALL ABI will accept these
arguments in r11 and r12. With ELFv2 ABI, these arguments are 2
double-words lower than ELFv1 ABI, as two double-words in the stack frame
are no longer used, and therefore removed from the frame. Fix the offsets
for loading the registers for the HCALL. This fixes the phyp_llan driver
with ELFv2 kernel.
Submitted by: alfredo.junior_eldorado.org.br
Differential Revision: https://reviews.freebsd.org/D20008
When running several builders in parallel, on QEMU, with 8GB of
memory, a fatal kernel trap (0x300 (data storage interrupt))
caused by llan driver is sometimes observed, when the system
starts to run out of swap space.
This happens because, at llan_intr(), a phyp call to add a
logical LAN buffer is always made when llan_add_rxbuf() fails,
even if it fails to allocate a new buffer.
PR: 235489
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D19084
The XIVE (External Interrupt Virtualization Engine) is a new interrupt
controller present in IBM's POWER9 processor. It's a very powerful,
very complex device using queues and shared memory to improve interrupt
dispatch performance in a virtualized environment.
This yields a ~10% performance improvment over the XICS emulation mode,
measured in both buildworld, and 'dd' from nvme to /dev/null.
Currently, this only supports native access.
MFC after: 1 month
The powerpc_intr structure is not zero-initialized, so on an invariants
build would panic in the xics driver with an invalid pointer. Also fix the
xics driver to share the private data setup code between xics_enable() and
xics_bind().
Reported by: Leonardo Bianconi
The IPI vector is static, and happens to be the most common interrupt by far
on some systems. Rather than searching for the interrupt every time, cache
the index.
This appears to yield a small performance boost, of about 8% reduction in
buildworld times, on my POWER9 system, when paired with r342975.
The XICS and XIVE need extra data beyond irq and vector. Rather than
performing a separate search, it's better for the general interrupt facility
to hold a private pointer, since the search already must be done anyway at
that level.
In r342771, I introduced a regression in Power by abusing the platform
smp_topo() method as a shortcut for providing the MI information needed for
the stated sysctls. The smp_topo() method was already called later by
sched_ule (under the name cpu_topo()), and initializes a static array of
scheduler topology information. I had skimmed the smp_topo_foo() functions
and assumed they were idempotent; empirically, they are not (or at least,
detect re-initialization and panic).
Do the cleaner thing I should have done in the first place and add a
platform method specifically for core- and thread-count probing.
Reported by: luporl via jhibbits
Reviewed by: luporl
X-MFC-With: r342771
Differential Revision: https://reviews.freebsd.org/D18777
With new sysctls (to the best of our ability do detect them). Restructured
smp.4 slightly for clarity (keep relevant stuff closer to the top) while
documenting.
Reviewed by: markj, jhibbits (ppc parts)
MFC after: 3 days
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D18322
Discussing with Benjamin Herrenschmidt, OPAL_INT_GET_XIRR masks the
returned priority, so must be resumed before more interrupts can be
handled at this priority. Since there are only two priorities used in
FreeBSD, we know that the previous priority in an EOI will always be
0xff (lowest priority).
Reviewed by: nwhitehorn
Approved by: re(rgrimes)
Differential Revision: https://reviews.freebsd.org/D17361
Summary:
POWER9 systems use a new interrupt controller, XIVE, managed through OPAL
firmware calls. The OPAL firmware includes support for emulating the previous
generation XICS presentation layer in addition to a new "XIVE Exploitation"
mode. As a stopgap until we have XIVE exploitation mode, enable XICS emulation
mode so that we at least have an interrupt controller.
Since the CPPR is local to the current CPU, it cannot be updated for APs when
initializing on the BSP. This adds a new function, directly called by the
powernv platform code, to initialize the CPPR on AP bringup.
Reviewed by: nwhitehorn
Differential Revision: https://reviews.freebsd.org/D15492