Commit Graph

62 Commits

Author SHA1 Message Date
Justin Hibbits
0effb2ccf3 powerpc/powernv: Only clear EEH freeze for some errors
Only clear an EEH freeze if an error occurs.  However, if an OPAL_HARDWARE
error is returned, this indicates a hardware failure which cannot be
unfrozen, and instead needs a hardware reset.  Attempting to unfreeze a
broken PCH will result in console spam for each attempt.  To avoid the spam,
just don't do it.
2019-08-01 03:59:25 +00:00
Justin Hibbits
fdb916d53e powernv: Port HMI handler to use the message framework
When an HMI occurs a message event also gets created with the details of the
exception.  Hook into the messaging framework to retrieve the HMI message.
Nothing is done with it yet, except to panic on unhandled exception.
2019-06-10 03:24:38 +00:00
Justin Hibbits
f433dab2de powerpc/powernv: Reduce the scope of the sensor guarding mutex
vmem_xalloc() cannot be called while holding a nonblocking mutex, warned
by WITNESS.  The lock may not be necessary in general, but it avoids
superfluous concurrent OPAL calls for the same sensor.

Reported by:	pkubaj
2019-06-10 03:16:55 +00:00
Conrad Meyer
e2e050c8ef Extract eventfilter declarations to sys/_eventfilter.h
This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h"
in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces header
pollution substantially.

EVENTHANDLER_DECLARE and EVENTHANDLER_LIST_DECLAREs were moved out of .c
files into appropriate headers (e.g., sys/proc.h, powernv/opal.h).

As a side effect of reduced header pollution, many .c files and headers no
longer contain needed definitions.  The remainder of the patch addresses
adding appropriate includes to fix those files.

LOCK_DEBUG and LOCK_FILE_LINE_ARG are moved to sys/_lock.h, as required by
sys/mutex.h since r326106 (but silently protected by header pollution prior
to this change).

No functional change (intended).  Of course, any out of tree modules that
relied on header pollution for sys/eventhandler.h, sys/lock.h, or
sys/mutex.h inclusion need to be fixed.  __FreeBSD_version has been bumped.
2019-05-20 00:38:23 +00:00
Justin Hibbits
b4698b7a6c powerpc: Drop OPAL_HANDLE_HMI2 for now, to avoid panicking
It's possible for a Hypervisor Maintenance Interrupt (HMI) to occur while in
the pmap code, holding locks.  This can cause WITNESS to panic due to lock
errors in calling pmap_kextract().  Since we don't yet handle the flags
returned by OPAL_HANDLE_HMI2, just stop using it, so that we don't call into
pmap_kextract().

Reported by:	pkubaj
2019-05-02 03:39:03 +00:00
Justin Hibbits
e2e3e7d28e powerpc: Make OPAL root node probe at bus pass
This way its children can attach earlier if needed, and some subsystems are
attached earlier, like the asynchronous token management.

MFC after:	2 weeks
2019-04-29 01:10:57 +00:00
Justin Hibbits
93096fecb6 powerpc64/powernv: Relax flash block write requirements
Since writes don't necessarily need to be on erase-block boundaries, we can
relax the block size and alignments down to sector size.  If it needs to be
erased, opalflash_erase() will check proper alignment and size.
2019-04-20 02:44:38 +00:00
Justin Hibbits
bc60451a47 powerpc/powernv: Make erasing before writes optional
If the OPAL flash driver supports writing without erase, it adds a
'no-erase' property to the flash device node.  Honor that property and don't
bother erasing if it exists.
2019-04-19 02:28:04 +00:00
Justin Hibbits
49d9a59783 Add NUMA support to powerpc
Summary:
Initial NUMA support:
    - associate CPU with domain
    - associate memory ranges with domain
    - identify domain for devices
    - limit device interrupt binding to appropriate domain

- Additionally fixes a bug in the setting of Maxmem which led to
  only memory attached to the first socket being enabled for DMA

A pmap variant can opt in to numa support by by calling `numa_mem_regions`
at the end of pmap_bootstrap - registering the corresponding ranges with the
VM.

This yields a ~20% improvement in build times of llvm on dual socket POWER9
over non-NUMA.

Original patch by mmacy.

Differential Revision: https://reviews.freebsd.org/D17933
2019-04-13 04:03:18 +00:00
Justin Hibbits
3c8c50f955 powerpc/powernv: Fix major bugs in opal_flash
* The BIO bio_data may not be page aligned.  Only the base address of each
  page worth of data is extracted to pass to OPAL.  Without page alignment
  it can scribble over random memory when finishing the page read.  Fix this
  by short-reading the first page to properly align for full page reads.
* Fix the definition of OPAL_FLASH_ERASE.
* Properly handle the async message result, as now returned from r345974.
2019-04-06 02:39:56 +00:00
Justin Hibbits
947079ebee powerpc/powernv: Fix issues in opal_async
* Properly return the full opal_msg from an async completion.
* Don't keep bugging OPAL, wait 100us or so.  With some minor changes to
  DELAY() to drop to very low priority, the thread won't hog the CPU while
  polling for the async completion.
2019-04-06 02:31:01 +00:00
Justin Hibbits
fbf7737949 powernv: Port OPAL asynchronous framework to use the new message framework
Since OPAL_GET_MSG does not discriminate between message types, asynchronous
completion events may be received in the OPAL_GET_MSG call, which dequeues
them from the list, thus preventing OPAL_CHECK_ASYNC_COMPLETION from
succeeding.  Handle this case by integrating with the messaging framework.
2019-04-02 04:02:57 +00:00
Justin Hibbits
911a92603e powerpc/powernv: Add OPAL heartbeat thread
Summary:
OPAL needs to be kicked periodically in order for the firmware to make
progress on its tasks.  To do so, create a heartbeat thread to perform this task
every N milliseconds, defined by the device tree.  This task is also a central
location to handle all messages received from OPAL.

Reviewed By: luporl
Differential Revision: https://reviews.freebsd.org/D19743
2019-04-02 04:00:01 +00:00
Justin Hibbits
0499e9c619 powerpc64: Use medium code model in asm files for TOC references
Summary:
With a sufficiently large TOC, it's possible to index out of range, as
the immediate load instructions only permit 16-bit indices, allowing up
to 64kB range (signed) from the base pointer.  Allow +/- 2GB range, with
the medium code model TOC accesses in asm.

Patch originally by Brandon Bergren.  The issue appears to impact ELFv2
more than ELFv1.

Reviewed by:	luporl
Differential Revision: https://reviews.freebsd.org/D19708
2019-03-29 02:38:30 +00:00
Justin Hibbits
8af4cc4d5a powernv: Add Hypervisor Maintenance Interrupt handler
Attempting to build www/firefox on POWER9 resulted in a HMI exception being
thrown, a fatal trap currently.  This is typically caused by timer facility
errors, but examination of the Hypervisor Maintenance Exception Register
(HMER) yielded only that an exception had recovered, with no information of
the actual exception cause.

When an HMI occurs, OPAL_HANDLE_HMI or OPAL_HANDLE_HMI2 must be called to
handle the exception at the firmware level.  If the exception is handled, we
can continue.

This adds only the preliminary handler, enough to prevent package building
from panicking.  An enhancement in the future is to use the flags returned
by OPAL_HANDLE_HMI2 to print more useful error messages, and log maintenance
events.

Reviewed by:	luporl
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D19634
2019-03-23 03:23:20 +00:00
Justin Hibbits
6775dfdf54 powerpc/powernv: Add OPAL flash device driver
Firmware needed by petitboot, for example, GPU firmware, can be installed to
a partition in the flash filesystem.  This driver exposes the full flash
given by the device tree, letting the user manage firmware, etc, from
FreeBSD.

To use the partitions provided by the flash module, the fdt_slicer module is
needed, but the module isn't needed for raw access, so there's no direct
dependency link in here.

MFC after:	2 weeks
2019-03-01 04:36:55 +00:00
Justin Hibbits
dac618a648 powerpc/powernv: Add asynchronous token management for powernv
The OPAL firmware only supports a finite number of in-flight asynchronous
operations.  Rather than have each subsystem try to manage its own, use a
central management service to hand out tokens.

More work can be done to improve asynchronous behavior, such as funneling
things through a future OPAL heartbeat handler, but capabilities will be
added as needed.

Augment the existing consumers (i2c and sensors) to use this new API.

MFC after:	4 weeks
2019-03-01 02:49:47 +00:00
Justin Hibbits
d49fc192c1 powerpc/powernv: Add a driver for the POWER9 XIVE interrupt controller
The XIVE (External Interrupt Virtualization Engine) is a new interrupt
controller present in IBM's POWER9 processor.  It's a very powerful,
very complex device using queues and shared memory to improve interrupt
dispatch performance in a virtualized environment.

This yields a ~10% performance improvment over the XICS emulation mode,
measured in both buildworld, and 'dd' from nvme to /dev/null.

Currently, this only supports native access.

MFC after:	1 month
2019-02-02 04:15:16 +00:00
Justin Hibbits
56505ec016 powerpc: Add opaque 'private data' to interrupt vectors
The XICS and XIVE need extra data beyond irq and vector.  Rather than
performing a separate search, it's better for the general interrupt facility
to hold a private pointer, since the search already must be done anyway at
that level.
2019-01-12 22:05:42 +00:00
Conrad Meyer
bba9cbe374 powerpc: Fix regression introduced in r342771
In r342771, I introduced a regression in Power by abusing the platform
smp_topo() method as a shortcut for providing the MI information needed for
the stated sysctls.  The smp_topo() method was already called later by
sched_ule (under the name cpu_topo()), and initializes a static array of
scheduler topology information.  I had skimmed the smp_topo_foo() functions
and assumed they were idempotent; empirically, they are not (or at least,
detect re-initialization and panic).

Do the cleaner thing I should have done in the first place and add a
platform method specifically for core- and thread-count probing.

Reported by:	luporl via jhibbits
Reviewed by:	luporl
X-MFC-With:	r342771
Differential Revision:	https://reviews.freebsd.org/D18777
2019-01-07 19:39:31 +00:00
Conrad Meyer
6b83069e05 Expose threads-per-core and physical core count information
With new sysctls (to the best of our ability do detect them).  Restructured
smp.4 slightly for clarity (keep relevant stuff closer to the top) while
documenting.

Reviewed by:	markj, jhibbits (ppc parts)
MFC after:	3 days
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D18322
2019-01-04 18:31:17 +00:00
Justin Hibbits
ad39591ad2 powerpc/powernv: Restrict the busdma tag to only POWER8
It seems this tag is causing problems on POWER9 systems.  Since no POWER9 user
has encountered the problem fixed by r339589 just restrict it to POWER8 for now.
A better fix will likely be to update powerpc/busdma_machdep.c to handle the
window correctly.

Reported by:	mmacy, others
2018-11-08 20:31:12 +00:00
Leandro Lupori
d93e635a81 ppc64: limited 32-bit DMA address range
Further investigation of issues with 32-bit DMA on PowerNV revealed that
its window is hardcoded by OPAL (at least in skiboot version 5.4.9) and
cannot be changed by the OS.
Thus, now jhb suggestion of limiting the range in PCI DMA tag seems
the best way to deal with it.

Reviewed by:	jhibbits, nwhitehorn, sbruno
Approved by:	jhibbits(mentor)
Differential Revision:	https://reviews.freebsd.org/D17601
2018-10-22 13:40:50 +00:00
Justin Hibbits
27ef2ca86b powerpc64/powernv: Add pnpinfo strings to opal device children
This makes it easier to see what's left unattached as new drivers are
written, and to see what drivers get attached to what nodes.
2018-10-21 02:30:34 +00:00
Justin Hibbits
2756851a77 powerpc64/powernv:opal_pci: Fix the alignment of the TCE table
The TCE table need only be aligned to the size of the table, not the size of
the TCE segment.
2018-10-21 02:24:37 +00:00
Justin Hibbits
013cc176c9 powerpc64/powernv: Don't mask MSIs in OPAL
Summary:
Discussing with Benjamin Herrenschmidt, MSIs, and edge-triggered
interrupts in general, must not be masked in XICS and XIVE, else
subsequent interrupts may be ignored.

Testing locally on my Talos II (single CPU, 18-core POWER9), NVMe now
works with MSI, improving read throughput by ~70% (900MB/s -> 1.67GB/s,
with 64MB block size) over INTx interrupts, and snd_hda(4) now will
actually play music with MSI.  Previously, snd_hda(4) would not receive
interrupts, timing out, and declaring the channels dead.

This has also been tested by Kevin Bowling, and others, with great
success.  Kevin reported NVMe unusable on his Talos II prior to this
patch.

Reviewed by:	nwhitehorn, kbowling
Approved by:	re(rgrimes)
Differential Revision: https://reviews.freebsd.org/D17356
2018-10-06 03:20:26 +00:00
Breno Leitao
78f4e2fea0 powerpc64/powernv: re-read RTC after polling
If OPAL_RTC_READ is busy and does not return the information on the first run,
as returning OPAL_BUSY_EVENT, the system will crash since ymd and hmsm variable
will contain junk values.

This is happening because we were not calling OPAL_RTC_READ again after
OPAL_POLL_EVENTS' return, which would finally replace the old/junk hmsm and ymd
values.

The code was also mixing OPAL_RTC_READ and OPAL_POLL_EVENTS return values.

This patch fix this logic and guarantee that we call OPAL_RTC_READ after
OPAL_POLL_EVENTS return, and guarantee the code will only proceed if
OPAL_RTC_READ returns OPAL_SUCCESS.

Reviewed by: jhibbits
Approved by: jhibbits (mentor)
Differential Revision: https://reviews.freebsd.org/D16617
2018-08-08 21:19:07 +00:00
Justin Hibbits
0bf0bb832f Support building IPMI as a module on powerpc64
This still only supports IPMI via OPAL on powerpc64, but now it can be tested
with a GENERIC kernel.
2018-07-25 18:58:57 +00:00
Justin Hibbits
3395ab28eb powerpc/powernv: Make opal_i2c driver work with attached i2c drivers
* FreeBSD stores addresses in 8 bit format, but the OPAL API requires the 7-bit
  address, and encodes the direction elsewhere.  Behave like other i2c drivers,
  and shift accordingly.
* The OPAL API can already handle multiple requests in flight.  Change the async
  token to be private to the thread, so as not to stomp across i2c accesses,
  remove the limitation error message, and use the correct message index to
  transfer all messages in the list.
* Micro-optimize the async handler to not continuously call pmap_kextract() when
  spin-waiting for the operation to complete.

This has been tested by hexdumping an EEPROM attached via the icee(4) driver.
2018-07-09 20:33:48 +00:00
Justin Hibbits
fedd55f14b Let ofw_iicbus work its magic on OPAL i2c buses.
ofw_iicbus already has attachments on iichb.  Rather than adding an explicit
attachment onto opal_i2c, simply change the exposed name of the OPAL i2c bus
to 'iichb'.
2018-07-07 01:58:40 +00:00
Justin Hibbits
341679e1f2 Support multiple OPAL consoles, and don't crash if uart is not stdout
Summary: If the chosen console is not the OPAL uart, but OPAL uart devices
exist, the console device doesn't attach properly, and faults in the interrupt
handler, with a NULL pointer dereference.  To fix this, and as a byproduct, also
support multiple OPAL consoles, refactor to have the console getc callback use
the appropriate softc instead of the global console_sc, which may be NULL in the
case of a different device being the console.

Reviewed by:	nwhitehorn
Differential Revision: https://reviews.freebsd.org/D16071
2018-06-29 19:35:25 +00:00
Breno Leitao
5ecc8c2077 powerpc64/powernv: Avoid type promotion
There is a type promotion that transform count = -1 into a unsigned int causing
the default TCE SEG SIZE not being returned on a Boston POWER9 machine.

This machine does not have the 'ibm,supported-tce-sizes' entries, thus, count
is set to -1, and the function continue to execute instead of returning.

Reviewed by: jhibbits, wma
Approved by: jhibbits (mentor)
Differential Revision: https://reviews.freebsd.org/D15763
2018-06-12 19:50:33 +00:00
Justin Hibbits
e69b55eadb Remove a debug printf from opal_pci driver 2018-05-31 04:11:40 +00:00
Justin Hibbits
dceea51efe Make opal_pci driver work with POWER9
Summary:
Coupled with r334365, this makes PCI work on POWER9.  There is still more to
do to fully exploit the hardware capabilities, but this is sufficient to
enable USB and ethernet controllers on a POWER9 Talos II system.

Reviewed by:	nwhitehorn, leitao
Differential Revision: https://reviews.freebsd.org/D15566
2018-05-30 03:00:57 +00:00
Justin Hibbits
f07ee2a7c0 Cache the phandle of the PCI node in opal_pci_attach
Simple cleanup, no functional change.  This is related to the fixups needed
for POWER9 support.
2018-05-30 02:47:23 +00:00
Justin Hibbits
0b1f36b6c5 Make ALT_BREAK_TO_DEBUGGER work with OPAL console
Match other consoles by using the higher level cngetc() in the interrupt
handler, so that kdb_alt_break() can check for console break.
2018-05-28 01:59:48 +00:00
Justin Hibbits
9ae8a6d3d1 Fix a typo missed in r334232 2018-05-26 04:24:25 +00:00
Justin Hibbits
459e54f990 Correct a typo for opal temperature sensor type constant 2018-05-26 02:45:41 +00:00
Justin Hibbits
1a3eaf6cc8 Add an IPMI attachment for PowerNV systems
IPMI access on PowerNV systems is done through the OPAL firmware.  This adds a
simple attachment for communicating with the FSP/BMC on these machines.  This
has been tested on a Talos POWER9 workstation, only in the bootup phase, noting
the successful attachment messages:

...
ipmi0: IPMI device rev. 0, firmware rev. 2.00, version 2.0, device support mask 0
ipmi0: Number of channels 2
...

The ipmi device has not been added to GENERIC64, but may be after further
testing.  It may also eventually be added to the ipmi module at that point.
2018-05-22 03:57:32 +00:00
Justin Hibbits
9c6ba29de1 Basic OPAL sensor support for POWER9 platforms
Summary:
PowerNV architectures (in the test case POWER9) export sensors via the device
tree, which are accessed via OPAL calls.  This adds sysctl nodes for each
device in a generic fashion.  New sysctl nodes are:

dev.opal_sensor.N.sensor
dev.opal_sensor.N.sensor_min
dev.opal_sensor.N.sensor_max
dev.opal_sensor.N.type
dev.opal_sensor.N.label

These are rooted at a parent attachment under opal, called opalsens.  This does
not add support for the "sensor groups" defined in the device tree.

Reviewed by:	breno.leitao_gmail.com
Differential Revision: https://reviews.freebsd.org/D15362
2018-05-22 02:42:53 +00:00
Justin Hibbits
ef6da5e5c7 Add support for the XIVE XICS emulation mode for POWER9 systems
Summary:
POWER9 systems use a new interrupt controller, XIVE, managed through OPAL
firmware calls.  The OPAL firmware includes support for emulating the previous
generation XICS presentation layer in addition to a new "XIVE Exploitation"
mode.  As a stopgap until we have XIVE exploitation mode, enable XICS emulation
mode so that we at least have an interrupt controller.

Since the CPPR is local to the current CPU, it cannot be updated for APs when
initializing on the BSP.  This adds a new function, directly called by the
powernv platform code, to initialize the CPPR on AP bringup.

Reviewed by:	nwhitehorn
Differential Revision: https://reviews.freebsd.org/D15492
2018-05-20 03:23:17 +00:00
Justin Hibbits
30f3b0f5f7 powerpc64: Add OPAL definitions
Summary:
Add additional OPAL PCI definitions and expand the code to use them in order to
ease the OPAL interface process for new comers.

These definitions came directly from the OPAL code and they are the same for
both PHB3 (POWER8) and PHB4 (POWER9).

Submitted by:	Breno Leitao
Differential Revision: https://reviews.freebsd.org/D15432
2018-05-19 04:01:15 +00:00
Wojciech Macek
d90930743f Reverting r330925 for now 2018-03-15 06:19:45 +00:00
Wojciech Macek
22eedd96c7 PowerNV: Fix I2C to compile if FDT is disabled
Submitted by:          Wojciech Macek <wma@semihalf.com>
Obtained from:         Semihalf
Sponsored by:          IBM, QCM Technologies
2018-03-14 09:20:03 +00:00
Nathan Whitehorn
72820025dd Fix use of unitialized variables. 2018-03-06 15:52:43 +00:00
Wojciech Macek
4ffd72e34c PowerNV: Initial support for OPAL I2C transfers
Add I2C OPAL driver and a set of dummy-ones to allow
all I2C things on Power8 to attach.

TODO: better async token management

Submitted by:          Wojciech Macek <wma@semihalf.com>
Obtained from:         Semihalf
Sponsored by:          IBM, QCM Technologies
2018-03-01 14:11:07 +00:00
Wojciech Macek
6d13fd638c PowerNV: Put processor to power-save state in idle thread
When processor enters power-save state it releases resources shared with other
cpu threads which makes other cores working much faster.

This patch also implements saving and restoring registers that might get
corrupted in power-save state.

Submitted by:          Patryk Duda <pdk@semihalf.com>
Obtained from:         Semihalf
Reviewed by:           jhibbits, nwhitehorn, wma
Sponsored by:          IBM, QCM Technologies
Differential revision: https://reviews.freebsd.org/D14330
2018-02-21 14:28:40 +00:00
Wojciech Macek
eb96cc1364 PowerNV: add missing RTC_WRITE support
Add function which can store RTC values to OPAL.

Submitted by:          Wojciech Macek <wma@semihalf.org>
Obtained from:         Semihalf
Sponsored by:          IBM, QCM Technologies
2018-02-21 08:13:17 +00:00
Wojciech Macek
70bb600a0a PowerNV: move LPCR and LPID altering to cpudep_ap_early_bootstrap
It turns out that under some circumstances we can get DSI or DSE before we set
LPCR and LPID so we should set it as early as possible.

Authored by:           Patryk Duda <pdk@semihalf.com>
Submitted by:          Wojciech Macek <wma@semihalf.com>
Obtained from:         Semihalf
Sponsored by:          IBM, QCM Technologies
2018-01-29 09:27:02 +00:00
Wojciech Macek
f0393bbf34 PPC64: use hwref instead of cpuid
On CHRP and PowerNV, use the interrupt server number in the cpuref and pcpu
hwref field instead of the device-tree phandle and make the CPU IDs reported
to the scheduler dense and with the BSP at 0.

Submitted by:          Wojciech Macek <wma@semihalf.com>
Obtained from:         Semihalf
Sponsored by:          IBM, QCM Technologies
Differential revision: https://reviews.freebsd.org/D14011
2018-01-29 09:15:38 +00:00