freebsd-skq

Author	SHA1	Message	Date
Justin Hibbits	6df6aae9bd	powerpc/powernv: powernv_node_numa_domain() fix non-NUMA case If NUMA is not enabled in the kernel config, or is disabled at boot, this function should just return domain 0 regardless of what's in the device tree. Fixes a panic in iflib with NUMA disabled. Reported by: luporl	2020-03-03 03:22:00 +00:00
Pawel Biernacki	7029da5c36	Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many) r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718	2020-02-26 14:26:36 +00:00
Leandro Lupori	a9d8f71f7b	[PPC64] Fix NUMA on POWER8 On some POWER8 machines, 'ibm,associativity' property may have 6 cells, which would overflow the 5 cells buffer being used. There was also an issue with the "check if node is root" part, that have been fixed too. Reviewed by: jhibbits Differential Revision: https://reviews.freebsd.org/D23414	2020-01-29 18:13:44 +00:00
Justin Hibbits	490ebb8f35	powerpc: Fix the NUMA domain list on powernv Summary: Consolidate the NUMA associativity handling into a platform function. Non-NUMA platforms will just fall back to the default (0). Currently only implemented for powernv, which uses a lookup table to map the device tree associativity into a system NUMA domain. Fixes hangs on powernv after r356534, and corrects a fairly longstanding bug in powernv's NUMA handling, which ended up using domains 1 and 2 for devices and memory on power9, while CPUs were bound to domains 0 and 1. Reviewed by: bdragon, luporl Differential Revision: https://reviews.freebsd.org/D23220	2020-01-18 01:26:54 +00:00
Justin Hibbits	03b6e7a627	powerpc/powernv: Un-Giant-ify opal_nvram driver It may be possible to make this completely lock free, but for now it's using a statically allocated bounce buffer in the softc, so it needs to be guarded.	2020-01-10 01:24:49 +00:00
Brandon Bergren	9367fb301c	[PowerPC] Fix panic when attempting to handle an HMI from an idle thread In IRC, sfs_ finally managed to get a good trace of a kernel panic that was happening when attempting to use webengine. As it turns out, we were using vtophys() from interrupt context on an idle thread in opal_hmi_handler2(). Since this involves locking the kernel pmap on PPC64 at the moment, this ended up tripping a KASSERT in mtx_lock(), which then caused a parallel panic stampede. So, avoid this by preallocating the flags variable and storing it in PCPU. Fixes "panic: mtx_lock() by idle thread 0x... on sleep mutex kernelpmap". Differential Revision: https://reviews.freebsd.org/D22962	2019-12-30 02:56:47 +00:00
Justin Hibbits	1223b40eba	powerpc/powernv: Set the PTCR for the Nest MMU The Nest MMU manages address translation for accelerators on the POWER9. To do so, it needs a page table, so export the system page table to the Nest MMU. This will quietly fail on pre-POWER9 systems that do not have a NMMU. The NMMU is currently unused, so this change is currently effectively a NOP, but the NMMU and VAS will eventually be used.	2019-12-15 21:20:18 +00:00
Leandro Lupori	a16111e6a2	[PPC64] Enable opal console use as a GDB DBGPORT This change makes it possible to use OPAL console as a GDB debug port. Similar to uart and uart_phyp debug ports, it has to be enabled by setting the hw.uart.dbgport variable to the serial console node of the device tree. Reviewed by: jhibbits Differential Revision: https://reviews.freebsd.org/D22649	2019-12-09 13:09:32 +00:00
Justin Hibbits	0b4753405b	powerpc64/powernv: Use OPAL call for non-POWER8 PCI TCE reset According to the OPAL documentation, only the POWER8 (PHB3) should use the register write TCE reset method. All others should use the OPAL call. On POWER9 the call is semantically identical to the register write, with a wait for completion.	2019-11-10 04:24:36 +00:00
Justin Hibbits	1c56203bcf	powerpc64/powernv: Add opal NVRAM driver for PowerNV systems Add a very basic NVRAM driver for OPAL which can be used by the IBM powerpc-utils nvram utility, not to be confused with the base nvram utility, which only operates on powermac_nvram. The IBM utility handles all partitions itself, treating the nvram device as a plain store. An alternative would be to manage partitions in the kernel, and augment the base nvram utility to deal with different backing stores, but that complicates the driver significantly. Instead, present the same interface IBM's utlity expects, and we get the usage for free. Tested by: bdragon	2019-09-14 03:30:34 +00:00
Justin Hibbits	84ce4f0375	powerpc/powernv: Fix OPAL cfgread/cfgwrite error handling Freeze clearing needs to heppen any time OPAL reads return either an error (except OPAL_HARDWARE), AND any time it returns 0xff for all bytes. For cfgwrite, any error that's not OPAL_HARDWARE should be cleaned up.	2019-08-03 01:55:51 +00:00
Justin Hibbits	0effb2ccf3	powerpc/powernv: Only clear EEH freeze for some errors Only clear an EEH freeze if an error occurs. However, if an OPAL_HARDWARE error is returned, this indicates a hardware failure which cannot be unfrozen, and instead needs a hardware reset. Attempting to unfreeze a broken PCH will result in console spam for each attempt. To avoid the spam, just don't do it.	2019-08-01 03:59:25 +00:00
Justin Hibbits	fdb916d53e	powernv: Port HMI handler to use the message framework When an HMI occurs a message event also gets created with the details of the exception. Hook into the messaging framework to retrieve the HMI message. Nothing is done with it yet, except to panic on unhandled exception.	2019-06-10 03:24:38 +00:00
Justin Hibbits	f433dab2de	powerpc/powernv: Reduce the scope of the sensor guarding mutex vmem_xalloc() cannot be called while holding a nonblocking mutex, warned by WITNESS. The lock may not be necessary in general, but it avoids superfluous concurrent OPAL calls for the same sensor. Reported by: pkubaj	2019-06-10 03:16:55 +00:00
Conrad Meyer	e2e050c8ef	Extract eventfilter declarations to sys/_eventfilter.h This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h" in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces header pollution substantially. EVENTHANDLER_DECLARE and EVENTHANDLER_LIST_DECLAREs were moved out of .c files into appropriate headers (e.g., sys/proc.h, powernv/opal.h). As a side effect of reduced header pollution, many .c files and headers no longer contain needed definitions. The remainder of the patch addresses adding appropriate includes to fix those files. LOCK_DEBUG and LOCK_FILE_LINE_ARG are moved to sys/_lock.h, as required by sys/mutex.h since r326106 (but silently protected by header pollution prior to this change). No functional change (intended). Of course, any out of tree modules that relied on header pollution for sys/eventhandler.h, sys/lock.h, or sys/mutex.h inclusion need to be fixed. __FreeBSD_version has been bumped.	2019-05-20 00:38:23 +00:00
Justin Hibbits	b4698b7a6c	powerpc: Drop OPAL_HANDLE_HMI2 for now, to avoid panicking It's possible for a Hypervisor Maintenance Interrupt (HMI) to occur while in the pmap code, holding locks. This can cause WITNESS to panic due to lock errors in calling pmap_kextract(). Since we don't yet handle the flags returned by OPAL_HANDLE_HMI2, just stop using it, so that we don't call into pmap_kextract(). Reported by: pkubaj	2019-05-02 03:39:03 +00:00
Justin Hibbits	e2e3e7d28e	powerpc: Make OPAL root node probe at bus pass This way its children can attach earlier if needed, and some subsystems are attached earlier, like the asynchronous token management. MFC after: 2 weeks	2019-04-29 01:10:57 +00:00
Justin Hibbits	93096fecb6	powerpc64/powernv: Relax flash block write requirements Since writes don't necessarily need to be on erase-block boundaries, we can relax the block size and alignments down to sector size. If it needs to be erased, opalflash_erase() will check proper alignment and size.	2019-04-20 02:44:38 +00:00
Justin Hibbits	bc60451a47	powerpc/powernv: Make erasing before writes optional If the OPAL flash driver supports writing without erase, it adds a 'no-erase' property to the flash device node. Honor that property and don't bother erasing if it exists.	2019-04-19 02:28:04 +00:00
Justin Hibbits	49d9a59783	Add NUMA support to powerpc Summary: Initial NUMA support: - associate CPU with domain - associate memory ranges with domain - identify domain for devices - limit device interrupt binding to appropriate domain - Additionally fixes a bug in the setting of Maxmem which led to only memory attached to the first socket being enabled for DMA A pmap variant can opt in to numa support by by calling `numa_mem_regions` at the end of pmap_bootstrap - registering the corresponding ranges with the VM. This yields a ~20% improvement in build times of llvm on dual socket POWER9 over non-NUMA. Original patch by mmacy. Differential Revision: https://reviews.freebsd.org/D17933	2019-04-13 04:03:18 +00:00
Justin Hibbits	3c8c50f955	powerpc/powernv: Fix major bugs in opal_flash * The BIO bio_data may not be page aligned. Only the base address of each page worth of data is extracted to pass to OPAL. Without page alignment it can scribble over random memory when finishing the page read. Fix this by short-reading the first page to properly align for full page reads. * Fix the definition of OPAL_FLASH_ERASE. * Properly handle the async message result, as now returned from r345974.	2019-04-06 02:39:56 +00:00
Justin Hibbits	947079ebee	powerpc/powernv: Fix issues in opal_async * Properly return the full opal_msg from an async completion. * Don't keep bugging OPAL, wait 100us or so. With some minor changes to DELAY() to drop to very low priority, the thread won't hog the CPU while polling for the async completion.	2019-04-06 02:31:01 +00:00
Justin Hibbits	fbf7737949	powernv: Port OPAL asynchronous framework to use the new message framework Since OPAL_GET_MSG does not discriminate between message types, asynchronous completion events may be received in the OPAL_GET_MSG call, which dequeues them from the list, thus preventing OPAL_CHECK_ASYNC_COMPLETION from succeeding. Handle this case by integrating with the messaging framework.	2019-04-02 04:02:57 +00:00
Justin Hibbits	911a92603e	powerpc/powernv: Add OPAL heartbeat thread Summary: OPAL needs to be kicked periodically in order for the firmware to make progress on its tasks. To do so, create a heartbeat thread to perform this task every N milliseconds, defined by the device tree. This task is also a central location to handle all messages received from OPAL. Reviewed By: luporl Differential Revision: https://reviews.freebsd.org/D19743	2019-04-02 04:00:01 +00:00
Justin Hibbits	0499e9c619	powerpc64: Use medium code model in asm files for TOC references Summary: With a sufficiently large TOC, it's possible to index out of range, as the immediate load instructions only permit 16-bit indices, allowing up to 64kB range (signed) from the base pointer. Allow +/- 2GB range, with the medium code model TOC accesses in asm. Patch originally by Brandon Bergren. The issue appears to impact ELFv2 more than ELFv1. Reviewed by: luporl Differential Revision: https://reviews.freebsd.org/D19708	2019-03-29 02:38:30 +00:00
Justin Hibbits	8af4cc4d5a	powernv: Add Hypervisor Maintenance Interrupt handler Attempting to build www/firefox on POWER9 resulted in a HMI exception being thrown, a fatal trap currently. This is typically caused by timer facility errors, but examination of the Hypervisor Maintenance Exception Register (HMER) yielded only that an exception had recovered, with no information of the actual exception cause. When an HMI occurs, OPAL_HANDLE_HMI or OPAL_HANDLE_HMI2 must be called to handle the exception at the firmware level. If the exception is handled, we can continue. This adds only the preliminary handler, enough to prevent package building from panicking. An enhancement in the future is to use the flags returned by OPAL_HANDLE_HMI2 to print more useful error messages, and log maintenance events. Reviewed by: luporl MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D19634	2019-03-23 03:23:20 +00:00
Justin Hibbits	6775dfdf54	powerpc/powernv: Add OPAL flash device driver Firmware needed by petitboot, for example, GPU firmware, can be installed to a partition in the flash filesystem. This driver exposes the full flash given by the device tree, letting the user manage firmware, etc, from FreeBSD. To use the partitions provided by the flash module, the fdt_slicer module is needed, but the module isn't needed for raw access, so there's no direct dependency link in here. MFC after: 2 weeks	2019-03-01 04:36:55 +00:00
Justin Hibbits	dac618a648	powerpc/powernv: Add asynchronous token management for powernv The OPAL firmware only supports a finite number of in-flight asynchronous operations. Rather than have each subsystem try to manage its own, use a central management service to hand out tokens. More work can be done to improve asynchronous behavior, such as funneling things through a future OPAL heartbeat handler, but capabilities will be added as needed. Augment the existing consumers (i2c and sensors) to use this new API. MFC after: 4 weeks	2019-03-01 02:49:47 +00:00
Justin Hibbits	d49fc192c1	powerpc/powernv: Add a driver for the POWER9 XIVE interrupt controller The XIVE (External Interrupt Virtualization Engine) is a new interrupt controller present in IBM's POWER9 processor. It's a very powerful, very complex device using queues and shared memory to improve interrupt dispatch performance in a virtualized environment. This yields a ~10% performance improvment over the XICS emulation mode, measured in both buildworld, and 'dd' from nvme to /dev/null. Currently, this only supports native access. MFC after: 1 month	2019-02-02 04:15:16 +00:00
Justin Hibbits	56505ec016	powerpc: Add opaque 'private data' to interrupt vectors The XICS and XIVE need extra data beyond irq and vector. Rather than performing a separate search, it's better for the general interrupt facility to hold a private pointer, since the search already must be done anyway at that level.	2019-01-12 22:05:42 +00:00
Conrad Meyer	bba9cbe374	powerpc: Fix regression introduced in r342771 In r342771, I introduced a regression in Power by abusing the platform smp_topo() method as a shortcut for providing the MI information needed for the stated sysctls. The smp_topo() method was already called later by sched_ule (under the name cpu_topo()), and initializes a static array of scheduler topology information. I had skimmed the smp_topo_foo() functions and assumed they were idempotent; empirically, they are not (or at least, detect re-initialization and panic). Do the cleaner thing I should have done in the first place and add a platform method specifically for core- and thread-count probing. Reported by: luporl via jhibbits Reviewed by: luporl X-MFC-With: r342771 Differential Revision: https://reviews.freebsd.org/D18777	2019-01-07 19:39:31 +00:00
Conrad Meyer	6b83069e05	Expose threads-per-core and physical core count information With new sysctls (to the best of our ability do detect them). Restructured smp.4 slightly for clarity (keep relevant stuff closer to the top) while documenting. Reviewed by: markj, jhibbits (ppc parts) MFC after: 3 days Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D18322	2019-01-04 18:31:17 +00:00
Justin Hibbits	ad39591ad2	powerpc/powernv: Restrict the busdma tag to only POWER8 It seems this tag is causing problems on POWER9 systems. Since no POWER9 user has encountered the problem fixed by r339589 just restrict it to POWER8 for now. A better fix will likely be to update powerpc/busdma_machdep.c to handle the window correctly. Reported by: mmacy, others	2018-11-08 20:31:12 +00:00
Leandro Lupori	d93e635a81	ppc64: limited 32-bit DMA address range Further investigation of issues with 32-bit DMA on PowerNV revealed that its window is hardcoded by OPAL (at least in skiboot version 5.4.9) and cannot be changed by the OS. Thus, now jhb suggestion of limiting the range in PCI DMA tag seems the best way to deal with it. Reviewed by: jhibbits, nwhitehorn, sbruno Approved by: jhibbits(mentor) Differential Revision: https://reviews.freebsd.org/D17601	2018-10-22 13:40:50 +00:00
Justin Hibbits	27ef2ca86b	powerpc64/powernv: Add pnpinfo strings to opal device children This makes it easier to see what's left unattached as new drivers are written, and to see what drivers get attached to what nodes.	2018-10-21 02:30:34 +00:00
Justin Hibbits	2756851a77	powerpc64/powernv:opal_pci: Fix the alignment of the TCE table The TCE table need only be aligned to the size of the table, not the size of the TCE segment.	2018-10-21 02:24:37 +00:00
Justin Hibbits	013cc176c9	powerpc64/powernv: Don't mask MSIs in OPAL Summary: Discussing with Benjamin Herrenschmidt, MSIs, and edge-triggered interrupts in general, must not be masked in XICS and XIVE, else subsequent interrupts may be ignored. Testing locally on my Talos II (single CPU, 18-core POWER9), NVMe now works with MSI, improving read throughput by ~70% (900MB/s -> 1.67GB/s, with 64MB block size) over INTx interrupts, and snd_hda(4) now will actually play music with MSI. Previously, snd_hda(4) would not receive interrupts, timing out, and declaring the channels dead. This has also been tested by Kevin Bowling, and others, with great success. Kevin reported NVMe unusable on his Talos II prior to this patch. Reviewed by: nwhitehorn, kbowling Approved by: re(rgrimes) Differential Revision: https://reviews.freebsd.org/D17356	2018-10-06 03:20:26 +00:00
Breno Leitao	78f4e2fea0	powerpc64/powernv: re-read RTC after polling If OPAL_RTC_READ is busy and does not return the information on the first run, as returning OPAL_BUSY_EVENT, the system will crash since ymd and hmsm variable will contain junk values. This is happening because we were not calling OPAL_RTC_READ again after OPAL_POLL_EVENTS' return, which would finally replace the old/junk hmsm and ymd values. The code was also mixing OPAL_RTC_READ and OPAL_POLL_EVENTS return values. This patch fix this logic and guarantee that we call OPAL_RTC_READ after OPAL_POLL_EVENTS return, and guarantee the code will only proceed if OPAL_RTC_READ returns OPAL_SUCCESS. Reviewed by: jhibbits Approved by: jhibbits (mentor) Differential Revision: https://reviews.freebsd.org/D16617	2018-08-08 21:19:07 +00:00
Justin Hibbits	0bf0bb832f	Support building IPMI as a module on powerpc64 This still only supports IPMI via OPAL on powerpc64, but now it can be tested with a GENERIC kernel.	2018-07-25 18:58:57 +00:00
Justin Hibbits	3395ab28eb	powerpc/powernv: Make opal_i2c driver work with attached i2c drivers * FreeBSD stores addresses in 8 bit format, but the OPAL API requires the 7-bit address, and encodes the direction elsewhere. Behave like other i2c drivers, and shift accordingly. * The OPAL API can already handle multiple requests in flight. Change the async token to be private to the thread, so as not to stomp across i2c accesses, remove the limitation error message, and use the correct message index to transfer all messages in the list. * Micro-optimize the async handler to not continuously call pmap_kextract() when spin-waiting for the operation to complete. This has been tested by hexdumping an EEPROM attached via the icee(4) driver.	2018-07-09 20:33:48 +00:00
Justin Hibbits	fedd55f14b	Let ofw_iicbus work its magic on OPAL i2c buses. ofw_iicbus already has attachments on iichb. Rather than adding an explicit attachment onto opal_i2c, simply change the exposed name of the OPAL i2c bus to 'iichb'.	2018-07-07 01:58:40 +00:00
Justin Hibbits	341679e1f2	Support multiple OPAL consoles, and don't crash if uart is not stdout Summary: If the chosen console is not the OPAL uart, but OPAL uart devices exist, the console device doesn't attach properly, and faults in the interrupt handler, with a NULL pointer dereference. To fix this, and as a byproduct, also support multiple OPAL consoles, refactor to have the console getc callback use the appropriate softc instead of the global console_sc, which may be NULL in the case of a different device being the console. Reviewed by: nwhitehorn Differential Revision: https://reviews.freebsd.org/D16071	2018-06-29 19:35:25 +00:00
Breno Leitao	5ecc8c2077	powerpc64/powernv: Avoid type promotion There is a type promotion that transform count = -1 into a unsigned int causing the default TCE SEG SIZE not being returned on a Boston POWER9 machine. This machine does not have the 'ibm,supported-tce-sizes' entries, thus, count is set to -1, and the function continue to execute instead of returning. Reviewed by: jhibbits, wma Approved by: jhibbits (mentor) Differential Revision: https://reviews.freebsd.org/D15763	2018-06-12 19:50:33 +00:00
Justin Hibbits	e69b55eadb	Remove a debug printf from opal_pci driver	2018-05-31 04:11:40 +00:00
Justin Hibbits	dceea51efe	Make opal_pci driver work with POWER9 Summary: Coupled with r334365, this makes PCI work on POWER9. There is still more to do to fully exploit the hardware capabilities, but this is sufficient to enable USB and ethernet controllers on a POWER9 Talos II system. Reviewed by: nwhitehorn, leitao Differential Revision: https://reviews.freebsd.org/D15566	2018-05-30 03:00:57 +00:00
Justin Hibbits	f07ee2a7c0	Cache the phandle of the PCI node in opal_pci_attach Simple cleanup, no functional change. This is related to the fixups needed for POWER9 support.	2018-05-30 02:47:23 +00:00
Justin Hibbits	0b1f36b6c5	Make ALT_BREAK_TO_DEBUGGER work with OPAL console Match other consoles by using the higher level cngetc() in the interrupt handler, so that kdb_alt_break() can check for console break.	2018-05-28 01:59:48 +00:00
Justin Hibbits	9ae8a6d3d1	Fix a typo missed in r334232	2018-05-26 04:24:25 +00:00
Justin Hibbits	459e54f990	Correct a typo for opal temperature sensor type constant	2018-05-26 02:45:41 +00:00
Justin Hibbits	1a3eaf6cc8	Add an IPMI attachment for PowerNV systems IPMI access on PowerNV systems is done through the OPAL firmware. This adds a simple attachment for communicating with the FSP/BMC on these machines. This has been tested on a Talos POWER9 workstation, only in the bootup phase, noting the successful attachment messages: ... ipmi0: IPMI device rev. 0, firmware rev. 2.00, version 2.0, device support mask 0 ipmi0: Number of channels 2 ... The ipmi device has not been added to GENERIC64, but may be after further testing. It may also eventually be added to the ipmi module at that point.	2018-05-22 03:57:32 +00:00

1 2

73 Commits