freebsd-nq

Author	SHA1	Message	Date
Leandro Lupori	e2d6c417e3	Implement superpages for PowerPC64 (HPT) This change adds support for transparent superpages for PowerPC64 systems using Hashed Page Tables (HPT). All pmap operations are supported. The changes were inspired by RISC-V implementation of superpages, by @markj (r344106), but heavily adapted to fit PPC64 HPT architecture and existing MMU OEA64 code. While these changes are not better tested, superpages support is disabled by default. To enable it, use vm.pmap.superpages_enabled=1. In this initial implementation, when superpages are disabled, system performance stays at the same level as without these changes. When superpages are enabled, buildworld time increases a bit (~2%). However, for workloads that put a heavy pressure on the TLB the performance boost is much bigger (see HPC Challenge and pgbench on D25237). Reviewed by: jhibbits Sponsored by: Eldorado Research Institute (eldorado.org.br) Differential Revision: https://reviews.freebsd.org/D25237	2020-11-06 14:12:45 +00:00
Brandon Bergren	05c3051f86	[PowerPC64LE] Endian fix for opal_hmi.c Another boring one. We need to endian swap before checking flags. Sponsored by: Tag1 Consulting, Inc.	2020-09-23 01:51:01 +00:00
Brandon Bergren	f9acb7a818	[PowerPC64LE] Get XIVE up and running. More endian conversion. * Install TCEs correctly (i.e. in big endian) * Convert to big endian and back when setting up queue pages and IRQs. Sponsored by: Tag1 Consulting, Inc.	2020-09-23 01:49:37 +00:00
Brandon Bergren	bf933a83ec	[PowerPC64LE] Endian fix for opal_dev.c. Not much to say here, another missing be64toh() in memory that was written from OPAL. Sponsored by: Tag1 Consulting, Inc.	2020-09-23 01:41:51 +00:00
Brandon Bergren	9cbcb6ffce	[PowerPC64LE] Endian fixes for opal_pci.c. Since OPAL runs in big endian, any data being passed back and forth via memory instead of registers needs to be byteswapped. From my notes during development: "A good way to find candidates is to look for vtophys() in opal_call() parameters. The memory being passed will be written into in BE." Sponsored by: Tag1 Consulting, Inc.	2020-09-23 01:37:01 +00:00
Brandon Bergren	c16359cf66	[PowerPC64LE] powernv ILE setup code. When running without a hypervisor, we need to set the ILE bit in the LPCR ourselves. For the boot processor, handle it in powernv_attach() like we do for other LPCR bits. No change for the APs, as they will use the lpcr global to set up their own LPCR when they do their own cpudep_ap_early_bootstrap() and pick up this automatically. Sponsored by: Tag1 Consulting, Inc.	2020-09-23 00:32:50 +00:00
Brandon Bergren	dadfbc2e60	[PowerPC64LE] LE opal_call() implementation OPAL runs in big endian, so we need to rfid into it to switch endian atomically when branching to it, and we need to do the RETURN_TO_NATIVE_ENDIAN dance when it returns to us. Sponsored by: Tag1 Consulting, Inc.	2020-09-23 00:28:47 +00:00
Brandon Bergren	4efb1ca7d2	[PowerPC64LE] Work around qemu TCG bug in mtmsrd emulation. The TCG implementation of mtmsrd in qemu blindly copies the entire register to the MSR, instead of the specific bit positions listed in the ISA. This means that qemu will prematurely switch endian out from under the running code instead of waiting for the rfid, causing an immediate trap as it attempts to interpret the next instruction in the wrong endianness. To work around this, ensure PSL_LE is still set before doing the mtmsrd. In the future, we may wish to just turn off translation and unconditionally use rfid to switch to the ofmsr instead of quasi-switching to the ofmsr. Add a new platform option so this can be disabled. (And so that we can conditonalize additional QEMU-specific hacks in the platform code.) Sponsored by: Tag1 Consulting, Inc.	2020-09-23 00:09:29 +00:00
Brandon Bergren	15be37cb7f	[PowerPC64LE] Fix endianness issues in phyp and opal consoles. This applies to both pseries and powernv, which were tested at different points during the patchset development. Sponsored by: Tag1 Consulting, Inc.	2020-09-23 00:06:48 +00:00
Brandon Bergren	5c74d551d2	[PowerPC] Fix setting of time in OPAL There were multiple bugs in the OPAL RTC code which had never been discovered, as the default configuration of OPAL machines is to have the BMC / FSP control the RTC. * Fix calling convention for setting the time -- the variables are passed directly in CPU registers, not via memory. * Fix bug in the bcd encoding routines. (from jhibbits) Tested on POWER9 Talos II (BE) and POWER9 Blackbird (LE). Reviewed by: jhibbits (in irc) Sponsored by: Tag1 Consulting, Inc.	2020-09-10 01:49:53 +00:00
Brandon Bergren	6957645145	[PowerPC64] Fix xive order calculation in qemu TCG When emulating a single thread system for testing reasons, mp_maxid can be 0. This trips up our math for calculating the order. Account for this to fix xive attachment when emulating a single-thread core on qemu powernv (a configuration that doesn't exist in the real world.) Sponsored by: Tag1 Consulting, Inc.	2020-09-08 23:48:49 +00:00
Mateusz Guzik	b64b31338f	powerpc: clean up empty lines in .c and .h files	2020-09-01 21:20:08 +00:00
Brandon Bergren	b94b2fcd61	[PowerPC64] Fix invalid OPAL call in xive_bind(). This fixes spurious "XIVE[ IC 00 ] ISN 1 lead to invalid IVE !" messages generated by OPAL when running with the debug level cranked up. Discussed with jhibbits. Sponsored by: Tag1 Consulting, Inc.	2020-08-21 03:23:10 +00:00
Brandon Bergren	60185d8965	[PowerPC] XIVE dispatch tweaks * Only read the DPCPU pointer once per xive_dispatch call. * Optimize HE decoding for the common cases. Reported by: jhibbits (in irc) Reviewed by: jhibbits Sponsored by: Tag1 Consulting, Inc. Differential Revision: https://reviews.freebsd.org/D25545	2020-07-06 15:15:37 +00:00
Justin Hibbits	46e8ab5aa1	powerpc/powernv: Don't use the vmem quantum cache for OPAL PCI MSI allocations vmem quantum cache is only needed when doing a lot of concurrent allocations, which doesn't happen when allocating MSIs. This wastes memory for the cache zones. Avoid this waste and don't use the quantum cache. Reported by: markj	2020-06-10 04:08:16 +00:00
Justin Hibbits	e48f804f8c	powerpc/powernv: Don't configure disabled CPUs If the POWER firmware detects a bad CPU core, it will "GUARD" it out, marking it disabled. Any attempt to spin up a bad CPU will trigger a panic later on when waiting for threads on said core to wake up. Support limping along on fewer cores instead.	2020-06-08 02:28:00 +00:00
Justin Hibbits	6df6aae9bd	powerpc/powernv: powernv_node_numa_domain() fix non-NUMA case If NUMA is not enabled in the kernel config, or is disabled at boot, this function should just return domain 0 regardless of what's in the device tree. Fixes a panic in iflib with NUMA disabled. Reported by: luporl	2020-03-03 03:22:00 +00:00
Pawel Biernacki	7029da5c36	Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many) r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718	2020-02-26 14:26:36 +00:00
Leandro Lupori	a9d8f71f7b	[PPC64] Fix NUMA on POWER8 On some POWER8 machines, 'ibm,associativity' property may have 6 cells, which would overflow the 5 cells buffer being used. There was also an issue with the "check if node is root" part, that have been fixed too. Reviewed by: jhibbits Differential Revision: https://reviews.freebsd.org/D23414	2020-01-29 18:13:44 +00:00
Justin Hibbits	490ebb8f35	powerpc: Fix the NUMA domain list on powernv Summary: Consolidate the NUMA associativity handling into a platform function. Non-NUMA platforms will just fall back to the default (0). Currently only implemented for powernv, which uses a lookup table to map the device tree associativity into a system NUMA domain. Fixes hangs on powernv after r356534, and corrects a fairly longstanding bug in powernv's NUMA handling, which ended up using domains 1 and 2 for devices and memory on power9, while CPUs were bound to domains 0 and 1. Reviewed by: bdragon, luporl Differential Revision: https://reviews.freebsd.org/D23220	2020-01-18 01:26:54 +00:00
Justin Hibbits	03b6e7a627	powerpc/powernv: Un-Giant-ify opal_nvram driver It may be possible to make this completely lock free, but for now it's using a statically allocated bounce buffer in the softc, so it needs to be guarded.	2020-01-10 01:24:49 +00:00
Brandon Bergren	9367fb301c	[PowerPC] Fix panic when attempting to handle an HMI from an idle thread In IRC, sfs_ finally managed to get a good trace of a kernel panic that was happening when attempting to use webengine. As it turns out, we were using vtophys() from interrupt context on an idle thread in opal_hmi_handler2(). Since this involves locking the kernel pmap on PPC64 at the moment, this ended up tripping a KASSERT in mtx_lock(), which then caused a parallel panic stampede. So, avoid this by preallocating the flags variable and storing it in PCPU. Fixes "panic: mtx_lock() by idle thread 0x... on sleep mutex kernelpmap". Differential Revision: https://reviews.freebsd.org/D22962	2019-12-30 02:56:47 +00:00
Justin Hibbits	1223b40eba	powerpc/powernv: Set the PTCR for the Nest MMU The Nest MMU manages address translation for accelerators on the POWER9. To do so, it needs a page table, so export the system page table to the Nest MMU. This will quietly fail on pre-POWER9 systems that do not have a NMMU. The NMMU is currently unused, so this change is currently effectively a NOP, but the NMMU and VAS will eventually be used.	2019-12-15 21:20:18 +00:00
Leandro Lupori	a16111e6a2	[PPC64] Enable opal console use as a GDB DBGPORT This change makes it possible to use OPAL console as a GDB debug port. Similar to uart and uart_phyp debug ports, it has to be enabled by setting the hw.uart.dbgport variable to the serial console node of the device tree. Reviewed by: jhibbits Differential Revision: https://reviews.freebsd.org/D22649	2019-12-09 13:09:32 +00:00
Justin Hibbits	0b4753405b	powerpc64/powernv: Use OPAL call for non-POWER8 PCI TCE reset According to the OPAL documentation, only the POWER8 (PHB3) should use the register write TCE reset method. All others should use the OPAL call. On POWER9 the call is semantically identical to the register write, with a wait for completion.	2019-11-10 04:24:36 +00:00
Justin Hibbits	1c56203bcf	powerpc64/powernv: Add opal NVRAM driver for PowerNV systems Add a very basic NVRAM driver for OPAL which can be used by the IBM powerpc-utils nvram utility, not to be confused with the base nvram utility, which only operates on powermac_nvram. The IBM utility handles all partitions itself, treating the nvram device as a plain store. An alternative would be to manage partitions in the kernel, and augment the base nvram utility to deal with different backing stores, but that complicates the driver significantly. Instead, present the same interface IBM's utlity expects, and we get the usage for free. Tested by: bdragon	2019-09-14 03:30:34 +00:00
Justin Hibbits	84ce4f0375	powerpc/powernv: Fix OPAL cfgread/cfgwrite error handling Freeze clearing needs to heppen any time OPAL reads return either an error (except OPAL_HARDWARE), AND any time it returns 0xff for all bytes. For cfgwrite, any error that's not OPAL_HARDWARE should be cleaned up.	2019-08-03 01:55:51 +00:00
Justin Hibbits	0effb2ccf3	powerpc/powernv: Only clear EEH freeze for some errors Only clear an EEH freeze if an error occurs. However, if an OPAL_HARDWARE error is returned, this indicates a hardware failure which cannot be unfrozen, and instead needs a hardware reset. Attempting to unfreeze a broken PCH will result in console spam for each attempt. To avoid the spam, just don't do it.	2019-08-01 03:59:25 +00:00
Justin Hibbits	fdb916d53e	powernv: Port HMI handler to use the message framework When an HMI occurs a message event also gets created with the details of the exception. Hook into the messaging framework to retrieve the HMI message. Nothing is done with it yet, except to panic on unhandled exception.	2019-06-10 03:24:38 +00:00
Justin Hibbits	f433dab2de	powerpc/powernv: Reduce the scope of the sensor guarding mutex vmem_xalloc() cannot be called while holding a nonblocking mutex, warned by WITNESS. The lock may not be necessary in general, but it avoids superfluous concurrent OPAL calls for the same sensor. Reported by: pkubaj	2019-06-10 03:16:55 +00:00
Conrad Meyer	e2e050c8ef	Extract eventfilter declarations to sys/_eventfilter.h This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h" in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces header pollution substantially. EVENTHANDLER_DECLARE and EVENTHANDLER_LIST_DECLAREs were moved out of .c files into appropriate headers (e.g., sys/proc.h, powernv/opal.h). As a side effect of reduced header pollution, many .c files and headers no longer contain needed definitions. The remainder of the patch addresses adding appropriate includes to fix those files. LOCK_DEBUG and LOCK_FILE_LINE_ARG are moved to sys/_lock.h, as required by sys/mutex.h since r326106 (but silently protected by header pollution prior to this change). No functional change (intended). Of course, any out of tree modules that relied on header pollution for sys/eventhandler.h, sys/lock.h, or sys/mutex.h inclusion need to be fixed. __FreeBSD_version has been bumped.	2019-05-20 00:38:23 +00:00
Justin Hibbits	b4698b7a6c	powerpc: Drop OPAL_HANDLE_HMI2 for now, to avoid panicking It's possible for a Hypervisor Maintenance Interrupt (HMI) to occur while in the pmap code, holding locks. This can cause WITNESS to panic due to lock errors in calling pmap_kextract(). Since we don't yet handle the flags returned by OPAL_HANDLE_HMI2, just stop using it, so that we don't call into pmap_kextract(). Reported by: pkubaj	2019-05-02 03:39:03 +00:00
Justin Hibbits	e2e3e7d28e	powerpc: Make OPAL root node probe at bus pass This way its children can attach earlier if needed, and some subsystems are attached earlier, like the asynchronous token management. MFC after: 2 weeks	2019-04-29 01:10:57 +00:00
Justin Hibbits	93096fecb6	powerpc64/powernv: Relax flash block write requirements Since writes don't necessarily need to be on erase-block boundaries, we can relax the block size and alignments down to sector size. If it needs to be erased, opalflash_erase() will check proper alignment and size.	2019-04-20 02:44:38 +00:00
Justin Hibbits	bc60451a47	powerpc/powernv: Make erasing before writes optional If the OPAL flash driver supports writing without erase, it adds a 'no-erase' property to the flash device node. Honor that property and don't bother erasing if it exists.	2019-04-19 02:28:04 +00:00
Justin Hibbits	49d9a59783	Add NUMA support to powerpc Summary: Initial NUMA support: - associate CPU with domain - associate memory ranges with domain - identify domain for devices - limit device interrupt binding to appropriate domain - Additionally fixes a bug in the setting of Maxmem which led to only memory attached to the first socket being enabled for DMA A pmap variant can opt in to numa support by by calling `numa_mem_regions` at the end of pmap_bootstrap - registering the corresponding ranges with the VM. This yields a ~20% improvement in build times of llvm on dual socket POWER9 over non-NUMA. Original patch by mmacy. Differential Revision: https://reviews.freebsd.org/D17933	2019-04-13 04:03:18 +00:00
Justin Hibbits	3c8c50f955	powerpc/powernv: Fix major bugs in opal_flash * The BIO bio_data may not be page aligned. Only the base address of each page worth of data is extracted to pass to OPAL. Without page alignment it can scribble over random memory when finishing the page read. Fix this by short-reading the first page to properly align for full page reads. * Fix the definition of OPAL_FLASH_ERASE. * Properly handle the async message result, as now returned from r345974.	2019-04-06 02:39:56 +00:00
Justin Hibbits	947079ebee	powerpc/powernv: Fix issues in opal_async * Properly return the full opal_msg from an async completion. * Don't keep bugging OPAL, wait 100us or so. With some minor changes to DELAY() to drop to very low priority, the thread won't hog the CPU while polling for the async completion.	2019-04-06 02:31:01 +00:00
Justin Hibbits	fbf7737949	powernv: Port OPAL asynchronous framework to use the new message framework Since OPAL_GET_MSG does not discriminate between message types, asynchronous completion events may be received in the OPAL_GET_MSG call, which dequeues them from the list, thus preventing OPAL_CHECK_ASYNC_COMPLETION from succeeding. Handle this case by integrating with the messaging framework.	2019-04-02 04:02:57 +00:00
Justin Hibbits	911a92603e	powerpc/powernv: Add OPAL heartbeat thread Summary: OPAL needs to be kicked periodically in order for the firmware to make progress on its tasks. To do so, create a heartbeat thread to perform this task every N milliseconds, defined by the device tree. This task is also a central location to handle all messages received from OPAL. Reviewed By: luporl Differential Revision: https://reviews.freebsd.org/D19743	2019-04-02 04:00:01 +00:00
Justin Hibbits	0499e9c619	powerpc64: Use medium code model in asm files for TOC references Summary: With a sufficiently large TOC, it's possible to index out of range, as the immediate load instructions only permit 16-bit indices, allowing up to 64kB range (signed) from the base pointer. Allow +/- 2GB range, with the medium code model TOC accesses in asm. Patch originally by Brandon Bergren. The issue appears to impact ELFv2 more than ELFv1. Reviewed by: luporl Differential Revision: https://reviews.freebsd.org/D19708	2019-03-29 02:38:30 +00:00
Justin Hibbits	8af4cc4d5a	powernv: Add Hypervisor Maintenance Interrupt handler Attempting to build www/firefox on POWER9 resulted in a HMI exception being thrown, a fatal trap currently. This is typically caused by timer facility errors, but examination of the Hypervisor Maintenance Exception Register (HMER) yielded only that an exception had recovered, with no information of the actual exception cause. When an HMI occurs, OPAL_HANDLE_HMI or OPAL_HANDLE_HMI2 must be called to handle the exception at the firmware level. If the exception is handled, we can continue. This adds only the preliminary handler, enough to prevent package building from panicking. An enhancement in the future is to use the flags returned by OPAL_HANDLE_HMI2 to print more useful error messages, and log maintenance events. Reviewed by: luporl MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D19634	2019-03-23 03:23:20 +00:00
Justin Hibbits	6775dfdf54	powerpc/powernv: Add OPAL flash device driver Firmware needed by petitboot, for example, GPU firmware, can be installed to a partition in the flash filesystem. This driver exposes the full flash given by the device tree, letting the user manage firmware, etc, from FreeBSD. To use the partitions provided by the flash module, the fdt_slicer module is needed, but the module isn't needed for raw access, so there's no direct dependency link in here. MFC after: 2 weeks	2019-03-01 04:36:55 +00:00
Justin Hibbits	dac618a648	powerpc/powernv: Add asynchronous token management for powernv The OPAL firmware only supports a finite number of in-flight asynchronous operations. Rather than have each subsystem try to manage its own, use a central management service to hand out tokens. More work can be done to improve asynchronous behavior, such as funneling things through a future OPAL heartbeat handler, but capabilities will be added as needed. Augment the existing consumers (i2c and sensors) to use this new API. MFC after: 4 weeks	2019-03-01 02:49:47 +00:00
Justin Hibbits	d49fc192c1	powerpc/powernv: Add a driver for the POWER9 XIVE interrupt controller The XIVE (External Interrupt Virtualization Engine) is a new interrupt controller present in IBM's POWER9 processor. It's a very powerful, very complex device using queues and shared memory to improve interrupt dispatch performance in a virtualized environment. This yields a ~10% performance improvment over the XICS emulation mode, measured in both buildworld, and 'dd' from nvme to /dev/null. Currently, this only supports native access. MFC after: 1 month	2019-02-02 04:15:16 +00:00
Justin Hibbits	56505ec016	powerpc: Add opaque 'private data' to interrupt vectors The XICS and XIVE need extra data beyond irq and vector. Rather than performing a separate search, it's better for the general interrupt facility to hold a private pointer, since the search already must be done anyway at that level.	2019-01-12 22:05:42 +00:00
Conrad Meyer	bba9cbe374	powerpc: Fix regression introduced in r342771 In r342771, I introduced a regression in Power by abusing the platform smp_topo() method as a shortcut for providing the MI information needed for the stated sysctls. The smp_topo() method was already called later by sched_ule (under the name cpu_topo()), and initializes a static array of scheduler topology information. I had skimmed the smp_topo_foo() functions and assumed they were idempotent; empirically, they are not (or at least, detect re-initialization and panic). Do the cleaner thing I should have done in the first place and add a platform method specifically for core- and thread-count probing. Reported by: luporl via jhibbits Reviewed by: luporl X-MFC-With: r342771 Differential Revision: https://reviews.freebsd.org/D18777	2019-01-07 19:39:31 +00:00
Conrad Meyer	6b83069e05	Expose threads-per-core and physical core count information With new sysctls (to the best of our ability do detect them). Restructured smp.4 slightly for clarity (keep relevant stuff closer to the top) while documenting. Reviewed by: markj, jhibbits (ppc parts) MFC after: 3 days Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D18322	2019-01-04 18:31:17 +00:00
Justin Hibbits	ad39591ad2	powerpc/powernv: Restrict the busdma tag to only POWER8 It seems this tag is causing problems on POWER9 systems. Since no POWER9 user has encountered the problem fixed by r339589 just restrict it to POWER8 for now. A better fix will likely be to update powerpc/busdma_machdep.c to handle the window correctly. Reported by: mmacy, others	2018-11-08 20:31:12 +00:00
Leandro Lupori	d93e635a81	ppc64: limited 32-bit DMA address range Further investigation of issues with 32-bit DMA on PowerNV revealed that its window is hardcoded by OPAL (at least in skiboot version 5.4.9) and cannot be changed by the OS. Thus, now jhb suggestion of limiting the range in PCI DMA tag seems the best way to deal with it. Reviewed by: jhibbits, nwhitehorn, sbruno Approved by: jhibbits(mentor) Differential Revision: https://reviews.freebsd.org/D17601	2018-10-22 13:40:50 +00:00

1 2

89 Commits