freebsd-skq

Author	SHA1	Message	Date
Mark Johnston	5cff1f4dc3	Introduce vm_page_astate. This is a 32-bit structure embedded in each vm_page, consisting mostly of page queue state. The use of a structure makes it easy to store a snapshot of a page's queue state in a stack variable and use cmpset loops to update that state without requiring the page lock. This change merely adds the structure and updates references to atomic state fields. No functional change intended. Reviewed by: alc, jeff, kib Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D22650	2019-12-10 18:14:50 +00:00
Justin Hibbits	a795401110	powerpc64/pmap: micro-optimize some PVO-PTE logic Summary: moea64_pte_sync_native() and moea64_pte_unset_native() don't need the full PTE created, they only need to check that the PVO has a matching PTE to the PTE in the page table. Don't waste time creating the full PTE in this case. Reviewed by: luporl Differential Revision: https://reviews.freebsd.org/D22341	2019-12-08 04:17:04 +00:00
Justin Hibbits	caef3e1280	powerpc/pmap: NUMA-ize vm_page_array on powerpc Summary: This matches r351198 from amd64. This only applies to AIM64 and Book-E. On AIM64 it short-circuits with one domain, to behave similar to existing. Otherwise it will allocate 16MB huge pages to hold the page array, across all NUMA domains. On the first domain it will shift the page array base up, to "upper-align" the page array in that domain, so as to reduce the number of pages from the next domain appearing in this domain. After the first domain, subsequent domains will be allocated in full 16MB pages, until the final domain, which can be short. This means some inner domains may have pages accounted in earlier domains. On Book-E the page array is setup at MMU bootstrap time so that it's always mapped in TLB1, on both 32-bit and 64-bit. This reduces the TLB0 overhead for touching the vm_page_array, which reduces up to one TLB miss per array access. Since page_range (vm_page_startup()) is no longer used on Book-E but is on 32-bit AIM, mark the variable as potentially unused, rather than using a nasty #if defined() list. Reviewed by: luporl Differential Revision: https://reviews.freebsd.org/D21449	2019-12-07 03:34:03 +00:00
Justin Hibbits	309d2bc890	powerpc/pmap: Remove an unused error from moea64_pvo_enter() ENOENT is leftover from mmu_oea.c's moea_pvo_enter(), where it's used to syncicache() on the first new mapping of a page. This sync is done differently in OEA64.	2019-11-19 02:00:13 +00:00
Brandon Bergren	6d515b0cc7	powerpc: Kernel fixes for ppc32 and powerpcspe w/ lld Fix wrong section ordering that was causing a ".got is not contiguous with other relro sections" lld error. This also brings ldscript.powerpc and ldscript.powerpcspe closer to ldscript.powerpc64. Also, remove unnecessary text relocs from the ppc32 AIM trap code. Approved by: jhibbits (mentor) Differential Revision: https://reviews.freebsd.org/D22349	2019-11-14 04:34:17 +00:00
Justin Hibbits	cf33fa7e80	powerpc64: Don't guard ISA 3.0 partition table setup with hw_direct_map PowerISA 3.0 eliminated the 64-bit bridge mode which allowed 32-bit kernels to run on 64-bit AIM/Book-S hardware. Since therefore only a 64-bit kernel can run on this hardware, and 64-bit native always has the direct map, there is no need to guard it.	2019-11-13 02:22:00 +00:00
Leandro Lupori	d7271ace1d	[PPC64] Fix trapstk overflow In some scenarios, the 4K trapstk may overflow, corrupting tmpstk. This was observed during remote debugging, with the following steps: At remote host (R): - enter kdb during boot - switch to gdb backend At local host (L): - attach gdb to R - try to read an invalid memory position At R: - a DSI trap occurs and kdb restarts (all this occurs on trapstk) - while printing the stacktrace, trapstk overflows and corrupts tmpstk Reviewed by: jhibbits Differential Revision: https://reviews.freebsd.org/D22200	2019-10-31 11:59:00 +00:00
Leandro Lupori	95ca4720f0	[PPC64] Add minidump support to PowerNV Implementation of PowerNV specific minidump code. Reviewed by: jhibbits Differential Revision: https://reviews.freebsd.org/D21643	2019-10-21 11:56:57 +00:00
Justin Hibbits	f1d4707c31	powerpc/aim: Fix comment typo	2019-10-19 02:47:32 +00:00
Mark Johnston	b4efea53e0	Clear PGA_WRITEABLE in moea_pvo_remove(). moea_pvo_remove() might remove the last mapping of a page, in which case it is clearly no longer writeable. This can happen via pmap_remove(), or when a CoW fault removes the last mapping of the old page. Reported and tested by: bdragon Reviewed by: alc, bdragon, kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22044	2019-10-16 15:50:12 +00:00
Konstantin Belousov	2a499f92ba	Fix assert in PowerPC pmaps after introduction of object busy. The VM_PAGE_OBJECT_BUSY_ASSERT() in pmap_enter() implementation should be only asserted when the code is executed as result of pmap_enter(), not when the same code is entered from e.g. pmap_enter_quick(). This is relevant for all PowerPC pmap variants, because mmu_*_enter() is used as the backend, and assert is located there. Add a PowerPC private pmap_enter() PMAP_ENTER_QUICK_LOCKED flag to indicate that the call is not from pmap_enter(). For non-quick-locked calls, assert that the object is locked. Reported and tested by: bdragon Reviewed by: alc, bdragon, markj Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D22041	2019-10-16 07:09:15 +00:00
Jeff Roberson	638f867814	(6/6) Convert pmap to expect busy in write related operations now that all callers hold it. This simplifies pmap code and removes a dependency on the object lock. Reviewed by: kib, markj Tested by: pho Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D21596	2019-10-15 03:51:46 +00:00
Jeff Roberson	205be21d99	(3/6) Add a shared object busy synchronization mechanism that blocks new page busy acquires while held. This allows code that would need to acquire and release a very large number of page busy locks to use the old mechanism where busy is only checked and not held. This comes at the cost of false positives but never false negatives which the single consumer, vm_fault_soft_fast(), handles. Reviewed by: kib Tested by: pho Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D21592	2019-10-15 03:41:36 +00:00
Leandro Lupori	fa14f7f1b7	Fix powerpc/powerpcspe builds Revision 353489 introduced some new function calls in common powerpc code, but these must be called only on powerpc64.	2019-10-14 19:06:17 +00:00
Leandro Lupori	0ecc478b74	[PPC64] Initial kernel minidump implementation Based on POWER9BSD implementation, with all POWER9 specific code removed and addition of new methods in PPC64 MMU interface, to isolate platform specific code. Currently, the new methods are implemented on pseries and PowerNV (D21643). Reviewed by: jhibbits Differential Revision: https://reviews.freebsd.org/D21551	2019-10-14 13:04:04 +00:00
Justin Hibbits	02e7952133	powerpc64/pmap: Fix release order to match lock order in moea64_enter() Page PV lock is always taken first, so should be released last. This also (trivially) shortens the hold time of the pmap lock. Submitted by: mjg	2019-10-07 02:36:42 +00:00
Justin Hibbits	d137ff5521	powerpc/pmap64: Properly parenthesize PV_LOCK_COUNT macros As pointed out by mjg, without the parentheses the calculations done against these macros are incorrect, resulting in only 1/3 of locks being used. Reported by: mjg	2019-10-06 19:11:01 +00:00
Mark Johnston	e8bcf6966b	Revert r352406, which contained changes I didn't intend to commit.	2019-09-16 15:04:45 +00:00
Mark Johnston	41fd4b9422	Fix a couple of nits in r352110. - Remove a dead variable from the amd64 pmap_extract_and_hold(). - Fix grammar in the vm_page_wire man page. Reported by: alc Reviewed by: alc, kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21639	2019-09-16 15:03:12 +00:00
Mark Johnston	fee2a2fa39	Change synchonization rules for vm_page reference counting. There are several mechanisms by which a vm_page reference is held, preventing the page from being freed back to the page allocator. In particular, holding the page's object lock is sufficient to prevent the page from being freed; holding the busy lock or a wiring is sufficent as well. These references are protected by the page lock, which must therefore be acquired for many per-page operations. This results in false sharing since the page locks are external to the vm_page structures themselves and each lock protects multiple structures. Transition to using an atomically updated per-page reference counter. The object's reference is counted using a flag bit in the counter. A second flag bit is used to atomically block new references via pmap_extract_and_hold() while removing managed mappings of a page. Thus, the reference count of a page is guaranteed not to increase if the page is unbusied, unmapped, and the object's write lock is held. As a consequence of this, the page lock no longer protects a page's identity; operations which move pages between objects are now synchronized solely by the objects' locks. The vm_page_wire() and vm_page_unwire() KPIs are changed. The former requires that either the object lock or the busy lock is held. The latter no longer has a return value and may free the page if it releases the last reference to that page. vm_page_unwire_noq() behaves the same as before; the caller is responsible for checking its return value and freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is introduced for use in pmap_extract_and_hold(). It fails if the page is concurrently being unmapped, typically triggering a fallback to the fault handler. vm_page_wire() no longer requires the page lock and vm_page_unwire() now internally acquires the page lock when releasing the last wiring of a page (since the page lock still protects a page's queue state). In particular, synchronization details are no longer leaked into the caller. The change excises the page lock from several frequently executed code paths. In particular, vm_object_terminate() no longer bounces between page locks as it releases an object's pages, and direct I/O and sendfile(SF_NOCACHE) completions no longer require the page lock. In these latter cases we now get linear scalability in the common scenario where different threads are operating on different files. __FreeBSD_version is bumped. The DRM ports have been updated to accomodate the KPI changes. Reviewed by: jeff (earlier version) Tested by: gallatin (earlier version), pho Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20486	2019-09-09 21:32:42 +00:00
Justin Hibbits	e0b5c15c54	powerpc64/pmap: Fix a WITNESS error in alloc_pvo_entry() We only call alloc_pvo_entry() with M_WAITOK from one location. However, this can be called while holding nonsleepable locks. Rather than passing M_WAITOK down, use vm_wait() and loop.	2019-09-06 03:02:12 +00:00
Justin Hibbits	197a7e48c9	powerpc64/pmap: Simplify the code path for moea64_pte_replace_native() Summary: MOEA64_PTE_REPLACE() is called often with the pmap lock held, and sometimes with the page pv lock held. The less work done while holding a lock, the better. Since we are intending to replace the same PTE (same hash index), we don't need to recalculate anything, just flat replace the PTE. This cuts more than 200 instructions off the invalidating code path. In addition, we don't need to replace a PTE that's not occupied by this PVO. Reviewed by: luporl Differential Revision: https://reviews.freebsd.org/D21515	2019-09-06 02:45:46 +00:00
Jeff Roberson	2194393787	Move phys_avail definition into MI code. It is consumed in the MI layer and doing so adds more flexibility with less redundant code. Reviewed by: jhb, markj, kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21250	2019-08-16 00:45:14 +00:00
Justin Hibbits	be01018809	powerpc64/mmu: Use a SLIST for the PVO delete list, instead of a RB_TREE Summary: Although it's convenient to reuse the pvo_plist for deletion, RB_TREE insertion and removal is not free, and can result in a lot of extra work to rebalance the tree. Instead, use a SLIST as a LIFO delete queue, which gives us almost free insertion, deletion, and traversal. Reviewed by: luporl Differential Revision: https://reviews.freebsd.org/D21061	2019-08-01 03:55:58 +00:00
Leandro Lupori	6962841780	powerpc: Improve pvo allocation code Added allocation retry loop in alloc_pvo_entry(), to wait for memory to become available if the caller specifies the M_WAITOK flag. Also, the loop in moa64_enter() was removed, as moea64_pvo_enter() never returns ENOMEM. It is alloc_pvo_entry() memory allocation that can fail and must be retried. Reviewed by: jhibbits Differential Revision: https://reviews.freebsd.org/D21035	2019-07-25 15:27:05 +00:00
Justin Hibbits	7c382eea30	powerpc/pmap64: Make moea64 statistics optional Summary: It turns out statistics accounting is very expensive in the pmap driver, and doesn't seem necessary in the common case. Make this optional behind a MOEA64_STATS #define, which one can set if they really need statistics. This saves ~7-8% on buildworld time on a POWER9. Found by bdragon. Reviewed by: luporl Differential Revision: https://reviews.freebsd.org/D20903	2019-07-25 03:47:27 +00:00
Justin Hibbits	b75bd60a04	powerpc: Unbreak 64-bit pmap from 350206 oldpvo is never explicitly NULL'd by moea64_pvo_enter(), so don't check for NULL to do anything, only check error. PR: 239372 Reported by: Francis Little	2019-07-22 22:59:50 +00:00
Justin Hibbits	5db86748b5	powerpc64/mmu: Make moea64_pvo_enter() return if an entry already exists Summary: Instead of searching for a PVO entry before adding, take advantage of the fact that RB_INSERT() returns NULL if it inserts, and the existing entry if an entry exists, without inserting a new entry. This saves an extra tree traversal in the cases where the PVO does not exist. Reviewed by: luporl Differential Revision: https://reviews.freebsd.org/D20944	2019-07-22 03:11:54 +00:00
Justin Hibbits	b982c7ee20	powerpc: Remove an unnecessary #ifdef guard from slb.c slb.c is only compiled for powerpc64, so no need for the #ifdef in this block.	2019-07-21 03:19:54 +00:00
Justin Hibbits	07b57507c9	powerpc64/pmap: No need for moea64_pvo_remove_from_page_locked() wrapper The only consumer of moea64_pvo_remove_from_page_locked() already has the page in hand, so there is no need to search for the page while holding the lock. Drop the wrapper, and rename _moea64_pvo_remove_from_page_locked(). Reported by: alc	2019-07-13 03:39:46 +00:00
Justin Hibbits	a7e6ec601a	powerpc64/pmap: Reduce scope of PV_LOCK in remove path Summary: Since the 'page pv' lock is one of the most highly contended locks, we need to try to do as much work outside of the lock as we can. The moea64_pvo_remove_from_page() path is a low hanging fruit, where we can do some heavy work (PHYS_TO_VM_PAGE()) outside of the lock if needed. In one path, moea64_remove_all(), the PV lock is already held and can't be swizzled, so we provide two ways to perform the locked operation, one that can call PHYS_TO_VM_PAGE outside the lock, and one that calls with the lock already held. Reviewed By: luporl Differential Revision: https://reviews.freebsd.org/D20694	2019-07-13 03:02:11 +00:00
Justin Hibbits	21e47be25e	Set pcpu curpmap for powerpc64 Summary: If an illegal instruction is encountered on a process running on a powerpc64 kernel it would attempt to sync the cache before retrying the instruction "just in case". However, since curpmap is not set, when moea64_sync_icache() attempts to lock the pmap, it's locking on a NULL pointer, triggering a panic. Fix this by adding a (assumed unnecessary) fallback to curthread's pmap in moea64_sync_icache(). Reported by: alfredo.junior_eldorado.org.br Reviewed by: luporl, alfredo.junior_eldorado.org.br Differential Revision: https://reviews.freebsd.org/D20911	2019-07-13 00:19:57 +00:00
Mark Johnston	eeacb3b02f	Merge the vm_page hold and wire mechanisms. The hold_count and wire_count fields of struct vm_page are separate reference counters with similar semantics. The remaining essential differences are that holds are not counted as a reference with respect to LRU, and holds have an implicit free-on-last unhold semantic whereas vm_page_unwire() callers must explicitly determine whether to free the page once the last reference to the page is released. This change removes the KPIs which directly manipulate hold_count. Functions such as vm_fault_quick_hold_pages() now return wired pages instead. Since r328977 the overhead of maintaining LRU for wired pages is lower, and in many cases vm_fault_quick_hold_pages() callers would swap holds for wirings on the returned pages anyway, so with this change we remove a number of page lock acquisitions. No functional change is intended. __FreeBSD_version is bumped. Reviewed by: alc, kib Discussed with: jeff Discussed with: jhb, np (cxgbe) Tested by: pho (previous version) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D19247	2019-07-08 19:46:20 +00:00
Leandro Lupori	54fdf3bf19	[PPC] Add missing SLB allocation KASSERT Although PPC SLB code doesn't handle allocation failures, which are rare, in most places it asserts that the pointer returned by uma_zalloc() is not NULL, making it easier to identify the failure and avoiding an invalid pointer dereference. This change simply adds a missing KASSERT in SLB code.	2019-07-08 13:01:54 +00:00
Brandon Bergren	664104b4af	Fix PPC970 boot after r348783 r348783 changed the behavior of the kernel mappings and broke booting on G5. - Split the kernel mapping logic out so that the case where we are running from the wrong memory space is handled using identity mappings, and the case where we are not using a DMAP is handled by forcibly mapping the kernel into the dmap range as intended by r348783. Reported by: Mikael Urankar Reviewed by: luporl Approved by: jhibbits (mentor) Differential Revision: https://reviews.freebsd.org/D20608	2019-06-12 15:58:11 +00:00
Justin Hibbits	988d63af1c	powerpc/pmap: Move the SLB spill handlers to a better place The SLB spill handlers are AIM-specific, and belong better with the rest of the SLB code anyway. No functional change.	2019-06-08 03:07:08 +00:00
Justin Hibbits	b7918b86b3	powerpc/aim: Use nitems() for calculating size of phys_avail in AIM pmaps Same thing was already done in r347164 for Book-E pmap.	2019-06-08 02:36:07 +00:00
Leandro Lupori	b934fc7468	[PPC64] Support QEMU/KVM pseries without hugepages This set of changes make it possible to run FreeBSD for PowerPC64/pseries, under QEMU/KVM, without requiring the host to make hugepages available to the guest. While there was already this possibility, by means of setting hw_direct_map to 0, on PowerPC64 there were a couple of issues/wrong assumptions that prevented this from working, before this changelist. Reviewed by: jhibbits Differential Revision: https://reviews.freebsd.org/D20522	2019-06-07 17:58:59 +00:00
Justin Hibbits	4420fc895f	powerpc/moea: Fix moea64 native VA invalidation Summary: moea64_insert_pteg_native()'s invalidation only works by happenstance. The purpose of the shifts and XORs is to extract the VSID in order to reverse-engineer the lower bits of the VPN. Currently a segment size is 256MB (2**28), and ADDR_API_SHFT64 is 16, so ADDR_PIDX_SHIFT is equivalent. However, it's semantically incorrect, in that we don't want to shift by the page shift size, we want to shift to get to the VSID. Tested by: bdragon Differential Revision: https://reviews.freebsd.org/D20467	2019-06-01 01:40:14 +00:00
Justin Hibbits	8cd3016c00	powerpc64/pmap: Reapply r334235 to OEA64 pmap, clearing HID0_RADIX This was lost in the re-merger of ISA3 MMU into moea64_native.	2019-05-25 04:56:06 +00:00
Leandro Lupori	0632bb89db	Fix PPC64 kernel build with clang8 + lld8 This patch fixes the following lld link errors: - unsupported dynamic relocations on read-only sections - out-of-range TOC references Submitted by: git_bdragon.rtk0.net Reviewed by: jhibbits, luporl MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D19352	2019-05-22 15:56:41 +00:00
Justin Hibbits	c3acaec564	powerpc: Fix moea64 pmap from 347952 vm_paddr_t is only 32 bits on AIM32 (currently), causing a build failure with the shifting. MFC after: 2 weeks MFC with: r347952	2019-05-18 14:55:59 +00:00
Justin Hibbits	a57ec59f9c	powerpc64/pmap: NUMA-ize the page pv lock pool to reduce contention It was found during building llvm that the page pv lock pool was seeing very high contention. Since the pmap is already NUMA aware, it was surmised that the domains were referencing similar pages in the different domains. This reduces contention to the point of noise in a lockstat(8) run (~51% down to under 5%), reducing build times by up to 20%. This doesn't do a perfect domain alignment, just a best-guess based on hardware available, that the domain is roughly specified in the upper bits of the PA. Trying to be more clever would more than likely result in reduced performance just on the work needed. MFC after: 2 weeks	2019-05-18 11:14:43 +00:00
Justin Hibbits	f04019c3c6	powerpc: Initialize the Hardware Interrupt Offset Register (HIOR) earlier for ppc970 Since we now have a much larger KVA on powerpc64, it's possible to get SLB traps earlier in boot, possibly even before the HIOR is properly configured for us. Move the HIOR setup to immediately after reset, so that we use our exception handlers instead of Open Firmware's. PR: 233863 Submitted by: Mark Millard (partial) Reported by: Mark Millard MFC after: 2 weeks	2019-05-10 19:36:14 +00:00
Justin Hibbits	f074eff155	powerpc: Add POWER8NVL definition The POWER8NVL (POWER8 NVLink) architecturally behaves identically to the POWER8, with a different PVR identifier. Mark it as such, so it shows up appropriately to the user. Reported by: Alexey Kardashevskiy MFC after: 2 weeks	2019-04-27 02:33:49 +00:00
Justin Hibbits	17b72853f4	powerpc64: Clear FSCR SPR, so that it's in a known state This now turns any access to the DSCR SPR into a SIGILL. Later commits will make DCSR work correctly on POWER8 and POWER9. PR: 237208	2019-04-26 03:18:49 +00:00
Justin Hibbits	19b86243f4	powerpc: Add a couple missing isyncs mtmsr and mtsr require context synchronizing instructions to follow. Without a CSI, there's a chance for a machine check exception. This reportedly does occur on a MPC750 (PowerMac G3). Reported by: Mark Millard	2019-04-24 02:51:58 +00:00
Justin Hibbits	49d9a59783	Add NUMA support to powerpc Summary: Initial NUMA support: - associate CPU with domain - associate memory ranges with domain - identify domain for devices - limit device interrupt binding to appropriate domain - Additionally fixes a bug in the setting of Maxmem which led to only memory attached to the first socket being enabled for DMA A pmap variant can opt in to numa support by by calling `numa_mem_regions` at the end of pmap_bootstrap - registering the corresponding ranges with the VM. This yields a ~20% improvement in build times of llvm on dual socket POWER9 over non-NUMA. Original patch by mmacy. Differential Revision: https://reviews.freebsd.org/D17933	2019-04-13 04:03:18 +00:00
Justin Hibbits	0499e9c619	powerpc64: Use medium code model in asm files for TOC references Summary: With a sufficiently large TOC, it's possible to index out of range, as the immediate load instructions only permit 16-bit indices, allowing up to 64kB range (signed) from the base pointer. Allow +/- 2GB range, with the medium code model TOC accesses in asm. Patch originally by Brandon Bergren. The issue appears to impact ELFv2 more than ELFv1. Reviewed by: luporl Differential Revision: https://reviews.freebsd.org/D19708	2019-03-29 02:38:30 +00:00
Justin Hibbits	9f1a007da7	powerpc64: Micro-optimize moea64 native pmap tlbie * Cache moea64_need_lock in a local variable; gcc generates slightly better code this way, it doesn't need to reload the value from memory each read. * VPN cropping is only needed on PowerPC ISA 2.02 and older cores, a subset of those that need serialization, so move this under the need_lock check, so those that don't need the lock don't even need to check this.	2019-03-26 02:53:35 +00:00
Justin Hibbits	bc94b70098	powerpc: Re-merge isa3 HPT with moea64 native HPT r345402 fixed the bug that led to the split of the ISA 3.0 HPT handling from the existing manager. The cause of the bug was gcc moving the register holding VPN to a different register (not r0), which triggered bizarre behaviors. With the fix, things work, so they can be re-merged. No performance lost with the merge.	2019-03-22 22:14:14 +00:00
Justin Hibbits	091a23cbf8	powerpc64: Handle the modern (2.05+) implementaiton of tlbie By happenstance gcc4 puts 'vpn' into r0 in all uses of TLBIE(), but modern gcc does not. Also, the single-argument form of tlbie zeros all unused arguments, making the modern tlbie instruction use r0 as the RS field (LPID). The vpn argument has the bottom 12 bits cleared (the input having been left-shifted by 12 bits), which just so happens, on the POWER9 and previous incarnations, to be the number of LPID bits supported. With those bits being zero, the instruction: tlbie r0, r0 will invalidate the VPN in r0, in LPAR 0 (ignoring the upper bits of r0 for the RS field). One build with gcc8 yields: tlbie r9, r0 with r0 having arbitrary contents, not equal to r9. This leads to strange crashes, behaviors, and panics, due to the requested TLB entry not actually being invalidated. As the moea64_native must work on both old and new, we explicitly zero out r0 so that it can work with only the single argument, built with base gcc and modern gcc. isa3_hashtb takes a different approach, encoding the two-argument form, soas not to explicitly clobber r0, and instead let the compiler decide. Reported by: Brandon Bergren Tested by: Brandon Bergren MFC after: 1 week	2019-03-22 01:43:31 +00:00
Justin Hibbits	1cd7081eb1	powerpc64: Fix early exit with invalid kernel SLB entries The check for early exit should be checking the SLB entry itself. As currently written it was checking the address of the SLB, which is always non-zero, so would go through the kernel SR restore loop regardless. Submitted by: mmacy MFC after: 2 weeks	2019-03-08 04:20:33 +00:00
Leandro Lupori	b8efbfb9d3	[ppc64] prevent infinite loop on icache sync At moea64_sync_icache(), when the 'va' argument has page size alignment, round_page() will return the same value as 'va'. This would cause 'len' to be 0 and thus an infinite loop. With this change, 'lim' will always point to the next page boundary. This issue occurred especially during debugging sessions, when a breakpoint was placed on an exact page-aligned offset, for instance. Reviewed by: jhibbits Differential Revision: https://reviews.freebsd.org/D19149	2019-02-12 11:29:03 +00:00
Leandro Lupori	6174048251	powerpc64: Add a trap stack area Currently, the trap code switches to the the temporary stack in the dbtrap section. It works in most cases, but in the beginning of the execution, the temp stack is being used, as starting in the powerpc_init() code. In this current scenario, the stack is being overwritten, which causes the return of breakpoint() to take abnormal execution. This current patchset create a small stack to use by the dbtrap: codepath avoiding the corruption of the temporary stack. PR: 224872 Submitted by: breno.leitao_gmail.com Reviewed by: jhibbits Differential Revision: https://reviews.freebsd.org/D14484	2019-02-04 16:02:03 +00:00
Leandro Lupori	be2bd024de	ppc64: handle exception 0x1500 (soft patch) This change adds a hypervisor trap handler for exception 0x1500 (soft patch), normalizing all VSX registers and returning. This avoids a kernel panic due to unknown exception. Change made with the collaboration of leonardo.bianconi_eldorado.org.br, that found out that this is a hypervisor exception and not a supervisor one, and fixed this in the code. Reviewed by: jhibbits, sbruno Differential Revision: https://reviews.freebsd.org/D17806	2018-12-10 14:54:28 +00:00
Justin Hibbits	8f69e36d87	powerpc: Don't include KERNBASE in genassym, it's unnecessary A related future change, which changes KERNBASE for Book-E for some reason causes a "KERNBASE redefined" error with assym.inc, even though it only changed the value of KERNBASE and nothing else. Since machine/vmparam.h is already included in booke/locore.S, and the requisite guards are already in place for properly handling KERNBASE in vmparam.h, just remove it from genassym, and include vmparam.h in the AIM locore files.	2018-11-28 16:00:52 +00:00
Justin Hibbits	266b2aa146	powerpc: Use MAX() macro instead of max() inline function to calculate Maxmem Maxmem is the highest address for physical memory in the system. It's measured in pages which, since max() returns a u_int, should allow for up to 2^44 bytes of memory addressable by the system. However, on POWER9 systems at least, memory addressed by additional socketed CPUs begins at addresses far above the 2^44 mark, causing issues with memory accesses and DMA, when memory is addressed on the auxiliary CPUs. Use the MAX() macro instead, which doesn't convert arguments, so retains Maxmem and all calculations as its defined long type (64-bit on powerpc64), keeping the maximum address correct. Submitted by: mmacy	2018-11-10 02:37:56 +00:00
Justin Hibbits	d692cd43c4	powerpc64/pmap: Correct the logic for minidump KVA chunk r279252 inverted the logic in moea64_scan_init, such that instead of terminating when reaching a dead page, it terminates when reaching a live page, ostensibly preserving exactly one page of KVA.	2018-10-21 02:28:04 +00:00
Justin Hibbits	ab42fbe2e9	powerpc64: Fix stack setup in dbtrap r330610 relocated the DMAP from the base of memory to the base of the fourth quadrant of memory. This broke synthetic traps, such as KDB forced breakpoints. Use GET_TOCBASE() so the DMAP offset is handled. Submitted by: git_bdragon.rkt0.net Differential Revision: https://reviews.freebsd.org/D15973	2018-06-23 01:42:34 +00:00
Justin Hibbits	ebf95d96d9	Split the PowerISA 3.0 HPT implementation from historic PowerISA 3.0 makes several changes to not only the format of the HPT but also the behavior surrounding it. For instance, TLBIE no longer requires serialization. Removing this lock cuts buildworld time in half on a 18-core/72-thread POWER9 system, demonstrating that this lock is highly contended on such a system. There was odd behavior observed trying to make this change in a backwards-compatible manner in moea64_native.c, so the best option was to fully split it, and largely revert the original changes adding POWER9 support to the original file. Suggested by: nwhitehorn	2018-06-14 17:23:51 +00:00
Justin Hibbits	402c7806cb	Fix CTR formatting for moea64_native bootstrap On very large memory systems 'size' can become 2GB or larger, resulting in a negative value being formatted. Also, moea64_pteg_count is already a long, so format it as such.	2018-06-14 16:01:11 +00:00
Justin Hibbits	5ab39b6552	On POWER9 clear the HID0_RADIX before enabling the page tables POWER9 supports Radix page tables in addition to Hashed page tables. When Radix page tables are in use, the TLB is cut in half, so that half of the TLB is used for the page walk cache. This is the default behavior, however FreeBSD currently does not support Radix tables. Clear this bit so that we can use the full TLB. Do this in the MMU logic so that configuration can be localized to the specific translation format. Once we do support Radix tables, the setup for that will be localized to the Radix MMU kobj.	2018-05-26 04:33:19 +00:00
Justin Hibbits	204d74320d	Only crop the VPN on POWER4 and derivatives for TLBIE operations Summary: PowerISA 2.03 and later require bits 14:65 in the RB register argument, which is the full value of the vpn argument post-shift. Only POWER4, POWER4+, and PPC970* need the upper 16 bits cropped. With this change FreeBSD can boot to multi-user on POWER9. Reviewed by: nwhitehorn Differential Revision: https://reviews.freebsd.org/D15581	2018-05-26 00:41:50 +00:00
Justin Hibbits	ef6da5e5c7	Add support for the XIVE XICS emulation mode for POWER9 systems Summary: POWER9 systems use a new interrupt controller, XIVE, managed through OPAL firmware calls. The OPAL firmware includes support for emulating the previous generation XICS presentation layer in addition to a new "XIVE Exploitation" mode. As a stopgap until we have XIVE exploitation mode, enable XICS emulation mode so that we at least have an interrupt controller. Since the CPPR is local to the current CPU, it cannot be updated for APs when initializing on the BSP. This adds a new function, directly called by the powernv platform code, to initialize the CPPR on AP bringup. Reviewed by: nwhitehorn Differential Revision: https://reviews.freebsd.org/D15492	2018-05-20 03:23:17 +00:00
Justin Hibbits	5321c01b50	Add hypervisor trap handling, using HSRR0/HSRR1 Summary: Some hypervisor exceptions on POWER architecture only save state to HSRR0/HSRR1. Until we have bhyve on POWER, use a lightweight exception frontend which copies HSRR0/HSRR1 into SRR0/SRR1, and run the normal trap handler. The first user of this is the Hypervisor Virtualization Interrupt, which targets the XIVE interrupt controller on POWER9. Reviewed By: nwhitehorn Differential Revision: https://reviews.freebsd.org/D15487	2018-05-19 04:21:50 +00:00
Nathan Whitehorn	b00df92b1f	Final fix for alignment issues with the page table first patched with r333273 and partially reverted with r333594. Older CPUs implement addition of offsets into the page table by a bitwise OR rather than actual addition, which only works if the table is aligned at a multiple of its own size (they also require it to be aligned at a multiple of 256KB). Newer ones do not have that requirement, but it hardly matters to enforce it anyway. The original code was failing on newer systems with huge amounts of RAM (> 512 GB), in which the page table was 4 GB in size. Because the bootstrap memory allocator took its alignment parameter as an int, this turned into a 0, removing any alignment constraint at all and making the MMU fail. The first round of this patch (r333273) fixed this case by aligning it at 256 KB, which broke older CPUs. Fix this instead by widening the alignment parameter.	2018-05-14 04:00:52 +00:00
Nathan Whitehorn	b9ff14e6e9	Revert changes to hash table alignment in r333273, which booting on all G5 systems, pending further analysis.	2018-05-13 23:56:43 +00:00
Justin Hibbits	10d0cdfc6e	Add support for powernv POWER9 MMU initialization The POWER9 MMU (PowerISA 3.0) is slightly different from current configurations, using a partition table even for hypervisor mode, and dropping the SDR1 register. Key off the newly early-enabled CPU features flags for the new architecture, and configure the MMU appropriately. The POWER9 MMU ignores the "PSIZ" field in the PTCR, and expects a 64kB table. As we are enabled for powernv (hypervisor mode, no VMs), only initialize partition table entry 0, and zero out the rest. The actual contents of the register are identical to SDR1 from previous architectures. Along with this, fix a bug in the page table allocation with very large memory. The table can be allocated on any 256k boundary. The bootstrap_alloc alignment argument is an int, and with large amounts of memory passing the size of the table as the alignment will overflow an integer. Hard-code the alignment at 256k as wider alignment is not necessary. Reviewed by: nwhitehorn Tested by: Breno Leitao Relnotes: Yes	2018-05-05 16:00:02 +00:00
Justin Hibbits	4f4f92c58f	Add POWER9 to the POWER8 bootstrap case blocks POWER8 and POWER9 have similar configuration requirements for hypervisor setup, and in the cases here they're identical. Add the POWER9 constant to the POWER8 list so it's initialized correctly. Reviewed by: nwhitehorn	2018-05-05 15:42:58 +00:00
Justin Hibbits	567dd766f6	powerpc64: Set n_slbs = 32 for POWER9 Summary: POWER9 also contains 32 slbs entries as explained by the POWER9 User Manual: "For HPT translation, the POWER9 core contains a unified (combined for both instruction and data), 32-entry, fully-associative SLB per thread" Submitted by: Breno Leitao Differential Revision: https://reviews.freebsd.org/D15128	2018-04-20 03:23:19 +00:00
Nathan Whitehorn	323e673945	Fix detection of memory overlap with the kernel in the case where a memory region marked "available" by firmware is contained entirely in the kernel. This had a tendency to happen with FDTs passed by loader, though could for other reasons as well, and would result in the kernel slowly cannibalizing itself for other purposes, eventually resulting in a crash. A similar fix is needed for mmu_oea.c and should probably just be rolled at that point into some generic code in platform.c for taking a mem_region list and removing chunks. PR: 226974 Submitted by: leandro.lupori@gmail.com Reviewed by: jhibbits Differential Revision: D15121	2018-04-19 18:34:38 +00:00
Brooks Davis	6469bdcdb6	Move most of the contents of opt_compat.h to opt_global.h. opt_compat.h is mentioned in nearly 180 files. In-progress network driver compabibility improvements may add over 100 more so this is closer to "just about everywhere" than "only some files" per the guidance in sys/conf/options. Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of sys/compat/linux/*.c. A fake _COMPAT_LINUX option ensure opt_compat.h is created on all architectures. Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the set of compiled files. Reviewed by: kib, cem, jhb, jtl Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14941	2018-04-06 17:35:35 +00:00
Ed Maste	fc2a8776a2	Rename assym.s to assym.inc assym is only to be included by other .s files, and should never actually be assembled by itself. Reviewed by: imp, bdrewery (earlier) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D14180	2018-03-20 17:58:51 +00:00
Nathan Whitehorn	9b0ec025d4	Restore missing temporary variable, deleted by accident in r330845. This unbreaks the ppc32 AIM build. Reported by: jhibbits	2018-03-13 18:24:21 +00:00
Nathan Whitehorn	8864f35942	Execute PowerPC64/AIM kernel from direct map region when possible. When the kernel can be in real mode in early boot, we can execute from high addresses aliased to the kernel's physical memory. If that high address has the first two bits set to 1 (0xc...), those addresses will automatically become part of the direct map. This reduces page table pressure from the kernel and it sets up the kernel to be used with radix translation, for which it has to be up here. This is accomplished by exploiting the fact that all PowerPC kernels are built as position-independent executables and relocate themselves on start. Before this patch, the kernel runs at 1:1 VA:PA, but that VA/PA is random and set by the bootloader. Very early, it processes its ELF relocations to operate wherever it happens to find itself. This patch uses that mechanism to re-enter and re-relocate the kernel a second time witha new base address set up in the early parts of powerpc_init(). Reviewed by: jhibbits Differential Revision: D14647	2018-03-13 15:03:58 +00:00
Nathan Whitehorn	f9edb09d70	Move the powerpc64 direct map base address from zero to high memory. This accomplishes a few things: - Makes NULL an invalid address in the kernel, which is useful for catching bugs. - Lays groundwork for radix-tree translation on POWER9, which requires the direct map be at high memory. - Similarly lays groundwork for a direct map on 64-bit Book-E. The new base address is chosen as the base of the fourth radix quadrant (the minimum kernel address in this translation mode) and because all supported CPUs ignore at least the first two bits of addresses in real mode, allowing direct-map addresses to be used in real-mode handlers. This is required by Linux and is part of the architecture standard starting in POWER ISA 3, so can be relied upon. Reviewed by: jhibbits, Breno Leitao Differential Revision: D14499	2018-03-07 17:08:07 +00:00
Wojciech Macek	6d13fd638c	PowerNV: Put processor to power-save state in idle thread When processor enters power-save state it releases resources shared with other cpu threads which makes other cores working much faster. This patch also implements saving and restoring registers that might get corrupted in power-save state. Submitted by: Patryk Duda <pdk@semihalf.com> Obtained from: Semihalf Reviewed by: jhibbits, nwhitehorn, wma Sponsored by: IBM, QCM Technologies Differential revision: https://reviews.freebsd.org/D14330	2018-02-21 14:28:40 +00:00
Konstantin Belousov	2c0f13aa59	vm_wait() rework. Make vm_wait() take the vm_object argument which specifies the domain set to wait for the min condition pass. If there is no object associated with the wait, use curthread' policy domainset. The mechanics of the wait in vm_wait() and vm_wait_domain() is supplied by the new helper vm_wait_doms(), which directly takes the bitmask of the domains to wait for passing min condition. Eliminate pagedaemon_wait(). vm_domain_clear() handles the same operations. Eliminate VM_WAIT and VM_WAITPFAULT macros, the direct functions calls are enough. Eliminate several control state variables from vm_domain, unneeded after the vm_wait() conversion. Scetched and reviewed by: jeff Tested by: pho Sponsored by: The FreeBSD Foundation, Mellanox Technologies Differential revision: https://reviews.freebsd.org/D14384	2018-02-20 10:13:13 +00:00
Justin Hibbits	bce6d88bc1	Merge AIM and Book-E PCPU fields This is part of a long-term goal of merging Book-E and AIM into a single GENERIC kernel. As more work is done, the struct may be optimized further. Reviewed by: nwhitehorn	2018-02-17 20:59:12 +00:00
Steve Wills	e1782bae5f	Correct longjmp Reviewed by: nwhitehorn Differential Revision: https://reviews.freebsd.org/D14159	2018-02-02 02:28:25 +00:00
Nathan Whitehorn	619282986d	Change the default MSR values used when starting userland and kernel threads from compile-time defines to global variables. This removes a significant amount of duplicated runtime patches to the compile-time defines, centralizing the conditional logic in the early startup code. Reviewed by: jhibbits	2018-02-01 05:31:24 +00:00
Nathan Whitehorn	564ac41556	Fix build on 32-bit PowerPC, broken in r328537.	2018-02-01 05:28:02 +00:00
Wojciech Macek	d32802f0c3	PowerNV: fix compilation on non-NV platforms Submitted by: Wojciech Macek <wma@semihalf.com> Obtained from: Semihalf Sponsored by: IBM, QCM Technologies	2018-01-31 06:42:01 +00:00
Wojciech Macek	70bb600a0a	PowerNV: move LPCR and LPID altering to cpudep_ap_early_bootstrap It turns out that under some circumstances we can get DSI or DSE before we set LPCR and LPID so we should set it as early as possible. Authored by: Patryk Duda <pdk@semihalf.com> Submitted by: Wojciech Macek <wma@semihalf.com> Obtained from: Semihalf Sponsored by: IBM, QCM Technologies	2018-01-29 09:27:02 +00:00
Nathan Whitehorn	eb1baf72ae	Remove hard-coded trap-handling logic involving the segmented memory model used with hashed page tables on AIM and place it into a new, modular pmap function called pmap_decode_kernel_ptr(). This function is the inverse of pmap_map_user_ptr(). With POWER9 radix tables, which mapping to use becomes more complex than just AIM/BOOKE and it is best to have it in the same place as pmap_map_user_ptr(). Reviewed by: jhibbits	2018-01-29 04:33:41 +00:00
Nathan Whitehorn	7790e46cf0	On AIM systems without a software-managed SLB, such as POWER9 systems using either hardware segment tables or radix-tree-based page tables, do not try to install SLB entries at trap boundaries.	2018-01-19 22:19:50 +00:00
Nathan Whitehorn	fc8ea4be2a	Install the SLB miss trap-handling code in the SLB-based MMU driver set up, to which it is specific, rather than in the generic AIM startup code. This will be required to support the radix-table-based MMU introduced with POWER9.	2018-01-15 16:08:34 +00:00
Nathan Whitehorn	04329fa708	Move the pmap-specific code in copyinout.c that gets pointers to userland buffers into a new pmap-module function pmap_map_user_ptr() that can be implemented by the respective modules. This is required to implement non-segment-based AIM-ish MMU systems such as the radix-tree page tables introduced by POWER ISA 3.0 and present on POWER9. Reviewed by: jhibbits	2018-01-15 06:46:33 +00:00
Nathan Whitehorn	68b9c019aa	Document places we assume that physical memory is direct-mapped at zero by using a new macro PHYS_TO_DMAP, which deliberately has the same name as the equivalent macro on amd64. This also sets the stage for moving the direct map to another base address.	2018-01-13 23:14:53 +00:00
Jeff Roberson	ab3185d15e	Implement NUMA support in uma(9) and malloc(9). Allocations from specific domains can be done by the _domain() API variants. UMA also supports a first-touch policy via the NUMA zone flag. The slab layer is now segregated by VM domains and is precise. It handles iteration for round-robin directly. The per-cpu cache layer remains a mix of domains according to where memory is allocated and freed. Well behaved clients can achieve perfect locality with no performance penalty. The direct domain allocation functions have to visit the slab layer and so require per-zone locks which come at some expense. Reviewed by: Attilio (a slightly older version) Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon	2018-01-12 23:25:05 +00:00
Nathan Whitehorn	a891d21aac	Make sure the first instruction of the low-memory spinloop is in the cacheline being invalidated. MFC after: 1 month	2017-12-31 05:38:19 +00:00
Nathan Whitehorn	70f654991a	Add support for 64-bit PowerPC kernels to be directly loaded by kexec, which is used as the bootloader on a number of PPC64 platforms. This involves the following pieces: - Making the first instruction a valid kernel entry point, since kexec ignores the ELF entry value. This requires a separate section and linker magic to prevent the linker from filling the beginning of the section with stubs. - Adding an entry point at 0x60 past the first instruction for systems lacking firmware CPU shutdown support (notably PS3). - Linker script changes to support the above. MFC after: 1 month	2017-12-29 20:30:10 +00:00
Nathan Whitehorn	8469e0fe35	Maintain alignment of in-code 64-bit quantities by design rather than luck. If these are not aligned, the linker has to emit a different type of relocation that the early boot self-relocation code cannot handle, even in principle, resulting in them being set to zero and the kernel crashing. MFC after: 1 week	2017-12-29 20:25:15 +00:00
Pedro F. Giffuni	71e3c3083b	sys/powerpc: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts.	2017-11-27 15:09:59 +00:00
Nathan Whitehorn	47f69f4f2b	Use the cookie now set by loader to determine whether the value passed to PowerPC kernels in r6 is actually metadata from loader(8) or gibberish left in r6, which is not required to be anything under the PAPR/ePAPR/CHRP/OF standards, by another boot loader. Note that, as a result, systems need a new boot loader to boot PPC kernels after this revision without ending up at a mountroot prompt. New boot loaders are backwards compatible and can boot older kernels. Reviewed by: jhibbits MFC after: 2 months	2017-11-26 03:53:20 +00:00
Nathan Whitehorn	50d82d6f6a	Missed gate on __powerpc64__ for setting LPCR in r326207. MFC after: 3 weeks X-MFC-with: r326207	2017-11-25 22:15:56 +00:00
Nathan Whitehorn	5bcc3e4277	Allow platform modules to set the size of large pizes, as potentially discovered from firmware, and better handle highly-discontiguous memory and CPU maps. MFC after: 3 weeks	2017-11-25 22:13:19 +00:00
Nathan Whitehorn	312fb3d8dd	Invalidate TLB at boot using the correct IS settings on newer-than-POWER5 CPUs. MFC after: 3 weeks	2017-11-25 22:10:10 +00:00
Nathan Whitehorn	66d6978c27	Missed platform_smp_timebase_sync() in r326205. MFC after: 3 weeks X-MFC-With: r326205	2017-11-25 22:06:40 +00:00

1 2 3 4 5 ...

1000 Commits