Commit Graph

1000 Commits

Author SHA1 Message Date
Mark Johnston
5cff1f4dc3 Introduce vm_page_astate.
This is a 32-bit structure embedded in each vm_page, consisting mostly
of page queue state.  The use of a structure makes it easy to store a
snapshot of a page's queue state in a stack variable and use cmpset
loops to update that state without requiring the page lock.

This change merely adds the structure and updates references to atomic
state fields.  No functional change intended.

Reviewed by:	alc, jeff, kib
Sponsored by:	Netflix, Intel
Differential Revision:	https://reviews.freebsd.org/D22650
2019-12-10 18:14:50 +00:00
Justin Hibbits
a795401110 powerpc64/pmap: micro-optimize some PVO-PTE logic
Summary:
moea64_pte_sync_native() and moea64_pte_unset_native() don't need the
full PTE created, they only need to check that the PVO has a matching
PTE to the PTE in the page table.  Don't waste time creating the full
PTE in this case.

Reviewed by:	luporl
Differential Revision:	https://reviews.freebsd.org/D22341
2019-12-08 04:17:04 +00:00
Justin Hibbits
caef3e1280 powerpc/pmap: NUMA-ize vm_page_array on powerpc
Summary:
This matches r351198 from amd64.  This only applies to AIM64 and Book-E.
On AIM64 it short-circuits with one domain, to behave similar to
existing.  Otherwise it will allocate 16MB huge pages to hold the page
array, across all NUMA domains.  On the first domain it will shift the
page array base up, to "upper-align" the page array in that domain, so
as to reduce the number of pages from the next domain appearing in this
domain.  After the first domain, subsequent domains will be allocated in
full 16MB pages, until the final domain, which can be short.  This means
some inner domains may have pages accounted in earlier domains.

On Book-E the page array is setup at MMU bootstrap time so that it's
always mapped in TLB1, on both 32-bit and 64-bit.  This reduces the TLB0
overhead for touching the vm_page_array, which reduces up to one TLB
miss per array access.

Since page_range (vm_page_startup()) is no longer used on Book-E but is on
32-bit AIM, mark the variable as potentially unused, rather than using a
nasty #if defined() list.

Reviewed by:	luporl
Differential Revision:	https://reviews.freebsd.org/D21449
2019-12-07 03:34:03 +00:00
Justin Hibbits
309d2bc890 powerpc/pmap: Remove an unused error from moea64_pvo_enter()
ENOENT is leftover from mmu_oea.c's moea_pvo_enter(), where it's used to
syncicache() on the first new mapping of a page.  This sync is done
differently in OEA64.
2019-11-19 02:00:13 +00:00
Brandon Bergren
6d515b0cc7 powerpc: Kernel fixes for ppc32 and powerpcspe w/ lld
Fix wrong section ordering that was causing a ".got is not contiguous with
other relro sections" lld error. This also brings ldscript.powerpc and
ldscript.powerpcspe closer to ldscript.powerpc64.

Also, remove unnecessary text relocs from the ppc32 AIM trap code.

Approved by: jhibbits (mentor)
Differential Revision: https://reviews.freebsd.org/D22349
2019-11-14 04:34:17 +00:00
Justin Hibbits
cf33fa7e80 powerpc64: Don't guard ISA 3.0 partition table setup with hw_direct_map
PowerISA 3.0 eliminated the 64-bit bridge mode which allowed 32-bit kernels
to run on 64-bit AIM/Book-S hardware.  Since therefore only a 64-bit kernel
can run on this hardware, and 64-bit native always has the direct map, there
is no need to guard it.
2019-11-13 02:22:00 +00:00
Leandro Lupori
d7271ace1d [PPC64] Fix trapstk overflow
In some scenarios, the 4K trapstk may overflow, corrupting tmpstk.

This was observed during remote debugging, with the following steps:

At remote host (R):
- enter kdb during boot
- switch to gdb backend

At local host (L):
- attach gdb to R
- try to read an invalid memory position

At R:
- a DSI trap occurs and kdb restarts (all this occurs on trapstk)
- while printing the stacktrace, trapstk overflows and corrupts tmpstk

Reviewed by:	jhibbits
Differential Revision:	https://reviews.freebsd.org/D22200
2019-10-31 11:59:00 +00:00
Leandro Lupori
95ca4720f0 [PPC64] Add minidump support to PowerNV
Implementation of PowerNV specific minidump code.

Reviewed by:	jhibbits
Differential Revision:	https://reviews.freebsd.org/D21643
2019-10-21 11:56:57 +00:00
Justin Hibbits
f1d4707c31 powerpc/aim: Fix comment typo 2019-10-19 02:47:32 +00:00
Mark Johnston
b4efea53e0 Clear PGA_WRITEABLE in moea_pvo_remove().
moea_pvo_remove() might remove the last mapping of a page, in which case
it is clearly no longer writeable.  This can happen via pmap_remove(),
or when a CoW fault removes the last mapping of the old page.

Reported and tested by:	bdragon
Reviewed by:	alc, bdragon, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22044
2019-10-16 15:50:12 +00:00
Konstantin Belousov
2a499f92ba Fix assert in PowerPC pmaps after introduction of object busy.
The VM_PAGE_OBJECT_BUSY_ASSERT() in pmap_enter() implementation should
be only asserted when the code is executed as result of pmap_enter(),
not when the same code is entered from e.g. pmap_enter_quick().  This
is relevant for all PowerPC pmap variants, because mmu_*_enter() is
used as the backend, and assert is located there.

Add a PowerPC private pmap_enter() PMAP_ENTER_QUICK_LOCKED flag to
indicate that the call is not from pmap_enter().  For non-quick-locked
calls, assert that the object is locked.

Reported and tested by:	bdragon
Reviewed by:	alc, bdragon, markj
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D22041
2019-10-16 07:09:15 +00:00
Jeff Roberson
638f867814 (6/6) Convert pmap to expect busy in write related operations now that all
callers hold it.

This simplifies pmap code and removes a dependency on the object lock.

Reviewed by:    kib, markj
Tested by:      pho
Sponsored by:   Netflix, Intel
Differential Revision:	https://reviews.freebsd.org/D21596
2019-10-15 03:51:46 +00:00
Jeff Roberson
205be21d99 (3/6) Add a shared object busy synchronization mechanism that blocks new page
busy acquires while held.

This allows code that would need to acquire and release a very large number
of page busy locks to use the old mechanism where busy is only checked and
not held.  This comes at the cost of false positives but never false
negatives which the single consumer, vm_fault_soft_fast(), handles.

Reviewed by:    kib
Tested by:      pho
Sponsored by:   Netflix, Intel
Differential Revision:	https://reviews.freebsd.org/D21592
2019-10-15 03:41:36 +00:00
Leandro Lupori
fa14f7f1b7 Fix powerpc/powerpcspe builds
Revision 353489 introduced some new function calls in common powerpc code,
but these must be called only on powerpc64.
2019-10-14 19:06:17 +00:00
Leandro Lupori
0ecc478b74 [PPC64] Initial kernel minidump implementation
Based on POWER9BSD implementation, with all POWER9 specific code removed and
addition of new methods in PPC64 MMU interface, to isolate platform specific
code. Currently, the new methods are implemented on pseries and PowerNV
(D21643).

Reviewed by:	jhibbits
Differential Revision:	https://reviews.freebsd.org/D21551
2019-10-14 13:04:04 +00:00
Justin Hibbits
02e7952133 powerpc64/pmap: Fix release order to match lock order in moea64_enter()
Page PV lock is always taken first, so should be released last.  This also
(trivially) shortens the hold time of the pmap lock.

Submitted by:	mjg
2019-10-07 02:36:42 +00:00
Justin Hibbits
d137ff5521 powerpc/pmap64: Properly parenthesize PV_LOCK_COUNT macros
As pointed out by mjg, without the parentheses the calculations done against
these macros are incorrect, resulting in only 1/3 of locks being used.

Reported by:	mjg
2019-10-06 19:11:01 +00:00
Mark Johnston
e8bcf6966b Revert r352406, which contained changes I didn't intend to commit. 2019-09-16 15:04:45 +00:00
Mark Johnston
41fd4b9422 Fix a couple of nits in r352110.
- Remove a dead variable from the amd64 pmap_extract_and_hold().
- Fix grammar in the vm_page_wire man page.

Reported by:	alc
Reviewed by:	alc, kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D21639
2019-09-16 15:03:12 +00:00
Mark Johnston
fee2a2fa39 Change synchonization rules for vm_page reference counting.
There are several mechanisms by which a vm_page reference is held,
preventing the page from being freed back to the page allocator.  In
particular, holding the page's object lock is sufficient to prevent the
page from being freed; holding the busy lock or a wiring is sufficent as
well.  These references are protected by the page lock, which must
therefore be acquired for many per-page operations.  This results in
false sharing since the page locks are external to the vm_page
structures themselves and each lock protects multiple structures.

Transition to using an atomically updated per-page reference counter.
The object's reference is counted using a flag bit in the counter.  A
second flag bit is used to atomically block new references via
pmap_extract_and_hold() while removing managed mappings of a page.
Thus, the reference count of a page is guaranteed not to increase if the
page is unbusied, unmapped, and the object's write lock is held.  As
a consequence of this, the page lock no longer protects a page's
identity; operations which move pages between objects are now
synchronized solely by the objects' locks.

The vm_page_wire() and vm_page_unwire() KPIs are changed.  The former
requires that either the object lock or the busy lock is held.  The
latter no longer has a return value and may free the page if it releases
the last reference to that page.  vm_page_unwire_noq() behaves the same
as before; the caller is responsible for checking its return value and
freeing or enqueuing the page as appropriate.  vm_page_wire_mapped() is
introduced for use in pmap_extract_and_hold().  It fails if the page is
concurrently being unmapped, typically triggering a fallback to the
fault handler.  vm_page_wire() no longer requires the page lock and
vm_page_unwire() now internally acquires the page lock when releasing
the last wiring of a page (since the page lock still protects a page's
queue state).  In particular, synchronization details are no longer
leaked into the caller.

The change excises the page lock from several frequently executed code
paths.  In particular, vm_object_terminate() no longer bounces between
page locks as it releases an object's pages, and direct I/O and
sendfile(SF_NOCACHE) completions no longer require the page lock.  In
these latter cases we now get linear scalability in the common scenario
where different threads are operating on different files.

__FreeBSD_version is bumped.  The DRM ports have been updated to
accomodate the KPI changes.

Reviewed by:	jeff (earlier version)
Tested by:	gallatin (earlier version), pho
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D20486
2019-09-09 21:32:42 +00:00
Justin Hibbits
e0b5c15c54 powerpc64/pmap: Fix a WITNESS error in alloc_pvo_entry()
We only call alloc_pvo_entry() with M_WAITOK from one location.  However,
this can be called while holding nonsleepable locks.  Rather than passing
M_WAITOK down, use vm_wait() and loop.
2019-09-06 03:02:12 +00:00
Justin Hibbits
197a7e48c9 powerpc64/pmap: Simplify the code path for moea64_pte_replace_native()
Summary:
MOEA64_PTE_REPLACE() is called often with the pmap lock held, and
sometimes with the page pv lock held.  The less work done while holding
a lock, the better.  Since we are intending to replace the same PTE
(same hash index), we don't need to recalculate anything, just flat
replace the PTE.  This cuts more than 200 instructions off the
invalidating code path.  In addition, we don't need to replace a PTE
that's not occupied by this PVO.

Reviewed by:	luporl
Differential Revision:	https://reviews.freebsd.org/D21515
2019-09-06 02:45:46 +00:00
Jeff Roberson
2194393787 Move phys_avail definition into MI code. It is consumed in the MI layer and
doing so adds more flexibility with less redundant code.

Reviewed by:	jhb, markj, kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D21250
2019-08-16 00:45:14 +00:00
Justin Hibbits
be01018809 powerpc64/mmu: Use a SLIST for the PVO delete list, instead of a RB_TREE
Summary:
Although it's convenient to reuse the pvo_plist for deletion, RB_TREE
insertion and removal is not free, and can result in a lot of extra work
to rebalance the tree.  Instead, use a SLIST as a LIFO delete queue,
which gives us almost free insertion, deletion, and traversal.

Reviewed by:	luporl
Differential Revision: https://reviews.freebsd.org/D21061
2019-08-01 03:55:58 +00:00
Leandro Lupori
6962841780 powerpc: Improve pvo allocation code
Added allocation retry loop in alloc_pvo_entry(), to wait for
memory to become available if the caller specifies the M_WAITOK flag.

Also, the loop in moa64_enter() was removed, as moea64_pvo_enter()
never returns ENOMEM. It is alloc_pvo_entry() memory allocation that
can fail and must be retried.

Reviewed by:	jhibbits
Differential Revision:	https://reviews.freebsd.org/D21035
2019-07-25 15:27:05 +00:00
Justin Hibbits
7c382eea30 powerpc/pmap64: Make moea64 statistics optional
Summary:
It turns out statistics accounting is very expensive in the pmap driver,
and doesn't seem necessary in the common case.  Make this optional
behind a MOEA64_STATS #define, which one can set if they really need
statistics.

This saves ~7-8% on buildworld time on a POWER9.

Found by bdragon.

Reviewed by:	luporl
Differential Revision: https://reviews.freebsd.org/D20903
2019-07-25 03:47:27 +00:00
Justin Hibbits
b75bd60a04 powerpc: Unbreak 64-bit pmap from 350206
oldpvo is never explicitly NULL'd by moea64_pvo_enter(), so don't check for
NULL to do anything, only check error.

PR:		239372
Reported by:	Francis Little
2019-07-22 22:59:50 +00:00
Justin Hibbits
5db86748b5 powerpc64/mmu: Make moea64_pvo_enter() return if an entry already exists
Summary:
Instead of searching for a PVO entry before adding, take advantage of
the fact that RB_INSERT() returns NULL if it inserts, and the existing entry if
an entry exists, without inserting a new entry.  This saves an extra tree
traversal in the cases where the PVO does not exist.

Reviewed by:	luporl
Differential Revision: https://reviews.freebsd.org/D20944
2019-07-22 03:11:54 +00:00
Justin Hibbits
b982c7ee20 powerpc: Remove an unnecessary #ifdef guard from slb.c
slb.c is only compiled for powerpc64, so no need for the #ifdef in this block.
2019-07-21 03:19:54 +00:00
Justin Hibbits
07b57507c9 powerpc64/pmap: No need for moea64_pvo_remove_from_page_locked() wrapper
The only consumer of moea64_pvo_remove_from_page_locked() already has the
page in hand, so there is no need to search for the page while holding the
lock.  Drop the wrapper, and rename _moea64_pvo_remove_from_page_locked().

Reported by:	alc
2019-07-13 03:39:46 +00:00
Justin Hibbits
a7e6ec601a powerpc64/pmap: Reduce scope of PV_LOCK in remove path
Summary:
Since the 'page pv' lock is one of the most highly contended locks, we
need to try to do as much work outside of the lock as we can.  The
moea64_pvo_remove_from_page() path is a low hanging fruit, where we can
do some heavy work (PHYS_TO_VM_PAGE()) outside of the lock if needed.
In one path, moea64_remove_all(), the PV lock is already held and can't
be swizzled, so we provide two ways to perform the locked operation, one
that can call PHYS_TO_VM_PAGE outside the lock, and one that calls with
the lock already held.

Reviewed By: luporl
Differential Revision: https://reviews.freebsd.org/D20694
2019-07-13 03:02:11 +00:00
Justin Hibbits
21e47be25e Set pcpu curpmap for powerpc64
Summary:
If an illegal instruction is encountered on a process running on a
powerpc64 kernel it would attempt to sync the cache before retrying the
instruction "just in case".  However, since curpmap is not set, when
moea64_sync_icache() attempts to lock the pmap, it's locking on a NULL pointer,
triggering a panic.  Fix this by adding a (assumed unnecessary) fallback to
curthread's pmap in moea64_sync_icache().

Reported by:	alfredo.junior_eldorado.org.br
Reviewed by:	luporl, alfredo.junior_eldorado.org.br
Differential Revision: https://reviews.freebsd.org/D20911
2019-07-13 00:19:57 +00:00
Mark Johnston
eeacb3b02f Merge the vm_page hold and wire mechanisms.
The hold_count and wire_count fields of struct vm_page are separate
reference counters with similar semantics.  The remaining essential
differences are that holds are not counted as a reference with respect
to LRU, and holds have an implicit free-on-last unhold semantic whereas
vm_page_unwire() callers must explicitly determine whether to free the
page once the last reference to the page is released.

This change removes the KPIs which directly manipulate hold_count.
Functions such as vm_fault_quick_hold_pages() now return wired pages
instead.  Since r328977 the overhead of maintaining LRU for wired pages
is lower, and in many cases vm_fault_quick_hold_pages() callers would
swap holds for wirings on the returned pages anyway, so with this change
we remove a number of page lock acquisitions.

No functional change is intended.  __FreeBSD_version is bumped.

Reviewed by:	alc, kib
Discussed with:	jeff
Discussed with:	jhb, np (cxgbe)
Tested by:	pho (previous version)
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D19247
2019-07-08 19:46:20 +00:00
Leandro Lupori
54fdf3bf19 [PPC] Add missing SLB allocation KASSERT
Although PPC SLB code doesn't handle allocation failures,
which are rare, in most places it asserts that the pointer
returned by uma_zalloc() is not NULL, making it easier to
identify the failure and avoiding an invalid pointer dereference.

This change simply adds a missing KASSERT in SLB code.
2019-07-08 13:01:54 +00:00
Brandon Bergren
664104b4af Fix PPC970 boot after r348783
r348783 changed the behavior of the kernel mappings and broke booting on G5.

- Split the kernel mapping logic out so that the case where we are
running from the wrong memory space is handled using identity
mappings, and the case where we are not using a DMAP is handled by
forcibly mapping the kernel into the dmap range as intended by
r348783.

Reported by:	Mikael Urankar
Reviewed by:	luporl
Approved by:	jhibbits (mentor)
Differential Revision:	https://reviews.freebsd.org/D20608
2019-06-12 15:58:11 +00:00
Justin Hibbits
988d63af1c powerpc/pmap: Move the SLB spill handlers to a better place
The SLB spill handlers are AIM-specific, and belong better with the rest of
the SLB code anyway.  No functional change.
2019-06-08 03:07:08 +00:00
Justin Hibbits
b7918b86b3 powerpc/aim: Use nitems() for calculating size of phys_avail in AIM pmaps
Same thing was already done in r347164 for Book-E pmap.
2019-06-08 02:36:07 +00:00
Leandro Lupori
b934fc7468 [PPC64] Support QEMU/KVM pseries without hugepages
This set of changes make it possible to run FreeBSD for PowerPC64/pseries,
under QEMU/KVM, without requiring the host to make hugepages available to the
guest.

While there was already this possibility, by means of setting hw_direct_map to
0, on PowerPC64 there were a couple of issues/wrong assumptions that prevented
this from working, before this changelist.

Reviewed by:	jhibbits
Differential Revision:	https://reviews.freebsd.org/D20522
2019-06-07 17:58:59 +00:00
Justin Hibbits
4420fc895f powerpc/moea: Fix moea64 native VA invalidation
Summary:
moea64_insert_pteg_native()'s invalidation only works by happenstance.
The purpose of the shifts and XORs is to extract the VSID in order to
reverse-engineer the lower bits of the VPN.  Currently a segment size is 256MB
(2**28), and ADDR_API_SHFT64 is 16, so ADDR_PIDX_SHIFT is equivalent.  However,
it's semantically incorrect, in that we don't want to shift by the page shift
size, we want to shift to get to the VSID.

Tested by:	bdragon
Differential Revision: https://reviews.freebsd.org/D20467
2019-06-01 01:40:14 +00:00
Justin Hibbits
8cd3016c00 powerpc64/pmap: Reapply r334235 to OEA64 pmap, clearing HID0_RADIX
This was lost in the re-merger of ISA3 MMU into moea64_native.
2019-05-25 04:56:06 +00:00
Leandro Lupori
0632bb89db Fix PPC64 kernel build with clang8 + lld8
This patch fixes the following lld link errors:

- unsupported dynamic relocations on read-only sections
- out-of-range TOC references

Submitted by:	git_bdragon.rtk0.net
Reviewed by:	jhibbits, luporl
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D19352
2019-05-22 15:56:41 +00:00
Justin Hibbits
c3acaec564 powerpc: Fix moea64 pmap from 347952
vm_paddr_t is only 32 bits on AIM32 (currently), causing a build failure with
the shifting.

MFC after:	2 weeks
MFC with:	r347952
2019-05-18 14:55:59 +00:00
Justin Hibbits
a57ec59f9c powerpc64/pmap: NUMA-ize the page pv lock pool to reduce contention
It was found during building llvm that the page pv lock pool was seeing very
high contention.  Since the pmap is already NUMA aware, it was surmised that
the domains were referencing similar pages in the different domains.  This
reduces contention to the point of noise in a lockstat(8) run (~51% down to
under 5%), reducing build times by up to 20%.

This doesn't do a perfect domain alignment, just a best-guess based on
hardware available, that the domain is roughly specified in the upper bits
of the PA.  Trying to be more clever would more than likely result in
reduced performance just on the work needed.

MFC after:	2 weeks
2019-05-18 11:14:43 +00:00
Justin Hibbits
f04019c3c6 powerpc: Initialize the Hardware Interrupt Offset Register (HIOR) earlier for ppc970
Since we now have a much larger KVA on powerpc64, it's possible to get SLB
traps earlier in boot, possibly even before the HIOR is properly configured
for us.  Move the HIOR setup to immediately after reset, so that we use our
exception handlers instead of Open Firmware's.

PR:		233863
Submitted by:	Mark Millard (partial)
Reported by:	Mark Millard
MFC after:	2 weeks
2019-05-10 19:36:14 +00:00
Justin Hibbits
f074eff155 powerpc: Add POWER8NVL definition
The POWER8NVL (POWER8 NVLink) architecturally behaves identically to the
POWER8, with a different PVR identifier.  Mark it as such, so it shows up
appropriately to the user.

Reported by:	Alexey Kardashevskiy
MFC after:	2 weeks
2019-04-27 02:33:49 +00:00
Justin Hibbits
17b72853f4 powerpc64: Clear FSCR SPR, so that it's in a known state
This now turns any access to the DSCR SPR into a SIGILL.  Later commits will
make DCSR work correctly on POWER8 and POWER9.

PR:		237208
2019-04-26 03:18:49 +00:00
Justin Hibbits
19b86243f4 powerpc: Add a couple missing isyncs
mtmsr and mtsr require context synchronizing instructions to follow.  Without
a CSI, there's a chance for a machine check exception.  This reportedly does
occur on a MPC750 (PowerMac G3).

Reported by:	Mark Millard
2019-04-24 02:51:58 +00:00
Justin Hibbits
49d9a59783 Add NUMA support to powerpc
Summary:
Initial NUMA support:
    - associate CPU with domain
    - associate memory ranges with domain
    - identify domain for devices
    - limit device interrupt binding to appropriate domain

- Additionally fixes a bug in the setting of Maxmem which led to
  only memory attached to the first socket being enabled for DMA

A pmap variant can opt in to numa support by by calling `numa_mem_regions`
at the end of pmap_bootstrap - registering the corresponding ranges with the
VM.

This yields a ~20% improvement in build times of llvm on dual socket POWER9
over non-NUMA.

Original patch by mmacy.

Differential Revision: https://reviews.freebsd.org/D17933
2019-04-13 04:03:18 +00:00
Justin Hibbits
0499e9c619 powerpc64: Use medium code model in asm files for TOC references
Summary:
With a sufficiently large TOC, it's possible to index out of range, as
the immediate load instructions only permit 16-bit indices, allowing up
to 64kB range (signed) from the base pointer.  Allow +/- 2GB range, with
the medium code model TOC accesses in asm.

Patch originally by Brandon Bergren.  The issue appears to impact ELFv2
more than ELFv1.

Reviewed by:	luporl
Differential Revision: https://reviews.freebsd.org/D19708
2019-03-29 02:38:30 +00:00
Justin Hibbits
9f1a007da7 powerpc64: Micro-optimize moea64 native pmap tlbie
* Cache moea64_need_lock in a local variable; gcc generates slightly better
  code this way, it doesn't need to reload the value from memory each read.
* VPN cropping is only needed on PowerPC ISA 2.02 and older cores, a subset
  of those that need serialization, so move this under the need_lock check,
  so those that don't need the lock don't even need to check this.
2019-03-26 02:53:35 +00:00
Justin Hibbits
bc94b70098 powerpc: Re-merge isa3 HPT with moea64 native HPT
r345402 fixed the bug that led to the split of the ISA 3.0 HPT handling from
the existing manager.  The cause of the bug was gcc moving the register
holding VPN to a different register (not r0), which triggered bizarre
behaviors.  With the fix, things work, so they can be re-merged.  No
performance lost with the merge.
2019-03-22 22:14:14 +00:00
Justin Hibbits
091a23cbf8 powerpc64: Handle the modern (2.05+) implementaiton of tlbie
By happenstance gcc4 puts 'vpn' into r0 in all uses of TLBIE(), but modern
gcc does not.  Also, the single-argument form of tlbie zeros all unused
arguments, making the modern tlbie instruction use r0 as the RS field
(LPID).

The vpn argument has the bottom 12 bits cleared (the input having been
left-shifted by 12 bits), which just so happens, on the POWER9 and previous
incarnations, to be the number of LPID bits supported.  With those bits
being zero, the instruction:

	tlbie r0, r0

will invalidate the VPN in r0, in LPAR 0 (ignoring the upper bits of r0 for
the RS field).  One build with gcc8 yields:

	tlbie r9, r0

with r0 having arbitrary contents, not equal to r9.  This leads to strange
crashes, behaviors, and panics, due to the requested TLB entry not actually
being invalidated.

As the moea64_native must work on both old and new, we explicitly zero out
r0 so that it can work with only the single argument, built with base gcc
and modern gcc.  isa3_hashtb takes a different approach, encoding the
two-argument form, soas not to explicitly clobber r0, and instead let the
compiler decide.

Reported by:	Brandon Bergren
Tested by:	Brandon Bergren
MFC after:	1 week
2019-03-22 01:43:31 +00:00
Justin Hibbits
1cd7081eb1 powerpc64: Fix early exit with invalid kernel SLB entries
The check for early exit should be checking the SLB entry itself.  As
currently written it was checking the address of the SLB, which is always
non-zero, so would go through the kernel SR restore loop regardless.

Submitted by:	mmacy
MFC after:	2 weeks
2019-03-08 04:20:33 +00:00
Leandro Lupori
b8efbfb9d3 [ppc64] prevent infinite loop on icache sync
At moea64_sync_icache(), when the 'va' argument has page size
alignment, round_page() will return the same value as 'va'.
This would cause 'len' to be 0 and thus an infinite loop.

With this change, 'lim' will always point to the next page boundary.

This issue occurred especially during debugging sessions, when a breakpoint
was placed on an exact page-aligned offset, for instance.

Reviewed by:	jhibbits
Differential Revision:	https://reviews.freebsd.org/D19149
2019-02-12 11:29:03 +00:00
Leandro Lupori
6174048251 powerpc64: Add a trap stack area
Currently, the trap code switches to the the temporary stack in the dbtrap
section. It works in most cases, but in the beginning of the execution, the
temp stack is being used, as starting in the powerpc_init() code.

In this current scenario, the stack is being overwritten, which causes the
return of breakpoint() to take abnormal execution.

This current patchset create a small stack to use by the dbtrap: codepath
avoiding the corruption of the temporary stack.

PR:		224872
Submitted by:	breno.leitao_gmail.com
Reviewed by:	jhibbits
Differential Revision:	https://reviews.freebsd.org/D14484
2019-02-04 16:02:03 +00:00
Leandro Lupori
be2bd024de ppc64: handle exception 0x1500 (soft patch)
This change adds a hypervisor trap handler for exception 0x1500 (soft patch),
normalizing all VSX registers and returning.
This avoids a kernel panic due to unknown exception.

Change made with the collaboration of leonardo.bianconi_eldorado.org.br,
that found out that this is a hypervisor exception and not a supervisor one,
and fixed this in the code.

Reviewed by:	jhibbits, sbruno
Differential Revision:	https://reviews.freebsd.org/D17806
2018-12-10 14:54:28 +00:00
Justin Hibbits
8f69e36d87 powerpc: Don't include KERNBASE in genassym, it's unnecessary
A related future change, which changes KERNBASE for Book-E for some reason
causes a "KERNBASE redefined" error with assym.inc, even though it only changed
the value of KERNBASE and nothing else.  Since machine/vmparam.h is already
included in booke/locore.S, and the requisite guards are already in place for
properly handling KERNBASE in vmparam.h, just remove it from genassym, and
include vmparam.h in the AIM locore files.
2018-11-28 16:00:52 +00:00
Justin Hibbits
266b2aa146 powerpc: Use MAX() macro instead of max() inline function to calculate Maxmem
Maxmem is the highest address for physical memory in the system.  It's
measured in pages which, since max() returns a u_int, should allow for up to
2^44 bytes of memory addressable by the system.  However, on POWER9 systems
at least, memory addressed by additional socketed CPUs begins at addresses
far above the 2^44 mark, causing issues with memory accesses and DMA, when
memory is addressed on the auxiliary CPUs.  Use the MAX() macro instead,
which doesn't convert arguments, so retains Maxmem and all calculations as
its defined long type (64-bit on powerpc64), keeping the maximum address
correct.

Submitted by:	mmacy
2018-11-10 02:37:56 +00:00
Justin Hibbits
d692cd43c4 powerpc64/pmap: Correct the logic for minidump KVA chunk
r279252 inverted the logic in moea64_scan_init, such that instead of
terminating when reaching a dead page, it terminates when reaching a live
page, ostensibly preserving exactly one page of KVA.
2018-10-21 02:28:04 +00:00
Justin Hibbits
ab42fbe2e9 powerpc64: Fix stack setup in dbtrap
r330610 relocated the DMAP from the base of memory to the base of the fourth
quadrant of memory.  This broke synthetic traps, such as KDB forced
breakpoints.  Use GET_TOCBASE() so the DMAP offset is handled.

Submitted by:	git_bdragon.rkt0.net
Differential Revision:	https://reviews.freebsd.org/D15973
2018-06-23 01:42:34 +00:00
Justin Hibbits
ebf95d96d9 Split the PowerISA 3.0 HPT implementation from historic
PowerISA 3.0 makes several changes to not only the format of the HPT but
also the behavior surrounding it.  For instance, TLBIE no longer requires
serialization.  Removing this lock cuts buildworld time in half on a
18-core/72-thread POWER9 system, demonstrating that this lock is highly
contended on such a system.

There was odd behavior observed trying to make this change in a
backwards-compatible manner in moea64_native.c, so the best option was to
fully split it, and largely revert the original changes adding POWER9
support to the original file.

Suggested by:	nwhitehorn
2018-06-14 17:23:51 +00:00
Justin Hibbits
402c7806cb Fix CTR formatting for moea64_native bootstrap
On very large memory systems 'size' can become 2GB or larger, resulting in a
negative value being formatted.  Also, moea64_pteg_count is already a long, so
format it as such.
2018-06-14 16:01:11 +00:00
Justin Hibbits
5ab39b6552 On POWER9 clear the HID0_RADIX before enabling the page tables
POWER9 supports Radix page tables in addition to Hashed page tables.  When
Radix page tables are in use, the TLB is cut in half, so that half of the
TLB is used for the page walk cache.  This is the default behavior, however
FreeBSD currently does not support Radix tables.  Clear this bit so that we
can use the full TLB.  Do this in the MMU logic so that configuration can be
localized to the specific translation format.  Once we do support Radix
tables, the setup for that will be localized to the Radix MMU kobj.
2018-05-26 04:33:19 +00:00
Justin Hibbits
204d74320d Only crop the VPN on POWER4 and derivatives for TLBIE operations
Summary:
PowerISA 2.03 and later require bits 14:65 in the RB register argument,
which is the full value of the vpn argument post-shift.  Only POWER4, POWER4+,
and PPC970* need the upper 16 bits cropped.

With this change FreeBSD can boot to multi-user on POWER9.

Reviewed by:	nwhitehorn
Differential Revision: https://reviews.freebsd.org/D15581
2018-05-26 00:41:50 +00:00
Justin Hibbits
ef6da5e5c7 Add support for the XIVE XICS emulation mode for POWER9 systems
Summary:
POWER9 systems use a new interrupt controller, XIVE, managed through OPAL
firmware calls.  The OPAL firmware includes support for emulating the previous
generation XICS presentation layer in addition to a new "XIVE Exploitation"
mode.  As a stopgap until we have XIVE exploitation mode, enable XICS emulation
mode so that we at least have an interrupt controller.

Since the CPPR is local to the current CPU, it cannot be updated for APs when
initializing on the BSP.  This adds a new function, directly called by the
powernv platform code, to initialize the CPPR on AP bringup.

Reviewed by:	nwhitehorn
Differential Revision: https://reviews.freebsd.org/D15492
2018-05-20 03:23:17 +00:00
Justin Hibbits
5321c01b50 Add hypervisor trap handling, using HSRR0/HSRR1
Summary:
Some hypervisor exceptions on POWER architecture only save state to HSRR0/HSRR1.
Until we have bhyve on POWER, use a lightweight exception frontend which copies
HSRR0/HSRR1 into SRR0/SRR1, and run the normal trap handler.

The first user of this is the Hypervisor Virtualization Interrupt, which targets
the XIVE interrupt controller on POWER9.

Reviewed By: nwhitehorn
Differential Revision: https://reviews.freebsd.org/D15487
2018-05-19 04:21:50 +00:00
Nathan Whitehorn
b00df92b1f Final fix for alignment issues with the page table first patched with
r333273 and partially reverted with r333594.

Older CPUs implement addition of offsets into the page table by a
bitwise OR rather than actual addition, which only works if the table is
aligned at a multiple of its own size (they also require it to be aligned
at a multiple of 256KB). Newer ones do not have that requirement, but it
hardly matters to enforce it anyway.

The original code was failing on newer systems with huge amounts of RAM
(> 512 GB), in which the page table was 4 GB in size. Because the
bootstrap memory allocator took its alignment parameter as an int, this
turned into a 0, removing any alignment constraint at all and making
the MMU fail. The first round of this patch (r333273) fixed this case by
aligning it at 256 KB, which broke older CPUs. Fix this instead by widening
the alignment parameter.
2018-05-14 04:00:52 +00:00
Nathan Whitehorn
b9ff14e6e9 Revert changes to hash table alignment in r333273, which booting on all G5
systems, pending further analysis.
2018-05-13 23:56:43 +00:00
Justin Hibbits
10d0cdfc6e Add support for powernv POWER9 MMU initialization
The POWER9 MMU (PowerISA 3.0) is slightly different from current
configurations, using a partition table even for hypervisor mode, and
dropping the SDR1 register.  Key off the newly early-enabled CPU features
flags for the new architecture, and configure the MMU appropriately.

The POWER9 MMU ignores the "PSIZ" field in the PTCR, and expects a 64kB
table.  As we are enabled for powernv (hypervisor mode, no VMs), only
initialize partition table entry 0, and zero out the rest.  The actual
contents of the register are identical to SDR1 from previous architectures.

Along with this, fix a bug in the page table allocation with very large
memory.  The table can be allocated on any 256k boundary.  The
bootstrap_alloc alignment argument is an int, and with large amounts of
memory passing the size of the table as the alignment will overflow an
integer.  Hard-code the alignment at 256k as wider alignment is not
necessary.

Reviewed by:	nwhitehorn
Tested by:	Breno Leitao
Relnotes:	Yes
2018-05-05 16:00:02 +00:00
Justin Hibbits
4f4f92c58f Add POWER9 to the POWER8 bootstrap case blocks
POWER8 and POWER9 have similar configuration requirements for hypervisor setup,
and in the cases here they're identical.  Add the POWER9 constant to the POWER8
list so it's initialized correctly.

Reviewed by:	nwhitehorn
2018-05-05 15:42:58 +00:00
Justin Hibbits
567dd766f6 powerpc64: Set n_slbs = 32 for POWER9
Summary:
POWER9 also contains 32 slbs entries as explained by the POWER9 User Manual:

 "For HPT translation, the POWER9 core contains a unified (combined for both
   instruction and data), 32-entry, fully-associative SLB per thread"

Submitted by:	Breno Leitao
Differential Revision: https://reviews.freebsd.org/D15128
2018-04-20 03:23:19 +00:00
Nathan Whitehorn
323e673945 Fix detection of memory overlap with the kernel in the case where a memory
region marked "available" by firmware is contained entirely in the kernel.

This had a tendency to happen with FDTs passed by loader, though could for
other reasons as well, and would result in the kernel slowly cannibalizing
itself for other purposes, eventually resulting in a crash.

A similar fix is needed for mmu_oea.c and should probably just be rolled
at that point into some generic code in platform.c for taking a mem_region
list and removing chunks.

PR:		226974
Submitted by:	leandro.lupori@gmail.com
Reviewed by:	jhibbits
Differential Revision:	D15121
2018-04-19 18:34:38 +00:00
Brooks Davis
6469bdcdb6 Move most of the contents of opt_compat.h to opt_global.h.
opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.

Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c.  A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.

Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.

Reviewed by:	kib, cem, jhb, jtl
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14941
2018-04-06 17:35:35 +00:00
Ed Maste
fc2a8776a2 Rename assym.s to assym.inc
assym is only to be included by other .s files, and should never
actually be assembled by itself.

Reviewed by:	imp, bdrewery (earlier)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D14180
2018-03-20 17:58:51 +00:00
Nathan Whitehorn
9b0ec025d4 Restore missing temporary variable, deleted by accident in r330845. This
unbreaks the ppc32 AIM build.

Reported by:	jhibbits
2018-03-13 18:24:21 +00:00
Nathan Whitehorn
8864f35942 Execute PowerPC64/AIM kernel from direct map region when possible.
When the kernel can be in real mode in early boot, we can execute from
high addresses aliased to the kernel's physical memory. If that high
address has the first two bits set to 1 (0xc...), those addresses will
automatically become part of the direct map. This reduces page table
pressure from the kernel and it sets up the kernel to be used with
radix translation, for which it has to be up here.

This is accomplished by exploiting the fact that all PowerPC kernels are
built as position-independent executables and relocate themselves
on start. Before this patch, the kernel runs at 1:1 VA:PA, but that
VA/PA is random and set by the bootloader. Very early, it processes
its ELF relocations to operate wherever it happens to find itself.
This patch uses that mechanism to re-enter and re-relocate the kernel
a second time witha new base address set up in the early parts of
powerpc_init().

Reviewed by:	jhibbits
Differential Revision:	D14647
2018-03-13 15:03:58 +00:00
Nathan Whitehorn
f9edb09d70 Move the powerpc64 direct map base address from zero to high memory. This
accomplishes a few things:
- Makes NULL an invalid address in the kernel, which is useful for catching
  bugs.
- Lays groundwork for radix-tree translation on POWER9, which requires the
  direct map be at high memory.
- Similarly lays groundwork for a direct map on 64-bit Book-E.

The new base address is chosen as the base of the fourth radix quadrant
(the minimum kernel address in this translation mode) and because all
supported CPUs ignore at least the first two bits of addresses in real
mode, allowing direct-map addresses to be used in real-mode handlers.
This is required by Linux and is part of the architecture standard
starting in POWER ISA 3, so can be relied upon.

Reviewed by:	jhibbits, Breno Leitao
Differential Revision:	D14499
2018-03-07 17:08:07 +00:00
Wojciech Macek
6d13fd638c PowerNV: Put processor to power-save state in idle thread
When processor enters power-save state it releases resources shared with other
cpu threads which makes other cores working much faster.

This patch also implements saving and restoring registers that might get
corrupted in power-save state.

Submitted by:          Patryk Duda <pdk@semihalf.com>
Obtained from:         Semihalf
Reviewed by:           jhibbits, nwhitehorn, wma
Sponsored by:          IBM, QCM Technologies
Differential revision: https://reviews.freebsd.org/D14330
2018-02-21 14:28:40 +00:00
Konstantin Belousov
2c0f13aa59 vm_wait() rework.
Make vm_wait() take the vm_object argument which specifies the domain
set to wait for the min condition pass.  If there is no object
associated with the wait, use curthread' policy domainset.  The
mechanics of the wait in vm_wait() and vm_wait_domain() is supplied by
the new helper vm_wait_doms(), which directly takes the bitmask of the
domains to wait for passing min condition.

Eliminate pagedaemon_wait().  vm_domain_clear() handles the same
operations.

Eliminate VM_WAIT and VM_WAITPFAULT macros, the direct functions calls
are enough.

Eliminate several control state variables from vm_domain, unneeded
after the vm_wait() conversion.

Scetched and reviewed by:	jeff
Tested by:	pho
Sponsored by:	The FreeBSD Foundation, Mellanox Technologies
Differential revision:	https://reviews.freebsd.org/D14384
2018-02-20 10:13:13 +00:00
Justin Hibbits
bce6d88bc1 Merge AIM and Book-E PCPU fields
This is part of a long-term goal of merging Book-E and AIM into a single GENERIC
kernel.  As more work is done, the struct may be optimized further.

Reviewed by:	nwhitehorn
2018-02-17 20:59:12 +00:00
Steve Wills
e1782bae5f Correct longjmp
Reviewed by:	nwhitehorn
Differential Revision:	https://reviews.freebsd.org/D14159
2018-02-02 02:28:25 +00:00
Nathan Whitehorn
619282986d Change the default MSR values used when starting userland and kernel
threads from compile-time defines to global variables. This removes a
significant amount of duplicated runtime patches to the compile-time
defines, centralizing the conditional logic in the early startup code.

Reviewed by:	jhibbits
2018-02-01 05:31:24 +00:00
Nathan Whitehorn
564ac41556 Fix build on 32-bit PowerPC, broken in r328537. 2018-02-01 05:28:02 +00:00
Wojciech Macek
d32802f0c3 PowerNV: fix compilation on non-NV platforms
Submitted by:          Wojciech Macek <wma@semihalf.com>
Obtained from:         Semihalf
Sponsored by:          IBM, QCM Technologies
2018-01-31 06:42:01 +00:00
Wojciech Macek
70bb600a0a PowerNV: move LPCR and LPID altering to cpudep_ap_early_bootstrap
It turns out that under some circumstances we can get DSI or DSE before we set
LPCR and LPID so we should set it as early as possible.

Authored by:           Patryk Duda <pdk@semihalf.com>
Submitted by:          Wojciech Macek <wma@semihalf.com>
Obtained from:         Semihalf
Sponsored by:          IBM, QCM Technologies
2018-01-29 09:27:02 +00:00
Nathan Whitehorn
eb1baf72ae Remove hard-coded trap-handling logic involving the segmented memory model
used with hashed page tables on AIM and place it into a new, modular pmap
function called pmap_decode_kernel_ptr(). This function is the inverse
of pmap_map_user_ptr(). With POWER9 radix tables, which mapping to use
becomes more complex than just AIM/BOOKE and it is best to have it in
the same place as pmap_map_user_ptr().

Reviewed by:	jhibbits
2018-01-29 04:33:41 +00:00
Nathan Whitehorn
7790e46cf0 On AIM systems without a software-managed SLB, such as POWER9 systems using
either hardware segment tables or radix-tree-based page tables, do not try
to install SLB entries at trap boundaries.
2018-01-19 22:19:50 +00:00
Nathan Whitehorn
fc8ea4be2a Install the SLB miss trap-handling code in the SLB-based MMU driver set up,
to which it is specific, rather than in the generic AIM startup code. This
will be required to support the radix-table-based MMU introduced with POWER9.
2018-01-15 16:08:34 +00:00
Nathan Whitehorn
04329fa708 Move the pmap-specific code in copyinout.c that gets pointers to userland
buffers into a new pmap-module function pmap_map_user_ptr() that can
be implemented by the respective modules. This is required to implement
non-segment-based AIM-ish MMU systems such as the radix-tree page tables
introduced by POWER ISA 3.0 and present on POWER9.

Reviewed by:	jhibbits
2018-01-15 06:46:33 +00:00
Nathan Whitehorn
68b9c019aa Document places we assume that physical memory is direct-mapped at zero by
using a new macro PHYS_TO_DMAP, which deliberately has the same name as the
equivalent macro on amd64. This also sets the stage for moving the direct
map to another base address.
2018-01-13 23:14:53 +00:00
Jeff Roberson
ab3185d15e Implement NUMA support in uma(9) and malloc(9). Allocations from specific
domains can be done by the _domain() API variants.  UMA also supports a
first-touch policy via the NUMA zone flag.

The slab layer is now segregated by VM domains and is precise.  It handles
iteration for round-robin directly.  The per-cpu cache layer remains
a mix of domains according to where memory is allocated and freed.  Well
behaved clients can achieve perfect locality with no performance penalty.

The direct domain allocation functions have to visit the slab layer and
so require per-zone locks which come at some expense.

Reviewed by:	Attilio (a slightly older version)
Tested by:	pho
Sponsored by:	Netflix, Dell/EMC Isilon
2018-01-12 23:25:05 +00:00
Nathan Whitehorn
a891d21aac Make sure the first instruction of the low-memory spinloop is in the
cacheline being invalidated.

MFC after:	1 month
2017-12-31 05:38:19 +00:00
Nathan Whitehorn
70f654991a Add support for 64-bit PowerPC kernels to be directly loaded by kexec, which
is used as the bootloader on a number of PPC64 platforms. This involves the
following pieces:
- Making the first instruction a valid kernel entry point, since kexec
  ignores the ELF entry value. This requires a separate section and linker
  magic to prevent the linker from filling the beginning of the section
  with stubs.
- Adding an entry point at 0x60 past the first instruction for systems
  lacking firmware CPU shutdown support (notably PS3).
- Linker script changes to support the above.

MFC after:	1 month
2017-12-29 20:30:10 +00:00
Nathan Whitehorn
8469e0fe35 Maintain alignment of in-code 64-bit quantities by design rather than luck.
If these are not aligned, the linker has to emit a different type of
relocation that the early boot self-relocation code cannot handle, even
in principle, resulting in them being set to zero and the kernel crashing.

MFC after:	1 week
2017-12-29 20:25:15 +00:00
Pedro F. Giffuni
71e3c3083b sys/powerpc: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 15:09:59 +00:00
Nathan Whitehorn
47f69f4f2b Use the cookie now set by loader to determine whether the value passed to
PowerPC kernels in r6 is actually metadata from loader(8) or gibberish
left in r6, which is not required to be anything under the
PAPR/ePAPR/CHRP/OF standards, by another boot loader.

Note that, as a result, systems need a new boot loader to boot PPC kernels
after this revision without ending up at a mountroot prompt. New boot
loaders are backwards compatible and can boot older kernels.

Reviewed by:	jhibbits
MFC after:	2 months
2017-11-26 03:53:20 +00:00
Nathan Whitehorn
50d82d6f6a Missed gate on __powerpc64__ for setting LPCR in r326207.
MFC after:	3 weeks
X-MFC-with:	r326207
2017-11-25 22:15:56 +00:00
Nathan Whitehorn
5bcc3e4277 Allow platform modules to set the size of large pizes, as potentially
discovered from firmware, and better handle highly-discontiguous memory
and CPU maps.

MFC after:	3 weeks
2017-11-25 22:13:19 +00:00
Nathan Whitehorn
312fb3d8dd Invalidate TLB at boot using the correct IS settings on newer-than-POWER5
CPUs.

MFC after:	3 weeks
2017-11-25 22:10:10 +00:00
Nathan Whitehorn
66d6978c27 Missed platform_smp_timebase_sync() in r326205.
MFC after:	3 weeks
X-MFC-With:	r326205
2017-11-25 22:06:40 +00:00