Commit Graph

389 Commits

Author SHA1 Message Date
Mark Johnston
b119329d81 Complete the removal of the "wire_count" field from struct vm_page.
Convert all remaining references to that field to "ref_count" and update
comments accordingly.  No functional change intended.

Reviewed by:	alc, kib
Sponsored by:	Intel, Netflix
Differential Revision:	https://reviews.freebsd.org/D21768
2019-09-25 16:11:35 +00:00
Mark Johnston
e8bcf6966b Revert r352406, which contained changes I didn't intend to commit. 2019-09-16 15:04:45 +00:00
Mark Johnston
41fd4b9422 Fix a couple of nits in r352110.
- Remove a dead variable from the amd64 pmap_extract_and_hold().
- Fix grammar in the vm_page_wire man page.

Reported by:	alc
Reviewed by:	alc, kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D21639
2019-09-16 15:03:12 +00:00
Mark Johnston
fee2a2fa39 Change synchonization rules for vm_page reference counting.
There are several mechanisms by which a vm_page reference is held,
preventing the page from being freed back to the page allocator.  In
particular, holding the page's object lock is sufficient to prevent the
page from being freed; holding the busy lock or a wiring is sufficent as
well.  These references are protected by the page lock, which must
therefore be acquired for many per-page operations.  This results in
false sharing since the page locks are external to the vm_page
structures themselves and each lock protects multiple structures.

Transition to using an atomically updated per-page reference counter.
The object's reference is counted using a flag bit in the counter.  A
second flag bit is used to atomically block new references via
pmap_extract_and_hold() while removing managed mappings of a page.
Thus, the reference count of a page is guaranteed not to increase if the
page is unbusied, unmapped, and the object's write lock is held.  As
a consequence of this, the page lock no longer protects a page's
identity; operations which move pages between objects are now
synchronized solely by the objects' locks.

The vm_page_wire() and vm_page_unwire() KPIs are changed.  The former
requires that either the object lock or the busy lock is held.  The
latter no longer has a return value and may free the page if it releases
the last reference to that page.  vm_page_unwire_noq() behaves the same
as before; the caller is responsible for checking its return value and
freeing or enqueuing the page as appropriate.  vm_page_wire_mapped() is
introduced for use in pmap_extract_and_hold().  It fails if the page is
concurrently being unmapped, typically triggering a fallback to the
fault handler.  vm_page_wire() no longer requires the page lock and
vm_page_unwire() now internally acquires the page lock when releasing
the last wiring of a page (since the page lock still protects a page's
queue state).  In particular, synchronization details are no longer
leaked into the caller.

The change excises the page lock from several frequently executed code
paths.  In particular, vm_object_terminate() no longer bounces between
page locks as it releases an object's pages, and direct I/O and
sendfile(SF_NOCACHE) completions no longer require the page lock.  In
these latter cases we now get linear scalability in the common scenario
where different threads are operating on different files.

__FreeBSD_version is bumped.  The DRM ports have been updated to
accomodate the KPI changes.

Reviewed by:	jeff (earlier version)
Tested by:	gallatin (earlier version), pho
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D20486
2019-09-09 21:32:42 +00:00
Justin Hibbits
7038069e9c Revert a part of r350883 that should never have gone in
The wire_count change is not part of the unification, and doesn't even make
sense.

Reported by:	markj
2019-08-27 14:04:32 +00:00
Justin Hibbits
8da08d5123 powerpc/booke: Clean up pmap a little for 64-bit
64-bit Book-E pmap doesn't need copy and zero bounce pages, nor the mutex.
Don't initialize them or reserve space for them.
2019-08-25 20:11:35 +00:00
Justin Hibbits
e7f2151c75 powerpc/booke: Use the DMAP if possible in pmap_map()
This avoids unnecessary TLB usage for statically mapped regions, such as
vm_page_array.
2019-08-25 20:08:48 +00:00
Justin Hibbits
6793e5b23d powerpc: Link Book-E kernels at the same address as AIM kernels
Summary:
Reduce the diff between AIM and Book-E even more.  This also cleans up
vmparam.h significantly.

Reviewed by:	luporl
Differential Revision:	https://reviews.freebsd.org/D21301
2019-08-20 01:26:02 +00:00
Jeff Roberson
2194393787 Move phys_avail definition into MI code. It is consumed in the MI layer and
doing so adds more flexibility with less redundant code.

Reviewed by:	jhb, markj, kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D21250
2019-08-16 00:45:14 +00:00
Justin Hibbits
141a0ab012 powerpc/pmap: Enable UMA_MD_SMALL_ALLOC for 64-bit booke
The only thing blocking UMA_MD_SMALL_ALLOC from working on 64-bit booke
powerpc was a missing check in pmap_kextract().  Adding DMAP handling into
pmap_kextract(), we can now use UMA_MD_SMALL_ALLOC.  This should improve
performance and stability a bit, since DMAP is always mapped in TLB1, so
this relieves pressure on TLB0.

MFC after:	3 weeks
2019-08-15 03:42:15 +00:00
Justin Hibbits
3be09f300d powerpc: Unify pmap definitions between AIM and Book-E
This is part 2 of r347078, pulling the page directory out of the Book-E
pmap.  This breaks KBI for anything that uses struct pmap (such as vm_map)
so any modules that access this must be rebuilt.
2019-08-12 03:03:56 +00:00
Justin Hibbits
937e8f20af powerpc/pmap: Minor optimizations to 64-bit booke pmap
Don't recalculate the VM page of the page table pages, just pass them down
to free.  Also, use the pmap's page zero function instead of bzero().
2019-08-08 03:18:35 +00:00
Justin Hibbits
8b1531eca8 Fix build from r350622
It helps if my local kernel build has INVARIANTS.
2019-08-06 03:49:40 +00:00
Justin Hibbits
dc825fed55 powerpc/pmap: Simplify Book-E 64-bit page table management
There is no need for the 64-bit pmap to have a fixed number of page table
buffers.  Since the 64-bit pmap has a DMAP, we can effectively have user
page tables limited only by total RAM size.
2019-08-06 03:16:06 +00:00
Justin Hibbits
cafceaebea powerpc/SPE: Enable SPV bit for EFSCFD instruction emulation
EFSCFD (floating point single convert from double) emulation requires saving
the high word of the register, which uses SPE instructions.  Enable the SPE
to avoid an SPV Unavailable exception.

MFC after:	1 week
2019-07-20 18:22:01 +00:00
Mark Johnston
eeacb3b02f Merge the vm_page hold and wire mechanisms.
The hold_count and wire_count fields of struct vm_page are separate
reference counters with similar semantics.  The remaining essential
differences are that holds are not counted as a reference with respect
to LRU, and holds have an implicit free-on-last unhold semantic whereas
vm_page_unwire() callers must explicitly determine whether to free the
page once the last reference to the page is released.

This change removes the KPIs which directly manipulate hold_count.
Functions such as vm_fault_quick_hold_pages() now return wired pages
instead.  Since r328977 the overhead of maintaining LRU for wired pages
is lower, and in many cases vm_fault_quick_hold_pages() callers would
swap holds for wirings on the returned pages anyway, so with this change
we remove a number of page lock acquisitions.

No functional change is intended.  __FreeBSD_version is bumped.

Reviewed by:	alc, kib
Discussed with:	jeff
Discussed with:	jhb, np (cxgbe)
Tested by:	pho (previous version)
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D19247
2019-07-08 19:46:20 +00:00
Justin Hibbits
72e58595b1 powerpc/booke: It helps to set variables before using them
Actually set the source and destination VA's before using them.  Fixes a
bizarre panic on 32-bit Book-E.  Not sure why this wasn't caught by the
compiler.
2019-05-23 03:40:48 +00:00
Justin Hibbits
b4b2e7a7b9 powerpc/booke: Use wrtee instead of msr to restore EE bit
The MSR[EE] bit does not require synchronization when changing.  This is a
trivial micro-optimization, removing the trailing isync from mtmsr().

MFC after:	1 week
2019-05-22 02:43:17 +00:00
Justin Hibbits
2b03b6bd45 powerpc/booke: Rewrite pmap_sync_icache() a bit
* Make mmu_booke_sync_icache() use the DMAP on 64-bit prcoesses, no need to
  map the page into the user's address space.  This removes the
  pvh_global_lock from the equation on 64-bit.
* Don't map the page with user-readability on 32-bit.  I don't know what the
  chance of a given user process being able to access the NULL page when
  another process's page is added there, but it doesn't seem like a good
  idea to map it to NULL with user read permissions.
* Only sync as much as we need to.  There are only two significant places
  where pmap_sync_icache is used: proc_rwmem(), and the SIGILL second-chance
  for powerpc.  The SIGILL second chance is likely the most common, and only
  syncs 4 bytes, so avoid the other 127 loop iterations (4096 / 32 byte
  cacheline) in __syncicache().
2019-05-08 16:15:28 +00:00
Justin Hibbits
4023311a29 powerpc/booke: Do as much work outside of TLB locks as possible
Reduce the surface area of the TLB locks.  Unfortunately the same trick for
serializing the tlbie instruction on OEA64 cannot be used here to reduce the
scope of the tlbivax mutex to the tlbsync only, as the mutex also serializes
the TLB miss lock as a side effect, so contention on this lock may not be
reducible any further.
2019-05-08 16:05:18 +00:00
Justin Hibbits
2154a866b6 powerpc/booke: Use #ifdef __powerpc64__ instead of hw_direct_map in places
Since the DMAP is only available on powerpc64, and is *always* available on
Book-E powerpc64, don't penalize either side (32-bit or 64-bit) by always
checking hw_direct_map to perform operations.  This saves 5-10% time on
various ports builds, and on buildworld+buildkernel on Book-E hardware.

MFC after:	3 weeks
2019-05-05 20:23:43 +00:00
Justin Hibbits
bfd0787769 powerpc/booke: Fix size check for phys_avail in pmap bootstrap
Use the nitems() macro instead of the expansion, a'la r298352.  Also, fix the
location of this check to after initializing availmem_regions_sz, so that the
check isn't always against 0, thus always failing (nitems(phys_avail) is always
more than 0).
2019-05-05 20:05:50 +00:00
Justin Hibbits
0af5d6f7d9 powerpc: Stop pretending we run on e500v1 cores
Unconditional writing to MAS7, which doesn't exist on the e500v1 core, in a
TLB miss handler has been in the code for several years now.  Since this has
gone unnoticed for so long, it's easily concluded that e500v1 is not in use
with FreeBSD.  Simplify the code path a bit, by unconditionally zeroing MAS7
instead of calling a subroutine to do it.
2019-04-30 03:45:46 +00:00
Justin Hibbits
0499e9c619 powerpc64: Use medium code model in asm files for TOC references
Summary:
With a sufficiently large TOC, it's possible to index out of range, as
the immediate load instructions only permit 16-bit indices, allowing up
to 64kB range (signed) from the base pointer.  Allow +/- 2GB range, with
the medium code model TOC accesses in asm.

Patch originally by Brandon Bergren.  The issue appears to impact ELFv2
more than ELFv1.

Reviewed by:	luporl
Differential Revision: https://reviews.freebsd.org/D19708
2019-03-29 02:38:30 +00:00
Justin Hibbits
5b4c63b781 powerpc/booke: Depessimize MAS register updates even more
Remove isyncs between MAS register updates in the TLB miss handler, since
it's only needed before the TLB update instructions.
2019-03-02 20:59:18 +00:00
Justin Hibbits
a9b033c2f3 powerpc/booke: Fix 32-bit build
MFC after:	2 weeks
MFC with:	344202
2019-02-16 04:47:33 +00:00
Justin Hibbits
0454ed9794 powerpc/booke: depessimize MAS register updates
We only need to isync before we actually use the MAS registers, so before and
after the TLB read/write/sync/search operations.

MFC after:	2 weeks
2019-02-16 04:38:34 +00:00
Justin Hibbits
18f7e2b45e powerpc/booke: Use DMAP where possible for page copy and zeroing
This avoids several locks and pmap_kenter()'s, improving performance
marginally.

MFC after:	2 weeks
2019-02-16 04:16:10 +00:00
Justin Hibbits
64143619ab powerpc/booke: Use the 'tlbilx' instruction on newer cores
Newer cores have the 'tlbilx' instruction, which doesn't broadcast over
CoreNet.  This is significantly faster than walking the TLB to invalidate
the PID mappings.  tlbilx with the arguments given takes 131 clock cycles to
complete, as opposed to 512 iterations through the loop plus tlbre/tlbwe at
each iteration.

MFC after:	3 weeks
2019-02-13 03:11:12 +00:00
Justin Hibbits
2da4e52d79 powerpcspe: Correct SPE high-component loading
Don't clobber the low part of the register restoring the high component of.
This could lead to very bad behavior if it's an ABI-affected register.

While here, also mark the asm volatile in the SPE high save case, to match
the load case.

Reported by:	Branden Bergren (git_bdragon.rtk0.net)
MFC after:	1 week
2019-01-13 04:51:24 +00:00
Justin Hibbits
3067a880ce powerpcspe: Fix GPR handling in SPE exception handler
Optimize the exception handler to only save and load the upper word of the
GPRs used in the emulating instruction.  This reduces the save/load
overhead, and as a side effect does not overwrite the upper word of any
temporary register.

With this commit I am now able to run editors/abiword and math/gnumeric on a
e500-based system.

MFC after:	1 week
MFC With:	r341752,r341751
2018-12-13 04:48:28 +00:00
Justin Hibbits
9d720d45c9 powerpc/booke: Don't get and use the load offset for TOC on APs
The code was a near exact copy of the code in startup, but it doesn't need
the complexity since the kernel is already relocated.  With
VM_MIN_KERNEL_ADDRESS as currently set to KERNBASE, this doesn't cause a
problem, because it's a zero offset.  However, when KERNBASE is changed to a
physical load address, it then has a non-zero offset, and ends up with an
invalid stack pointer, causing the AP to hang.
2018-12-11 02:03:00 +00:00
Justin Hibbits
ddc6c1fa3d powerpc/SPE: Copy lower part of source register to target for efdabs/efdnabs/efdneg
MFC after:	1 week
MFC With:	r341751
2018-12-09 04:54:55 +00:00
Justin Hibbits
3d6bebd3a2 powerpc/SPE: Reload vector registers after efdabs/efdnabs/efdneg
While here, also style(9)-adjust indents around this code.
2018-12-09 04:13:14 +00:00
Justin Hibbits
f1e0cb5ef1 powerpc: preload_addr_relocate is no longer necessary for booke
The same behavior was moved to machdep.c, paired with AIM's relocation,
making this redundant.  With this, it's now possible to boot FreeBSD with
ubldr on a uboot Book-E platform, even with a
KERNBASE != VM_MIN_KERNEL_ADDRESS.
2018-12-04 03:51:10 +00:00
Justin Hibbits
3c0b081966 powerpc/booke: Check for the metadata address by physical address
The metadata pointer will almost never be at or above 'btext', as btext is a
relocated symbol, so will be based at VM_MIN_KERNEL_ADDRESS, not at
KERNBASE.  Check the address against kernload, where the kernel is
physically loaded.
2018-12-03 04:47:28 +00:00
Justin Hibbits
b996435337 powerpc/booke: Fix debug printfs in pmap
Add missing '%'s so printf formats are actually handled.
2018-11-28 04:02:26 +00:00
Justin Hibbits
ea32838af0 powerpc: Prepare Book-E kernels for KERNBASE != run base
Book-E kernels really run at VM_MIN_KERNEL_ADDRESS, which currently happens to
be the same as KERNBASE.  KERNBASE is the linked address, which the loader also
takes to be the physical load address.  Treat KERNBASE as a physical address,
not a virtual, and change virtual address references for KERNBASE to use
something more appropriate.
2018-11-28 02:00:27 +00:00
Justin Hibbits
6f4827aca7 powerpc/booke: Turn tlb*_print_tlbentries() into 'show tlb*' DDB commands
debugf() is unnecessary for the TLB printing functions, as they're only
intended to be used from ddb.  Instead, make them full DDB 'show'
commands, so now it can be written as 'show tlb1' and 'show tlb0'
instead of calling the function, hoping DEBUG has been defined.
2018-10-22 00:21:27 +00:00
Justin Hibbits
289041e2cb powerpcspe: Implement SPE exception handling
The Signal Processing Engine (SPE) found in Freescale e500 cores (and
others) offloads IEEE-754 compliance (NaN, Inf handling, overflow,
underflow) to software, most likely as a means of simplifying the APU
silicon.  Some software, like AbiWord, needs full IEEE-754 compliance,
including NaN handling.  Implement the necessary bits to enable it.

Differential Revision: https://reviews.freebsd.org/D17446
2018-10-21 00:43:27 +00:00
Justin Hibbits
340a810bf0 booke pmap: hide debug-ish printf behind bootverbose
It's not necessary during normal operation to know the mapped region size
and wasted space.
2018-08-19 18:54:43 +00:00
Justin Hibbits
38cfc8c393 Print the full-width pointer values in hex.
PRI0ptrX is used to print a zero-padded hex value of the architecture's bitness,
so on 64-bit architectures it'll print the full 64 bit address.
2018-05-28 00:19:08 +00:00
Justin Hibbits
971b5e4da8 Remove dead errata fixup code
This code caused more problems than it should have fixed (boot failures) on
the machines I tested, so has been commented out for a while now.  Remove
it, and assume the errata fixups were done by the bootloader where they
belong.
2018-05-01 04:31:17 +00:00
Justin Hibbits
99adcecf38 Call through powerpc_interrupt for all Book-E interrupts
Make int_external_input, int_decrementer, and int_performance_counter all
now use trap_common, just like on AIM.  The effects of this are:

* All traps are now properly displayed in ddb.  Previously traps from
  external input, decrementer, and performance counters, would display as
  just basic stack traces.  Now the frame is displayed.

* External interrupts are now handled with interrupts enabled, so handling
  can be preempted.  This seems to fix a hang found post-r329882.
2018-04-10 17:32:27 +00:00
Brooks Davis
6469bdcdb6 Move most of the contents of opt_compat.h to opt_global.h.
opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.

Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c.  A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.

Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.

Reviewed by:	kib, cem, jhb, jtl
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14941
2018-04-06 17:35:35 +00:00
Justin Hibbits
9225dfbdf0 Correct the ilog2() for calculating memory sizes.
TLB1 can handle ranges up to 4GB (through e5500, larger in e6500), but
ilog2() took a unsigned int, which maxes out at 4GB-1, but truncates
silently.  Increase the input range to the largest supported, at least for
64-bit targets.  This lets the DMAP be completely mapped, instead of only
1GB blocks with it assuming being fully mapped.
2018-04-04 02:13:27 +00:00
Justin Hibbits
9f5b999aca Add support for a pmap direct map for 64-bit Book-E
As with AIM64, map the DMAP at the beginning of the fourth "quadrant" of
memory, and move the KERNBASE to the the start of KVA.

Eventually we may run the kernel out of the DMAP, but for now, continue
booting as it has been.
2018-04-03 00:45:38 +00:00
Ed Maste
fc2a8776a2 Rename assym.s to assym.inc
assym is only to be included by other .s files, and should never
actually be assembled by itself.

Reviewed by:	imp, bdrewery (earlier)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D14180
2018-03-20 17:58:51 +00:00
Justin Hibbits
a029f84189 Fix powerpc Book-E build post-331018/331048.
pagedaemon_wakeup() was moved from vm_pageout.h to vm_pagequeue.h.
2018-03-20 01:07:22 +00:00
Justin Hibbits
5903f5954a Fix the psl_userset32 definition.
It should be based on psl_userset, not psl_kernset.  As kernset, it would
inherit kernel config, including privilege level.
2018-03-01 04:44:17 +00:00
Justin Hibbits
2d5320a818 Increase the size of a reservation granule for TLB locks
A reservation granule on PowerPC is a cache line.

On e500mc and derivatives a cacheline size is 64 bytes, not 32.  Allocate
the maximum size permitted, but only utilize the size that is needed.  On
e500v1 and e500v2 the reservation granule will still be 32 bytes.
2018-02-27 04:38:27 +00:00
Justin Hibbits
e6939726ef Correct a copy&paste-o -- altivec assist interrupt, not watchdog 2018-02-26 03:05:36 +00:00
Konstantin Belousov
2c0f13aa59 vm_wait() rework.
Make vm_wait() take the vm_object argument which specifies the domain
set to wait for the min condition pass.  If there is no object
associated with the wait, use curthread' policy domainset.  The
mechanics of the wait in vm_wait() and vm_wait_domain() is supplied by
the new helper vm_wait_doms(), which directly takes the bitmask of the
domains to wait for passing min condition.

Eliminate pagedaemon_wait().  vm_domain_clear() handles the same
operations.

Eliminate VM_WAIT and VM_WAITPFAULT macros, the direct functions calls
are enough.

Eliminate several control state variables from vm_domain, unneeded
after the vm_wait() conversion.

Scetched and reviewed by:	jeff
Tested by:	pho
Sponsored by:	The FreeBSD Foundation, Mellanox Technologies
Differential revision:	https://reviews.freebsd.org/D14384
2018-02-20 10:13:13 +00:00
Justin Hibbits
bce6d88bc1 Merge AIM and Book-E PCPU fields
This is part of a long-term goal of merging Book-E and AIM into a single GENERIC
kernel.  As more work is done, the struct may be optimized further.

Reviewed by:	nwhitehorn
2018-02-17 20:59:12 +00:00
Jeff Roberson
e958ad4cf3 Make v_wire_count a per-cpu counter(9) counter. This eliminates a
significant source of cache line contention from vm_page_alloc().  Use
accessors and vm_page_unwire_noq() so that the mechanism can be easily
changed in the future.

Reviewed by:	markj
Discussed with:	kib, glebius
Tested by:	pho (earlier version)
Sponsored by:	Netflix, Dell/EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D14273
2018-02-12 22:53:00 +00:00
Jeff Roberson
e2068d0bcd Use per-domain locks for vm page queue free. Move paging control from
global to per-domain state.  Protect reservations with the free lock
from the domain that they belong to.  Refactor to make vm domains more
of a first class object.

Reviewed by:    markj, kib, gallatin
Tested by:      pho
Sponsored by:   Netflix, Dell/EMC Isilon
Differential Revision:  https://reviews.freebsd.org/D14000
2018-02-06 22:10:07 +00:00
Nathan Whitehorn
619282986d Change the default MSR values used when starting userland and kernel
threads from compile-time defines to global variables. This removes a
significant amount of duplicated runtime patches to the compile-time
defines, centralizing the conditional logic in the early startup code.

Reviewed by:	jhibbits
2018-02-01 05:31:24 +00:00
Nathan Whitehorn
eb1baf72ae Remove hard-coded trap-handling logic involving the segmented memory model
used with hashed page tables on AIM and place it into a new, modular pmap
function called pmap_decode_kernel_ptr(). This function is the inverse
of pmap_map_user_ptr(). With POWER9 radix tables, which mapping to use
becomes more complex than just AIM/BOOKE and it is best to have it in
the same place as pmap_map_user_ptr().

Reviewed by:	jhibbits
2018-01-29 04:33:41 +00:00
Nathan Whitehorn
04329fa708 Move the pmap-specific code in copyinout.c that gets pointers to userland
buffers into a new pmap-module function pmap_map_user_ptr() that can
be implemented by the respective modules. This is required to implement
non-segment-based AIM-ish MMU systems such as the radix-tree page tables
introduced by POWER ISA 3.0 and present on POWER9.

Reviewed by:	jhibbits
2018-01-15 06:46:33 +00:00
Eitan Adler
caa7e52f3f kernel: Fix several typos and minor errors
- duplicate words
- typos
- references to old versions of FreeBSD

Reviewed by:	imp, benno
2017-12-27 03:23:21 +00:00
Nathan Whitehorn
d6716aa2af The highest-order bit of the bootloader cookie is 1, with the result that
the 32-bit cookie can be sign-extended on its way out of the loader and
through Open Firmware. If sign-extended, the in-kernel check of its value
would fail on 64-bit systems, resulting in a mountroot prompt. Solve this
by telling the kernel to ignore the high-order bits.

PR:		kern/224437
Submitted by:	Gustavo Romero
2017-12-19 16:45:40 +00:00
Justin Hibbits
713e844971 Retrieve the page outside of holding locks
pmap_track_page() only works with physical memory pages, which have a
constant vm_page_t address.  Microoptimize pmap_track_page() to perform one
less operation under the lock.
2017-12-10 04:43:27 +00:00
Justin Hibbits
94a9d7c3b9 Remove PTE VA mappings for tracked pages in 64-bit mode
This was done in 32-bit mode, but not duplicated when 64-bit mode was
brought in.  Without this, stale mappings can be left, leading to odd
crashes when the wrong VA is checked in XX_PhysToVirt() (dpaa(4)).
2017-12-08 03:49:53 +00:00
Justin Hibbits
3de971a61a Only check the page tables if within the KVA.
Devices aren't mapped within the KVA, and with the way 64-bit hashes the
addresses pte_vatopa() may not return a 0 physical address for a device.

MFC after:	1 week
2017-11-29 01:26:07 +00:00
Pedro F. Giffuni
71e3c3083b sys/powerpc: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 15:09:59 +00:00
Justin Hibbits
41eeef87ca Synchronize TLB1 mappings when created
This allows modules creating mappings to be loaded post-boot, after SMP has
started.  Without this, the TLB1 mappings can become unsynchronized and lead
to kernel page faults when accessed on the alternate CPUs.

MFC after:	3 weeks
2017-11-26 20:30:02 +00:00
Nathan Whitehorn
47f69f4f2b Use the cookie now set by loader to determine whether the value passed to
PowerPC kernels in r6 is actually metadata from loader(8) or gibberish
left in r6, which is not required to be anything under the
PAPR/ePAPR/CHRP/OF standards, by another boot loader.

Note that, as a result, systems need a new boot loader to boot PPC kernels
after this revision without ending up at a mountroot prompt. New boot
loaders are backwards compatible and can boot older kernels.

Reviewed by:	jhibbits
MFC after:	2 months
2017-11-26 03:53:20 +00:00
Justin Hibbits
1ccb14588b Check the page table before TLB1 in pmap_kextract()
The vast majority of pmap_kextract() calls are looking for a physical memory
address, not a device address.  By checking the page table first this saves
the formerly inevitable 64 (on e500mc and derivatives) iteration loop
through TLB1 in the most common cases.

Benchmarking this on the P5020 (e5500 core) yields a 300% throughput
improvement on dtsec(4) (115Mbit/s -> 460Mbit/s) measured with iperf.

Benchmarked on the P1022 (e500v2 core, 16 TLB1 entries) yields a 50%
throughput improvement on tsec(4) (~93Mbit/s -> 165Mbit/s) measured with
iperf.

MFC after:	1 week
Relnotes:	Maybe (significant performance improvement)
2017-11-21 03:12:16 +00:00
Justin Hibbits
2d968f6dd4 Properly initialize the full md_page structure 2017-11-10 04:23:58 +00:00
Justin Hibbits
06ba753a7a Book-E pmap_mapdev_attr() improvements
* Check TLB1 in all mapdev cases, in case the memattr matches an existing
  mapping (doesn't need to be MAP_DEFAULT).
* Fix mapping where the starting address is not a multiple of the widest size
  base.  For instance, it will now properly map 0xffffef000, size 0x11000 using
  2 TLB entries, basing it at 0x****f000, instead of 0x***00000.

MFC after:	2 weeks
2017-11-10 04:14:48 +00:00
Justin Hibbits
bc3acf82cc Clear the WE bit in C code rather than the asm
According to EREF rlwinm is supposed to clear the upper 32 bits of the
register of 64-bit cores.  However, from experience it seems there's a bug
in the e5500 which causes the result to be duplicated in the upper bits of
the register.  This causes problems when applied to stashed SRR1 accessed
to retrieve context, as the upper bits are not masked out, so a
set_mcontext() fails.  This causes sigreturn() to in turn return with
EINVAL, causing make(1) to exit with error.

This bit is unused in e500mc derivatives (including e5500), so could just be
conditional on non-powerpc64, but there may be other non-Freescale cores
which do use it.  This is also the same as the POW bit on Book-S, so could
be cleared unconditionally with the only penalty being a few clock cycles
for these two interrupts.
2017-11-08 01:23:37 +00:00
Justin Hibbits
61b9e7ef6a Fix debug interrupts on 64-bit Book-E
Use a WORD_SIZE macro to define the correct offset to the second word
needed.  This corrects the offset calculation in 64-bit builds.
2017-11-01 02:40:15 +00:00
Justin Hibbits
d41742b585 Expand the TLB nest level mask to 3 bits to match the 32-bit mask
This really doesn't change anything right now, because BOOKE_TLB_MAXNEST is only
3, which fits into the 2 bits currently used.
2017-10-20 03:31:23 +00:00
Justin Hibbits
da7266dd6b Remove an obsolete comment
This has been wrong for well over a year, we support the full 36-bit
(or more) PA space.
2017-07-05 02:20:03 +00:00
Justin Hibbits
d7fd731d06 Use the more common Book-E idiom for disabling interrupts.
Book-E has the wrteei/wrtee instructions for writing the PSL_EE bit, ignoring
all others.  Use this instead of the AIM-typical mtmsr.

MFC with:	r320392
2017-06-30 02:11:32 +00:00
Justin Hibbits
3d1357108a Disable interrupts when updating the TLB
Without disabling interrupts it's possible for another thread to preempt
and update the registers post-read (tlb1_read_entry) or pre-write
(tlb1_write_entry), and confuse the kernel with mixed register states.

MFC after:	2 weeks
2017-06-27 01:57:22 +00:00
Justin Hibbits
675cad71e7 Fix stack tracing in dtrace for powerpc
The current method only sort of works, and usually doesn't work reliably.
Also, on Book-E the return address from DEBUG exceptions is not the sentinel
addresses, so it won't exit the loop correctly.

Fix this by better handling trap frames during unwinding, and using the
common trap handler for debug traps, as the code in that segment is
identical between the two.

MFC after:	1 week
2017-05-11 00:23:51 +00:00
Justin Hibbits
e683c328f8 Introduce 64-bit PowerPC Book-E support
Extend the Book-E pmap to support 64-bit operation.  Much of this was taken from
Juniper's Junos FreeBSD port.  It uses a 3-level page table (page directory
list -- PP2D, page directory, page table), but has gaps in the page directory
list where regions will repeat, due to the design of the PP2D hash (a 20-bit gap
between the two parts of the index).  In practice this may not be a problem
given the expanded address space.  However, an alternative to this would be to
use a 4-level page table, like Linux, and possibly reduce the available address
space; Linux appears to use a 46-bit address space.  Alternatively, a cache of
page directory pointers could be used to keep the overall design as-is, but
remove the gaps in the address space.

This includes a new kernel config for 64-bit QorIQ SoCs, based on MPC85XX, with
the following notes:
* The DPAA driver has not yet been ported to 64-bit so is not included in the
  kernel config.
* This has been tested on the AmigaOne X5000, using a MD_ROOT compiled in
  (total size kernel+mdroot must be under 64MB).
* This can run both 32-bit and 64-bit processes, and has even been tested to run
  a 32-bit init with 64-bit children.

Many thanks to stevek and marcel for getting Juniper's FreeBSD patches open
sourced to be used here, and to stevek for reviewing, and providing some
historical contexts on quirks of the code.

Reviewed by:	stevek
Obtained from:	Juniper (in part)
MFC after:	2 months
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D9433
2017-03-17 21:40:14 +00:00
Justin Hibbits
27da2007da Correct the return value for pmap_change_attr()
pmap_change_attr() returns an error code, not a paddr.  This function is
currently unused for powerpc.

MFC after:	2 weeks
2017-02-21 05:08:07 +00:00
Justin Hibbits
91722a2f0f Add Book-E Enhanced Debug (E.D) profile debug support
Freescale added the E.D profile to e500mc and derivative cores.  From
Freescale's EREF reference manual this is enabled by a bit in HID0 and should
otherwise default to traditional debug.  However, none of the Freescale cores
support that bit, and instead always use E.D.  This results in kernel panics
using the standard debug on e500mc+ cores.

Enhanced debug allows debugging of interrupts, including critical interrupts,
as it uses a different save/restore registers (srr*).  At this time we don't use
this ability, so instead share the core of the debug handler code between both
handlers.

MFC after:	3 weeks
2017-02-01 03:29:13 +00:00
Justin Hibbits
0a82e6f09a Use trunc_page() instead of rolling my own in pmap_track_page() 2016-12-05 02:27:50 +00:00
Justin Hibbits
aa38c69b74 Fix a typo (move parenthesis to correct location in the line).
Before this, it would cause the one consumer of this API in powerpc usage
(dev/dpaa) to set the PTE WIMG flags to empty instead of --M-, making the
cache-enabled buffer portals non-coherent.
2016-12-04 02:15:46 +00:00
Justin Hibbits
b2f831c009 Simplify the page tracking for VA<->PA translations.
Drop the tracking down to the pmap layer, with optimizations to only track
necessary pages.  This should give a (slight) performance improvement, as well
as a stability improvement, as the tracking is already mostly handled by the
pmap layer.
2016-11-16 05:24:42 +00:00
Justin Hibbits
dc9b124d66 Create a new MACHINE_ARCH for Freescale PowerPC e500v2
Summary:
The Freescale e500v2 PowerPC core does not use a standard FPU.
Instead, it uses a Signal Processing Engine (SPE)--a DSP-style vector processor
unit, which doubles as a FPU.  The PowerPC SPE ABI is incompatible with the
stock powerpc ABI, so a new MACHINE_ARCH was created to deal with this.
Additionaly, the SPE opcodes overlap with Altivec, so these are mutually
exclusive.  Taking advantage of this fact, a new file, powerpc/booke/spe.c, was
created with the same function set as in powerpc/powerpc/altivec.c, so it
becomes effectively a drop-in replacement.  setjmp/longjmp were modified to save
the upper 32-bits of the now-64-bit GPRs (upper 32-bits are only accessible by
the SPE).

Note: This does _not_ support the SPE in the e500v1, as the e500v1 SPE does not
support double-precision floating point.

Also, without a new MACHINE_ARCH it would be impossible to provide binary
packages which utilize the SPE.

Additionally, no work has been done to support ports, work is needed for this.
This also means no newer gcc can yet be used.  However, gcc's powerpc support
has been refactored which would make adding a powerpcspe-freebsd target very
easy.

Test Plan:
This was lightly tested on a RouterBoard RB800 and an AmigaOne A1222
(P1022-based) board, compiled against the new ABI.  Base system utilities
(/bin/sh, /bin/ls, etc) still function appropriately, the system is able to boot
multiuser.

Reviewed By:	bdrewery, imp
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D5683
2016-10-22 01:57:15 +00:00
Alan Cox
8cb0c1029d Various changes to pmap_ts_referenced()
Move PMAP_TS_REFERENCED_MAX out of the various pmap implementations and
into vm/pmap.h, and describe what its purpose is.  Eliminate the archaic
"XXX" comment about its value.  I don't believe that its exact value, e.g.,
5 versus 6, matters.

Update the arm64 and riscv pmap implementations of pmap_ts_referenced()
to opportunistically update the page's dirty field.

On amd64, use the PDE value already cached in a local variable rather than
dereferencing a pointer again and again.

Reviewed by:	kib, markj
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D7836
2016-09-10 16:49:25 +00:00
Justin Hibbits
bdee3435fa Allow pmap_early_io_unmap() to reclaim memory
pmap_early_io_map()/pmap_early_io_unmap(), if used in pairs, should be used in
the form:

pmap_early_io_map()
..do stuff..
pmap_early_io_unmap()

Without other allocations in the middle.  Without reclaiming memory this can
leave large holes in the device space.

While here, make a simple change to the unmap loop which now permits it to unmap
multiple TLB entries in the range.
2016-09-07 03:26:55 +00:00
Mark Johnston
dbbaf04f1e Remove support for idle page zeroing.
Idle page zeroing has been disabled by default on all architectures since
r170816 and has some bugs that make it seemingly unusable. Specifically,
the idle-priority pagezero thread exacerbates contention for the free page
lock, and yields the CPU without releasing it in non-preemptive kernels. The
pagezero thread also does not behave correctly when superpage reservations
are enabled: its target is a function of v_free_count, which includes
reserved-but-free pages, but it is only able to zero pages belonging to the
physical memory allocator.

Reviewed by:	alc, imp, kib
Differential Revision:	https://reviews.freebsd.org/D7714
2016-09-03 20:38:13 +00:00
Justin Hibbits
60152a4037 Fix system hang when large FDT is in use
Summary:
Kernel maps only one page of FDT. When FDT is more than one page in size, data
TLB miss occurs on memmove() when FDT is moved to kernel storage
(sys/powerpc/booke/booke_machdep.c, booke_init())

This introduces a pmap_early_io_unmap() to complement pmap_early_io_map(), which
can be used for any early I/O mapping, but currently is only used when mapping
the fdt.

Submitted by:	Ivan Krivonos <int0dster_gmail.com>
Differential Revision: https://reviews.freebsd.org/D7605
2016-08-24 03:51:40 +00:00
Justin Hibbits
81ef73fb1b Take into account mas7/8 when reading/writing TLB entries on e6500
Summary: Current booke/pmap code ignores mas7 and mas8 on e6500 CPU.

Submitted by:	Ivan Krivonos <int0dster_gmail.com>
Differential Revision: https://reviews.freebsd.org/D7606
2016-08-23 04:26:30 +00:00
Justin Hibbits
beacb09864 Skip HID1 initialization on e6500 cores, it doesn't exist.
With this, and some drivers removed, a T2080 dev board boots to mountroot.

Submitted by:	Ivan Krivonos <int0dster_AT_gmail.com>
2016-08-20 00:55:58 +00:00
Justin Hibbits
791578a84f Add missing pmap_kenter() method for book-e.
This isn't added to AIM yet, because it's not yet needed.  It's needed for
Book-E ePAPR boot support.

X-MFC With:	r304047
2016-08-13 18:57:14 +00:00
Justin Hibbits
cbc3c68d9a Add a kdb show command to print arbitrary SPRs on PowerPC
Summary:
There is often a need at the debugger to print arbitrary special
purpose registers (SPRs) on PowerPC.  Using a rewritable asm stub, print any SPR
provided on the command line.

Note, as there is no checking in this, attempting to print a nonexistent SPR
may cause a Program exception (illegal instruction, or boundedly undefined).

Note also that this relies on the kernel text pages being writable.  If in the
future this is made not the case, this will need to be reworked.

Test Plan:
Printing the Processor Version Register (PVR, SPR 287):

db> show spr 11f
SPR 287(11f): 80240012

Differential Revision: https://reviews.freebsd.org/D7403
2016-08-13 18:46:49 +00:00
Justin Hibbits
253902b44c Add ePAPR boot support for PowerPC book-E (MPC85xx) hardware
Summary:
u-boot, following the ePAPR specification, puts secondary cores into a
spinloop at boot, rather than leaving them shut off.  It then relies on the host
OS to write the correct values to a special spin table, located in coherent
memory (on newer implementations), or noncoherent memory (older
implementations).

This supports both implementations of ePAPR, as well as continuing to support
non-ePAPR booting, by first attempting to use the spintable, and falling back to
expecting non-started CPUs.

Test Plan:
Booted on a P5020 board.  Tested before and after the changes.
Before the changes, prints the error "SMP: CPU 1 already out of hold-off state!"
and panics shortly thereafter.  After the changes, same boot method lets it
complete boot.

Reviewed by:	nwhitehorn
MFC after:	2 weeks
Relnotes:	Yes
Sponsored by:	Alex Perez/Inertial Computing
Differential Revision: https://reviews.freebsd.org/D7494
2016-08-13 16:16:02 +00:00
Justin Hibbits
d071aa6df2 Use label math instead of hard-coding offsets for return addresses.
Though the chances of the code in these sections changing are low, future-proof
the sections and use label math.

Renumber the surrounding areas to avoid duplicate label numbers.
2016-07-23 02:27:42 +00:00
Justin Hibbits
78cc33fd32 Remove booke_enable_l3_cache declaration and remaining definition.
L3 cache is not defined by Book-E, so is platform specific.  Since it was
already moved for e500-based devices into mpc85xx in r292903, just eliminate it
altogether.  Any device that supports L3 cache should have its own platform
means to enable it.
2016-07-17 19:24:28 +00:00
Justin Hibbits
b926a14ca7 No need to include mpc85xx.h anymore, so remove it. 2016-07-17 19:19:50 +00:00
Pedro F. Giffuni
910c079886 sys/powerpc: make use of the howmany() macro when available.
We have a howmany() macro in the <sys/param.h> header that is
convenient to re-use as it makes things easier to read.
2016-04-26 14:44:49 +00:00
Pedro F. Giffuni
d9c9c81c08 sys: use our roundup2/rounddown2() macros when param.h is available.
rounddown2 tends to produce longer lines than the original code
and when the code has a high indentation level it was not really
advantageous to do the replacement.

This tries to strike a balance between readability using the macros
and flexibility of having the expressions, so not everything is
converted.
2016-04-21 19:57:40 +00:00
Justin Hibbits
f60708c9f7 Fix SMP booting for PowerPC Book-E
Summary:
PowerPC Book-E SMP is currently broken for unknown reasons.  Pull in
Semihalf changes made c2012 for e500mc/e5500, which enables booting SMP.

This eliminates the shared software TLB1 table, replacing it with
tlb1_read_entry() function.

This does not yet support ePAPR SMP booting, and doesn't handle resetting CPUs
already released (ePAPR boot releases APs to a spin loop waiting on a specific
address).  This will be addressed in the near future by using the MPIC to reset
the AP into our own alternate boot address.

This does include a change to the dpaa/dtsec(4) driver, to mark the portals as
CPU-private.

Test Plan:
Tested on Amiga X5000/20 (P5020).  Boots, prints the following
messages:

 Adding CPU 0, pir=0, awake=1
 Waking up CPU 1 (dev=1)
 Adding CPU 1, pir=20, awake=1
 SMP: AP CPU #1 launched

top(1) shows CPU1 active.

Obtained from:	Semihalf
Relnotes:	Yes
Differential Revision: https://reviews.freebsd.org/D5945
2016-04-19 01:48:18 +00:00
Justin Hibbits
f00e990465 VM_MAXUSER_ADDRESS is highest page start, not highest address.
In case a single page mapping is requested first, which might overlap the user
address space, fix the device map block to the next page.
2016-04-10 15:50:45 +00:00