r353489 added minidump support for powerpc64, but it added a dependency on
the dump_avail array. Leaving it uninitialized caused breakage in late
boot. Initialize dump_avail, even though the 64-bit booke pmap doesn't yet
support minidumps, but will in the future.
MAS8 is hypervisor privileged, defining the logical partition (VM) to
operate on for TLB accesses. It's already guaranteed to be cleared when
booting bare metal (bootloader needs it zeroed to work), and we can't touch
it from a guest. Assume that if/when we eventually port bhyve to PowerPC
(and Book-E) the hypervisor module will take care of managing MAS8. This
saves several (tens) of clocks on each TLB miss.
MFC after: 2 weeks
After r352110 the page lock no longer protects a page's identity, so
there is no purpose in locking the page in pmap_mincore(). Instead,
if vm.mincore_mapped is set to the non-default value of 0, re-lookup
the page after acquiring its object lock, which holds the page's
identity stable.
The change removes the last callers of vm_page_pa_tryrelock(), so
remove it.
Reviewed by: kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21823
The VM_PAGE_OBJECT_BUSY_ASSERT() in pmap_enter() implementation should
be only asserted when the code is executed as result of pmap_enter(),
not when the same code is entered from e.g. pmap_enter_quick(). This
is relevant for all PowerPC pmap variants, because mmu_*_enter() is
used as the backend, and assert is located there.
Add a PowerPC private pmap_enter() PMAP_ENTER_QUICK_LOCKED flag to
indicate that the call is not from pmap_enter(). For non-quick-locked
calls, assert that the object is locked.
Reported and tested by: bdragon
Reviewed by: alc, bdragon, markj
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D22041
callers hold it.
This simplifies pmap code and removes a dependency on the object lock.
Reviewed by: kib, markj
Tested by: pho
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D21596
busy acquires while held.
This allows code that would need to acquire and release a very large number
of page busy locks to use the old mechanism where busy is only checked and
not held. This comes at the cost of false positives but never false
negatives which the single consumer, vm_fault_soft_fast(), handles.
Reviewed by: kib
Tested by: pho
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D21592
There are cases where there's no vm_page_t structure for a given physical
address, such as the CCSR. In this case, trying to obtain the
md.page_tracked struct member would lead to a NULL dereference, and panic.
Tighten this up by checking for kernel_pmap AND that the page structure
actually exists before dereferencing. The flag can only be set when it's
tracked in the kernel pmap anyway.
MFC after: 3 weeks
Clang9/LLD9 appears to get quite confused with the instruction stream used
to obtain the tmpstack pointer, almost as though it thinks this is a C
function, so tries to optimize it. Since the AIM64 method doesn't use the
TOC to obtain the tmpstack, just follow that model, and lld won't get
confused.
Reported by: bdragon
MFC after: 2 weeks
Convert all remaining references to that field to "ref_count" and update
comments accordingly. No functional change intended.
Reviewed by: alc, kib
Sponsored by: Intel, Netflix
Differential Revision: https://reviews.freebsd.org/D21768
- Remove a dead variable from the amd64 pmap_extract_and_hold().
- Fix grammar in the vm_page_wire man page.
Reported by: alc
Reviewed by: alc, kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21639
There are several mechanisms by which a vm_page reference is held,
preventing the page from being freed back to the page allocator. In
particular, holding the page's object lock is sufficient to prevent the
page from being freed; holding the busy lock or a wiring is sufficent as
well. These references are protected by the page lock, which must
therefore be acquired for many per-page operations. This results in
false sharing since the page locks are external to the vm_page
structures themselves and each lock protects multiple structures.
Transition to using an atomically updated per-page reference counter.
The object's reference is counted using a flag bit in the counter. A
second flag bit is used to atomically block new references via
pmap_extract_and_hold() while removing managed mappings of a page.
Thus, the reference count of a page is guaranteed not to increase if the
page is unbusied, unmapped, and the object's write lock is held. As
a consequence of this, the page lock no longer protects a page's
identity; operations which move pages between objects are now
synchronized solely by the objects' locks.
The vm_page_wire() and vm_page_unwire() KPIs are changed. The former
requires that either the object lock or the busy lock is held. The
latter no longer has a return value and may free the page if it releases
the last reference to that page. vm_page_unwire_noq() behaves the same
as before; the caller is responsible for checking its return value and
freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is
introduced for use in pmap_extract_and_hold(). It fails if the page is
concurrently being unmapped, typically triggering a fallback to the
fault handler. vm_page_wire() no longer requires the page lock and
vm_page_unwire() now internally acquires the page lock when releasing
the last wiring of a page (since the page lock still protects a page's
queue state). In particular, synchronization details are no longer
leaked into the caller.
The change excises the page lock from several frequently executed code
paths. In particular, vm_object_terminate() no longer bounces between
page locks as it releases an object's pages, and direct I/O and
sendfile(SF_NOCACHE) completions no longer require the page lock. In
these latter cases we now get linear scalability in the common scenario
where different threads are operating on different files.
__FreeBSD_version is bumped. The DRM ports have been updated to
accomodate the KPI changes.
Reviewed by: jeff (earlier version)
Tested by: gallatin (earlier version), pho
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20486
Summary:
Reduce the diff between AIM and Book-E even more. This also cleans up
vmparam.h significantly.
Reviewed by: luporl
Differential Revision: https://reviews.freebsd.org/D21301
doing so adds more flexibility with less redundant code.
Reviewed by: jhb, markj, kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21250
The only thing blocking UMA_MD_SMALL_ALLOC from working on 64-bit booke
powerpc was a missing check in pmap_kextract(). Adding DMAP handling into
pmap_kextract(), we can now use UMA_MD_SMALL_ALLOC. This should improve
performance and stability a bit, since DMAP is always mapped in TLB1, so
this relieves pressure on TLB0.
MFC after: 3 weeks
This is part 2 of r347078, pulling the page directory out of the Book-E
pmap. This breaks KBI for anything that uses struct pmap (such as vm_map)
so any modules that access this must be rebuilt.
There is no need for the 64-bit pmap to have a fixed number of page table
buffers. Since the 64-bit pmap has a DMAP, we can effectively have user
page tables limited only by total RAM size.
EFSCFD (floating point single convert from double) emulation requires saving
the high word of the register, which uses SPE instructions. Enable the SPE
to avoid an SPV Unavailable exception.
MFC after: 1 week
The hold_count and wire_count fields of struct vm_page are separate
reference counters with similar semantics. The remaining essential
differences are that holds are not counted as a reference with respect
to LRU, and holds have an implicit free-on-last unhold semantic whereas
vm_page_unwire() callers must explicitly determine whether to free the
page once the last reference to the page is released.
This change removes the KPIs which directly manipulate hold_count.
Functions such as vm_fault_quick_hold_pages() now return wired pages
instead. Since r328977 the overhead of maintaining LRU for wired pages
is lower, and in many cases vm_fault_quick_hold_pages() callers would
swap holds for wirings on the returned pages anyway, so with this change
we remove a number of page lock acquisitions.
No functional change is intended. __FreeBSD_version is bumped.
Reviewed by: alc, kib
Discussed with: jeff
Discussed with: jhb, np (cxgbe)
Tested by: pho (previous version)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D19247
Actually set the source and destination VA's before using them. Fixes a
bizarre panic on 32-bit Book-E. Not sure why this wasn't caught by the
compiler.
The MSR[EE] bit does not require synchronization when changing. This is a
trivial micro-optimization, removing the trailing isync from mtmsr().
MFC after: 1 week
* Make mmu_booke_sync_icache() use the DMAP on 64-bit prcoesses, no need to
map the page into the user's address space. This removes the
pvh_global_lock from the equation on 64-bit.
* Don't map the page with user-readability on 32-bit. I don't know what the
chance of a given user process being able to access the NULL page when
another process's page is added there, but it doesn't seem like a good
idea to map it to NULL with user read permissions.
* Only sync as much as we need to. There are only two significant places
where pmap_sync_icache is used: proc_rwmem(), and the SIGILL second-chance
for powerpc. The SIGILL second chance is likely the most common, and only
syncs 4 bytes, so avoid the other 127 loop iterations (4096 / 32 byte
cacheline) in __syncicache().
Reduce the surface area of the TLB locks. Unfortunately the same trick for
serializing the tlbie instruction on OEA64 cannot be used here to reduce the
scope of the tlbivax mutex to the tlbsync only, as the mutex also serializes
the TLB miss lock as a side effect, so contention on this lock may not be
reducible any further.
Since the DMAP is only available on powerpc64, and is *always* available on
Book-E powerpc64, don't penalize either side (32-bit or 64-bit) by always
checking hw_direct_map to perform operations. This saves 5-10% time on
various ports builds, and on buildworld+buildkernel on Book-E hardware.
MFC after: 3 weeks
Use the nitems() macro instead of the expansion, a'la r298352. Also, fix the
location of this check to after initializing availmem_regions_sz, so that the
check isn't always against 0, thus always failing (nitems(phys_avail) is always
more than 0).
Unconditional writing to MAS7, which doesn't exist on the e500v1 core, in a
TLB miss handler has been in the code for several years now. Since this has
gone unnoticed for so long, it's easily concluded that e500v1 is not in use
with FreeBSD. Simplify the code path a bit, by unconditionally zeroing MAS7
instead of calling a subroutine to do it.
Summary:
With a sufficiently large TOC, it's possible to index out of range, as
the immediate load instructions only permit 16-bit indices, allowing up
to 64kB range (signed) from the base pointer. Allow +/- 2GB range, with
the medium code model TOC accesses in asm.
Patch originally by Brandon Bergren. The issue appears to impact ELFv2
more than ELFv1.
Reviewed by: luporl
Differential Revision: https://reviews.freebsd.org/D19708
Newer cores have the 'tlbilx' instruction, which doesn't broadcast over
CoreNet. This is significantly faster than walking the TLB to invalidate
the PID mappings. tlbilx with the arguments given takes 131 clock cycles to
complete, as opposed to 512 iterations through the loop plus tlbre/tlbwe at
each iteration.
MFC after: 3 weeks
Don't clobber the low part of the register restoring the high component of.
This could lead to very bad behavior if it's an ABI-affected register.
While here, also mark the asm volatile in the SPE high save case, to match
the load case.
Reported by: Branden Bergren (git_bdragon.rtk0.net)
MFC after: 1 week
Optimize the exception handler to only save and load the upper word of the
GPRs used in the emulating instruction. This reduces the save/load
overhead, and as a side effect does not overwrite the upper word of any
temporary register.
With this commit I am now able to run editors/abiword and math/gnumeric on a
e500-based system.
MFC after: 1 week
MFC With: r341752,r341751
The code was a near exact copy of the code in startup, but it doesn't need
the complexity since the kernel is already relocated. With
VM_MIN_KERNEL_ADDRESS as currently set to KERNBASE, this doesn't cause a
problem, because it's a zero offset. However, when KERNBASE is changed to a
physical load address, it then has a non-zero offset, and ends up with an
invalid stack pointer, causing the AP to hang.
The same behavior was moved to machdep.c, paired with AIM's relocation,
making this redundant. With this, it's now possible to boot FreeBSD with
ubldr on a uboot Book-E platform, even with a
KERNBASE != VM_MIN_KERNEL_ADDRESS.
The metadata pointer will almost never be at or above 'btext', as btext is a
relocated symbol, so will be based at VM_MIN_KERNEL_ADDRESS, not at
KERNBASE. Check the address against kernload, where the kernel is
physically loaded.
Book-E kernels really run at VM_MIN_KERNEL_ADDRESS, which currently happens to
be the same as KERNBASE. KERNBASE is the linked address, which the loader also
takes to be the physical load address. Treat KERNBASE as a physical address,
not a virtual, and change virtual address references for KERNBASE to use
something more appropriate.
debugf() is unnecessary for the TLB printing functions, as they're only
intended to be used from ddb. Instead, make them full DDB 'show'
commands, so now it can be written as 'show tlb1' and 'show tlb0'
instead of calling the function, hoping DEBUG has been defined.
The Signal Processing Engine (SPE) found in Freescale e500 cores (and
others) offloads IEEE-754 compliance (NaN, Inf handling, overflow,
underflow) to software, most likely as a means of simplifying the APU
silicon. Some software, like AbiWord, needs full IEEE-754 compliance,
including NaN handling. Implement the necessary bits to enable it.
Differential Revision: https://reviews.freebsd.org/D17446