32-bit Book-E doesn't set UMA_MD_SMALL_ALLOC, and 32-bit OEA platforms
have a 32-bit vm_paddr_t. Moreover, this code was wrong in that it
leaked the page if the check failed.
Reviewed by: jhibbits
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D23991
Port aacraid driver to big-endian (BE) hosts.
The immediate goal of this change is to make it possible to use the
aacraid driver on PowerPC64 machines that have Adaptec Series 8 SAS
controllers.
Adapters supported by this driver expect FIB contents in little-endian
(LE) byte order. All FIBs have a fixed header part as well as a data
part that depends on the command being issued to the controller.
In this way, on BE hosts, the FIB header and all FIB data structures
used in aacraid.c and aacraid_cam.c need to be converted to LE before
being sent to the adapter and converted to BE when coming from it.
The functions to convert each struct are on aacraid_endian.c.
For little-endian (LE) targets, they are macros that expand
to nothing.
In some cases, when only a few fields of a large structure are used,
the fields are converted inline, by the code using them.
PR: 237463
Reviewed by: jhibbits
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D23887
Fix panic "Freeing UMA block at 0xn with no associated page".
Also replaces pmap_remove call by pmap_kremove, for symmetry.
Reviewed by: jhibbits
Approved by: jhibbits (mentor)
Differential Revision: https://reviews.freebsd.org/D23931
If NUMA is not enabled in the kernel config, or is disabled at boot, this
function should just return domain 0 regardless of what's in the device
tree.
Fixes a panic in iflib with NUMA disabled.
Reported by: luporl
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.
Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT
Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718
Since powerpc64 has such a large virtual address space, significantly larger
than its physical address space, take advantage of this, and create yet
another DMAP-like instance for the device mappings. In this case, the
device mapping "DMAP" is in the 0x8000000000000000 - 0xc000000000000000
range, so as not to overlap the physical memory DMAP.
This will allow us to add TLB1 entry coalescing in the future, especially
useful for things like the radeonkms driver, which maps parts of the GPU at
a time, but eventually maps all of it, using up a lot of TLB1 entries (~40).
ptbl_alloc() is expected to return with the pvh_global_lock and pmap
lock held. However, it will return with them unlocked if nosleep is
specified.
Along with this, fix lock ordering of pvh_global_lock with respect to
the pmap lock in other places.
Differential Revision: https://reviews.freebsd.org/D23692
PR: 244118
Reported by: Francis Little <oggy at farscape.co.uk>
Tested by: Francis Little, Mark Millard <marklmi at yahoo.com>
Reviewed by: markj
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D23729
We somewhat blindly copy the srr1 from the new context to the trap frame,
but disable FPU and VSX unconditionally, relying on the trap to re-enable
them. This works because the FPU manages the VSX extended FP registers,
which is governed by the PCB_FPFREGS flag. However, with altivec, we
would blindly disable PSL_VEC, without touching PCB_VEC. Handle this case
by disabling altivec in both srr1 and pcb_flags, if the mcontext doesn't
have _MC_AV_VALID set.
Reported by: pkubaj
This reverts r177661. The change is no longer very useful since
out-of-tree KLDs will be built to target SMP kernels anyway. Moveover
it breaks the KBI in !SMP builds since cpuset_t's layout depends on the
value of MAXCPU, and several kernel interfaces, notably
smp_rendezvous_cpus(), take a cpuset_t as a parameter.
PR: 243711
Reviewed by: jhb, kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D23512
Remove mbuf_jumbo_alloc and let large mbuf zones use the new uma default
contig allocator (a copy of mbuf_jumbo_alloc). Tag other zones which
require contiguous objects, even if they don't use the new default
contig allocator, so that uma knows about their constraints.
Reviewed by: jeff, markj
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D23238
In r356767, memcpy/memmove/bcopy optimizations were added to libc to
improve performance.
This exposed an existing kernel issue in VSX handling. The PSL_VSX flag was
not being excluded from the psl_userstatic set, which meant that any thread
that used these and then called swapcontext(3) would get an EINVAL error.
Fixing this exposed a second issue - in r344123, the FPU was being forced
off in set_mcontext(). However, this was neglecting to ensure VSX was turned
off at the same time.
While here, add some code comments to explain what's going on.
Reviewed by: jhibbits, luporl (earlier rev), pkubaj (earlier rev)
Sponsored by: Tag1 Consulting, Inc.
Differential Revision: https://reviews.freebsd.org/D23497
After r355784 the td_oncpu field is no longer synchronized by the thread
lock, so the stack capture interrupt cannot be delievered precisely.
Fix this using a loop which drops the thread lock and restarts if the
wrong thread was sampled from the stack capture interrupt handler.
Change the implementation to use a regular interrupt instead of an NMI.
Now that we drop the thread lock, there is no advantage to the latter.
Simplify the KPIs. Remove stack_save_td_running() and add a return
value to stack_save_td(). On platforms that do not support stack
capture of running threads, stack_save_td() returns EOPNOTSUPP. If the
target thread is running in user mode, stack_save_td() returns EBUSY.
Reviewed by: kib
Reported by: mjg, pho
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D23355
On some POWER8 machines, 'ibm,associativity' property may have 6
cells, which would overflow the 5 cells buffer being used.
There was also an issue with the "check if node is root" part,
that have been fixed too.
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D23414
Summary:
The CPLD is the communications medium between the CPU and the XMOS
"Xena" event coprocessor. It provides a mailbox communication feature,
along with dual-port RAM to be used between the CPU and XMOS. Also, it
provides basic board stats as well, such as PCIe presence, JTAG signals,
and CPU fan speed reporting (in revolutions per second). Only fan speed
reading is handled, as a sysctl.
Reviewed by: bdragon
Differential Revision: https://reviews.freebsd.org/D23136
Summary:
This fixes kernel crashing when tunable "machdep.moea64_bpvo_pool_size" is
set to a value higher then 327680 (default value). Function
moea64_mid_bootstrap() relies on moea64_bpvo_pool_size, but at time of the
use the variable wan't yet updated with the new value provided by user.
Problem was detected after trying to use a VM with 64GB of RAM, and default
moea64_bpvo_pool_size is insufficient (kernel boot used more than 470000) .
I think default value must be discussed to address this use case, or find a
way to calculate pool size automatically based on amount of memory detected.
Test Plan: Tested on QEMU VM with 64GB of RAM using "set
machdep.moea64_bpvo_pool_size=655360" on loader prompt
Submitted by: Alfredo Dal'Ava Júnior (alfredo.junior_eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D23233
In rS354701, I replaced text relocations with offsets from &generictrap.
Unfortunately, the magic variable I was using doesn't actually mean the
address of &generictrap, in bridge mode it actually means &generictrap64.
So, for bridge mode to work, it is necessary to differentiate between
"where do we need to branch to to handle a trap" and "where is &generictrap
for purposes of doing relative math".
Introduce a new TRAP_ENTRY and use it instead of TRAP_GENTRAP for doing
actual calls to the generic trap handler.
Reported by: Mark Millard <marklmi@yahoo.com>
Reviewed by: jhibbits
Sponsored by: Tag1 Consulting, Inc.
Differential Revision: https://reviews.freebsd.org/D23057
Summary:
This makes the interface described in the definition file act like a
pseudo-IFUNC service, by caching the found method locally.
Applying this to the PowerPC MMU definitions, it yields a significant
(15-20%) performance improvement, seen in both a 'make buildworld' and a
parallel build of LLVM, on a POWER9 system.
Reviewed By: imp
Differential Revision: https://reviews.freebsd.org/D23245
Summary:
Consolidate the NUMA associativity handling into a platform function.
Non-NUMA platforms will just fall back to the default (0). Currently
only implemented for powernv, which uses a lookup table to map the
device tree associativity into a system NUMA domain.
Fixes hangs on powernv after r356534, and corrects a fairly longstanding
bug in powernv's NUMA handling, which ended up using domains 1 and 2 for
devices and memory on power9, while CPUs were bound to domains 0 and 1.
Reviewed by: bdragon, luporl
Differential Revision: https://reviews.freebsd.org/D23220
It turns out the maximum TLB1 page size on e5500 is 4G, despite the format
being defined for up to 1TB.
So, we need to clamp the DMAP TLB1 entries to not attempt to create 16G or
larger entries.
Fixes boot on my X5000 in which I just installed 16G of RAM.
Reviewed by: jhibbits
Sponsored by: Tag1 Consulting, Inc.
Differential Revision: https://reviews.freebsd.org/D23244
In r291668, an instruction was added to sigcode64.S without the nop pad at
the end being taken out.
Due to alignment, this means that a dword is being wasted on the shared
page for no reason.
Take out this nop, and add some comments while I'm here.
Reviewed by: jhibbits
Sponsored by: Tag1 Consulting, Inc.
Differential Revision: https://reviews.freebsd.org/D23055
This enables virtio modules on PowerPC* target.
On PowerPC64, drivers are also kernel builtin.
QEMU currently needs to be patched to in order to work on LE hosts due to known
issue affecting pre-1.0 (legacy) virtio drivers.
The patch was submitted to QEMU mail list by @afscoelho_gmail.com, available at
https://lists.nongnu.org/archive/html/qemu-devel/2020-01/msg01496.html
Submitted by: Alfredo Dal'Ava Junior <alfredo.junior@eldorado.org.br>
Reviewed by: luporl
Differential Revision: https://reviews.freebsd.org/D22833
r302340, as an attempt to fix the localbus child handling post-rman change,
actually broke child resource allocation, due to typos in
fdt_lbc_reg_decode(). This went unnoticed because there aren't any drivers
currently in tree that use localbus.
hw.floatingpoint and hw.altivec are effectively runtime constants (bits from
the cpu_feature bitfield), so don't need Giant, or any locking for that
matter.
It may be possible to make this completely lock free, but for now it's using
a statically allocated bounce buffer in the softc, so it needs to be
guarded.
This is a lock-based emulation of 64-bit atomics for kernel use, split off
from an earlier patch by jhibbits.
This is needed to unblock future improvements that reduce the need for
locking on 64-bit platforms by using atomic updates.
The implementation allows for future integration with userland atomic64,
but as that implies going through sysarch for every use, the current
status quo of userland doing its own locking may be for the best.
Submitted by: jhibbits (original patch), kevans (mips bits)
Reviewed by: jhibbits, jeff, kevans
Differential Revision: https://reviews.freebsd.org/D22976
In IRC, sfs_ finally managed to get a good trace of a kernel panic that was
happening when attempting to use webengine.
As it turns out, we were using vtophys() from interrupt context on an idle
thread in opal_hmi_handler2().
Since this involves locking the kernel pmap on PPC64 at the moment, this
ended up tripping a KASSERT in mtx_lock(), which then caused a parallel
panic stampede.
So, avoid this by preallocating the flags variable and storing it in PCPU.
Fixes "panic: mtx_lock() by idle thread 0x... on sleep mutex kernelpmap".
Differential Revision: https://reviews.freebsd.org/D22962
Due to a bug in clang 9.0.0 source tracking, the trap vector copying will
always trigger a fortify-source warning.
The destination buffers are 0x2f00 bytes, and the bcopy region is 0x2e00
bytes, so there is not an overflow here.
(I have been running with this patch since September.)
Summary:
r356113 used an older patch, which predated the
freebsd_copyout_auxargs() addition. Fix this by using a private
powerpc_copyout_auxargs() instead, and keep it private to powerpc, not in MI
files.
Reviewed by: kib, bdragon
Differential Revision: https://reviews.freebsd.org/D22935
This is a prerequisite for anything IFUNC in the ELFv2 / clang switch.
Since probing cpu info on powerpc is a privileged operation, define that we
pass AT_HWCAP / AT_HWCAP2 through as cpu_features and cpu_features2 to ifunc
resolvers.
This is particularly important when dealing with non-PLT GNU IFUNC, which is
not allowed to PLT call from resolvers and therefore can't access global
variables.
The naming convention "cpu_features"/"cpu_features2" is an existing FreeBSD
PowerPC convention and matches the way we treat these variables in
machine/cpu.h.
The underlying variables are u_long, however, as per the commit message for
r332868, only the low 32 bits are ever used, so the underlying flags are
compatible across all of PowerPC.
The resolver prototype is defined to reserve the maximum number of
register-passed parameters the various PowerPC ABIs allow. This leaves
plenty of room for growth without needing to resort to passing via the
stack in the future.
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D22787
Due to clang and LLD's tendency to use a PLT for builtins, and as they
don't have full support for EABI, we sometimes have to deal with a PLT in
.ko files in a clang-built kernel.
As such, augment the in-kernel linker to support jump table processing.
As there is no particular reason to support lazy binding in kernel modules,
only implement Secure-PLT immediate binding.
As part of these changes, add elf_cpu_parse_dynamic() to the MD API of the
in-kernel linker (except on platforms that use raw object files.)
The new function will allow MD code to act on MD tags in _DYNAMIC.
Use this new function in the PowerPC MD code to ensure BSS-PLT modules using
PLT will be rejected during insertion, and to poison the runtime resolver to
ensure we get a clear panic reason if a call is made to the resolver.
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D22608
off the stack, initialized to default values, and then filled in with
driver-specific values, all without having to worry about the numerous
other fields in the tag. The resulting template is then passed into
busdma and the normal opaque tag object created. See the man page for
details on how to initialize a template.
Templates do not support tag filters. Filters have been broken for many
years, and only existed for an ancient make/model of hardware that had a
quirky DMA engine. Instead of breaking the ABI/API and changing the
arugment signature of bus_dma_tag_create() to remove the filter arguments,
templates allow us to ignore them, and also significantly reduce the
complexity of creating and managing tags.
Reviewed by: imp, kib
Differential Revision: https://reviews.freebsd.org/D22906
As far as I can tell, these are an artifact of times when linker sets
couldn't be empty, otherwise the kernel build would fail due to unresolved
symbols. hselasky fixed this in r268138, and I've audited the kbd portions
to make sure nothing would blow up due to the empty linker set and
successfully compiled+ran a kernel with no keyboard support at all.
Kill them off now since they're no longer required.
MFC after: 1 week
fix an assert violation introduced in r355784. Without this spinlock_exit()
may see owepreempt and switch before reducing the spinlock count. amd64
had been optimized to do a single critical enter/exit regardless of the
number of spinlocks which avoided the problem and this optimization had
not been applied elsewhere.
Reported by: emaste
Suggested by: rlibby
Discussed with: jhb, rlibby
Tested by: manu (arm64)
On PowerPC, this is needed in order for the debugger to find out
the memory offset where the kernel image was loaded on the remote
target.
This fixes symbol resolution when remote debugging a PowerPC kernel.
Reviewed by: cem
Differential Revision: https://reviews.freebsd.org/D22767
The Nest MMU manages address translation for accelerators on the POWER9. To
do so, it needs a page table, so export the system page table to the Nest
MMU. This will quietly fail on pre-POWER9 systems that do not have a NMMU.
The NMMU is currently unused, so this change is currently effectively a NOP,
but the NMMU and VAS will eventually be used.
Fix multiple problems in the powerpcspe floating point code.
* Endianness handling of the SPEFSCR in fenv.h was completely broken.
* Ensure SPEFSCR synchronization requirements are being met.
The __r.__d -> __r transformations were written by jhibbits.
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D22526
Due to ppc32 building mmu_oea64.c (for use when in bridge mode on a G5), we
need to guard the new moea64_page_array_startup code behind __powerpc64__
to avoid a compile error, since vm_offset_t is not 64-bit on ppc32.
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D22782
This is a 32-bit structure embedded in each vm_page, consisting mostly
of page queue state. The use of a structure makes it easy to store a
snapshot of a page's queue state in a stack variable and use cmpset
loops to update that state without requiring the page lock.
This change merely adds the structure and updates references to atomic
state fields. No functional change intended.
Reviewed by: alc, jeff, kib
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D22650
This change enables the use of OpenFirmware Console (ofwcons), even when VGA is
available, allowing early kernel messages to be seen, that is important in case
of crashes before VGA console initialization.
This is specially useful in virtualized environments, where the user/developer
doesn't have full control of the virtualization engine (e.g. OpenStack).
The old behavior is preserved by default and, in order to use ofwcons, a few
tunables that have been introduced need to be set:
- hw.ofwfb.disable=1 - disable OFW FrameBuffer device
- machdep.ofw.mtx_spin=1 - change PPC OFW mutex to SPIN type, to match kernel
console's mutex type
- debug.quiesce_ofw=0 - don't call OFW quiesce, needed to keep ofwcons I/O
working
More details can be found at differential revision D20640.
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D20640
This change makes it possible to use OPAL console as a GDB debug port.
Similar to uart and uart_phyp debug ports, it has to be enabled by
setting the hw.uart.dbgport variable to the serial console node
of the device tree.
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D22649
Summary:
There's no need to use the fallback fls() and flsl() libkern functions
when the PowerISA includes instructions that already do the bulk of the
work. Take advantage of this through the GCC builtins __builtin_clz()
and __builtin_clzl().
Reviewed by: luporl
Differential Revision: https://reviews.freebsd.org/D22340
Summary:
moea64_pte_sync_native() and moea64_pte_unset_native() don't need the
full PTE created, they only need to check that the PVO has a matching
PTE to the PTE in the page table. Don't waste time creating the full
PTE in this case.
Reviewed by: luporl
Differential Revision: https://reviews.freebsd.org/D22341
Summary:
This matches r351198 from amd64. This only applies to AIM64 and Book-E.
On AIM64 it short-circuits with one domain, to behave similar to
existing. Otherwise it will allocate 16MB huge pages to hold the page
array, across all NUMA domains. On the first domain it will shift the
page array base up, to "upper-align" the page array in that domain, so
as to reduce the number of pages from the next domain appearing in this
domain. After the first domain, subsequent domains will be allocated in
full 16MB pages, until the final domain, which can be short. This means
some inner domains may have pages accounted in earlier domains.
On Book-E the page array is setup at MMU bootstrap time so that it's
always mapped in TLB1, on both 32-bit and 64-bit. This reduces the TLB0
overhead for touching the vm_page_array, which reduces up to one TLB
miss per array access.
Since page_range (vm_page_startup()) is no longer used on Book-E but is on
32-bit AIM, mark the variable as potentially unused, rather than using a
nasty #if defined() list.
Reviewed by: luporl
Differential Revision: https://reviews.freebsd.org/D21449
o Remove All Rights Reserved from my notices
o imp@FreeBSD.org everywhere
o regularize punctiation, eliminate date ranges
o Make sure that it's clear that I don't claim All Rights reserved by listing
All Rights Reserved on same line as other copyright holders (but not
me). Other such holders are also listed last where it's clear.
Use the right formats for the types given (vm_offset_t and vm_size_t are
both uint32_t on 32-bit platforms, and uint64_t on 64-bit platforms, and
match size_t in size, so we can use the size_t format as we do in other
similar code).
These were found by clang.
r354266 changed the type of bp_kernload to vm_paddr_t in platform_mpc85xx.c,
but not the variable itself in locore.S. This caused the AP to not come up,
due to overwriting the following variable (bp_virtaddr). Also, properly
load bp_kernload into MAS3 and MAS7. Prior to r354266, we required loading
into the low 4GB, but now we can load from anywhere in memory that ubldr can
access.
- Use ustringp for the location of the argv and environment strings
and allow destp to travel further down the stack for the stackgap
and auxv regions.
- Update the Linux copyout_strings variants to move destp down the
stack as was done for the native ABIs in r263349.
- Stop allocating a space for a stack gap in the Linux ABIs. This
used to hold translated system call arguments, but hasn't been used
since r159992.
Reviewed by: kib
Tested on: md64 (amd64, i386, linux64), i386 (i386, linux)
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D22501
This lets us print, for example, the user's trap frame when a panic occurs.
The frame address is given in the backtrace at the trap point, which can
then be passed to 'show frame'. This is useful for debugging as it can show
inputs that lead to a panic or fault. It can also be used to print trap
frames from other CPUs that get stuck.
i386 already has a similar command, but no others do.
As part of my journey to make it easy to determine what's relying on tty
bits, remove a couple more. Some of these just outright didn't need it,
while others did rely on <sys/tty.h> pollution for mutex headers.
Since version 2.11.0, QEMU became bug-compatible with
PowerVM's vty implementation, by inserting a \0 after
every \r going to the guest. Guests are expected to
workaround this issue by removing every \0 immediately
following a \r.
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D22171
This change makes it possible to use a POWER Hypervisor virtual
terminal device (phyp vty) as a GDB debug port.
Similar to the uart debug port, it has to be enabled by setting
the hw.uart_phyp.dbgport variable to the vty node of the device
tree.
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D22205
Switch from "evaddumiaaw 0,0" to "evmwumiaa 0,0,0" when persisting the
accumulator. This has the benefit of actually being implemented in QEMU
as it is the form Linux uses for the same task.
Both instructions are functionally equivilent, as we are using them for
their side effect of copying the accumulator to GPRs rather than for the
actual math operation that they are performing.
Reviewed by: jhibbits
SPE registers are already exported in core dumps with the VMX note, so use
the same interface for live access.
Instead of simply guarding out in #ifndef __SPE__ the cpu_feature check, I
chose to keep the check and check against PPC_FEATURE_SPE, on the off-chance
someone decides to run a SPE kernel on a non-SPE device (which is possible,
though highly unlikely, and would be no different from running a MPC85XX
kernel in that instance).
ENOENT is leftover from mmu_oea.c's moea_pvo_enter(), where it's used to
syncicache() on the first new mapping of a page. This sync is done
differently in OEA64.
'tlbilxpid' is 'tlbilx 1, 0', while the existing form is 'tlbilx 0, 0',
which translates to 'tlbilxlpid', invalidating a LDPID. This effectively
invalidates the entire TLB, causing unnecessary reloads.
The purpose of this option is to make it easier to track down memory
corruption bugs by reducing the number of malloc(9) types that might
have recently been associated with a given chunk of memory. However, it
increases fragmentation and is disabled in release kernels.
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Since the DPAA code is from a third party, with minimal edits, there is no
intent to fix these specific warnings at this time. Hide these warnings to
prevent the noise from hiding real warnings.
save_vec_int() for SPE saves off only the high word of the register, leaving
the low word as "garbage", but really containing whatever was in the kernel
register at the time. This leaks into core dumps, and in a near future
commit also into ptrace. Instead, save the GPR in the low word in
save_vec_nodrop(), which is used only for core dumps and ptrace.
Modern gcc errors that "'vec[0]' is used uninitialized in this function"
without us telling it that vec is clobbered. Neither clang nor gcc 4.2.1
error on the existing construct.
Submitted by: bdragon
Change the FreeBSD ELF ABIs to use this new hook to copyout ELF auxv
instead of doing it in the sv_fixup hook. In particular, this new
hook allows the stack space to be allocated at the same time the auxv
values are copied out to userland. This allows us to avoid wasting
space for unused auxv entries as well as not having to recalculate
where the auxv vector is by walking back up over the argv and
environment vectors.
Reviewed by: brooks, emaste
Tested on: amd64 (amd64 and i386 binaries), i386, mips, mips64
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D22355
Summary:
This is a more optimal way of doing atomic_compset_masked() than the
fallback in sys/_atomic_subword.h. There's also an override for
_atomic_fcmpset_masked_word(), which may or may not be necessary, and is
unused for powerpc.
Reviewed by: kevans, kib
Differential Revision: https://reviews.freebsd.org/D22359
Fix wrong section ordering that was causing a ".got is not contiguous with
other relro sections" lld error. This also brings ldscript.powerpc and
ldscript.powerpcspe closer to ldscript.powerpc64.
Also, remove unnecessary text relocs from the ppc32 AIM trap code.
Approved by: jhibbits (mentor)
Differential Revision: https://reviews.freebsd.org/D22349
PowerISA 3.0 eliminated the 64-bit bridge mode which allowed 32-bit kernels
to run on 64-bit AIM/Book-S hardware. Since therefore only a 64-bit kernel
can run on this hardware, and 64-bit native always has the direct map, there
is no need to guard it.
According to the OPAL documentation, only the POWER8 (PHB3) should use
the register write TCE reset method. All others should use the OPAL
call.
On POWER9 the call is semantically identical to the register write, with
a wait for completion.
The memory range between VM_MAXUSER_ADDRESS and VM_MIN_KERNEL_ADDRESS is
reserved for devices currently, which are always mapped in TLB1, and
therefore do not exist in the kernel page table. Any page fault in this
range is therefore automatically a fatal fault.
Since TLB_MAXNEST is 3, the insert mask should only be 2 bits. Given that 2
bits counts to 4, and that we already have plenty of space wasted in
padding, make the nest level 4 to match the mask.
Freescale SoCs use a set of IRQs at the high end of the OpenPIC IRQ
list, not counted in the NIRQs of the Feature reporting register. Some
SoCs include a MSI inbound window in the PCIe controller configuration
registers as well, but some don't. Currently, this only handles the
SoCs *with* the MSI window.
There are 256 MSIs per MSI bank (32 per MSI IRQ, 8 IRQs per MSI bank).
The P5020 has 3 banks, yielding up to 768 MSIs; older SoCs have only one
bank.
Also, fix pmap_change_attr() to ignore non-kernel mappings.
* Fix a masking bug in mmu_booke_mapdev_attr() which caused it to align
mappings to the smallest mapping alignment, instead of the largest. This
caused mappings to be potentially pessimally aligned, using more TLB
entries than necessary.
* Return existing mappings from mmu_booke_mapdev_attr() that span more than
one TLB1 entry. The drm-current-kmod drivers map discontiguous segments
of the GPU, resulting in more than one TLB entry being used to satisfy the
mapping.
* Ignore non-kernel mappings in mmu_booke_change_attr(). There's a bug in
the linuxkpi layer that causes it to actually try to change physical
address mappings, instead of virtual addresses. amd64 doesn't encounter
this because it ignores non-kernel mappings.
With this it's possible to use drm-current-kmod on Book-E.
tlb1_mapin_region() and pmap_mapdev_attr() do roughly the same thing -- map
a chunk of physical address space(memory or MMIO) into virtual, but do it in
differing ways. Unify the code, settling on pmap_mapdev_attr()'s algorithm,
to simplify and unify the logic. This fixes a bug with growing the kernel
mappings in mmu_booke_bootstrap(), where part of the mapping was not getting
done, leading to a hang when the unmapped VAs were accessed.
The "alternate format" character 'I' previously had the same behavior as
the "display as an instruction" character 'i'. With this change, it will now
prefix each disassembled instruction with the raw hex value.
As PowerPC instructions are always 32 bits and always aligned, and there are
no alternate modes that would affect instruction decoding or display, this
seemed to me to be the obvious interpretation of "alternate format".
Approved by: jhibbits (mentor)
Differential Revision: https://reviews.freebsd.org/D22223
This involved several changes:
* Since lld does not like text relocations, replace SMP boot page text relocs
in booke/locore.S with position-independent math, and track the virtual base
in the SMP boot page header.
* As some SPRs are interpreted differently on clang due to the way it handles
platform-specific SPRs, switch m*dear and m*esr mnemonics out for regular
m*spr. Add both forms of SPR_DEAR to spr.h so the correct encoding is selected.
* Change some hardcoded 32 bit things in the boot page to be pointer-sized, and
fix alignment.
* Fix 64-bit build of booke/pmap.c when enabling pmap debugging.
Additionally, I took the opportunity to document how the SMP boot page works.
Approved by: jhibbits (mentor)
Differential Revision: https://reviews.freebsd.org/D21999
It's possible, with per-CPU mappings, for TLB1 indices to get out of sync.
This presents a problem when trying to insert an entry into TLB1 of all
CPUs. Currently that's done by assuming (hoping) that the TLBs are
perfectly synced, and inserting to the same index for all CPUs. However,
with aforementioned private mappings, this can result in overwriting
mappings on the other CPUs.
An example:
CPU0 CPU1
<setup all mappings> <idle>
3 private mappings
kick off CPU 1
initialize shared mappings (3 indices low)
Load kernel module, triggers 20 new mappings
Sync mappings at N-3
initialize 3 private mappings.
At this point, CPU 1 has all the correct mappings, while CPU 0 is missing 3
mappings that were shared across to CPU 1. When CPU 0 tries to access
memory in one of the overwritten mappings, it hangs while tripping through
the TLB miss handler. Device mappings are not stored in any page table.
This fixes by introducing a '-1' index for tlb1_write_entry_int(), so each
CPU searches for an available index private to itself.
MFC after: 3 weeks
There was a couple issues with GDB machdep code for PPC/PPC64, the main ones being:
- wrong register sizes being returned
- pcb_context index was wrong (this affects all PPC variants)
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D22201
In some scenarios, the 4K trapstk may overflow, corrupting tmpstk.
This was observed during remote debugging, with the following steps:
At remote host (R):
- enter kdb during boot
- switch to gdb backend
At local host (L):
- attach gdb to R
- try to read an invalid memory position
At R:
- a DSI trap occurs and kdb restarts (all this occurs on trapstk)
- while printing the stacktrace, trapstk overflows and corrupts tmpstk
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D22200
Summary:
Due to bugs in the enumeration code, fsl_pcib_init() was not configuring
sub-bridges properly, so devices hanging off a separate bridge would not
be found. Since the generic PCI code already supports probing child
buses, just delete this code and initialize only the device itself,
letting the generic code handle all the additional probing and
initializing.
This also deletes setup for some PCI peripherals found on some MPC85XX
evaluation boards. The code can be resurrected if needed, but overly
complicated this code in the first place.
Reviewed by: bdragon
Differential Revision: https://reviews.freebsd.org/D22050
r353489 added minidump support for powerpc64, but it added a dependency on
the dump_avail array. Leaving it uninitialized caused breakage in late
boot. Initialize dump_avail, even though the 64-bit booke pmap doesn't yet
support minidumps, but will in the future.
On POWER8 systems with only one memory domain, the "ibm,associativity"
number that corresponds to it is 0, unlike POWER9 systems with two
or more domains, in which the minimum value is 1.
In POWER8 case, subtracting 1 causes an underflow on the unsigned domain
variable and a subsequent index out-of-bounds access.
Reviewed by: jhibbits
Tested by: bdragon, luporl