These definitions were repeated by all architectures, with small
variations. Consolidate the common definitons in machine
independent code and use bitset(9) macros for manipulation. Many
opportunities for deduplication remain in the machine dependent
minidump logic. The only intended functional change is increasing
the bit index type to vm_pindex_t, allowing the indexing of pages
with address of 8 TiB and greater.
Reviewed by: kib, markj
Approved by: scottl (implicit)
MFC after: 1 week
Sponsored by: Ampere Computing, Inc.
Differential Revision: https://reviews.freebsd.org/D26129
One problem with the bus_space_read_N() and bus_space_write_N() family of
functions is that they provide no protection against exceptions which can
occur when no physical hardware or device responds to the read or write
cycles. In such a situation, the system typically would panic due to a
kernel-mode bus error. The bus_space_peek_N() and bus_space_poke_N() family
of functions provide a mechanism to handle these exceptions gracefully
without the risk of crashing the system.
Typical example is access to PCI(e) configuration space in bus enumeration
function on badly implemented PCI(e) root complexes (RK3399 or Neoverse
N1 N1SDP and/or access to PCI(e) register when device is in deep sleep state.
This commit adds a real implementation for arm64 only. The remaining
architectures have bus_space_peek()/bus_space_poke() emulated by using
bus_space_read()/bus_space_write() (without exception handling).
MFC after: 1 month
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D25371
Due to a check that should have been an endian check being an #if 0,
the wrong checksum mask table was being used on LE, which was causing
extreme strangeness in DNS resolution -- *some* hosts would be resolvable,
but most would not.
This fixes DNS resolution.
(I am committing some parts of the LE patchset ahead of time to reduce the
amount of work I have to do while committing the main patchset.)
Sponsored by: Tag1 Consulting, Inc.
On ibm,extended-clock-frequency, ensure we be64toh() the value.
On clock-frequency, remove the right-shifting hack (which was needed due to
reading a 32 bit value into a 64 bit variable) and switch to OF_getencprop()
for reading (which will handle endian conversion internally.)
Reviewed by: jhibbits (in irc)
Sponsored by: Tag1 Consulting, Inc.
that can be extended, but also ensure compile-time type checking. Refactor
common code out of arch-specific implementations. Move the mpr and mps
drivers to this new API. The template type remains visible to the consumer
so that it can be allocated on the stack, but should be considered opaque.
To make it easier to work with this in the future, convert to c99
designated initializer syntax.
Tested on powerpc, powerpc64, and powerpc64le. No functional change.
Sponsored by: Tag1 Consulting, Inc.
The intention of the bus_be naming was for those to be the no-endian-swapping
and for the bus_le to be endian-swapping in all the functions.
This naming breaks down when we're actually are running in LE and need to
use the opposite sense.
As such, rename bs_be_* to native_bs_* and rename bs_le_* to swapped_bs_*.
No functional change.
Sponsored by: Tag1 Consulting, Inc.
Swap the BE and LE bus_space tags when on LE, and adjust the nexus tag
to match.
This is prep for a a followup that makes the powerpc bus_space macros easier
to maintain in the future.
Sponsored by: Tag1 Consulting, Inc.
Implement pmap_mincore() for moea64.
This will need some slight tweaks when large page support in HPT lands.
Submitted by: Fernando Eckhardt Valle <fernando.valle@eldorado.org.br>
Reviewed by: bdragon
Differential Revision: https://reviews.freebsd.org/D26314
* Add LOAD_LR_NIA define. This is preferred to "bl 1f; 1:" because it
doesn't pollute the branch predictor.
* Add magic sequence to return the CPU to the correct endianness after
jumping to cross-endian code, similar to the sequence from Linux.
Sponsored by: Tag1 Consulting, Inc.
There were multiple bugs in the OPAL RTC code which had never been
discovered, as the default configuration of OPAL machines is to
have the BMC / FSP control the RTC.
* Fix calling convention for setting the time -- the variables are passed
directly in CPU registers, not via memory.
* Fix bug in the bcd encoding routines. (from jhibbits)
Tested on POWER9 Talos II (BE) and POWER9 Blackbird (LE).
Reviewed by: jhibbits (in irc)
Sponsored by: Tag1 Consulting, Inc.
When emulating a single thread system for testing reasons, mp_maxid can
be 0. This trips up our math for calculating the order.
Account for this to fix xive attachment when emulating a single-thread
core on qemu powernv (a configuration that doesn't exist in the real world.)
Sponsored by: Tag1 Consulting, Inc.
When doing a build for a modern CPUTYPE, llvm will throw errors if obsolete
instructions are used, even if they will never run due to runtime checks.
Hiding the dssall instruction from the assembler fixes kernel build when
overriding CPUTYPE, without having any effect on the generated binary.
This has been in my local tree for over a year and is well tested across
a variety of machines.
Sponsored by: Tag1 Consulting, Inc.
When enabling an interrupt, assert that we do in fact have a root PIC.
This would have saved me some debugging effort.
Sponsored by: Tag1 Consulting, Inc.
Implement the remaining pieces needed to allow userland timestamp reading.
Rewritten based on an intial essay into the problem by Justin Hibbits.
(Copyright changed to my own on his request.)
Tested on ppc64 (POWER9 Talos II), powerpcspe (e500v2 RB800), and
powerpc (g4 PowerBook).
Reviewed by: jhibbits (in irc)
Sponsored by: Tag1 Consulting, Inc.
Differential Revision: https://reviews.freebsd.org/D26347
In order to enable VDSO timekeeping, it is necessary that there be exactly
one primary FreeBSD sysvec for each of the host and (optionally) compat32.
So, switch ELFv1 to being a secondary sysvec of ELFv2, so it does not get
double-allocated in the shared page.
Since secondary sysvecs use the same sigcode allocation as the primary,
define both to use the main sigcode64, and adjust the sv_sigcode_base on
ELFv2 after initialization to point to the correct offset.
This has the desirable side effect of avoiding having a separate copy of
the signal trampoline in the shared page. Our sigcode64 was already written
to take advantage of trampoline sharing, it was just not being allocated
that way until now.
Submitted by: jhibbits
Sponsored by: Tag1 Consulting, Inc.
Currently we use a single bit to indicate whether the virtual page is
part of a superpage. To support a forthcoming implementation of
non-transparent 1GB superpages, it is useful to provide more detailed
information about large page sizes.
The change converts MINCORE_SUPER into a mask for MINCORE_PSIND(psind)
values, indicating a mapping of size psind, where psind is an index into
the pagesizes array returned by getpagesizes(3), which in turn comes
from the hw.pagesizes sysctl. MINCORE_PSIND(1) is equal to the old
value of MINCORE_SUPER.
For now, two bits are used to record the page size, permitting values
of MAXPAGESIZES up to 4.
Reviewed by: alc, kib
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D26238
This allows privileged userspace processes to find information about the
physical page backing a given mapping. It is useful in applications
such as DPDK which perform some of their own memory management.
Reviewed by: kib, jhb (previous version)
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara Inc.
Differential Revision: https://reviews.freebsd.org/D26237
PMCLOG macros were always using 32-bit addresses, even on PPC64.
This resulted in truncated addresses in logs, when running on 64-bit PPC
machines.
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D26112
Calling pmc_hook inside a critical section may result in a panic.
This happens when the user callchain is fetched, because it uses
pmap_map_user_ptr, that tries to get the (sleepable) pmap lock when the
needed vsid is not found.
Judging by the implementation in other platforms, intr_irq_handler in
kern/subr_intr.c and what pmc_hook do, it seems safe to move pmc_hook
outside the critical section.
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D26111
When SMP support for powerpc was added in r178628, the last callers of this
function were removed. All code that needs to manipulate the task priority
just does it directly instead.
Noticed while reading through the lint logs.
Sponsored by: Tag1 Consulting, Inc.
Assume ELF images without OSREL use the new auxv format.
This is specially important for rtld, that is not tagged. Using
direct exec mode with new (ELFv2) binaries that expect the new auxv
format would result in crashes otherwise.
Unfortunately, this may break direct exec'ing old binaries,
but it seems better to correctly support new binaries by default,
considering the transition to ELFv2 happened quite some time
ago. If needed, a sysctl may be added to allow old auxv format to
be used when OSREL is not found.
Reviewed by: bdragon
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D25651
Currently, we parse notes for the values of ELF FreeBSD feature flags
and osrel. Knowing these values, or knowing that image does not carry
the note if pointers are NULL, is useful to decide which ABI variant
(brand) we want to activate for the image.
Right now this is only a plumbing change
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D25273
After spending a lot of time trying to track down what was going on, I have
isolated the "black screen" failures when using boot1 to boot a G4 machine.
It turns out we were replacing the traps before installing the temporary
BAT entry for the bottom of physical memory. That meant that until the MMU
was bootstrapped, the cached translations were the only thing keeping us
from losing.
Throwing boot1 into the mix was affecting execution flow enough to cause us
to hit an uncached page and crash.
Fix this by properly setting up the initial BAT entry at the same time we
are replacing the OpenFirmware traps, so we can continue executing in
segment 0 until the rest of the DMAP has been set up.
A second thing discovered while researching this is that we were entering a
BAT region for segment 16. It turns out this range was a) considered part
of KVA, and b) has firmware mappings with varying attributes.
If we ever accessed an unmapped page in segment 16, it would cause a BAT
entry to be installed for the whole segment, which would bypass the
existing mappings until it was flushed out again.
Instead, translate the OFW memory attributes into VM memory attributes and
install the ranges into the kernel address space properly.
Reviewed by: adalava
MFC after: 3 weeks
Sponsored by: Tag1 Consulting, Inc.
Differential Revision: https://reviews.freebsd.org/D25547
This fixes spurious "XIVE[ IC 00 ] ISN 1 lead to invalid IVE !" messages
generated by OPAL when running with the debug level cranked up.
Discussed with jhibbits.
Sponsored by: Tag1 Consulting, Inc.
Being able to use tmpfs without kernel modules is very useful when building
small MFS_ROOT kernels without a real file system.
Including TMPFS also matches arm/GENERIC and the MIPS std.MALTA configs.
Compiling TMPFS only adds 4 .c files so this should not make much of a
difference to NO_MODULES build times (as we do for our minimal RISC-V
images).
Reviewed By: br (earlier version for riscv), brooks, emaste
Differential Revision: https://reviews.freebsd.org/D25317
Subsequent to r240317, kmem_free() was replaced with kva_free() (r254025).
kva_free() releases the KVA allocation for the mapped region, but no longer
clears the pmap (pagetable) entries.
An affected pmap_unmapdev operation would leave the still-pmap'd VA space
free for allocation by other KVA consumers. However, this bug easily
avoided notice for ~7 years because most devices (1) never call
pmap_unmapdev and (2) on amd64, mostly fit within the DMAP and do not need
KVA allocations. Other affected arch are less popular: i386, MIPS, and
PowerPC. Arm64, arm32, and riscv are not affected.
Reported by: Don Morris <dgmorris AT earthlink.net>
Submitted by: Don Morris (amd64 part)
Reviewed by: kib, markj, Don (!amd64 parts)
MFC after: I don't intend to, but you might want to
Sponsored by: Dell Isilon
Differential Revision: https://reviews.freebsd.org/D25689
This removes SCTP from in-tree kernel configuration files. Now, SCTP
can be enabled by simply loading the module, as discussed on
freebsd-net@.
Reviewed by: tuexen
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25611
Use PVO_PADDR macro to get the physical address from a PVO, instead of
explicitly ANDing pvo_pte.pa with LPTE_RPGN where it is needed. Besides
improving readability, this is needed to support superpages (D25237), where
the steps to get the PA from a PVO are different.
Reviewed by: markj
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D25654
* Only read the DPCPU pointer once per xive_dispatch call.
* Optimize HE decoding for the common cases.
Reported by: jhibbits (in irc)
Reviewed by: jhibbits
Sponsored by: Tag1 Consulting, Inc.
Differential Revision: https://reviews.freebsd.org/D25545
It turns out relocating the symbol table itself can cause issues, like fbt
crashing because it applies the offsets to the kernel twice.
This had been previously brought up in rS333447 when the stoffs hack was
added, but I had been unaware of this and reimplemented symtab relocation.
Instead of relocating the symbol table, keep track of the relocation base
in ddb, so the ddb symbols behave like the kernel linker-provided symbols.
This is intended to be NFC on platforms other than PowerPC, which do not
use fully relocatable kernels. (The relbase will always be 0)
* Remove the rest of the stoffs hack.
* Remove my half-baked displace_symbol_table() function.
* Extend ddb initialization to cope with having a relocation offset on the
kernel symbol table.
* Fix my kernel-as-initrd hack to work with booke64 by using a temporary
mapping to access the data.
* Fix another instance of __powerpc__ that is actually RELOCATABLE_KERNEL.
* Change the behavior or X_db_symbol_values to apply the relocation base
when updating valp, to match link_elf_symbol_values() behavior.
Reviewed by: jhibbits
Sponsored by: Tag1 Consulting, Inc.
Differential Revision: https://reviews.freebsd.org/D25223
Due to kldxref not being able to generate hints for nonnative platforms,
any cross built VM images do not have /boot/kernel/linker.hints.
This prevents the virtio modules from being loaded, as the fallback code
will always fail the version check when the hints are missing.
Since we want to be able to generate VM images for 32 bit powerpc, add the
virtio modules to GENERIC like we do on powerpc64.
Reviewed by: jhibbits
Sponsored by: Tag1 Consulting, Inc.
Differential Revision: https://reviews.freebsd.org/D25271
Since qemu does not implement the L2 cache, we get stuck forever waiting
for a bit to be set when trying to invalidate it.
To prevent that, we should bail out if the L2 cache is missing.
One easy way to check this is L2CFG0 == 0 (since L2CSIZE always has at
least one bit set in a valid implementation)
(tested on qemu, rb800, and x5000)
Reviewed by: jhibbits
Sponsored by: Tag1 Consulting, Inc.
Differential Revision: https://reviews.freebsd.org/D25225
After r361988 fixed the reference count leak on booke64, it became possible
for an iteration somewhere in the middle of a page to become stale, with the
page vanishing (correctly) due to all PTEs on that page going away.
pte_find_next() would start at that iterator, and move along 'higher' order
directory pages until it finds a valid one, without zeroing out the lower
order pages. For instance:
/* Find next pte at or above 0x10002000. */
pte = pte_find_next(pmap, &(0x10002000));
pte_remove(pmap, pte);
/* This pte was the last reference in the page table page, page is
* gone.
*/
pte = pte_find_next(pmap, 0x10002000);
/* pte_find_next will see 0x10002000's page is gone, and jump to the
* next one, but starting iteration at the '0x2000' slot, skipping
* 0x0000 and 0x1000.
*/
This caused some processes, like git, to trip the KASSERT() in
pmap_release().
Fix this by zeroing all lower order iterators at each level.
vmem quantum cache is only needed when doing a lot of concurrent allocations,
which doesn't happen when allocating MSIs. This wastes memory for the cache
zones. Avoid this waste and don't use the quantum cache.
Reported by: markj
Properly handle reference counts in the 64-bit pmap page directories.
Otherwise all page table pages would leak due to over-referencing. This
would cause a quick enter to swap on a desktop system (AmigaOne X5000) when
quitting and rerunning applications, or just building world.
Add an INVARIANTS check to validate no leakage at pmap release time.
If the POWER firmware detects a bad CPU core, it will "GUARD" it out,
marking it disabled. Any attempt to spin up a bad CPU will trigger a panic
later on when waiting for threads on said core to wake up. Support limping
along on fewer cores instead.