Add a new 'debugger_on_trap' knob separate from 'debugger_on_panic'
and make the calls to kdb_trap() in MD fatal trap handlers prior to
calling panic() conditional on this new knob instead of
'debugger_on_panic'. Disable the new knob by default. Developers who
wish to recover from a fatal fault by adjusting saved register state
and retrying the faulting instruction can still do so by enabling the
new knob. However, for the more common case this makes the user
experience for panics due to a fatal fault match the user experience
for other panics, e.g. 'c' in DDB will generate a crash dump and
reboot the system rather than being stuck in an infinite loop of fatal
fault messages and DDB prompts.
Reviewed by: kib, avg
MFC after: 2 months
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D17768
The loader tunable 'debug.verbose_sysinit' may be used to toggle verbosity.
This is added to the debugging section of these kernconfs to be turned off
in stable branches for clarity of intent.
MFC after: never
The number of fans on a PowerMac7,3 with liquid cooling is 9.
Reviewed by: andreast@
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D17754
It seems if a Radeon card is already initialized by u-boot, it won't be
reinitialized by the kernel, and the DRM module will fail to attach. This
steals the reset code from mips/octopci.c to blindly reset the bus on attach.
This was tested on a AmigaOne X5000/20, such that it can be booted from the
local video console, and get a video console in FreeBSD.
All platforms except powerpc use the same values and powerpc shares a
majority of them.
Go ahead and declare AT_NOTELF, AT_UID, and AT_EUID in favor of the
unused AT_DCACHEBSIZE, AT_ICACHEBSIZE, and AT_UCACHEBSIZE for powerpc.
Reviewed by: jhb, imp
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D17397
Further investigation of issues with 32-bit DMA on PowerNV revealed that
its window is hardcoded by OPAL (at least in skiboot version 5.4.9) and
cannot be changed by the OS.
Thus, now jhb suggestion of limiting the range in PCI DMA tag seems
the best way to deal with it.
Reviewed by: jhibbits, nwhitehorn, sbruno
Approved by: jhibbits(mentor)
Differential Revision: https://reviews.freebsd.org/D17601
si_addr is the address of the instruction executing at the time the
signal was sent. Populate this field with srr0, which, though not
always the case, is most often the instruction that triggered the fault.
debugf() is unnecessary for the TLB printing functions, as they're only
intended to be used from ddb. Instead, make them full DDB 'show'
commands, so now it can be written as 'show tlb1' and 'show tlb0'
instead of calling the function, hoping DEBUG has been defined.
This driver was already 99% identical to the ofw_pcib_pci driver, except for
the attachment. Since ofw_pcib_pci is already a subclass of pcib, this
creates a private declaration of that class, to use for the base class for
this driver.
At some point in the future, ofw_pcib_pci_driver should probably be exported
to a header, so we're not tracking the softc struct contents, but for now,
since there's only this one other driver, it's not a pressing issue.
r279252 inverted the logic in moea64_scan_init, such that instead of
terminating when reaching a dead page, it terminates when reaching a live
page, ostensibly preserving exactly one page of KVA.
The Signal Processing Engine (SPE) found in Freescale e500 cores (and
others) offloads IEEE-754 compliance (NaN, Inf handling, overflow,
underflow) to software, most likely as a means of simplifying the APU
silicon. Some software, like AbiWord, needs full IEEE-754 compliance,
including NaN handling. Implement the necessary bits to enable it.
Differential Revision: https://reviews.freebsd.org/D17446
At early boot, PCPU_GET(), that obtains a pointer from SPRG0, was being
used with SPRG0 not yet initialized. If it pointed to an invalid
address, the machine would hang.
Approved by: re(gjb), jhibbits(mentor)
Discussing with Benjamin Herrenschmidt, OPAL_INT_GET_XIRR masks the
returned priority, so must be resumed before more interrupts can be
handled at this priority. Since there are only two priorities used in
FreeBSD, we know that the previous priority in an EOI will always be
0xff (lowest priority).
Reviewed by: nwhitehorn
Approved by: re(rgrimes)
Differential Revision: https://reviews.freebsd.org/D17361
Summary:
Discussing with Benjamin Herrenschmidt, MSIs, and edge-triggered
interrupts in general, must not be masked in XICS and XIVE, else
subsequent interrupts may be ignored.
Testing locally on my Talos II (single CPU, 18-core POWER9), NVMe now
works with MSI, improving read throughput by ~70% (900MB/s -> 1.67GB/s,
with 64MB block size) over INTx interrupts, and snd_hda(4) now will
actually play music with MSI. Previously, snd_hda(4) would not receive
interrupts, timing out, and declaring the channels dead.
This has also been tested by Kevin Bowling, and others, with great
success. Kevin reported NVMe unusable on his Talos II prior to this
patch.
Reviewed by: nwhitehorn, kbowling
Approved by: re(rgrimes)
Differential Revision: https://reviews.freebsd.org/D17356
The PHB4 host bridge used by the POWER9 uses a 64kB range in 32-bit
space at the address 0xffff0000-0xffffffff. Reserve this range so that
DMA memory cannot be allocated within this range. This fixes seemingly
random crashes on a POWER9 system. Ideally this range will have been
reserved by the firmware, but as of now this is not the case.
Submitted by: git_bdragon.rtk0.net
Reviewed by: nwhitehorn
Approved by: re(kib)
Differential Revision: https://reviews.freebsd.org/D17183
This patch adds the very initial support for HTM that might come at FreeBSD
version 12.1. This basic support defines a new kABI, so, we do not need to change
it later during 12.1 time frame, when the full implementation will come.
Reviewed by: jhibbits
Approved by: re(marius), jhibbits (mentor)
Differential Revision: https://reviews.freebsd.org/D16889
error in the function hypercall_memfree(), where the wrong arena was being
passed to kmem_free().
Introduce a per-page flag, VPO_KMEM_EXEC, to mark physical pages that are
mapped in kmem with execute permissions. Use this flag to determine which
arena the kmem virtual addresses are returned to.
Eliminate UMA_SLAB_KRWX. The introduction of VPO_KMEM_EXEC makes it
redundant.
Update the nearby comment for UMA_SLAB_KERNEL.
Reviewed by: kib, markj
Discussed with: jeff
Approved by: re (marius)
Differential Revision: https://reviews.freebsd.org/D16845
The boot-time ifunc resolver assumes that it only needs to apply
IRELATIVE relocations to PLT entries. With an upcoming optimization,
this assumption no longer holds, so add the support required to handle
PC-relative relocations targeting GNU_IFUNC symbols.
- Provide a custom symbol lookup routine that can be used in early boot.
The default lookup routine uses kobj, which is not functional at that
point.
- Apply all existing relocations during boot rather than filtering
IRELATIVE relocations.
- Ensure that we continue to apply ifunc relocations in a second pass
when loading a kernel module.
Reviewed by: kib
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D16749
became unused in FreeBSD 12.x as a side-effect of the NUMA-related
changes.)
Reviewed by: kib, markj
Discussed with: jeff, re@
Differential Revision: https://reviews.freebsd.org/D16825
This is an amalgam of a patch by Doug Ambrisko to
generalize uart_acpi_find_device, imp moving the
ACPI table to uart_dev_ns8250.c and advice by jhb
to work around a bug in the EPYC 3151 BIOS
(the BIOS incorrectly marks the serial ports as
disabled)
Reviewed by: imp
MFC after: 8 weeks
Differential Revision: https://reviews.freebsd.org/D16432
- In configurations with a pseudo devices section, move 'device crypto'
into that section.
- Use a consistent comment. Note that other things common in kernel
configs such as GELI also require 'device crypto', not just IPSEC.
Reviewed by: rgrimes, cem, imp
Differential Revision: https://reviews.freebsd.org/D16775
The canonical form of sync is:
sync L, E (if Category Elemental Memory Barriers implemented)
The L bits (2) denote the type of sync:
0 -- hwsync
1 -- lwsync
2 -- ptesync or hwsync
It's been found that most 32-bit CPUs designed prior to the introduction of
lwsync will ignore the L bits. However, some cores, particularly the e500 core,
will trigger an illegal instruction exception. Adding these variants will make
it easier to see which sync variant is actually being used in case of a trap.
If OPAL_RTC_READ is busy and does not return the information on the first run,
as returning OPAL_BUSY_EVENT, the system will crash since ymd and hmsm variable
will contain junk values.
This is happening because we were not calling OPAL_RTC_READ again after
OPAL_POLL_EVENTS' return, which would finally replace the old/junk hmsm and ymd
values.
The code was also mixing OPAL_RTC_READ and OPAL_POLL_EVENTS return values.
This patch fix this logic and guarantee that we call OPAL_RTC_READ after
OPAL_POLL_EVENTS return, and guarantee the code will only proceed if
OPAL_RTC_READ returns OPAL_SUCCESS.
Reviewed by: jhibbits
Approved by: jhibbits (mentor)
Differential Revision: https://reviews.freebsd.org/D16617
This is an OFW initrd module that would load the initrd from device tree
parameters and give the to the md driver.
With this patch, it is possible to pass a rootfs image through kexec in PowerNV
mode (powerpc64). In order to user it, you should set the MD_ROOT_MEM option in
your kernel configuration.
Reviewed by: jhibbits
Approved by: jhibbits (mentor)
Differential Revision: https://reviews.freebsd.org/D15705
I had naively assumed that building kernel would be sufficient to test that
the header is sane. However, it turns out this now needs -fms-extensions to
build. Rather than sprinkling -fms-extensions all over the place, revert
for now, and revisit with a better fix.
Summary:
Ports like sysutils/lsof troll through kernel structures, and
therefore include kernel headers and all the dirty secrets involved. struct
vm_page includes the struct md_page inline, which currently is only defined
if AIM or BOOKE is defined. Thus, by default, sysutils/lsof cannot build,
due to the struct md_page having an incomplete type. Fix this by merging
the two struct definitions into an anonymous struct-union.
A similar change could be made to unify the pmap structures as well.
Reviewed By: nwhitehorn
Differential Revision: https://reviews.freebsd.org/D16232
* FreeBSD stores addresses in 8 bit format, but the OPAL API requires the 7-bit
address, and encodes the direction elsewhere. Behave like other i2c drivers,
and shift accordingly.
* The OPAL API can already handle multiple requests in flight. Change the async
token to be private to the thread, so as not to stomp across i2c accesses,
remove the limitation error message, and use the correct message index to
transfer all messages in the list.
* Micro-optimize the async handler to not continuously call pmap_kextract() when
spin-waiting for the operation to complete.
This has been tested by hexdumping an EEPROM attached via the icee(4) driver.
ofw_iicbus already has attachments on iichb. Rather than adding an explicit
attachment onto opal_i2c, simply change the exposed name of the OPAL i2c bus
to 'iichb'.
- Change pcpu zone consumers to use a stride size of PAGE_SIZE.
(defined as UMA_PCPU_ALLOC_SIZE to make future identification easier)
- Allocate page from the correct domain for a given cpu.
- Don't initialize pc_domain to non-zero value if NUMA is not defined
There are some misconceptions surrounding this field. It is the
_VM_ NUMA domain and should only ever correspond to valid domain
values as understood by the VM.
The former slab size of sizeof(struct pcpu) was somewhat arbitrary.
The new value is PAGE_SIZE because that's the smallest granularity
which the VM can allocate a slab for a given domain. If you have
fewer than PAGE_SIZE/8 counters on your system there will be some
memory wasted, but this is obviously something where you want the
cache line to be coming from the correct domain.
Reviewed by: jeff
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D15933
On arm64 (and possible other architectures) we are unable to use static
DPCPU data in kernel modules. This is because the compiler will generate
PC-relative accesses, however the runtime-linker expects to be able to
relocate these.
In preparation to fix this create two macros depending on if the data is
global or static.
Reviewed by: bz, emaste, markj
Sponsored by: ABT Systems Ltd
Differential Revision: https://reviews.freebsd.org/D16140
Summary: If the chosen console is not the OPAL uart, but OPAL uart devices
exist, the console device doesn't attach properly, and faults in the interrupt
handler, with a NULL pointer dereference. To fix this, and as a byproduct, also
support multiple OPAL consoles, refactor to have the console getc callback use
the appropriate softc instead of the global console_sc, which may be NULL in the
case of a different device being the console.
Reviewed by: nwhitehorn
Differential Revision: https://reviews.freebsd.org/D16071
Summary: In r220638, stoppcbs started being tracked. This never got exposed to
ddb though, so kdb_thr_ctx() didn't know how to look them up.
This allows switching to threads on stopped CPUs in kdb.
Submitted by: Brandon Bergren <git_bdragon.rkt0.net>
Differential Revision: https://reviews.freebsd.org/D15986
r330610 relocated the DMAP from the base of memory to the base of the fourth
quadrant of memory. This broke synthetic traps, such as KDB forced
breakpoints. Use GET_TOCBASE() so the DMAP offset is handled.
Submitted by: git_bdragon.rkt0.net
Differential Revision: https://reviews.freebsd.org/D15973
Summary: POWER8 and POWER9 use a single CPU register, per core, to change clock
speed. Everything else is handled by the on-chip controller. This change
necessitates a change to the cpufreq global kernel driver to bump supported
levels, as the device tree for these systems can have theoretically 256
different options. On my POWER9 Talos, the list consists of 100 items. At
16.67MHz intervals, that allows for a change of roughly 1.67GHz between lowest
and highest.
This has only been tested on the POWER9. However, since they're similar, this
should work on POWER8 as well.
Reviewed By: nwhitehorn
Differential Revision: https://reviews.freebsd.org/D15932
PowerISA 3.0 makes several changes to not only the format of the HPT but
also the behavior surrounding it. For instance, TLBIE no longer requires
serialization. Removing this lock cuts buildworld time in half on a
18-core/72-thread POWER9 system, demonstrating that this lock is highly
contended on such a system.
There was odd behavior observed trying to make this change in a
backwards-compatible manner in moea64_native.c, so the best option was to
fully split it, and largely revert the original changes adding POWER9
support to the original file.
Suggested by: nwhitehorn
On very large memory systems 'size' can become 2GB or larger, resulting in a
negative value being formatted. Also, moea64_pteg_count is already a long, so
format it as such.
There is a type promotion that transform count = -1 into a unsigned int causing
the default TCE SEG SIZE not being returned on a Boston POWER9 machine.
This machine does not have the 'ibm,supported-tce-sizes' entries, thus, count
is set to -1, and the function continue to execute instead of returning.
Reviewed by: jhibbits, wma
Approved by: jhibbits (mentor)
Differential Revision: https://reviews.freebsd.org/D15763
pmc_process_interrupt takes 5 arguments when only 3 are needed.
cpu is always available in curcpu and inuserspace can always be
derived from the passed trapframe.
While facially a reasonable cleanup this change was motivated
by the need to workaround a compiler bug.
core2_intr(cpu, tf) ->
pmc_process_interrupt(cpu, ring, pmc, tf, inuserspace) ->
pmc_add_sample(cpu, ring, pm, tf, inuserspace)
In the process of optimizing the tail call the tf pointer was getting
clobbered:
(kgdb) up
at /storage/mmacy/devel/freebsd/sys/dev/hwpmc/hwpmc_mod.c:4709
4709 pmc_save_kernel_callchain(ps->ps_pc,
(kgdb) up
1205 error = pmc_process_interrupt(cpu, PMC_HR, pm, tf,
resulting in a crash in pmc_save_kernel_callchain.
Changed excise_initrd_region to support both 32- and 64-bit
values for linux,initrd-start and linux,initrd-end.
This fixes the boot problem on some machines after rS334485.
Submitted by: Luis Pires <lffpires@ruabrasil.org>
Reviewed by: jhibbits, leitao
Approved by: jhibbits (mentor)
Differential Revision: https://reviews.freebsd.org/D15667
Summary: Included VSX registers in powerpc core dumps (both kernel and gcore)
Submitted by: Luis Pires
Differential Revision: https://reviews.freebsd.org/D15512
Summary:
Added ptrace support for getting/setting the remaining part of the VSX registers
(the part that's not already covered by FPR or VR registers).
This is necessary to add support for VSX registers in debuggers.
Submitted by: Luis Pires
Differential Revision: https://reviews.freebsd.org/D15458
This will let us use much more KVA for ZFS ARC where needed. This may be
incresed in the future if memory requirements increase.
Discussed with: nwhitehorn
Recently a change was made which broke loading 32-bit binaries on powerpc64,
with an assertion in ld-elf32.so.1:
ld-elf32.so.1: assert failed:
/usr/local/poudriere/jails/ppc64/usr/src/libexec/rtld-elf/rtld.c:390
It turns out Elf32_AuxInfo was broken for a very long time on powerpc64, as
it uses long and pointers, which are both 64 bits on powerpc64, and only
manifested with the recent work on auxargs.
Currently kexec loads an initrd file into the main memory but does not
mark that region as reserved, thus the area is not protected.
If any initrd/md file is loaded from kexec/petitboot, the region might become
corarupted/overwritten since FreeBSD does not know the region is 'reserved'.
This patch simply adds the initrd area as a reserved memory region.
Approved by: jhibbits
Differential Revision: https://reviews.freebsd.org/D15610
Summary:
Coupled with r334365, this makes PCI work on POWER9. There is still more to
do to fully exploit the hardware capabilities, but this is sufficient to
enable USB and ethernet controllers on a POWER9 Talos II system.
Reviewed by: nwhitehorn, leitao
Differential Revision: https://reviews.freebsd.org/D15566
This reduces the CPU cycle wastage on power9, which is SMT4. Any idle
thread that's spinning is simply starving working threads on the same core
of valuable resources.
This can be reduced further by taking more advantage of the PSSCR supported
states, as well as permitting state loss, as is currently done for power8.
The currently implemented stop state is the lowest latency, which may still
consume resources.
POWER9 supports Radix page tables in addition to Hashed page tables. When
Radix page tables are in use, the TLB is cut in half, so that half of the
TLB is used for the page walk cache. This is the default behavior, however
FreeBSD currently does not support Radix tables. Clear this bit so that we
can use the full TLB. Do this in the MMU logic so that configuration can be
localized to the specific translation format. Once we do support Radix
tables, the setup for that will be localized to the Radix MMU kobj.
Summary:
PowerISA 2.03 and later require bits 14:65 in the RB register argument,
which is the full value of the vpn argument post-shift. Only POWER4, POWER4+,
and PPC970* need the upper 16 bits cropped.
With this change FreeBSD can boot to multi-user on POWER9.
Reviewed by: nwhitehorn
Differential Revision: https://reviews.freebsd.org/D15581
IPMI access on PowerNV systems is done through the OPAL firmware. This adds a
simple attachment for communicating with the FSP/BMC on these machines. This
has been tested on a Talos POWER9 workstation, only in the bootup phase, noting
the successful attachment messages:
...
ipmi0: IPMI device rev. 0, firmware rev. 2.00, version 2.0, device support mask 0
ipmi0: Number of channels 2
...
The ipmi device has not been added to GENERIC64, but may be after further
testing. It may also eventually be added to the ipmi module at that point.
Summary:
PowerNV architectures (in the test case POWER9) export sensors via the device
tree, which are accessed via OPAL calls. This adds sysctl nodes for each
device in a generic fashion. New sysctl nodes are:
dev.opal_sensor.N.sensor
dev.opal_sensor.N.sensor_min
dev.opal_sensor.N.sensor_max
dev.opal_sensor.N.type
dev.opal_sensor.N.label
These are rooted at a parent attachment under opal, called opalsens. This does
not add support for the "sensor groups" defined in the device tree.
Reviewed by: breno.leitao_gmail.com
Differential Revision: https://reviews.freebsd.org/D15362
Summary:
POWER9 systems use a new interrupt controller, XIVE, managed through OPAL
firmware calls. The OPAL firmware includes support for emulating the previous
generation XICS presentation layer in addition to a new "XIVE Exploitation"
mode. As a stopgap until we have XIVE exploitation mode, enable XICS emulation
mode so that we at least have an interrupt controller.
Since the CPPR is local to the current CPU, it cannot be updated for APs when
initializing on the BSP. This adds a new function, directly called by the
powernv platform code, to initialize the CPPR on AP bringup.
Reviewed by: nwhitehorn
Differential Revision: https://reviews.freebsd.org/D15492
This turns on support for kernel dump encryption and compression, and
netdump. arm and mips platforms are omitted for now, since they are more
constrained and don't benefit as much from these features.
Reviewed by: cem, manu, rgrimes
Tested by: manu (arm64)
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D15465
Summary:
Some hypervisor exceptions on POWER architecture only save state to HSRR0/HSRR1.
Until we have bhyve on POWER, use a lightweight exception frontend which copies
HSRR0/HSRR1 into SRR0/SRR1, and run the normal trap handler.
The first user of this is the Hypervisor Virtualization Interrupt, which targets
the XIVE interrupt controller on POWER9.
Reviewed By: nwhitehorn
Differential Revision: https://reviews.freebsd.org/D15487
Summary:
Add additional OPAL PCI definitions and expand the code to use them in order to
ease the OPAL interface process for new comers.
These definitions came directly from the OPAL code and they are the same for
both PHB3 (POWER8) and PHB4 (POWER9).
Submitted by: Breno Leitao
Differential Revision: https://reviews.freebsd.org/D15432
On some POWER9 systems, 'reg' denotes the full memory in the system, while
'linux,usable-memory' denotes the usable memory. Some memory is reserved for
NVLink usage, so is partitioned off.
Submitted by: Breno Leitao
r333273 and partially reverted with r333594.
Older CPUs implement addition of offsets into the page table by a
bitwise OR rather than actual addition, which only works if the table is
aligned at a multiple of its own size (they also require it to be aligned
at a multiple of 256KB). Newer ones do not have that requirement, but it
hardly matters to enforce it anyway.
The original code was failing on newer systems with huge amounts of RAM
(> 512 GB), in which the page table was 4 GB in size. Because the
bootstrap memory allocator took its alignment parameter as an int, this
turned into a 0, removing any alignment constraint at all and making
the MMU fail. The first round of this patch (r333273) fixed this case by
aligning it at 256 KB, which broke older CPUs. Fix this instead by widening
the alignment parameter.
Summary:
There were 2 issues that were preventing correct symbol resolution
on PowerPC/pseries:
1- memory corruption at chrp_attach() - this caused the inital
part of the symbol table to become zeroed, which would cause
the kernel linker to fail to parse it.
(this was probably zeroing out other memory parts as well)
2- DDB symbol resolution wasn't working because symtab contained
not relocated addresses but it was given relocated offsets.
Although relocating the symbol table fixed this, it broke the
linker, that already handled this case.
Thus, the fix for this consists in adding a new DDB macro:
DB_STOFFS(offs) that converts a (potentially) relocated offset
into one that can be compared with symbol table values.
PR: 227093
Submitted by: Leandro Lupori <leandro.lupori_gmail.com>
Differential Revision: https://reviews.freebsd.org/D15372
riscv and powerpc have nearly identical bcopy.c that's
supposed to be mostly MI. Move it to the MI libkern.
Differential Revision: https://reviews.freebsd.org/D15374
Summary:
chrp_cpuref_init() was relying on the boot strap processor to be
the first child of /cpus. That was not always the case, specially
on pseries with FDT.
This change uses the "reg" property of each CPU instead and also
adds several sanity checks to avoid unexpected behavior (maybe
too many panics?).
The main observed symptom was interrupts being missed by the main
processor, leading to timeouts and the kernel aborting the boot.
Submitted by: Leandro Lupori
Reviewed by: nwhitehorn
Differential Revision: https://reviews.freebsd.org/D15174
The POWER9 MMU (PowerISA 3.0) is slightly different from current
configurations, using a partition table even for hypervisor mode, and
dropping the SDR1 register. Key off the newly early-enabled CPU features
flags for the new architecture, and configure the MMU appropriately.
The POWER9 MMU ignores the "PSIZ" field in the PTCR, and expects a 64kB
table. As we are enabled for powernv (hypervisor mode, no VMs), only
initialize partition table entry 0, and zero out the rest. The actual
contents of the register are identical to SDR1 from previous architectures.
Along with this, fix a bug in the page table allocation with very large
memory. The table can be allocated on any 256k boundary. The
bootstrap_alloc alignment argument is an int, and with large amounts of
memory passing the size of the table as the alignment will overflow an
integer. Hard-code the alignment at 256k as wider alignment is not
necessary.
Reviewed by: nwhitehorn
Tested by: Breno Leitao
Relnotes: Yes
The new POWER9 MMU configuration is slightly different from current setups.
Rather than special-casing on POWER9, move the initialization of cpu_features
and cpu_features2 to as early as possible, so that platform and MMU
configuration can be based upon CPU features instead of specific CPUs if at all
possible.
Reviewed by: nwhitehorn
POWER8 and POWER9 have similar configuration requirements for hypervisor setup,
and in the cases here they're identical. Add the POWER9 constant to the POWER8
list so it's initialized correctly.
Reviewed by: nwhitehorn
This code caused more problems than it should have fixed (boot failures) on
the machines I tested, so has been commented out for a while now. Remove
it, and assume the errata fixups were done by the bootloader where they
belong.
Discussing with others, this needs to be at least 20 to boot on some POWER9
nodes. Linux made a similar change for the same reason, so increase to 32
to give us some extra breathing room as well. The input and output arrays
are sized at 256, so much greater than the increase in the property array
size.
sysentvec::sv_hwcap/sv_hwcap2 are pointers to u_long, so cpu_features* need
to be u_long to use the pointers. This also requires a temporary cast in
printing the bitfields, which is fine because the feature flag fields are
only 32 bits anyway.
FreeBSD exports the AT_HWCAP* auxvec items if provided by the ELF sysentvec
structure. Add the CPU features to be exported, so user space can more
easily check for them without using the hw.cpu_features and hw.cpu_features2
sysctls.
Not all feature flags are synced. Those for processors we don't currently
support are ignored currently. Those that are supported are synced best I
can tell. One flag was renamed to match the Linux flag name
(PPC_FEATURE2_VCRYPTO -> PPC_FEATURE2_VEC_CRYPTO).
Summary:
POWER9 also contains 32 slbs entries as explained by the POWER9 User Manual:
"For HPT translation, the POWER9 core contains a unified (combined for both
instruction and data), 32-entry, fully-associative SLB per thread"
Submitted by: Breno Leitao
Differential Revision: https://reviews.freebsd.org/D15128
Summary:
Powerpc64 has support for a register called Data Stream Control Register
(DSCR), which basically controls how the hardware controls the caching and
prefetch for stream operations.
Since mfdscr and mtdscr are privileged instructions, we need to emulate them,
and
keep the custom DSCR configuration per thread.
The purpose of this feature is to change DSCR depending on the operation, set
to DSCR Default Prefetch Depth to deepest on string operations, as memcpy.
Submitted by: Breno Leitao
Differential Revision: https://reviews.freebsd.org/D15081
region marked "available" by firmware is contained entirely in the kernel.
This had a tendency to happen with FDTs passed by loader, though could for
other reasons as well, and would result in the kernel slowly cannibalizing
itself for other purposes, eventually resulting in a crash.
A similar fix is needed for mmu_oea.c and should probably just be rolled
at that point into some generic code in platform.c for taking a mem_region
list and removing chunks.
PR: 226974
Submitted by: leandro.lupori@gmail.com
Reviewed by: jhibbits
Differential Revision: D15121
This will allow to hook a ddb script to "kdb.enter.trap" event.
Previously there was no specific name for this event, so it could only
be handled by either "kdb.enter.unknown" or "kdb.enter.default" hooks.
Both are very unspecific.
Having a specific event is useful because the fatal trap condition is
very similar to panic but it has an additional property that the current
stack frame is the frame where the trap occurred. So, both a register
dump and a stack bottom dump have additional information that can help
analyze the problem.
I have added the event only on architectures that have trap_fatal()
function defined. I haven't looked at other architectures. Their
maintainers can add support for the event later.
Sample script:
kdb.enter.trap=bt; show reg; x/aS $rsp,20; x/agx $rsp,20
Reviewed by: kib, jhb, markj
MFC after: 11 days
Sponsored by: Panzura
Differential Revision: https://reviews.freebsd.org/D15093
Half of implementations always failed (returned (-1)) and they were
previously used in only one place.
Reviewed by: kib, andrew
Obtained from: CheriBSD
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D15102
This makes it more consistent with FreeBSD norms, rather than using Linux's
norms. Now, instead of needing an environment variable
video-mode=fslfb:1280x1024@60
Now one would use a hint:
hint.fb.0.mode=1280x1024@60
Most other architectures already re-enter KDB on faults, powerpc and mips
are the only outliers. Correct this for powerpc, so that now bad addresses
can be handled gracefully instead of panicking.
Make int_external_input, int_decrementer, and int_performance_counter all
now use trap_common, just like on AIM. The effects of this are:
* All traps are now properly displayed in ddb. Previously traps from
external input, decrementer, and performance counters, would display as
just basic stack traces. Now the frame is displayed.
* External interrupts are now handled with interrupts enabled, so handling
can be preempted. This seems to fix a hang found post-r329882.
Change OF_getencprop_alloc semantics to be combination of malloc and
OF_getencprop and return size of the property, not number of elements
allocated.
For the use cases where number of elements is preferred introduce
OF_getencprop_alloc_multi helper function that copies semantics
of OF_getencprop_alloc prior to this change.
This is to make OF_getencprop_alloc and OF_getencprop_alloc_multi
function signatures consistent with OF_getencprop_alloc and
OF_getencprop_alloc_multi.
Functionality-wise this patch is mostly rename of OF_getencprop_alloc
to OF_getencprop_alloc_multi except two calls in ofw_bus_setup_iinfo
where 1 was used as a block size.
OF_getprop_alloc takes element size argument and returns number of
elements in the property. There are valid use cases for such behavior
but mostly API consumers pass 1 as element size to get string
properties. What API users would expect from OF_getprop_alloc is to be
a combination of malloc + OF_getprop with the same semantic of return
value. This patch modifies API signature to match these expectations.
For the valid use cases with element size != 1 and to reduce
modification scope new OF_getprop_alloc_multi function has been
introduced that behaves the same way OF_getprop_alloc behaved prior to
this patch.
Reviewed by: ian, manu
Differential Revision: https://reviews.freebsd.org/D14850
Summary:
This code adds the basic infrastructure for the facility subsystem. A facility
trap is raised when an unavailable instruction is executed. One example is
executing a Hardware Transactional Memory instruction while the MSR[TM] is
disabled. In the past, there was a specific interrupt for it (FP, VEC), but the
new instructions seem to be multiplexed on this facility interrupt.
The root cause of the trap is provided on Facility Status and Control Register
(FSCR) register.
Submitted by: Breno Leitao
Reviewed by: nwhitehorn
Differential Revision: https://reviews.freebsd.org/D14566
Summary:
Print current MSR on printtrap(). Currently, printtrap just prints srr1, which
contains part of the MSR prior to the exception. I find useful to dump the
current value of the MSR, since it changes when there is an interruption.
With this patch, this is the new printtrap model:
handled user trap:
exception = 0x700 (program)
srr0 = 0x100008a0 (0x100008a0)
srr1 = 0x800000000002f032
current msr = 0x8000000000009032
lr = 0x1000089c (0x1000089c)
curthread = 0x7a50000
pid = 714, comm = ttrap2
Submitted by: Breno Leitao
Reviewed by: nwhitehorn
Differential Revision: https://reviews.freebsd.org/D14600
Summary:
It is not necessary to call isync() after calling mtmsr() function, mainly
because the mtmsr() calls 'isync' internally to synchronize the machine state
register. Other than that, isync() just calls the 'isync' instruction, thus,
the 'isync' instruction is being called twice, and that seems to be unnecessary.
This patch just remove the unecessary calls to isync() after mtmsr().
Submitted by: Breno Leitao
Differential Revision: https://reviews.freebsd.org/D14583
Summary:
Currently ofw_real_bounce_alloc() is requesting memory, using WAITOK, holding a
non-sleepable locks, called 'OF Bounce Page'.
Fix this by allocating the pages outside of the lock, and only updating the
global variables while holding the lock.
Submitted by: Breno Leitao
Differential Revision: https://reviews.freebsd.org/D14955
opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.
Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c. A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.
Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.
Reviewed by: kib, cem, jhb, jtl
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14941
TLB1 can handle ranges up to 4GB (through e5500, larger in e6500), but
ilog2() took a unsigned int, which maxes out at 4GB-1, but truncates
silently. Increase the input range to the largest supported, at least for
64-bit targets. This lets the DMAP be completely mapped, instead of only
1GB blocks with it assuming being fully mapped.
As with AIM64, map the DMAP at the beginning of the fourth "quadrant" of
memory, and move the KERNBASE to the the start of KVA.
Eventually we may run the kernel out of the DMAP, but for now, continue
booting as it has been.
assym is only to be included by other .s files, and should never
actually be assembled by itself.
Reviewed by: imp, bdrewery (earlier)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D14180
OF_finddevices returns ((phandle_t)-1) in case of failure. Some code
in existing drivers checked return value to be equal to 0 or
less/equal to 0 which is also wrong because phandle_t is unsigned
type. Most of these checks were for negative cases that were never
triggered so trhere was no impact on functionality.
Reviewed by: nwhitehorn
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D14645
When the kernel can be in real mode in early boot, we can execute from
high addresses aliased to the kernel's physical memory. If that high
address has the first two bits set to 1 (0xc...), those addresses will
automatically become part of the direct map. This reduces page table
pressure from the kernel and it sets up the kernel to be used with
radix translation, for which it has to be up here.
This is accomplished by exploiting the fact that all PowerPC kernels are
built as position-independent executables and relocate themselves
on start. Before this patch, the kernel runs at 1:1 VA:PA, but that
VA/PA is random and set by the bootloader. Very early, it processes
its ELF relocations to operate wherever it happens to find itself.
This patch uses that mechanism to re-enter and re-relocate the kernel
a second time witha new base address set up in the early parts of
powerpc_init().
Reviewed by: jhibbits
Differential Revision: D14647
accomplishes a few things:
- Makes NULL an invalid address in the kernel, which is useful for catching
bugs.
- Lays groundwork for radix-tree translation on POWER9, which requires the
direct map be at high memory.
- Similarly lays groundwork for a direct map on 64-bit Book-E.
The new base address is chosen as the base of the fourth radix quadrant
(the minimum kernel address in this translation mode) and because all
supported CPUs ignore at least the first two bits of addresses in real
mode, allowing direct-map addresses to be used in real-mode handlers.
This is required by Linux and is part of the architecture standard
starting in POWER ISA 3, so can be relied upon.
Reviewed by: jhibbits, Breno Leitao
Differential Revision: D14499
correctly for the data contained on each memory page.
There are several components to this change:
* Add a variable to indicate the start of the R/W portion of the
initial memory.
* Stop detecting NX bit support for each AP. Instead, use the value
from the BSP and, if supported, activate the feature on the other
APs just before loading the correct page table. (Functionally, we
already assume that the BSP and all APs had the same support or
lack of support for the NX bit.)
* Set the RW and NX bits correctly for the kernel text, data, and
BSS (subject to some caveats below).
* Ensure DDB can write to memory when necessary (such as to set a
breakpoint).
* Ensure GDB can write to memory when necessary (such as to set a
breakpoint). For this purpose, add new MD functions gdb_begin_write()
and gdb_end_write() which the GDB support code can call before and
after writing to memory.
This change is not comprehensive:
* It doesn't do anything to protect modules.
* It doesn't do anything for kernel memory allocated after the kernel
starts running.
* In order to avoid excessive memory inefficiency, it may let multiple
types of data share a 2M page, and assigns the most permissions
needed for data on that page.
Reviewed by: jhb, kib
Discussed with: emaste
MFC after: 2 weeks
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D14282
Add I2C OPAL driver and a set of dummy-ones to allow
all I2C things on Power8 to attach.
TODO: better async token management
Submitted by: Wojciech Macek <wma@semihalf.com>
Obtained from: Semihalf
Sponsored by: IBM, QCM Technologies
A reservation granule on PowerPC is a cache line.
On e500mc and derivatives a cacheline size is 64 bytes, not 32. Allocate
the maximum size permitted, but only utilize the size that is needed. On
e500v1 and e500v2 the reservation granule will still be 32 bytes.
These interfaces were put in place to let QorIQ SoCs dictate CPU idling
semantics, in order to support capabilities such as NAP mode and deep sleep.
However, this never stabilized, and the idling support reverted back to
CPU-level rather than SoC level. Move this code back to cpu.c instead. If
at a later date the lower power modes do come to fruition, it should be done
by overriding the cpu_idle_hook instead of this platform hook.
NVMe support is ready and should be compiled-in
to the ppc64 kernel.
Submitted by: Wojciech Macek <wma@semihalf.org>
Obtained from: Semihalf
Sponsored by: IBM, QCM Technologies
We don't support float in the boot loaders, so don't include
interfaces for float or double in systems headers. In addition, take
the unusual step of spiking double and float to prevent any more
accidental seepage.
When processor enters power-save state it releases resources shared with other
cpu threads which makes other cores working much faster.
This patch also implements saving and restoring registers that might get
corrupted in power-save state.
Submitted by: Patryk Duda <pdk@semihalf.com>
Obtained from: Semihalf
Reviewed by: jhibbits, nwhitehorn, wma
Sponsored by: IBM, QCM Technologies
Differential revision: https://reviews.freebsd.org/D14330
Add function which can store RTC values to OPAL.
Submitted by: Wojciech Macek <wma@semihalf.org>
Obtained from: Semihalf
Sponsored by: IBM, QCM Technologies
Summary:
This compartmentalizes the CPU-specific trap components into its own
function, rather than littering the general printtrap() with various checks.
This will let us replace a series of #ifdef's with a runtime conditional check
in the future.
Reviewed By: nwhitehorn
Differential Revision: https://reviews.freebsd.org/D14416
Make vm_wait() take the vm_object argument which specifies the domain
set to wait for the min condition pass. If there is no object
associated with the wait, use curthread' policy domainset. The
mechanics of the wait in vm_wait() and vm_wait_domain() is supplied by
the new helper vm_wait_doms(), which directly takes the bitmask of the
domains to wait for passing min condition.
Eliminate pagedaemon_wait(). vm_domain_clear() handles the same
operations.
Eliminate VM_WAIT and VM_WAITPFAULT macros, the direct functions calls
are enough.
Eliminate several control state variables from vm_domain, unneeded
after the vm_wait() conversion.
Scetched and reviewed by: jeff
Tested by: pho
Sponsored by: The FreeBSD Foundation, Mellanox Technologies
Differential revision: https://reviews.freebsd.org/D14384
On POWER8 architecture there is a timer with 512Mhz frequency.
It has about 1,95ns period, but it is rounded to 1ns which is not accurate.
Submitted by: Patryk Duda <pdk@semihalf.com>
Obtained from: Semihalf
Reviewed by: wma
Sponsored by: IBM, QCM Technologies
Differential revision: https://reviews.freebsd.org/D14433
Currently Hypervisor Emulation Assistance interrupt is unhandled.
Executing an undefined instruction in userland triggers kernel panic.
Handle this the same way as Facility Unavailable Interrupt - send
SIGILL signal to userspace.
Submitted by: Michal Stanek <mst@semihalf.com>
Obtained from: Semihalf
Reviewed by: nwhitehorn, pdk@semihalf.com, wma
Sponsored by: IBM, QCM Technologies
Differential revision: https://reviews.freebsd.org/D14437
zero, matching the IEEE 1275 standard. Since these internal error paths
have never, to my knowledge, been taken, behavior is unchanged.
Reported by: gonzo
MFC after: 2 weeks
This is part of a long-term goal of merging Book-E and AIM into a single GENERIC
kernel. As more work is done, the struct may be optimized further.
Reviewed by: nwhitehorn
Summary:
After revision rS328534('PPC64: use hwref instead of cpuid'), FreeBSD on
powerpc64 virtual machine panics since it is unable to read the
timebase, showing the following error:
get-property for timebase-frequency on zero phandle
panic: Unable to determine timebase frequency!
With the change above, cpuref->cr_hwref does not contain the phandle
anymore, thus, it never reads the proper CPU entry in OF.
Submitted by: Breno Leitao
Differential Revision: https://reviews.freebsd.org/D14204
significant source of cache line contention from vm_page_alloc(). Use
accessors and vm_page_unwire_noq() so that the mechanism can be easily
changed in the future.
Reviewed by: markj
Discussed with: kib, glebius
Tested by: pho (earlier version)
Sponsored by: Netflix, Dell/EMC Isilon
Differential Revision: https://reviews.freebsd.org/D14273
We don't support older compilers. Most of the code in these files is
for pre-3.0 gcc, which is at least 15 years obsolete. Move to using
phk's sys/_stdargs.h for all these platforms.
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D14323
r328113 and affecting SMP systems.
The way the time is set on PowerMacs is racy and relies on all the
CPUs in the system setting a register simultaneously in a rendezvous. A
few-cycle delay can result in out-of-sync times, which can break the
scheduler and result in calls like mtx_sleep() and pause() never timing out
if the thread is migrated while sleeping. r328113 added a call to a no-op
function between the beginning of the rendezvous and setting the time that
was only called on APs and added enough cycles to cause a problematic offset.
For some reason, the fan-management code was the first place this appeared.
Clue from: andreast
Reported by: many
global to per-domain state. Protect reservations with the free lock
from the domain that they belong to. Refactor to make vm domains more
of a first class object.
Reviewed by: markj, kib, gallatin
Tested by: pho
Sponsored by: Netflix, Dell/EMC Isilon
Differential Revision: https://reviews.freebsd.org/D14000
The L3 cache controller (Corenet Platform Cache) is listed with one of its
compatible strings as "cache", which this driver can't attach to. Restrict
to a known list of primary cache controller strings, as found in the l2cache
devicetree binding.
threads from compile-time defines to global variables. This removes a
significant amount of duplicated runtime patches to the compile-time
defines, centralizing the conditional logic in the early startup code.
Reviewed by: jhibbits
It turns out that under some circumstances we can get DSI or DSE before we set
LPCR and LPID so we should set it as early as possible.
Authored by: Patryk Duda <pdk@semihalf.com>
Submitted by: Wojciech Macek <wma@semihalf.com>
Obtained from: Semihalf
Sponsored by: IBM, QCM Technologies
On CHRP and PowerNV, use the interrupt server number in the cpuref and pcpu
hwref field instead of the device-tree phandle and make the CPU IDs reported
to the scheduler dense and with the BSP at 0.
Submitted by: Wojciech Macek <wma@semihalf.com>
Obtained from: Semihalf
Sponsored by: IBM, QCM Technologies
Differential revision: https://reviews.freebsd.org/D14011
Cleaning up AP startup routines. This is a mix of changes
required to make PowerNV running and to modify the code
to be more robust. Previously, some races were seen if more
than 90CPUs were online.
Authored by: Patryk Duda <pdk@semihalf.com>
Submitted by: Wojciech Macek <wma@semihalf.com>
Obtained from: Semihalf
Sponsored by: IBM, QCM Technologies
Differential revision: https://reviews.freebsd.org/D14026
used with hashed page tables on AIM and place it into a new, modular pmap
function called pmap_decode_kernel_ptr(). This function is the inverse
of pmap_map_user_ptr(). With POWER9 radix tables, which mapping to use
becomes more complex than just AIM/BOOKE and it is best to have it in
the same place as pmap_map_user_ptr().
Reviewed by: jhibbits
There's no reason not to build modules for 64-bit QorIQ devices. This
config has evolved to be analogous to the AIM GENERIC64 kernel, so will grow
to match it in more ways as well.
Summary:
Rather than duplicating the checks for programmatic traps all over the code, put
it all in one function. This helps to remove some of the #ifdefs between AIM
and Book-E.
Reviewed By: nwhitehorn
Differential Revision: https://reviews.freebsd.org/D14082
In a corner case we could fall into OOB error.
Authored by: Patryk Duda <pdk@semihalf.com>
Submitted by: Wojciech Macek <wma@semihalf.com>
Obtained from: Semihalf
Sponsored by: IBM, QCM Technologies
MSI/MSI-x interrupts are edge-triggered. If an interrupt
arrives when IRQ line is masked, it will be lost and will
never recover. Perform MSI_EOI always after unmask to give
a chance for PHB/XICS to send an interrupt again if MSI/MSI-x
pending bit is set in MSI/MSI-x BAR space.
Submitted by: Wojciech Macek <wma@semihalf.org>
Obtained from: Semihalf
Sponsored by: IBM, QCM Technologies
Commits r326203 and r326978 broke 64-bit booke kernels by introducing a 1MB
zero-pad between the ELF header and the start of the kernel. This didn't
cause a build failure, but caused kernels to need to be loaded into memory
1MB lower, which could easily break scripts expecting previous behavior.
This change matches the similar change made to AIM in r327358.
Uses of mallocarray(9).
The use of mallocarray(9) has rocketed the required swap to build FreeBSD.
This is likely caused by the allocation size attributes which put extra pressure
on the compiler.
Given that most of these checks are superfluous we have to choose better
where to use mallocarray(9). We still have more uses of mallocarray(9) but
hopefully this is enough to bring swap usage to a reasonable level.
Reported by: wosch
PR: 225197
kernel by PHYS_TO_DMAP() as previously present on amd64, arm64, riscv, and
powerpc64. This introduces a new MI macro (PMAP_HAS_DMAP) that can be
evaluated at runtime to determine if the architecture has a direct map;
if it does not (or does) unconditionally and PMAP_HAS_DMAP is either 0 or
1, the compiler can remove the conditional logic.
As part of this, implement PHYS_TO_DMAP() on sparc64 and mips64, which had
similar things but spelled differently. 32-bit MIPS has a partial direct-map
that maps poorly to this concept and is unchanged.
Reviewed by: kib
Suggestions from: marius, alc, kib
Runtime tested on: amd64, powerpc64, powerpc, mips64
In platform_smp_ap_init we are doing some crucial code (eg. set LPCR register)
which have influence over further execution.
Practiculary in PowerNV platform we have experienced Data Storage Interrupt
before we set apropriate LPCR. It caused code execution from location which was
legal in bootloader (petitboot based on linux) but illegal in FreeBSD
Set stack pointer to correct value after thread's stack pointer restore
Restoring new thread's stack pointer caused stack corruption because
restored stack pointer didn't point to callee (cpu_switch) stack frame but
caller stack frame.
As a result we had mysterious errors in caller function (sched_switch).
Solution: simply set stack pointer to correct value
Also, initialize TOC to a valid pointer once the thread is being
created.
Created by: Patryk Duda <pdk@semihalf.com>
Submitted by: Wojciech Macek <wma@semihalf.com>
Obtained from: Semihalf
Reviewed by: nwhitehorn
Differential revision: https://reviews.freebsd.org/D13947
Sponsored by: QCM Technologies
Zero BSS always. The only case when this operation is
ommitted is when booting on BookE.
Created by: Wojciech Macek <wma@semihalf.com>
Obtained from: Semihalf
Reviewed by: imp, nwhitehorn
Differential revision: https://reviews.freebsd.org/D13948
Sponsored by: QCM Technologies
Add missing little-endian 64-bit read and write. Since there
is no direct ASM opcode for this, perform byte swap if
necessary.
Created by: Wojciech Macek <wma@semihalf.com>
Obtained from: Semihalf
Sponsored by: QCM Technologies
Use current userspace address for segment mapping. Previously,
there was a bug which made the funciton constantly using the userspace
base address which could cause data integrity issues.
Created by: Wojciech Macek <wma@semihalf.com>
Obtained from: Semihalf
Sponsored by: QCM Technologies
Add CXGBE driver which is required for PowerNV system.
Also, remove AHCI which does not work in BigEndian.
Created by: Wojciech Macek <wma@semihalf.com>
Obtained from: Semihalf
Sponsored by: QCM Technologies
FreeBSD prints text char-by-char, which is not what OPAL
is designed to. Poll events more frequently to avoid buffer
overflow and loosing data.
Created by: Wojciech Macek <wma@semihalf.com>
Obtained from: Semihalf
Sponsored by: QCM Technologies
Fixes:
- map all devices to PE0
- use 1:1 TCE mapping
- provide the same TCE mapping for all PEs (not only PE0)
- add TCE reset and alignment (required by OPAL)
Created by: Wojciech Macek <wma@semihalf.com>
Obtained from: Semihalf
Sponsored by: QCM Technologies
Make XICS to be OPAL-aware.
Created by: Nathan Whitehorn <nwhitehorn@freebsd.org>
Submitted by: Wojciech Macek <wma@semihalf.com>
Sponsored by: FreeBSD Foundation
P1022 SATA controller may set the wrong CCR bit for a command completion.
This would previously cause an interrupt storm. Solve this by marking all
commands complete, and letting the end_transaction deal with the successes.
Causes no problems on P5020.
While here, fix a minor bug in collision detection. The Freescale SATA
controller only has 16 slots, not 32.
Focus on code where we are doing multiplications within malloc(9). None of
these ire likely to overflow, however the change is still useful as some
static checkers can benefit from the allocation attributes we use for
mallocarray.
This initial sweep only covers malloc(9) calls with M_NOWAIT. No good
reason but I started doing the changes before r327796 and at that time it
was convenient to make sure the sorrounding code could handle NULL values.
X-Differential revision: https://reviews.freebsd.org/D13837
to which it is specific, rather than in the generic AIM startup code. This
will be required to support the radix-table-based MMU introduced with POWER9.
buffers into a new pmap-module function pmap_map_user_ptr() that can
be implemented by the respective modules. This is required to implement
non-segment-based AIM-ish MMU systems such as the radix-tree page tables
introduced by POWER ISA 3.0 and present on POWER9.
Reviewed by: jhibbits
using a new macro PHYS_TO_DMAP, which deliberately has the same name as the
equivalent macro on amd64. This also sets the stage for moving the direct
map to another base address.
Some PowerQUICC and QorIQ platforms have a L2 cache managed via the
memory-mapped configuration registers, and appear as a node in the device
tree. This adds basic support to enable the cache.
allocated with a tag to come from the specified domain if it meets the
other constraints provided by the tag. Automatically create a tag at
the root of each bus specifying the domain local to that bus if
available.
Reviewed by: jhb, kib
Tested by: pho
Sponsored by: Netflix, Dell/EMC Isilon
Differential Revision: https://reviews.freebsd.org/D13545
domains can be done by the _domain() API variants. UMA also supports a
first-touch policy via the NUMA zone flag.
The slab layer is now segregated by VM domains and is precise. It handles
iteration for round-robin directly. The per-cpu cache layer remains
a mix of domains according to where memory is allocated and freed. Well
behaved clients can achieve perfect locality with no performance penalty.
The direct domain allocation functions have to visit the slab layer and
so require per-zone locks which come at some expense.
Reviewed by: Attilio (a slightly older version)
Tested by: pho
Sponsored by: Netflix, Dell/EMC Isilon
Provide initial support for PCIe host controller as
well as for IOMMU mapping. This commit allows proper
bus enumeration, but does not guarantee DMA operations
are working.
Created by: Nathan Whitehorn <nwhitehorn@freebsd.org>
Submitted by: Wojciech Macek <wma@semihalf.com>
Sponsored by: FreeBSD Foundation
Avoid the lock in vtophys() by providing a static direct-mapped
spinlock- protected output buffer to use when the console driver
cannot acquire locks for some reason. This allows the idle thread
to use printf() (e.g. the SMP startup messages) without crashing
the kernel.
Created by: Nathan Whitehorn <nwhitehorn@freebsd.org>
Submitted by: Wojciech Macek <wma@freebsd.org>
Sponsored by: FreeBSD Foundation
Make sure to set LPCR[LPES] so that external interrupts set SRR0 and SRR1
instead of HSRR0 and HSRR1. Without this, external interrupt handlers would
get the wrong MSR value when executing, causing eventual madness.
Created by: Nathan Whitehorn <nwhitehorn@freebsd.org>
Submitted by: Wojciech Macek <wma@freebsd.org>
Sponsored by: FreeBSD Foundation
Fix AP startup, which was broken.
Created by: Nathan Whitehorn <nwhitehorn@freebsd.org>
Submitted by: Wojciech Macek <wma@freebsd.org>
Sponsored by: FreeBSD Foundation
Add basic power control (reset, power off) and bind
ttyuX to opal console so that init will start login there.
Created by: Nathan Whitehorn <nw@freebsd.org>
Submitted by: Wojciech Macek <wma@freebsd.org>
Sponsored by: FreeBSD Foundation
OPAL is a dedicated firmware acting as a hypervisor.
Add generic functions to provide all access.
Created by: Nathan Whitehorn <nw@freebsd.org>
Submitted by: Wojciech Macek <wma@freebsd.org>
- Call resource_int_value() once during attach, rather than within the
pci_(read|write)_config() code path; this avoids taking a blocking mutex
to read kenv variables.
- Use a spin lock to protect non-atomic config space accesses; this matches
the behavior of Darwin's AppleMacRiscPCI driver.
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D13839
supported on newer POWER hardware and in graphical VMs run on the same,
which are typically XHCI-only. The 32-bit GENERIC kernel, which
does not run on hardware made in the last decade and is unlikely to
encounter XHCI devices, is left unchanged.
PR: kern/224940
Submitted by: Gustavo Romero
MFC after: 1 week
The data segement was too big.
Add a fix-up function like on ia32 for MAXDSIZ.
While here, bring also the MAXSSIZ closer to amd64 and add an equal fix-up
function for MAXSSIZ.
Reviewed by: jhibbits@
Obtained from: jhibbits@
Differential Revision: https://reviews.freebsd.org/D13753
is of limited utility outside of platform-specific code and can vary
at runtime when running as a hypervisor guest, so does not even have the
virtue of being a static identifier.
Reviewed by: jhibbits
instead of hard-coding a default. This information is passed implicitly by
the PS3 firmware and can be relied upon. Also adjust the default mode, if
somehow firmware doesn't pass one, to 1920x1080 from 720x480 since it is
2017.
MFC after: 2 weeks
- Densely number CPUs to avoid systems with CPUs with very high ID numbers
- Always have the BSP be CPU 0 to avoid remnant brokenness with non-0 BSPs
in other parts of the kernel.
- Improve parsing of the device tree CPU listings on SMT systems.
- Allow reboot via RTAS as well as OF for pSeries systems booted by FDT
without functioning Open Firmware.
Obtained from: projects/powernv
MFC after: 3 weeks
is used as the bootloader on a number of PPC64 platforms. This involves the
following pieces:
- Making the first instruction a valid kernel entry point, since kexec
ignores the ELF entry value. This requires a separate section and linker
magic to prevent the linker from filling the beginning of the section
with stubs.
- Adding an entry point at 0x60 past the first instruction for systems
lacking firmware CPU shutdown support (notably PS3).
- Linker script changes to support the above.
MFC after: 1 month
If these are not aligned, the linker has to emit a different type of
relocation that the early boot self-relocation code cannot handle, even
in principle, resulting in them being set to zero and the kernel crashing.
MFC after: 1 week
draft of a never-finalized standard (CHRP) and is irrelevant in practice
on FreeBSD since we load the kernel with loader(8) on Open Firmware
platforms anyway. Moreover, loader(8), which is directly loaded by Open
Firmware, has never had an equivalent note.
MFC after: 2 weeks
the 32-bit cookie can be sign-extended on its way out of the loader and
through Open Firmware. If sign-extended, the in-kernel check of its value
would fail on 64-bit systems, resulting in a mountroot prompt. Solve this
by telling the kernel to ignore the high-order bits.
PR: kern/224437
Submitted by: Gustavo Romero
They provide relaxed-ordered atomic access semantic. Due to the
FreeBSD memory model, the operations are syntaxical wrappers around
the volatile accesses. The volatile qualifier is used to ensure that
the access not optimized out and in turn depends on the volatile
semantic as implemented by supported compilers.
The motivation for adding the operation is to help people coming from
other systems or knowing the C11/C++ standards where atomics have
special type and require use of the special access operations. It is
still the case that FreeBSD requires plain load and stores of aligned
integer types to be atomic.
Suggested by: jhb
Reviewed by: alc, jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D13534
Currently Facility Unavailable is absent and once an application
tries to use or access a register from a feature disabled in the
CPU it causes a kernel panic.
A simple test-case is:
int main() { asm volatile ("tbegin.;"); }
which will use TM (Hardware Transactional Memory) feature which
is not supported by the kernel and so will trigger the following
kernel panic:
----
fatal user trap:
exception = 0xf60 (unknown)
srr0 = 0x10000890
srr1 = 0x800000000000f032
lr = 0x100004e4
curthread = 0x5f93000
pid = 1021, comm = htm
panic: unknown trap
cpuid = 40
KDB: stack backtrace:
Uptime: 3m18s
Dumping 10 MB (3 chunks)
chunk 0: 11MB (2648 pages) ... ok
chunk 1: 1MB (24 pages) ... ok
chunk 2: 1MB (2 pages)panic: IOMMU mapping error: -4
cpuid = 40
Uptime: 3m18s
----
Since Hardware Transactional Memory is not yet supported by FreeBSD, treat
this as an illegal instruction.
PR: 224350
Submitted by: Gustavo Romero <gromero_AT_ibm_DOT_com>
MFC after: 2 weeks
Without the identifier in the list booting FreeBSD results in printing the
following (from a PowerKVM boot):
cpu0: Unknown PowerPC CPU revision 0x1201, 2550.00 MHz
For now, add the same feature list as POWER8. As new capabilities are added to
support POWER9 specific features, they will be added to this.
PR: 224344
Submitted by: Breno Leitao <breno_DOT_leitao_AT_gmail_DOT_com>
Decode on Book-E:
* ESR (Exception Syndrome Register)
* MCSR (Machine Check Status Register)
On AIM:
* MSSSR (Memory Subsystem Status Register)
Makes it easier to tell at a glance the type of trap and machine check
conditions now.
The DTrace fasttrap entry points expect a struct reg containing the
register values of the calling thread. Perform the conversion in
fasttrap rather than in the trap handler: this reduces the number of
ifdefs and avoids wasting stack space for traps that don't involve
DTrace.
MFC after: 2 weeks
pmap_track_page() only works with physical memory pages, which have a
constant vm_page_t address. Microoptimize pmap_track_page() to perform one
less operation under the lock.
This was done in 32-bit mode, but not duplicated when 64-bit mode was
brought in. Without this, stale mappings can be left, leading to odd
crashes when the wrong VA is checked in XX_PhysToVirt() (dpaa(4)).
This variable should be pure MI except possibly for reading it in MD
dump routines. Its initialization was pure MD in 4.4BSD, but FreeBSD
changed this in r36441 in 1998. There were many imperfections in
r36441. This commit fixes only a small one, to simplify fixing the
others 1 arch at a time. (r47678 added support for
special/early/multiple message buffer initialization which I want in
a more general form, but this was too fragile to use because hacking
on the msgbufp global corrupted it, and was only used for 5 hours in
-current...)
The Display Interface Unit (DIU) uses main memory for the framebuffer, which
is already mapped as cache coherent physical memory. Prevent mmap() from
using its own attributes which may otherwise conflict.