This is a general cleanup of the relocatable kernel support on powerpc,
needed to enable kernel ifuncs.
* Fix some relocatable issues in the kernel linker, and change to using
a RELOCATABLE_KERNEL #define instead of #ifdef __powerpc__ for parts that
other platforms can use in the future if they wish to have ET_DYN kernels.
* Get rid of the DB_STOFFS hack now that the kernel is relocated to the DMAP
properly across the board on powerpc64.
* Add powerpc64 and powerpc32 ifunc functionality.
* Allow AIM64 virtual mode OF kernels to run from the DMAP like other AIM64
by implementing a virtual mode restart. This fixes the runtime address on
PowerMac G5.
* Fix symbol relocation problems on post-relocation kernels by relocating
the symbol table.
* Add an undocumented method for supplying kernel symbols on powernv and
other powerpc machines using linux-style kernel/initrd loading -- If
you pass the kernel in as the initrd as well, the copy resident in initrd
will be used as a source for symbols when initializing the debugger.
This method is subject to removal once we have a better way of doing this.
Approved by: jhibbits
Relnotes: yes
Sponsored by: Tag1 Consulting, Inc.
Differential Revision: https://reviews.freebsd.org/D23156
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.
Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT
Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718
Summary:
Consolidate the NUMA associativity handling into a platform function.
Non-NUMA platforms will just fall back to the default (0). Currently
only implemented for powernv, which uses a lookup table to map the
device tree associativity into a system NUMA domain.
Fixes hangs on powernv after r356534, and corrects a fairly longstanding
bug in powernv's NUMA handling, which ended up using domains 1 and 2 for
devices and memory on power9, while CPUs were bound to domains 0 and 1.
Reviewed by: bdragon, luporl
Differential Revision: https://reviews.freebsd.org/D23220
Due to a bug in clang 9.0.0 source tracking, the trap vector copying will
always trigger a fortify-source warning.
The destination buffers are 0x2f00 bytes, and the bcopy region is 0x2e00
bytes, so there is not an overflow here.
(I have been running with this patch since September.)
As far as I can tell, these are an artifact of times when linker sets
couldn't be empty, otherwise the kernel build would fail due to unresolved
symbols. hselasky fixed this in r268138, and I've audited the kbd portions
to make sure nothing would blow up due to the empty linker set and
successfully compiled+ran a kernel with no keyboard support at all.
Kill them off now since they're no longer required.
MFC after: 1 week
This change enables the use of OpenFirmware Console (ofwcons), even when VGA is
available, allowing early kernel messages to be seen, that is important in case
of crashes before VGA console initialization.
This is specially useful in virtualized environments, where the user/developer
doesn't have full control of the virtualization engine (e.g. OpenStack).
The old behavior is preserved by default and, in order to use ofwcons, a few
tunables that have been introduced need to be set:
- hw.ofwfb.disable=1 - disable OFW FrameBuffer device
- machdep.ofw.mtx_spin=1 - change PPC OFW mutex to SPIN type, to match kernel
console's mutex type
- debug.quiesce_ofw=0 - don't call OFW quiesce, needed to keep ofwcons I/O
working
More details can be found at differential revision D20640.
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D20640
Freescale SoCs use a set of IRQs at the high end of the OpenPIC IRQ
list, not counted in the NIRQs of the Feature reporting register. Some
SoCs include a MSI inbound window in the PCIe controller configuration
registers as well, but some don't. Currently, this only handles the
SoCs *with* the MSI window.
There are 256 MSIs per MSI bank (32 per MSI IRQ, 8 IRQs per MSI bank).
The P5020 has 3 banks, yielding up to 768 MSIs; older SoCs have only one
bank.
On POWER8 systems with only one memory domain, the "ibm,associativity"
number that corresponds to it is 0, unlike POWER9 systems with two
or more domains, in which the minimum value is 1.
In POWER8 case, subtracting 1 causes an underflow on the unsigned domain
variable and a subsequent index out-of-bounds access.
Reviewed by: jhibbits
Tested by: bdragon, luporl
The last few changes needed before 32-bit AIM builds with secure-PLT with
base GCC. Because ofwcall32.S and swtch32.S were branching to the GOT it
could not use secure PLT.
Before this change, OFW initrd (as md) handling code was simulating an ofwbus
device. But as there isn't really a Device Tree (DT) node representing OFW
initrd (it is specified in 2 properties under /chosen), its driver was in fact
stealing other driver's DT node. This was noticed after MD_ROOT_MEM became
default and QEMU's USB keyboard stopped working under VNC.
This change consists in simplifying the process of detection and mapping of
initrd memory, turning it into a simple startup step, instead of trying to
simulate a device.
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D20553
This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h"
in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces header
pollution substantially.
EVENTHANDLER_DECLARE and EVENTHANDLER_LIST_DECLAREs were moved out of .c
files into appropriate headers (e.g., sys/proc.h, powernv/opal.h).
As a side effect of reduced header pollution, many .c files and headers no
longer contain needed definitions. The remainder of the patch addresses
adding appropriate includes to fix those files.
LOCK_DEBUG and LOCK_FILE_LINE_ARG are moved to sys/_lock.h, as required by
sys/mutex.h since r326106 (but silently protected by header pollution prior
to this change).
No functional change (intended). Of course, any out of tree modules that
relied on header pollution for sys/eventhandler.h, sys/lock.h, or
sys/mutex.h inclusion need to be fixed. __FreeBSD_version has been bumped.
Summary:
Initial NUMA support:
- associate CPU with domain
- associate memory ranges with domain
- identify domain for devices
- limit device interrupt binding to appropriate domain
- Additionally fixes a bug in the setting of Maxmem which led to
only memory attached to the first socket being enabled for DMA
A pmap variant can opt in to numa support by by calling `numa_mem_regions`
at the end of pmap_bootstrap - registering the corresponding ranges with the
VM.
This yields a ~20% improvement in build times of llvm on dual socket POWER9
over non-NUMA.
Original patch by mmacy.
Differential Revision: https://reviews.freebsd.org/D17933
Summary:
With a sufficiently large TOC, it's possible to index out of range, as
the immediate load instructions only permit 16-bit indices, allowing up
to 64kB range (signed) from the base pointer. Allow +/- 2GB range, with
the medium code model TOC accesses in asm.
Patch originally by Brandon Bergren. The issue appears to impact ELFv2
more than ELFv1.
Reviewed by: luporl
Differential Revision: https://reviews.freebsd.org/D19708
The QorIQ SoCs don't actually support multicast interrupts, and the
references state explicitly that multicast is undefined behavior. Avoid the
undefined behavior by binding to only a single CPU, using a quirk to
determine if this is necessary.
MFC after: 3 weeks
Mark some buses as BUS_PASS_BUS, and some resources as BUS_PASS_RESOURCE.
This also decouples some resource attachment orderings from being races by
device tree ordering, instead relying on the bus pass to provide the
ordering.
This was originally intended to support multipass suspend/resume, but it's
also needed on PowerMacs when using fdt, as the device tree seems to get
created in reverse of the OFW tree.
Reviewed by: nwhitehorn (long ago)
Differential Revision: https://reviews.freebsd.org/D918
The PHB4 host bridge used by the POWER9 uses a 64kB range in 32-bit
space at the address 0xffff0000-0xffffffff. Reserve this range so that
DMA memory cannot be allocated within this range. This fixes seemingly
random crashes on a POWER9 system. Ideally this range will have been
reserved by the firmware, but as of now this is not the case.
Submitted by: git_bdragon.rtk0.net
Reviewed by: nwhitehorn
Approved by: re(kib)
Differential Revision: https://reviews.freebsd.org/D17183
This is an OFW initrd module that would load the initrd from device tree
parameters and give the to the md driver.
With this patch, it is possible to pass a rootfs image through kexec in PowerNV
mode (powerpc64). In order to user it, you should set the MD_ROOT_MEM option in
your kernel configuration.
Reviewed by: jhibbits
Approved by: jhibbits (mentor)
Differential Revision: https://reviews.freebsd.org/D15705
Changed excise_initrd_region to support both 32- and 64-bit
values for linux,initrd-start and linux,initrd-end.
This fixes the boot problem on some machines after rS334485.
Submitted by: Luis Pires <lffpires@ruabrasil.org>
Reviewed by: jhibbits, leitao
Approved by: jhibbits (mentor)
Differential Revision: https://reviews.freebsd.org/D15667
Currently kexec loads an initrd file into the main memory but does not
mark that region as reserved, thus the area is not protected.
If any initrd/md file is loaded from kexec/petitboot, the region might become
corarupted/overwritten since FreeBSD does not know the region is 'reserved'.
This patch simply adds the initrd area as a reserved memory region.
Approved by: jhibbits
Differential Revision: https://reviews.freebsd.org/D15610
On some POWER9 systems, 'reg' denotes the full memory in the system, while
'linux,usable-memory' denotes the usable memory. Some memory is reserved for
NVLink usage, so is partitioned off.
Submitted by: Breno Leitao
Discussing with others, this needs to be at least 20 to boot on some POWER9
nodes. Linux made a similar change for the same reason, so increase to 32
to give us some extra breathing room as well. The input and output arrays
are sized at 256, so much greater than the increase in the property array
size.
Summary:
Currently ofw_real_bounce_alloc() is requesting memory, using WAITOK, holding a
non-sleepable locks, called 'OF Bounce Page'.
Fix this by allocating the pages outside of the lock, and only updating the
global variables while holding the lock.
Submitted by: Breno Leitao
Differential Revision: https://reviews.freebsd.org/D14955
When the kernel can be in real mode in early boot, we can execute from
high addresses aliased to the kernel's physical memory. If that high
address has the first two bits set to 1 (0xc...), those addresses will
automatically become part of the direct map. This reduces page table
pressure from the kernel and it sets up the kernel to be used with
radix translation, for which it has to be up here.
This is accomplished by exploiting the fact that all PowerPC kernels are
built as position-independent executables and relocate themselves
on start. Before this patch, the kernel runs at 1:1 VA:PA, but that
VA/PA is random and set by the bootloader. Very early, it processes
its ELF relocations to operate wherever it happens to find itself.
This patch uses that mechanism to re-enter and re-relocate the kernel
a second time witha new base address set up in the early parts of
powerpc_init().
Reviewed by: jhibbits
Differential Revision: D14647
accomplishes a few things:
- Makes NULL an invalid address in the kernel, which is useful for catching
bugs.
- Lays groundwork for radix-tree translation on POWER9, which requires the
direct map be at high memory.
- Similarly lays groundwork for a direct map on 64-bit Book-E.
The new base address is chosen as the base of the fourth radix quadrant
(the minimum kernel address in this translation mode) and because all
supported CPUs ignore at least the first two bits of addresses in real
mode, allowing direct-map addresses to be used in real-mode handlers.
This is required by Linux and is part of the architecture standard
starting in POWER ISA 3, so can be relied upon.
Reviewed by: jhibbits, Breno Leitao
Differential Revision: D14499
zero, matching the IEEE 1275 standard. Since these internal error paths
have never, to my knowledge, been taken, behavior is unchanged.
Reported by: gonzo
MFC after: 2 weeks
draft of a never-finalized standard (CHRP) and is irrelevant in practice
on FreeBSD since we load the kernel with loader(8) on Open Firmware
platforms anyway. Moreover, loader(8), which is directly loaded by Open
Firmware, has never had an equivalent note.
MFC after: 2 weeks
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
There's no need to special case 32-bit AIM to short circuit processing.
Some AIM CPUs can handle 36 bit addresses, and 64-bit CPUs can run 32-bit
OSes, so this will allow us to expand for that in the future if we desire.
The MFC will include a compat definition of smp_no_rendevous_barrier()
that calls smp_no_rendezvous_barrier().
Reviewed by: gnn, kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D10313
===
From Nathan Whitehorn:
Open Firmware runs in virtual mode on the Powermac G5. This runs inside the
kernel page table, which preserves all address translations made by OF before
the kernel starts; as a result, the kernel address space is a strict superset of
OF's.
Where this explodes is if OF uses an unmapped SLB entry. The SLB fault handler
runs in real mode and refers to the PCPU pointer in SPRG0, which blows up the
kernel. Having a value of SPRG0 that works for the kernel is less fatal than
preserving OF's value in this case.
===
The result of this is seemingly random panics from NULL dereferences, or hangs
immediately upon boot. By not restoring SPRG0 for Open Firmware entry the
kernel PCPU pointer is preserved and SLB faults are successful, resulting in a
stable kernel.
PR: 205458
Reported by: several (over bugzilla, lists, IRC)
Reviewed by: andreast
Tested by: many (various forms)
MFC after: 2 weeks
Convert PCIe hot plug support over to asking the firmware, if any, for
permission to use the HotPlug hardware. Implement pci_request_feature
for ACPI. All other host pci connections to allowing all valid feature
requests.
Sponsored by: Netflix
Summary:
If the environment variable is set, U-boot adds a 'bootargs' property to
/chosen. This is already handled by ARM and MIPS, but should be handled in a
central location. For now, ofw_subr.c is a good place until we determine if it
should be moved to init_main.c, or somewhere more central to all architectures.
Eventually arm and mips should be modified to use ofw_parse_bootargs() as well,
rather than using the duplicate code already.
Reviewed By: adrian
Differential Revision: https://reviews.freebsd.org/D7846
This allows the PCI-PCI bridge driver to save a reference to the child
device in its softc.
Note that this required moving the "pci" device creation out of
acpi_pcib_attach(). Instead, acpi_pcib_attach() is renamed to
acpi_pcib_fetch_prt() as it's sole action now is to fetch the PCI
interrupt routing table.
Differential Revision: https://reviews.freebsd.org/D6021
Rescanning a PCI bus uses the following steps:
- Fetch the current set of child devices and save it in the 'devlist'
array.
- Allocate a parallel array 'unchanged' initalized with NULL pointers.
- Scan the bus checking each slot (and each function on slots with a
multifunction device).
- If a valid function is found, look for a matching device in the 'devlist'
array. If a device is found, save the pointer in the 'unchanged' array.
If a device is not found, add a new device.
- After the scan has finished, walk the 'devlist' array deleting any
devices that do not have a matching pointer in the 'unchanged' array.
- Finally, fetch an updated set of child devices and explicitly attach any
devices that are not present in the 'unchanged' array.
This builds on the previous changes to move subclass data management into
pci_alloc_devinfo(), pci_child_added(), and bus_child_deleted().
Subclasses of the PCI bus use custom rescan logic explicitly override the
rescan method to disable rescans.
Differential Revision: https://reviews.freebsd.org/D6018
The ACPI and OFW PCI bus drivers as well as CardBus override this to
allocate the larger ivars to hold additional info beyond the stock PCI ivars.
This removes the need to pass the size to functions like pci_add_iov_child()
and pci_read_device() simplifying IOV and bus rescanning implementations.
As a result of this and earlier changes, the ACPI PCI bus driver no longer
needs its own device_attach and pci_create_iov_child methods but can use
the methods in the stock PCI bus driver instead.
Differential Revision: https://reviews.freebsd.org/D5891
Instead of providing a wrapper around device_delete_child() that the PCI
bus and child bus drivers must call explicitly, move the bulk of the logic
from pci_delete_child() into a bus_child_deleted() method
(pci_child_deleted()). This allows PCI devices to be safely deleted via
device_delete_child().
- Add a bus_child_deleted method to the ACPI PCI bus which clears the
device_t associated with the corresponding ACPI handle in addition to
the normal PCI bus cleanup.
- Change cardbus_detach_card to call device_delete_children() and move
CardBus-specific delete logic into a new cardbus_child_deleted() method.
- Use device_delete_child() instead of pci_delete_child() in the SRIOV code.
- Add a bus_child_deleted method to the OpenFirmware PCI bus drivers which
frees the OpenFirmware device info for each PCI device.
Reviewed by: imp
Tested on: amd64 (CardBus and PCI-e hotplug)
Differential Revision: https://reviews.freebsd.org/D5831
On some architectures, u_long isn't large enough for resource definitions.
Particularly, powerpc and arm allow 36-bit (or larger) physical addresses, but
type `long' is only 32-bit. This extends rman's resources to uintmax_t. With
this change, any resource can feasibly be placed anywhere in physical memory
(within the constraints of the driver).
Why uintmax_t and not something machine dependent, or uint64_t? Though it's
possible for uintmax_t to grow, it's highly unlikely it will become 128-bit on
32-bit architectures. 64-bit architectures should have plenty of RAM to absorb
the increase on resource sizes if and when this occurs, and the number of
resources on memory-constrained systems should be sufficiently small as to not
pose a drastic overhead. That being said, uintmax_t was chosen for source
clarity. If it's specified as uint64_t, all printf()-like calls would either
need casts to uintmax_t, or be littered with PRI*64 macros. Casts to uintmax_t
aren't horrible, but it would also bake into the API for
resource_list_print_type() either a hidden assumption that entries get cast to
uintmax_t for printing, or these calls would need the PRI*64 macros. Since
source code is meant to be read more often than written, I chose the clearest
path of simply using uintmax_t.
Tested on a PowerPC p5020-based board, which places all device resources in
0xfxxxxxxxx, and has 8GB RAM.
Regression tested on qemu-system-i386
Regression tested on qemu-system-mips (malta profile)
Tested PAE and devinfo on virtualbox (live CD)
Special thanks to bz for his testing on ARM.
Reviewed By: bz, jhb (previous)
Relnotes: Yes
Sponsored by: Alex Perez/Inertial Computing
Differential Revision: https://reviews.freebsd.org/D4544