Commit Graph

2916 Commits

Author SHA1 Message Date
jhibbits
45450d16d7 Included VSX registers in powerpc core dumps
Summary: Included VSX registers in powerpc core dumps (both kernel and gcore)

Submitted by:	Luis Pires
Differential Revision: https://reviews.freebsd.org/D15512
2018-06-02 20:28:58 +00:00
jhibbits
729cc88801 Added ptrace support for reading/writing powerpc VSX registers
Summary:
Added ptrace support for getting/setting the remaining part of the VSX registers
(the part that's not already covered by FPR or VR registers).

This is necessary to add support for VSX registers in debuggers.

Submitted by:	Luis Pires
Differential Revision: https://reviews.freebsd.org/D15458
2018-06-02 19:17:11 +00:00
jhibbits
47bc3a3d85 Increase powerpc64 KVA from ~7.25GB to 32GB
This will let us use much more KVA for ZFS ARC where needed.  This may be
incresed in the future if memory requirements increase.

Discussed with:	nwhitehorn
2018-06-01 21:37:20 +00:00
jhibbits
e2d69f7ac9 Unbreak 32-bit binaries on powerpc64
Recently a change was made which broke loading 32-bit binaries on powerpc64,
with an assertion in ld-elf32.so.1:

ld-elf32.so.1: assert failed:
/usr/local/poudriere/jails/ppc64/usr/src/libexec/rtld-elf/rtld.c:390

It turns out Elf32_AuxInfo was broken for a very long time on powerpc64, as
it uses long and pointers, which are both 64 bits on powerpc64, and only
manifested with the recent work on auxargs.
2018-06-01 16:31:05 +00:00
leitao
bc8b83fde2 powerpc64: Avoid overwriting initrd area
Currently kexec loads an initrd file into the main memory but does not
mark that region as reserved, thus the area is not protected.

If any initrd/md file is loaded from kexec/petitboot, the region might become
corarupted/overwritten since FreeBSD does not know the region is 'reserved'.

This patch simply adds the initrd area as a reserved memory region.

Approved by: jhibbits
Differential Revision: https://reviews.freebsd.org/D15610
2018-06-01 12:43:13 +00:00
jhibbits
595b3a9fc7 Remove a debug printf from opal_pci driver 2018-05-31 04:11:40 +00:00
jhibbits
6dcf88b73b Make opal_pci driver work with POWER9
Summary:
Coupled with r334365, this makes PCI work on POWER9.  There is still more to
do to fully exploit the hardware capabilities, but this is sufficient to
enable USB and ethernet controllers on a POWER9 Talos II system.

Reviewed by:	nwhitehorn, leitao
Differential Revision: https://reviews.freebsd.org/D15566
2018-05-30 03:00:57 +00:00
jhibbits
018b267a7c Cache the phandle of the PCI node in opal_pci_attach
Simple cleanup, no functional change.  This is related to the fixups needed
for POWER9 support.
2018-05-30 02:47:23 +00:00
jhibbits
b12054557d Make ALT_BREAK_TO_DEBUGGER work with OPAL console
Match other consoles by using the higher level cngetc() in the interrupt
handler, so that kdb_alt_break() can check for console break.
2018-05-28 01:59:48 +00:00
jhibbits
62df87828b Print the full-width pointer values in hex.
PRI0ptrX is used to print a zero-padded hex value of the architecture's bitness,
so on 64-bit architectures it'll print the full 64 bit address.
2018-05-28 00:19:08 +00:00
jhibbits
a71068739b Match style of the other prototypes, and don't name the argument. 2018-05-27 20:36:43 +00:00
jhibbits
4960bbbc3d Stop idle threads on power9 in the idle task until an interrupt.
This reduces the CPU cycle wastage on power9, which is SMT4.  Any idle
thread that's spinning is simply starving working threads on the same core
of valuable resources.

This can be reduced further by taking more advantage of the PSSCR supported
states, as well as permitting state loss, as is currently done for power8.
The currently implemented stop state is the lowest latency, which may still
consume resources.
2018-05-27 20:24:24 +00:00
jhibbits
c1d69fb7e4 On POWER9 clear the HID0_RADIX before enabling the page tables
POWER9 supports Radix page tables in addition to Hashed page tables.  When
Radix page tables are in use, the TLB is cut in half, so that half of the
TLB is used for the page walk cache.  This is the default behavior, however
FreeBSD currently does not support Radix tables.  Clear this bit so that we
can use the full TLB.  Do this in the MMU logic so that configuration can be
localized to the specific translation format.  Once we do support Radix
tables, the setup for that will be localized to the Radix MMU kobj.
2018-05-26 04:33:19 +00:00
jhibbits
7068814771 Fix a typo missed in r334232 2018-05-26 04:24:25 +00:00
jhibbits
96a9723c2a Correct a typo for opal temperature sensor type constant 2018-05-26 02:45:41 +00:00
jhibbits
d5eeddd249 Only crop the VPN on POWER4 and derivatives for TLBIE operations
Summary:
PowerISA 2.03 and later require bits 14:65 in the RB register argument,
which is the full value of the vpn argument post-shift.  Only POWER4, POWER4+,
and PPC970* need the upper 16 bits cropped.

With this change FreeBSD can boot to multi-user on POWER9.

Reviewed by:	nwhitehorn
Differential Revision: https://reviews.freebsd.org/D15581
2018-05-26 00:41:50 +00:00
jhibbits
0712c8c20c Add an IPMI attachment for PowerNV systems
IPMI access on PowerNV systems is done through the OPAL firmware.  This adds a
simple attachment for communicating with the FSP/BMC on these machines.  This
has been tested on a Talos POWER9 workstation, only in the bootup phase, noting
the successful attachment messages:

...
ipmi0: IPMI device rev. 0, firmware rev. 2.00, version 2.0, device support mask 0
ipmi0: Number of channels 2
...

The ipmi device has not been added to GENERIC64, but may be after further
testing.  It may also eventually be added to the ipmi module at that point.
2018-05-22 03:57:32 +00:00
jhibbits
5c6533292f Add a comment explaining the need of a global temporary variable
cpu_xirr is used only as a temporary location for the OPAL call in
PIC_DISPATCH().

Requested by:	nwhitehorn
2018-05-22 03:24:16 +00:00
jhibbits
ae82fd52ad Basic OPAL sensor support for POWER9 platforms
Summary:
PowerNV architectures (in the test case POWER9) export sensors via the device
tree, which are accessed via OPAL calls.  This adds sysctl nodes for each
device in a generic fashion.  New sysctl nodes are:

dev.opal_sensor.N.sensor
dev.opal_sensor.N.sensor_min
dev.opal_sensor.N.sensor_max
dev.opal_sensor.N.type
dev.opal_sensor.N.label

These are rooted at a parent attachment under opal, called opalsens.  This does
not add support for the "sensor groups" defined in the device tree.

Reviewed by:	breno.leitao_gmail.com
Differential Revision: https://reviews.freebsd.org/D15362
2018-05-22 02:42:53 +00:00
nwhitehorn
851b9e4b05 Fix build with PSERIES but not POWERNV defined. 2018-05-20 18:26:09 +00:00
jhibbits
4a4c273c7a Add support for the XIVE XICS emulation mode for POWER9 systems
Summary:
POWER9 systems use a new interrupt controller, XIVE, managed through OPAL
firmware calls.  The OPAL firmware includes support for emulating the previous
generation XICS presentation layer in addition to a new "XIVE Exploitation"
mode.  As a stopgap until we have XIVE exploitation mode, enable XICS emulation
mode so that we at least have an interrupt controller.

Since the CPPR is local to the current CPU, it cannot be updated for APs when
initializing on the BSP.  This adds a new function, directly called by the
powernv platform code, to initialize the CPPR on AP bringup.

Reviewed by:	nwhitehorn
Differential Revision: https://reviews.freebsd.org/D15492
2018-05-20 03:23:17 +00:00
markj
5594f37668 Enable kernel dump features in GENERIC for most platforms.
This turns on support for kernel dump encryption and compression, and
netdump. arm and mips platforms are omitted for now, since they are more
constrained and don't benefit as much from these features.

Reviewed by:	cem, manu, rgrimes
Tested by:	manu (arm64)
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D15465
2018-05-19 19:53:23 +00:00
jhibbits
c6cca6917f Add SPR_HSRR0/SPR_HSRR1 definitions
Reported by:	Mark Millard
Pointy-hat to:	jhibbits
2018-05-19 04:56:10 +00:00
jhibbits
96915338d5 Add hypervisor trap handling, using HSRR0/HSRR1
Summary:
Some hypervisor exceptions on POWER architecture only save state to HSRR0/HSRR1.
Until we have bhyve on POWER, use a lightweight exception frontend which copies
HSRR0/HSRR1 into SRR0/SRR1, and run the normal trap handler.

The first user of this is the Hypervisor Virtualization Interrupt, which targets
the XIVE interrupt controller on POWER9.

Reviewed By: nwhitehorn
Differential Revision: https://reviews.freebsd.org/D15487
2018-05-19 04:21:50 +00:00
jhibbits
d02acb9927 powerpc64: Add OPAL definitions
Summary:
Add additional OPAL PCI definitions and expand the code to use them in order to
ease the OPAL interface process for new comers.

These definitions came directly from the OPAL code and they are the same for
both PHB3 (POWER8) and PHB4 (POWER9).

Submitted by:	Breno Leitao
Differential Revision: https://reviews.freebsd.org/D15432
2018-05-19 04:01:15 +00:00
jhibbits
bf8692e041 Fix a manual copy from the original diff for r333825
The 'else' was in the original diff.

Submitted by:	Breno Leitao
2018-05-19 03:47:28 +00:00
jhibbits
895c0483ab Add yet another option for gathering available memory
On some POWER9 systems, 'reg' denotes the full memory in the system, while
'linux,usable-memory' denotes the usable memory.  Some memory is reserved for
NVLink usage, so is partitioned off.

Submitted by:	Breno Leitao
2018-05-19 03:45:38 +00:00
jhibbits
c804777c4e Add some Hypervisor interrupt definitions
This mostly completes the interrupt definitions.  There are still some left out,
less likely to be used in the near term.
2018-05-19 03:23:46 +00:00
mmacy
7aeac9ef18 ifnet: Replace if_addr_lock rwlock with epoch + mutex
Run on LLNW canaries and tested by pho@

gallatin:
Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5
based ConnectX 4-LX NIC, I see an almost 12% improvement in received
packet rate, and a larger improvement in bytes delivered all the way
to userspace.

When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1,
I see, using nstat -I mce0 1 before the patch:

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
4.98   0.00   4.42   0.00 4235592     33   83.80 4720653 2149771   1235 247.32
4.73   0.00   4.20   0.00 4025260     33   82.99 4724900 2139833   1204 247.32
4.72   0.00   4.20   0.00 4035252     33   82.14 4719162 2132023   1264 247.32
4.71   0.00   4.21   0.00 4073206     33   83.68 4744973 2123317   1347 247.32
4.72   0.00   4.21   0.00 4061118     33   80.82 4713615 2188091   1490 247.32
4.72   0.00   4.21   0.00 4051675     33   85.29 4727399 2109011   1205 247.32
4.73   0.00   4.21   0.00 4039056     33   84.65 4724735 2102603   1053 247.32

After the patch

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
5.43   0.00   4.20   0.00 3313143     33   84.96 5434214 1900162   2656 245.51
5.43   0.00   4.20   0.00 3308527     33   85.24 5439695 1809382   2521 245.51
5.42   0.00   4.19   0.00 3316778     33   87.54 5416028 1805835   2256 245.51
5.42   0.00   4.19   0.00 3317673     33   90.44 5426044 1763056   2332 245.51
5.42   0.00   4.19   0.00 3314839     33   88.11 5435732 1792218   2499 245.52
5.44   0.00   4.19   0.00 3293228     33   91.84 5426301 1668597   2121 245.52

Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch

Reviewed by:	gallatin
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15366
2018-05-18 20:13:34 +00:00
nwhitehorn
ec32568d63 Final fix for alignment issues with the page table first patched with
r333273 and partially reverted with r333594.

Older CPUs implement addition of offsets into the page table by a
bitwise OR rather than actual addition, which only works if the table is
aligned at a multiple of its own size (they also require it to be aligned
at a multiple of 256KB). Newer ones do not have that requirement, but it
hardly matters to enforce it anyway.

The original code was failing on newer systems with huge amounts of RAM
(> 512 GB), in which the page table was 4 GB in size. Because the
bootstrap memory allocator took its alignment parameter as an int, this
turned into a 0, removing any alignment constraint at all and making
the MMU fail. The first round of this patch (r333273) fixed this case by
aligning it at 256 KB, which broke older CPUs. Fix this instead by widening
the alignment parameter.
2018-05-14 04:00:52 +00:00
nwhitehorn
5ee2c8220a Revert changes to hash table alignment in r333273, which booting on all G5
systems, pending further analysis.
2018-05-13 23:56:43 +00:00
jhibbits
095a58f17a No need to bzero splpar_vpa entries
splpar_vpa is in the BSS, so is already zeroed when the kernel starts up.

Tested by:	Leandro Lupori
2018-05-11 02:04:01 +00:00
jhibbits
560fc64981 Fix PPC symbol resolution
Summary:
There were 2 issues that were preventing correct symbol resolution
on PowerPC/pseries:

1- memory corruption at chrp_attach() - this caused the inital
   part of the symbol table to become zeroed, which would cause
   the kernel linker to fail to parse it.
   (this was probably zeroing out other memory parts as well)

2- DDB symbol resolution wasn't working because symtab contained
   not relocated addresses but it was given relocated offsets.
   Although relocating the symbol table fixed this, it broke the
   linker, that already handled this case.
   Thus, the fix for this consists in adding a new DDB macro:
   DB_STOFFS(offs) that converts a (potentially) relocated offset
   into one that can be compared with symbol table values.

PR:		227093
Submitted by:	Leandro Lupori <leandro.lupori_gmail.com>
Differential Revision: https://reviews.freebsd.org/D15372
2018-05-10 03:59:48 +00:00
imp
741ea3fea2 Move MI-ish bcopy routine to libkern
riscv and powerpc have nearly identical bcopy.c that's
supposed to be mostly MI. Move it to the MI libkern.

Differential Revision: https://reviews.freebsd.org/D15374
2018-05-10 02:31:38 +00:00
jhibbits
f7e8241ecd Fix wrong cpu0 identification
Summary:
chrp_cpuref_init() was relying on the boot strap processor to be
the first child of /cpus. That was not always the case, specially
on pseries with FDT.

This change uses the "reg" property of each CPU instead and also
adds several sanity checks to avoid unexpected behavior (maybe
too many panics?).

The main observed symptom was interrupts being missed by the main
processor, leading to timeouts and the kernel aborting the boot.

Submitted by:	Leandro Lupori
Reviewed by:	nwhitehorn
Differential Revision: https://reviews.freebsd.org/D15174
2018-05-08 13:23:39 +00:00
jhibbits
a0a74f77de Add support for powernv POWER9 MMU initialization
The POWER9 MMU (PowerISA 3.0) is slightly different from current
configurations, using a partition table even for hypervisor mode, and
dropping the SDR1 register.  Key off the newly early-enabled CPU features
flags for the new architecture, and configure the MMU appropriately.

The POWER9 MMU ignores the "PSIZ" field in the PTCR, and expects a 64kB
table.  As we are enabled for powernv (hypervisor mode, no VMs), only
initialize partition table entry 0, and zero out the rest.  The actual
contents of the register are identical to SDR1 from previous architectures.

Along with this, fix a bug in the page table allocation with very large
memory.  The table can be allocated on any 256k boundary.  The
bootstrap_alloc alignment argument is an int, and with large amounts of
memory passing the size of the table as the alignment will overflow an
integer.  Hard-code the alignment at 256k as wider alignment is not
necessary.

Reviewed by:	nwhitehorn
Tested by:	Breno Leitao
Relnotes:	Yes
2018-05-05 16:00:02 +00:00
jhibbits
66b2a0cd2b Break out the cpu_features setup to its own function, to be run earlier
The new POWER9 MMU configuration is slightly different from current setups.
Rather than special-casing on POWER9, move the initialization of cpu_features
and cpu_features2 to as early as possible, so that platform and MMU
configuration can be based upon CPU features instead of specific CPUs if at all
possible.

Reviewed by:	nwhitehorn
2018-05-05 15:48:39 +00:00
jhibbits
66aefafff1 Add POWER9 to the POWER8 bootstrap case blocks
POWER8 and POWER9 have similar configuration requirements for hypervisor setup,
and in the cases here they're identical.  Add the POWER9 constant to the POWER8
list so it's initialized correctly.

Reviewed by:	nwhitehorn
2018-05-05 15:42:58 +00:00
mjg
9d78bcff00 Allow __builtin_memmove instead of bcopy for small buffers of known size
See r323329 for an explanation why this is a good idea.
2018-05-04 04:00:48 +00:00
jhibbits
95d12b7b4a Remove dead errata fixup code
This code caused more problems than it should have fixed (boot failures) on
the machines I tested, so has been commented out for a while now.  Remove
it, and assume the errata fixups were done by the bootloader where they
belong.
2018-05-01 04:31:17 +00:00
nwhitehorn
403afd3caa Fix null pointer dereference on nodes without a "compatible" property.
MFC after:	1 week
2018-04-30 19:37:32 +00:00
jhibbits
ee711c8a2d Increase the fdtmemreserv array limit to boot on POWER9
Discussing with others, this needs to be at least 20 to boot on some POWER9
nodes.  Linux made a similar change for the same reason, so increase to 32
to give us some extra breathing room as well.  The input and output arrays
are sized at 256, so much greater than the increase in the property array
size.
2018-04-25 02:42:11 +00:00
jhibbits
5d53704ae8 Fix the build post r332859
sysentvec::sv_hwcap/sv_hwcap2 are pointers to  u_long, so cpu_features* need
to be u_long to use the pointers.  This also requires a temporary cast in
printing the bitfields, which is fine because the feature flag fields are
only 32 bits anyway.
2018-04-22 03:58:04 +00:00
jhibbits
fb63c578b3 Export powerpc CPU features for auxvec
FreeBSD exports the AT_HWCAP* auxvec items if provided by the ELF sysentvec
structure.  Add the CPU features to be exported, so user space can more
easily check for them without using the hw.cpu_features and hw.cpu_features2
sysctls.
2018-04-21 15:15:47 +00:00
jhibbits
f139bcd72b Sync powerpc feature flags with Linux
Not all feature flags are synced.  Those for processors we don't currently
support are ignored currently.  Those that are supported are synced best I
can tell.  One flag was renamed to match the Linux flag name
(PPC_FEATURE2_VCRYPTO -> PPC_FEATURE2_VEC_CRYPTO).
2018-04-21 04:18:17 +00:00
jhibbits
53a9e83dbf powerpc64: Set n_slbs = 32 for POWER9
Summary:
POWER9 also contains 32 slbs entries as explained by the POWER9 User Manual:

 "For HPT translation, the POWER9 core contains a unified (combined for both
   instruction and data), 32-entry, fully-associative SLB per thread"

Submitted by:	Breno Leitao
Differential Revision: https://reviews.freebsd.org/D15128
2018-04-20 03:23:19 +00:00
jhibbits
d6c2811df6 powerpc64: Add DSCR support
Summary:
Powerpc64 has support for a register called Data Stream Control Register
(DSCR), which basically controls how the hardware controls the caching and
prefetch for stream operations.

Since mfdscr and mtdscr are privileged instructions, we need to emulate them,
and
keep the custom DSCR configuration per thread.

The purpose of this feature is to change DSCR depending on the operation, set
to DSCR Default Prefetch Depth to deepest on string operations, as memcpy.

Submitted by:	Breno Leitao
Differential Revision: https://reviews.freebsd.org/D15081
2018-04-20 03:19:44 +00:00
nwhitehorn
46c08115fd Fix detection of memory overlap with the kernel in the case where a memory
region marked "available" by firmware is contained entirely in the kernel.

This had a tendency to happen with FDTs passed by loader, though could for
other reasons as well, and would result in the kernel slowly cannibalizing
itself for other purposes, eventually resulting in a crash.

A similar fix is needed for mmu_oea.c and should probably just be rolled
at that point into some generic code in platform.c for taking a mem_region
list and removing chunks.

PR:		226974
Submitted by:	leandro.lupori@gmail.com
Reviewed by:	jhibbits
Differential Revision:	D15121
2018-04-19 18:34:38 +00:00
mav
e45770f056 Release memory resource on cuda driver attach failure.
Submitted by:	Dmitry Luhtionov <dmitryluhtionov@gmail.com>
2018-04-19 15:29:10 +00:00
avg
cc87433116 set kdb_why to "trap" when calling kdb_trap from trap_fatal
This will allow to hook a ddb script to "kdb.enter.trap" event.
Previously there was no specific name for this event, so it could only
be handled by either "kdb.enter.unknown" or "kdb.enter.default" hooks.
Both are very unspecific.

Having a specific event is useful because the fatal trap condition is
very similar to panic but it has an additional property that the current
stack frame is the frame where the trap occurred.  So, both a register
dump and a stack bottom dump have additional information that can help
analyze the problem.

I have added the event only on architectures that have trap_fatal()
function defined.  I haven't looked at other architectures.  Their
maintainers can add support for the event later.

Sample script:
kdb.enter.trap=bt; show reg; x/aS $rsp,20; x/agx $rsp,20

Reviewed by:	kib, jhb, markj
MFC after:	11 days
Sponsored by:	Panzura
Differential Revision: https://reviews.freebsd.org/D15093
2018-04-19 05:06:56 +00:00