78 Commits

Author SHA1 Message Date
jhibbits
5c6533292f Add a comment explaining the need of a global temporary variable
cpu_xirr is used only as a temporary location for the OPAL call in
PIC_DISPATCH().

Requested by:	nwhitehorn
2018-05-22 03:24:16 +00:00
nwhitehorn
851b9e4b05 Fix build with PSERIES but not POWERNV defined. 2018-05-20 18:26:09 +00:00
jhibbits
4a4c273c7a Add support for the XIVE XICS emulation mode for POWER9 systems
Summary:
POWER9 systems use a new interrupt controller, XIVE, managed through OPAL
firmware calls.  The OPAL firmware includes support for emulating the previous
generation XICS presentation layer in addition to a new "XIVE Exploitation"
mode.  As a stopgap until we have XIVE exploitation mode, enable XICS emulation
mode so that we at least have an interrupt controller.

Since the CPPR is local to the current CPU, it cannot be updated for APs when
initializing on the BSP.  This adds a new function, directly called by the
powernv platform code, to initialize the CPPR on AP bringup.

Reviewed by:	nwhitehorn
Differential Revision: https://reviews.freebsd.org/D15492
2018-05-20 03:23:17 +00:00
mmacy
7aeac9ef18 ifnet: Replace if_addr_lock rwlock with epoch + mutex
Run on LLNW canaries and tested by pho@

gallatin:
Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5
based ConnectX 4-LX NIC, I see an almost 12% improvement in received
packet rate, and a larger improvement in bytes delivered all the way
to userspace.

When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1,
I see, using nstat -I mce0 1 before the patch:

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
4.98   0.00   4.42   0.00 4235592     33   83.80 4720653 2149771   1235 247.32
4.73   0.00   4.20   0.00 4025260     33   82.99 4724900 2139833   1204 247.32
4.72   0.00   4.20   0.00 4035252     33   82.14 4719162 2132023   1264 247.32
4.71   0.00   4.21   0.00 4073206     33   83.68 4744973 2123317   1347 247.32
4.72   0.00   4.21   0.00 4061118     33   80.82 4713615 2188091   1490 247.32
4.72   0.00   4.21   0.00 4051675     33   85.29 4727399 2109011   1205 247.32
4.73   0.00   4.21   0.00 4039056     33   84.65 4724735 2102603   1053 247.32

After the patch

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
5.43   0.00   4.20   0.00 3313143     33   84.96 5434214 1900162   2656 245.51
5.43   0.00   4.20   0.00 3308527     33   85.24 5439695 1809382   2521 245.51
5.42   0.00   4.19   0.00 3316778     33   87.54 5416028 1805835   2256 245.51
5.42   0.00   4.19   0.00 3317673     33   90.44 5426044 1763056   2332 245.51
5.42   0.00   4.19   0.00 3314839     33   88.11 5435732 1792218   2499 245.52
5.44   0.00   4.19   0.00 3293228     33   91.84 5426301 1668597   2121 245.52

Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch

Reviewed by:	gallatin
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15366
2018-05-18 20:13:34 +00:00
jhibbits
095a58f17a No need to bzero splpar_vpa entries
splpar_vpa is in the BSS, so is already zeroed when the kernel starts up.

Tested by:	Leandro Lupori
2018-05-11 02:04:01 +00:00
jhibbits
560fc64981 Fix PPC symbol resolution
Summary:
There were 2 issues that were preventing correct symbol resolution
on PowerPC/pseries:

1- memory corruption at chrp_attach() - this caused the inital
   part of the symbol table to become zeroed, which would cause
   the kernel linker to fail to parse it.
   (this was probably zeroing out other memory parts as well)

2- DDB symbol resolution wasn't working because symtab contained
   not relocated addresses but it was given relocated offsets.
   Although relocating the symbol table fixed this, it broke the
   linker, that already handled this case.
   Thus, the fix for this consists in adding a new DDB macro:
   DB_STOFFS(offs) that converts a (potentially) relocated offset
   into one that can be compared with symbol table values.

PR:		227093
Submitted by:	Leandro Lupori <leandro.lupori_gmail.com>
Differential Revision: https://reviews.freebsd.org/D15372
2018-05-10 03:59:48 +00:00
jhibbits
f7e8241ecd Fix wrong cpu0 identification
Summary:
chrp_cpuref_init() was relying on the boot strap processor to be
the first child of /cpus. That was not always the case, specially
on pseries with FDT.

This change uses the "reg" property of each CPU instead and also
adds several sanity checks to avoid unexpected behavior (maybe
too many panics?).

The main observed symptom was interrupts being missed by the main
processor, leading to timeouts and the kernel aborting the boot.

Submitted by:	Leandro Lupori
Reviewed by:	nwhitehorn
Differential Revision: https://reviews.freebsd.org/D15174
2018-05-08 13:23:39 +00:00
jhibbits
a0a74f77de Add support for powernv POWER9 MMU initialization
The POWER9 MMU (PowerISA 3.0) is slightly different from current
configurations, using a partition table even for hypervisor mode, and
dropping the SDR1 register.  Key off the newly early-enabled CPU features
flags for the new architecture, and configure the MMU appropriately.

The POWER9 MMU ignores the "PSIZ" field in the PTCR, and expects a 64kB
table.  As we are enabled for powernv (hypervisor mode, no VMs), only
initialize partition table entry 0, and zero out the rest.  The actual
contents of the register are identical to SDR1 from previous architectures.

Along with this, fix a bug in the page table allocation with very large
memory.  The table can be allocated on any 256k boundary.  The
bootstrap_alloc alignment argument is an int, and with large amounts of
memory passing the size of the table as the alignment will overflow an
integer.  Hard-code the alignment at 256k as wider alignment is not
necessary.

Reviewed by:	nwhitehorn
Tested by:	Breno Leitao
Relnotes:	Yes
2018-05-05 16:00:02 +00:00
gonzo
ee59b6a5e7 [ofw] fix errneous checks for OF_finddevice(9) return value
OF_finddevices returns ((phandle_t)-1) in case of failure. Some code
in existing drivers checked return value to be equal to 0 or
less/equal to 0 which is also wrong because phandle_t is unsigned
type. Most of these checks were for negative cases that were never
triggered so trhere was no impact on functionality.

Reviewed by:	nwhitehorn
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D14645
2018-03-20 00:03:49 +00:00
jhibbits
0f5155e5e4 Merge AIM and Book-E PCPU fields
This is part of a long-term goal of merging Book-E and AIM into a single GENERIC
kernel.  As more work is done, the struct may be optimized further.

Reviewed by:	nwhitehorn
2018-02-17 20:59:12 +00:00
jhibbits
707bc0b7a7 PPC64: Get the timestap from the proper OF field
Summary:
After revision rS328534('PPC64: use hwref instead of cpuid'), FreeBSD on
powerpc64 virtual machine panics since it is unable to read the
timebase, showing the following error:

     get-property for timebase-frequency on zero phandle

     panic: Unable to determine timebase frequency!

With the change above,  cpuref->cr_hwref does not contain the phandle
anymore, thus, it never reads the proper CPU entry in OF.

Submitted by:	Breno Leitao
Differential Revision:	https://reviews.freebsd.org/D14204
2018-02-14 02:51:28 +00:00
jhibbits
e2b4c362eb powerpc64/pseries: Define new hcalls
Summary:
Define new hcalls as in 'Linux on Power Architecture Platform Reference'
version 1.1 (24 March 2016) downloaded from:

        https://members.openpowerfoundation.org/document/dl/469

Submitted by:	Breno Leitao
Differential Revision:	https://reviews.freebsd.org/D14281
2018-02-14 02:48:27 +00:00
wma
bb9cfb4be4 PPC64: use hwref instead of cpuid
On CHRP and PowerNV, use the interrupt server number in the cpuref and pcpu
hwref field instead of the device-tree phandle and make the CPU IDs reported
to the scheduler dense and with the BSP at 0.

Submitted by:          Wojciech Macek <wma@semihalf.com>
Obtained from:         Semihalf
Sponsored by:          IBM, QCM Technologies
Differential revision: https://reviews.freebsd.org/D14011
2018-01-29 09:15:38 +00:00
pfg
ced875130d Revert r327828, r327949, r327953, r328016-r328026, r328041:
Uses of mallocarray(9).

The use of mallocarray(9) has rocketed the required swap to build FreeBSD.
This is likely caused by the allocation size attributes which put extra pressure
on the compiler.

Given that most of these checks are superfluous we have to choose better
where to use mallocarray(9). We still have more uses of mallocarray(9) but
hopefully this is enough to bring swap usage to a reasonable level.

Reported by:	wosch
PR:		225197
2018-01-21 15:42:36 +00:00
wma
b76b1a3176 PowerNV: XICS support for PowerNV/OPAL
Make XICS to be OPAL-aware.

Created by:            Nathan Whitehorn <nwhitehorn@freebsd.org>
Submitted by:          Wojciech Macek <wma@semihalf.com>
Sponsored by:          FreeBSD Foundation
2018-01-16 06:24:19 +00:00
pfg
e1b1b7bd96 powerpc: make some use of mallocarray(9).
Focus on code where we are doing multiplications within malloc(9). None of
these ire likely to overflow, however the change is still useful as some
static checkers can benefit from the allocation attributes we use for
mallocarray.

This initial sweep only covers malloc(9) calls with M_NOWAIT. No good
reason but I started doing the changes before r327796 and at that time it
was convenient to make sure the sorrounding code could handle NULL values.

X-Differential revision: https://reviews.freebsd.org/D13837
2018-01-15 21:10:40 +00:00
nwhitehorn
c09107dfaf Revert r327360, which can cause boot problems on high-CPU-count (>60)
POWER8 and POWER9 systems, pending further analysis.

PR:		224841
2018-01-04 23:07:51 +00:00
nwhitehorn
9e4c7607a7 Enhance the CHRP/pSeries platform layer:
- Densely number CPUs to avoid systems with CPUs with very high ID numbers
- Always have the BSP be CPU 0 to avoid remnant brokenness with non-0 BSPs
  in other parts of the kernel.
- Improve parsing of the device tree CPU listings on SMT systems.
- Allow reboot via RTAS as well as OF for pSeries systems booted by FDT
  without functioning Open Firmware.

Obtained from:	projects/powernv
MFC after:	3 weeks
2017-12-29 21:09:17 +00:00
pfg
6f8905baf6 sys/powerpc: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 15:09:59 +00:00
nwhitehorn
580b4626ea Remove another extern int n_slbs made redundant by declaring this in
mmu_oea64.h.

MFC after:	3 weeks
2017-11-26 04:34:13 +00:00
asomers
2d89a50f96 Always null-terminate ccb_pathinq.(sim_vid|hba_vid|dev_name)
The sim_vid, hba_vid, and dev_name fields of struct ccb_pathinq are
fixed-length strings. AFAICT the only place they're read is in
sbin/camcontrol/camcontrol.c, which assumes they'll be null-terminated.
However, the kernel doesn't null-terminate them. A bunch of copy-pasted code
uses strncpy to write them, and doesn't guarantee null-termination. For at
least 4 drivers (mpr, mps, ciss, and hyperv), the hba_vid field actually
overflows. You can see the result by doing "camcontrol negotiate da0 -v".

This change null-terminates those fields everywhere they're set in the
kernel. It also shortens a few strings to ensure they'll fit within the
16-character field.

PR:		215474
Reported by:	Coverity
CID:		1009997 1010000 1010001 1010002 1010003 1010004 1010005
CID:		1331519 1010006 1215097 1010007 1288967 1010008 1306000
CID:		1211924 1010009 1010010 1010011 1010012 1010013 1010014
CID:		1147190 1010017 1010016 1010018 1216435 1010020 1010021
CID:		1010022 1009666 1018185 1010023 1010025 1010026 1010027
CID:		1010028 1010029 1010030 1010031 1010033 1018186 1018187
CID:		1010035 1010036 1010042 1010041 1010040 1010039
Reviewed by:	imp, sephe, slm
MFC after:	4 weeks
Sponsored by:	Spectra Logic Corp
Differential Revision:	https://reviews.freebsd.org/D9037
Differential Revision:	https://reviews.freebsd.org/D9038
2017-01-04 20:26:42 +00:00
nwhitehorn
dd78a731a2 Close a race when making the CPU idle under pHyp. If an interrupt occurs
between the beginning of the idle function and actually going idle, the
CPU could go to sleep with pending work.

MFC after:	1 month
2016-08-24 16:49:14 +00:00
pfg
15369e2805 Use our nitems() macro when param.h is available.
Replacements specific to arm, mips, pc98, powerpc and sparc64.

Discussed in:	freebsd-current
2016-04-20 15:45:55 +00:00
zbb
fc42f4865c Reduce OFW PCI code duplication - involves ARM, PPC and SPARC64
Import portions of the PowerPC OF PCI implementation into new file
"ofwpci.c", common for other platforms. The files ofw_pci.c and ofw_pci.h
from sys/powerpc/ofw no longer exist. All required declarations are moved
to sys/dev/ofw/ofwpci.h. This creates a new ofw_pci_write_ivar() function
and modifies some others methods. Most functions contain existing ppc
implementations in the majority unchanged. Now there is no need to have
multiple identical copies of methods for various architectures.

Requested by:  jhibbits
Reviewed by:   jhibbits, marius
Submitted by:  Marcin Mazurek <mma@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Annapurna Labs
Differential Revision: https://reviews.freebsd.org/D4879
2016-03-29 15:19:56 +00:00
jhibbits
c55aa7292d Fix the resource_list_print_type() calls to use uintmax_t.
Missed a bunch from r297000.
2016-03-22 22:25:08 +00:00
skra
f4b6499ab5 As <machine/pmap.h> is included from <vm/pmap.h>, there is no need to
include it explicitly when <vm/pmap.h> is already included.

Reviewed by:	alc, kib
Differential Revision:	https://reviews.freebsd.org/D5373
2016-02-22 09:02:20 +00:00
zbb
e9cf712fda Revert r295756:
Extract common code from PowerPC's ofw_pci

Import portions of the PowerPC OF PCI implementation into
new file "ofw_pci.c", common for other platforms. The files ofw_pci.c and
ofw_pci.h from sys/powerpc/ofw no longer exist. All required declarations
are moved to sys/dev/ofw/ofw_pci.h.

This creates a new ofw_pci_write_ivar() function and modifies
ofw_pci_nranges(), ofw_pci_read_ivar(), ofw_pci_route_interrupt()
methods.
Most functions contain existing ppc implementations in the majority
unchanged. Now there is no need to have multiple identical copies
of methods for various architectures.

Submitted by:  Marcin Mazurek <mma@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Annapurna Labs
Reviewed by:   jhibbits, mmel
Differential Revision: https://reviews.freebsd.org/D4879

This needs to return to the drawing board as it breaks both
PowerPC and Sparc64 build.

Pointed out by: jhibbits
2016-02-20 12:28:20 +00:00
zbb
e22231d219 Extract common code from PowerPC's ofw_pci
Import portions of the PowerPC OF PCI implementation into
new file "ofw_pci.c", common for other platforms. The files ofw_pci.c and
ofw_pci.h from sys/powerpc/ofw no longer exist. All required declarations
are moved to sys/dev/ofw/ofw_pci.h.

This creates a new ofw_pci_write_ivar() function and modifies
ofw_pci_nranges(), ofw_pci_read_ivar(), ofw_pci_route_interrupt() methods.
Most functions contain existing ppc implementations in the majority
unchanged. Now there is no need to have multiple identical copies
of methods for various architectures.

Submitted by:  Marcin Mazurek <mma@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Annapurna Labs
Reviewed by:   jhibbits, mmel
Differential Revision: https://reviews.freebsd.org/D4879
2016-02-18 13:07:21 +00:00
nwhitehorn
999db941f1 Move RTAS PCI-specific interpretation of the "reg" property of the PCI host
device to the RTAS driver, where it belongs.
2016-01-18 17:27:16 +00:00
nwhitehorn
cdb30b9b2d Provide link state reporting so that ifconfig_llan0="DHCP" works. The
reported link state is fictional (always up) since the hypervisor does
not provide this information.

MFC after:	1 week
2015-12-19 02:16:38 +00:00
nwhitehorn
96deb1fcf0 Where appropriate, use the endian-flipping OF_getencprop() instead of
OF_getprop() to get encode-int encoded values from the OF tree. This is
a no-op at present, since all existing PowerPC ports are big-endian, but
it is a correctness improvement and will be required if we have a
little-endian kernel at some future point.

Where it is totally impossible for the code ever to be used on a
little-endian system (much of powerpc/powermac, for instance), I have not
necessarily made the appropriate changes.

MFC after:	1 month
2015-11-17 16:07:43 +00:00
jkim
318c4f97e6 CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten
years for head.  However, it is continuously misused as the mpsafe argument
for callout_init(9).  Deprecate the flag and clean up callout_init() calls
to make them more consistent.

Differential Revision:	https://reviews.freebsd.org/D2613
Reviewed by:	jhb
MFC after:	2 weeks
2015-05-22 17:05:21 +00:00
br
deecdc3d61 Provide the number of interrupt resources added to the list
by using extra argument, so caller will know that.
2015-05-15 13:55:18 +00:00
nwhitehorn
224d995d2e Convert PTE eviction lock from an RW lock to a RM lock. It is held for
writing approximately never (< 0.00000001% under heavy VM load, and it can
go for months without ever being acquired in normal operation). This
provides a 10% (2-minute) improvement in wall clock time for make -j32
buildworld on a 4-core 32-thread POWER8.
2015-03-16 16:29:33 +00:00
nwhitehorn
2c2f3ffe1d Deallocate any leftover page table entries in the LPAR at boot. This
prevents contamination from a previous kernel (e.g. after shutdown -r).
2015-03-13 00:08:58 +00:00
nwhitehorn
431cf92001 The H_VIO_SIGNAL hypercall only enables interrupts for future received
packets and does not schedule interrupts for any packets currently
enqueued. Close two races where enqueued packets may not ever trigger
interrupts. The first of these, at adapter initialization time, was
especially severe since a rush of enqueued packets could actually fill
the receive buffer completely, stalling the interface forever.

MFC after:	2 weeks
2015-03-12 17:01:30 +00:00
nwhitehorn
8d11dd01a1 New pmap implementation for 64-bit PowerPC processors. The main focus of
this change is to improve concurrency:
- Drop global state stored in the shadow overflow page table (and all other
  global state)
- Remove all global locks
- Use per-PTE lock bits to allow parallel page insertion
- Reconstruct state when requested for evicted PTEs instead of buffering
  it during overflow

This drops total wall time for make buildworld on a 32-thread POWER8 system
by a factor of two and system time by a factor of three, providing performance
20% better than similarly clocked Core i7 Xeons per-core. Performance on
smaller SMP systems, where PMAP lock contention was not as much of an issue,
is nearly unchanged.

Tested on:	POWER8, POWER5+, G5 UP, G5 SMP (64-bit and 32-bit kernels)
Merged from:	user/nwhitehorn/ppc64-pmap-rework
Looked over by:	jhibbits, andreast
MFC after:	3 months
Relnotes:	yes
Sponsored by:	FreeBSD Foundation
2015-02-24 21:37:20 +00:00
nwhitehorn
cc0edb125a Fix race in interrupt handling that could cause IO to hang up under heavy
load.
2015-02-23 20:38:00 +00:00
nwhitehorn
5e79d149b1 Add error reporting to interrupt CPU binding. 2015-02-10 00:57:26 +00:00
nwhitehorn
2cc0c7af02 Distribute interrupts across multiple CPUs in SMP configurations instead of sending them
all to CPU 0.
2015-02-09 19:21:54 +00:00
nwhitehorn
83e35772fc Mark invalid page table entries correctly for PMAP as well as for the
hypervisor. This prevents an infinite loop where processes with evicted
pages would page fault forever when PMAP decided the evicted pages on
which the process was faulting was actually present and did not need to
be restored.

Found while building LLVM with make -j32.

Sponsored by:	FreeBSD Foundation
2015-02-09 15:58:27 +00:00
bz
4a78a9edfc Properly hide a variable under #ifdef as it is only used inside the
specific #ifdef block otherwise leaving an unused variable and breaking
other kernel builds.
2015-02-09 11:34:45 +00:00
nwhitehorn
1e0c575751 Fix typo in PTE insertion overflow handling: use the page we're actually
returning, not the one we just looked at.
2015-02-09 07:08:54 +00:00
nwhitehorn
04847b0c51 Technically speaking, using one virtal processor area for all CPUs is a
violation of the spec. Make duplicate entries for each CPU.
2015-02-09 02:13:36 +00:00
nwhitehorn
02ff4e5e54 Add some error checking on the supplied page size list. This makes sure
that we (a) get the correct large page size to provide to pmap and (b)
we can alert the user if running under incorrectly-configured PowerKVM
on POWER7 and POWER8 systems.

MFC after:	1 week
2015-02-08 16:50:00 +00:00
nwhitehorn
aa04ebca04 Fix bug in mapppings of multiple pages exposed by updates to the VSCSI
support in QEMU. Each page of a many page mapping was getting mapped to
the same physical address, which is not the desired behavior.

MFC after:	1 week
2015-01-27 07:20:00 +00:00
nwhitehorn
1be9b49684 Restore use of ofw_bus_intr_to_rl() in the pseries vdevice driver after fixing
ofw_bus_intr_to_rl() to match the spec for unspecified interrupt-parent
properties.
2015-01-05 21:39:35 +00:00
nwhitehorn
4b42f3d1c8 Revert r272109 locally, which is not quite equivalent in how it deals with
missing interrupt-parent properties. A better solution will come later,
but this restores pseries in QEMU for the time being.
2015-01-05 18:15:16 +00:00
nwhitehorn
96234b61c4 Allow booting with both a real Open Firmware tree and a flattened version of
the Open Firmware, as provided by petitboot, for example. Note that this is
not quite complete, since RTAS instantiation still depends on callable
firmware.

MFC after:	2 weeks
2015-01-01 22:26:12 +00:00
ian
e2b20df1df Replace multiple nearly-identical copies of code to walk through an FDT
node's interrupts=<...> property creating resource list entries with a
single common implementation.  This change makes ofw_bus_intr_to_rl() the
one true copy of that code and removes the copies of it from other places.

This also adds handling of the interrupts-extended property, which allows
specifying multiple interrupts for a node where each interrupt can have a
separate interrupt-parent.  The bindings for this state that the property
cells contain an xref phandle to the interrupt parent followed by whatever
interrupt info that parent normally expects.  This leads to having a
variable number of icells per interrupt in the property.  For example you
could have <&intc1 1 &intc2 26 9 0 &intc3 9 4>.

Differential Revision: https://reviews.freebsd.org/D803
2014-09-25 15:02:33 +00:00