Commit Graph

280 Commits

Author SHA1 Message Date
Zbigniew Bodek
017b6ebc0a Fix interrupts delivery on ThunderX for VF IDs beyond 8
SR-IOV devices usually use Alternative Routing ID (ARI).
In that case slot/device is always assumed to be 0 and
function/identifier is extended to 8 bits.

Fix interrupts delivery to VF IDs beyond 8 by using a correct
DevID if ARI is enabled.

Reviewed by:   jhb, wma
Obtained from: Semihalf
Sponsored by:  Cavium
Differential Revision: https://reviews.freebsd.org/D5855
2016-04-07 10:36:50 +00:00
Andrew Turner
5d58666c93 Use PHYS_IN_DMAP to check if a physical address is within the DMAP region.
Approved by:	ABT Systems Ltd
Sponsored by:	The FreeBSD Foundation
2016-04-06 14:16:37 +00:00
Andrew Turner
70e7278593 Cleanup the early pagetable creation code in preperation for increasing
the size of the arm64 DMAP region.

Approved by:	ABT Systems Ltd
Sponsored by:	The FreeBSD Foundation
2016-04-06 14:12:00 +00:00
Andrew Turner
c7d4b461b3 Allow vmparam.h to be included from assembly files on arm64.
Obtained from:	ABT Systems Ltd
Sponsored by:	The FreeBSD Foundation
2016-04-06 14:08:10 +00:00
Ed Schouten
ab83575070 Make CloudABI's way of doing TLS more friendly to userspace emulators.
We're currently seeing how hard it would be to run CloudABI binaries on
operating systems cannot be modified easily (Windows, Mac OS X). The
idea is that we want to just run them without any sandboxing. Now
that CloudABI executables are PIE, this is already a bit easier, but TLS
is still problematic:

- CloudABI executables want to write to the %fs, which typically
  requires extra system calls by the emulator every time it needs to
  switch between CloudABI's and its own TLS.

- If CloudABI executables overwrite the %fs base unconditionally, it
  also becomes harder for the emulator to store a backup of the old
  value of %fs. To solve this, let's no longer overwrite %fs, but just
  %fs:0.

As CloudABI's C library does not use a TCB, this space can now be used
by an emulator to keep track of its internal state. The executable can
now safely overwrite %fs:0, as long as it makes sure that the TCB is
copied over to the new TLS area.

Ensure that there is an initial TLS area set up when the process starts,
only containing a bogus TCB. We don't really care about its contents on
FreeBSD.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D5836
2016-04-06 11:11:31 +00:00
Wojciech Macek
1c7c13aa0e Implement dtrace_getupcstack in ARM64
Allow using DTRACE for performance analysis of userspace
applications - the function call stack can be captured.
This is almost an exact copy of AMD64 solution.

Obtained from:         Semihalf
Sponsored by:          Cavium
Reviewed by:           emaste, gnn, jhibbits
Differential Revision: https://reviews.freebsd.org/D5779
2016-04-06 05:13:36 +00:00
Andrew Turner
53b832b091 Add a table to map from the FreeBSD CPUID space to the GIC CPUID space. On
many SoCs these two are the same, however there is no requirement for this
to be the case, e.g. on the ARM Juno we boot on what the GIC thinks of as
CPU 2, but FreeBSD numbers it CPU 0.

Obtained from:	ABT Systems Ltd
Sponsored by:	The FreeBSD Foundation
2016-04-04 17:04:33 +00:00
Andrew Turner
bc5a80161c Reduce the diff for when we switch to intrng. The IPI interrupts will be
split out to multiple handlers.

Obtained from:	ABT Systems Ltd
Sponsored by:	The FreeBSD Foundation
2016-04-04 15:13:17 +00:00
Wojciech Macek
4d1dd74a50 arm64: pagezero improvement
This change has been provided to improve pagezero call performance.

Submitted by:          Dominik Ermel <der@semihalf.com>
Obtained from:         Semihalf
Sponsored by:          Cavium
Reviewed by:           kib
Differential Revision: https://reviews.freebsd.org/D5741
2016-04-04 07:16:43 +00:00
Wojciech Macek
73ffb5e8a4 Add bzero.S to ARM64 machdep
Add fille missing from https://svnweb.freebsd.org/changeset/base/297536
2016-04-04 07:11:33 +00:00
Wojciech Macek
db27818234 arm64: bzero optimization
This optimization attempts to utylize as wide as possible register store instructions to zero large buffers.
The implementation, if possible, will use 'dc zva' to zero buffer by cache lines.

Speedup: 60x faster memory zeroing

Submitted by:          Dominik Ermel <der@semihalf.com>
Obtained from:         Semihalf
Sponsored by:          Cavium
Reviewed by:           kib
Differential Revision: https://reviews.freebsd.org/D5726
2016-04-04 07:06:20 +00:00
Ed Schouten
4a8b3b18cc Make Position Independent Executables work for CloudABI.
- Set BI_CAN_EXEC_DYN, so we can execute ET_DYN ELF files in addition to
  regular ET_EXECs.
- Provide an AT_BASE entry in the auxiliary vector, so the executable
  knows at which address it got loaded and can apply relocations.
2016-03-31 18:52:00 +00:00
Andrew Turner
f2f21faf62 Add support for 4 level pagetables. The userland address space has been
increased to 256TiB. The kernel address space can also be increased to be
the same size, but this will be performed in a later change.

To help work with an extra level of page tables two new functions have
been added, one to file the lowest level table entry, and one to find the
block/page level. Both of these find the entry for a given pmap and virtual
address.

This has been tested with a combination of buildworld, stress2 tests, and
by using sort to consume a large amount of memory by sorting /dev/zero. No
new issues are known to be present from this change.

Reviewed by:	kib
Obtained from:	ABT Systems Ltd
Relnotes:	yes
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D5720
2016-03-31 11:07:24 +00:00
Andrew Turner
6c5b1ed4b6 Read the CPU ID for the current CPU from the GIC. The GIC may have a
different ID space than the kernel. Because of this we need to read the
ID from the hardware. The hardware will provide this value to the CPU by
reading any of the first 8 Interrupt Processor Targets Registers.

Obtained from:	ABT Systems Ltd
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D5706
2016-03-29 13:51:26 +00:00
Wojciech Macek
f379b4636a arm64: Fixing user space boudary checking in copyinout.S
Big buffer size could cause integer overflow and as a result
attempt to copy beyond VM_USERMAX_ADDRESS.

Fixing copyinstr boundary checking where compared value has been
overwritten by accident when setting fault handler.

Submitted by:          Dominik Ermel <der@semihalf.com>
Obtained from:         Semihalf
Sponsored by:          Cavium
Reviewed by:           kib
Differential Revision: https://reviews.freebsd.org/D5719
2016-03-24 13:28:33 +00:00
Wojciech Macek
f3e730a1e5 ARM64 copyinout improvements
The first of set of patches.
Use wider load/stores when aligned buffer is being copied.

In a simple test:
  dd if=/dev/zero of=/dev/null bs=1M count=1024
the performance jumped from 410MB/s up to 3.6GB/s.

TODO:
 - better handling of unaligned buffers (WiP)
 - implement similar mechanism to bzero

Submitted by:          Dominik Ermel <der@semihalf.com>
Obtained from:         Semihalf
Sponsored by:          Cavium
Reviewed by:           kib, andrew, emaste
Differential Revision: https://reviews.freebsd.org/D5664
2016-03-23 13:29:52 +00:00
Andrew Turner
f6c7371c81 Use the saved program state register to detect when an exception frame is
from userpsace. Previously we could have triggered a panic by trying to
jump to a kernel address from userland as the trap handling code thought we
received an ast in kernel mode.

Obtained from:	ABT Systems Ltd
Sponsored by:	The FreeBSD Foundation
2016-03-22 08:36:25 +00:00
Andrew Turner
b011fce09f Move the opt_ files to be included first so their definitions can be used
from within all further included files.

Obtained from:	ABT Systems Ltd
Sponsored by:	The FreeBSD Foundation
2016-03-18 16:32:22 +00:00
Andrew Turner
a9056bbb93 Rename COUNT_IPI to INTR_IPI_COUNT to reduce the diff with intrng.
Obtained from:	ABT Systems Ltd
Sponsored by:	The FreeBSD Foundation
2016-03-18 16:29:58 +00:00
Andrew Turner
4d00d27b3a Reduce the diff with intrng by renaming similar functions. This is a noop,
but will help move to use the common interrupt handling code later.

Obtained from:	ABT Systems Ltd
Sponsored by:	The FreeBSD Foundation
2016-03-18 16:18:29 +00:00
Andrew Turner
7285efe8cd Remove the invalid L0_BLOCK definition. ARMv8 doesn't support block
translation in the level 0 descriptor.

Obtained from:	ABT Systems Ltd
Sponsored by:	The FreeBSD Foundation
2016-03-18 10:01:25 +00:00
Wojciech Macek
f54153bb08 pmap arm64: fixing pmap_invalidate_range
It seems that if range within one page is given this page will not be
invalidated at all. Clean it up.

Submitted by:          Dominik Ermel <der@semihalf.com>
Obtained from:         Semihalf
Sponsored by:          Cavium
Reviewed by:           wma, zbb
Approved by:           cognet (mentor)
Differential Revision: https://reviews.freebsd.org/D5569
2016-03-14 07:26:38 +00:00
John Baldwin
6fc8053f1a Fix reporting of the CloudABI ABI in kdump.
- Advertise the word size for CloudABI ABIs via the SV_LP64 flag.  All of
  the other ABIs include either SV_ILP32 or SV_LP64.
- Fix kdump to not assume a 32-bit ABI if the ABI flags field is non-zero
  but SV_LP64 isn't set.  Instead, only assume a 32-bit ABI if SV_ILP32 is
  set and fallback to the unknown value of "00" if neither SV_LP64 nor
  SV_ILP32 is set.

Reviewed by:	kib, ed
Differential Revision:	https://reviews.freebsd.org/D5560
2016-03-09 18:38:30 +00:00
Bjoern A. Zeeb
3d9ac4ecfc Force re-routing PCI interrupts (this is for legacy INTx not MSI).
Need this for gem5, but was not needed on real hadrware (yet) as it
was always MSI.

Reviewed by:		andrew, jhb
Discovered by:		andrew
Sponsored by:		DARPA/AFRL
Differential Revision:	https://reviews.freebsd.org/D5494
2016-03-02 15:20:42 +00:00
Wojciech Macek
52ea10af69 Improve ThunderX PEM driver to work on pass2 revision
Things changed:
    * do not allocate 4GB of SLI space, because it's the waste of
      system resources. Allocate only small portions when needed.
    * provide own implementation of activate_resource which performs
      address translation between PCI bus and host PA address space.
      This is temporary solution, should be replaced by bus_map_resource
      once implemented.

Obtained from:         Semihalf
Sponsored by:          Cavium
Approved by:           cognet (mentor)
Reviewed by:           jhb
Differential revision: https://reviews.freebsd.org/D5294
2016-03-02 08:39:59 +00:00
Wojciech Macek
a5d461a398 Get memory ranges from FDT if no EFI API is available on ARM64
Obtained from:         Semihalf
Submitted by:          Michal Stanek <mst@semihalf.com>
Sponsored by:          Annapurna Labs
Approved by:           cognet (mentor)
Reviewed by:           andrew, wma
Differential revision: https://reviews.freebsd.org/D5408
2016-03-01 12:50:24 +00:00
Wojciech Macek
b2552c46b6 Enable SRE_EL2 on ARM64
Enable system register access for EL2. Alpine-V2 is
the first device requiring this to be enabled.
It is also in-sync with Linux initialization code,
and compatible with Alpine-V2 uboot requirements.

Obtained from:         Semihalf
Submitted by:          Michal Stanek <mst@semihalf.com>
Sponsored by:          Annapurna Labs
Approved by:           cognet (mentor)
Reviewed by:           wma
Differential revision: https://reviews.freebsd.org/D5394
2016-03-01 08:15:00 +00:00
Wojciech Macek
30d9287468 Add uart 8250 device to GENERIC arm64 configuration
Obtained from:         Semihalf
Submitted by:          Michal Stanek <mst@semihalf.com>
Sponsored by:          Annapurna Labs
Approved by:           cognet (mentor)
Reviewed by:           zbb, wma
Differential revision: https://reviews.freebsd.org/D5406
2016-03-01 07:06:36 +00:00
Justin Hibbits
e665eafb25 Correct the memory rman ranges to be to BUS_SPACE_MAXADDR
Summary:
As part of the migration of rman_res_t to be typed to uintmax_t, memory ranges
must be clamped appropriately for the bus, to prevent completely bogus addresses
from being used.

This is extracted from D4544.

Reviewed By: cem
Sponsored by:	Alex Perez/Inertial Computing
Differential Revision: https://reviews.freebsd.org/D5134
2016-03-01 02:59:06 +00:00
Wojciech Macek
2445e7c84e Restore ThunderX Pass1.1 PCI changes removed by r295962
If Enhanced Allocation is not used, we can't allocate any random
    range. All internal devices have hardcoded place where they can
    be located within PCI address space. Fortunately, we can read
    this value from BAR.

Obtained from:         Semihalf
Sponsored by:          Cavium
Approved by:           cognet (mentor)
Reviewed by:           zbb
Differential revision: https://reviews.freebsd.org/D5455
2016-02-26 12:16:11 +00:00
Wojciech Macek
fb05500b24 Make pci_host_generic and thunderx_pci common
* provided OFW interface for pci_host_generic (for handling devices which are present in DTS under the PCI node)
  * removed support for internal PCI from arm64/cavium
  * cleaned up and made most of the code common

Obtained from:         Semihalf
Sponsored by:          Cavium
Approved by:           cognet (mentor)
Reviewed by:           zbb
Differential revision: https://reviews.freebsd.org/D5261
2016-02-24 06:05:30 +00:00
Wojciech Macek
0422bce8b2 Add Intel 10Gb support to ARM64 GENERIC kernel config
Obtained from:         Semihalf
Sponsored by:          Cavium
Approved by:           cognet (mentor)
Reviewed by:           zbb
Differential revision: https://reviews.freebsd.org/D5347
2016-02-22 13:34:43 +00:00
Svatopluk Kraus
35a0bc1260 As <machine/vmparam.h> is included from <vm/vm_param.h>, there is no
need to include it explicitly when <vm/vm_param.h> is already included.

Suggested by:	alc
Reviewed by:	alc
Differential Revision:	https://reviews.freebsd.org/D5379
2016-02-22 09:08:04 +00:00
Svatopluk Kraus
d6849317c5 As <machine/param.h> is included from <sys/param.h>, there is no need
to include it explicitly when <sys/param.h> is already included.

Reviewed by:	alc, kib
Differential Revision:	https://reviews.freebsd.org/D5378
2016-02-22 09:04:36 +00:00
Svatopluk Kraus
a1e1814d76 As <machine/pmap.h> is included from <vm/pmap.h>, there is no need to
include it explicitly when <vm/pmap.h> is already included.

Reviewed by:	alc, kib
Differential Revision:	https://reviews.freebsd.org/D5373
2016-02-22 09:02:20 +00:00
Justin Hibbits
7915adb560 Introduce a RMAN_IS_DEFAULT_RANGE() macro, and use it.
This simplifies checking for default resource range for bus_alloc_resource(),
and improves readability.

This is part of, and related to, the migration of rman_res_t from u_long to
uintmax_t.

Discussed with:	jhb
Suggested by:	marcel
2016-02-20 01:32:58 +00:00
Zbigniew Bodek
b998c9656b Introduce bus_get_bus_tag() method
Provide bus_get_bus_tag() for sparc64, powerpc, arm, arm64 and mips
nexus and its children in order to return a platform specific default tag.

This is required to ensure generic correctness of the bus_space tag.
It is especially needed for arches where child bus tag does not match
the parent bus tag. This solves the problem with ppc architecture
where the PCI bus tag differs from parent bus tag which is big-endian.

This commit is a part of the following patch:
https://reviews.freebsd.org/D4879

Submitted by:  Marcin Mazurek <mma@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Annapurna Labs
Reviewed by:   jhibbits, mmel
Differential Revision: https://reviews.freebsd.org/D4879
2016-02-18 13:00:04 +00:00
Wojciech Macek
98bc9384c5 Fix ThunderX external PEM bus offset
Obtained from:         Semihalf
Sponsored by:          Cavium
Approved by:           cognet (mentor)
Reviewed by:           zbb
Differential revision: https://reviews.freebsd.org/D5293
2016-02-18 11:26:08 +00:00
Svatopluk Kraus
1413a3ab64 Remove pd_prot and pd_cache members from struct arm_devmap_entry.
The struct is used for definition of static device mappings which
should always have same protection and attributes.
2016-02-17 12:36:24 +00:00
Andrew Turner
45fd186285 Allow callers of OF_decode_addr to get the size of the found mapping. This
will allow for code that uses the old fdt_get_range and fdt_regsize
functions to find a range, map it, access, then unmap to replace this, up
to and including the map, with a call to OF_decode_addr.

As this function should only be used in the early boot code the unmap is
mostly do document we no longer need the mapping as it's a no-op, at least
on arm.

Reviewed by:	jhibbits
Sponsored by:	ABT Systems Ltd
Differential Revision:	https://reviews.freebsd.org/D5258
2016-02-16 15:18:12 +00:00
Zbigniew Bodek
9ccaab6db5 Support PEM that is not a PCI endpoint on ThunderX
Some chip revisions don't have their external PCIe buses
behind the internal bridge. Add support for FDT-configurable
PEMs but keep ability for PCIe enumeration.

Reviewed by:   andrew, wma
Obtained from: Semihalf
Sponsored by:  Cavium
Differential Revision: https://reviews.freebsd.org/D5285
2016-02-16 11:43:57 +00:00
Andrew Turner
770fd1c976 Only update curthread and curpcb after we have finished using the old
values.

If switching from a thread that used floating-point registers to a thread
that is still running, but holding the blocked_lock lock we would switch
the curthread to the new (running) thread, then call critical_enter. This
will non-atomically increment td_critnest, and later call critical_exit to
non-atomically decrement this value.

This can happen at the same time as the new thread is still running on the
old core, also calling these functions. In this case there will be a race
between these non-atomic operations. This can be an issue as we could loose
one of these operations leading to the value to not return to zero.

If, later on, we then hit a data abort we check if the td_critnest is zero.
If this check fails we will panic the kernel.

This has been observed when running pcmstat on a Cavium ThunderX. The pcm
thread will use the blocked_lock lock and there is a high chance userspace
will use the floating-point registers. When, later on, pmcstat triggers a
data abort we will hit this panic.

The fix is to update these values after storing the floating-point state.
This means we use the correct curthread while storing the state so it will
not be an issue that the changes to td_critnest are non-atomic.

Sponsored by:	ABT Systems Ltd
2016-02-12 12:38:04 +00:00
Zbigniew Bodek
6cd36342c0 Support interrupts binding in GICv3 and ITS
- Add MOVI command and routine for the LPI migration
- Allow to search for the ITS device descriptor using
  not only devID but also LPI number.
- Bind SPIs in the Distributor
- Don't bind its_dev to collection. Keep track of the collection
  IDs for each LPI.

Reviewed by:   wma
Obtained from: Semihalf
Sponsored by:  Cavium
Differential Revision: https://reviews.freebsd.org/D5231
2016-02-11 12:04:58 +00:00
Zbigniew Bodek
907a0579aa Implement finer locking in ITS
- Change locks' names to be more suitable
- Don't use blocking mutex. Lock only basic operations such
  as lists or bitmaps modifications.

Reviewed by:   wma
Obtained from: Semihalf
Sponsored by:  Cavium
Differential Revision: https://reviews.freebsd.org/D5230
2016-02-11 12:03:11 +00:00
Zbigniew Bodek
47a1ff355e Initially bind all interrupts to the boot CPU when using GICv3
This should be done by routing all interrupts to CPU0,
different assignment will be induced by either interrupts
shuffling or bus_bind_intr().

Reviewed by:   wma
Obtained from: Semihalf
Sponsored by:  Cavium
Differential Revision: https://reviews.freebsd.org/D5229
2016-02-11 12:01:33 +00:00
Zbigniew Bodek
55bdcadded Call pmc_hook() correctly in the ARM64 interrupt handler
pmc_hook() was called only in case of the stray interrupt but should
rather be called on each interrupt. Move in to the arm_cpu_intr()
handler, out of the critical section too.

Reviewed by:   br
Obtained from: Semihalf
Sponsored by:  Cavium
Differential Revision: https://reviews.freebsd.org/D5161
2016-02-11 11:59:32 +00:00
Zbigniew Bodek
be7aab76ec Introduce bus_bind_intr method for ARM64
It can be used to bind specific interrupt to a particular CPU.
Requires PIC support for interrupts binding.

Reviewed by:   wma
Obtained from: Semihalf
Sponsored by:  Cavium
Differential Revision: https://reviews.freebsd.org/D5122
2016-02-11 11:58:27 +00:00
Zbigniew Bodek
513411c9f5 Fix bugs in interrupts allocation on ARM64
Separate interrupt descriptors lookup from allocation. It was possible
to perform config on non-existing interrupt simply by allocating spurious
descriptor.
Must lock the interrupt descriptors table lookup to avoid mismatches.
This ought to prevent trouble while setting up new interrupt
and dispatching existing one.
Use spin mutex rather than sleep mutex. This is mainly due to lock in
arm_dispatch_intr.
This should be eventually changed to a lock-less solution without
walking through a linked list on each interrupt.

Reviewed by:   andrew, wma
Obtained from: Semihalf
Sponsored by:  Cavium
Differential Revision: https://reviews.freebsd.org/D5121
2016-02-11 11:57:13 +00:00
Zbigniew Bodek
8133eda921 Minor clean-ups for ARM64 GICv3 and GIC drivers
GICv3:
- move ICC_SGI1R_EL1 definitions to armreg.h and use proper system
  register's names
GIC:
- remove unused functions

Reviewed by:   andrew
Obtained from: Semihalf
Sponsored by:  Cavium
Differential Revision: https://reviews.freebsd.org/D5119
2016-02-11 11:55:37 +00:00
Wojciech Macek
c7fc655f3f ARM64 disassembler: support for LDR instructions
Implemented disassembly for a whole bunch of
    various ldr instructions.

Obtained from:         Semihalf
Sponsored by:          Cavium
Approved by:           cognet (mentor)
Reviewed by:           zbb
Differential revision: https://reviews.freebsd.org/D5217
2016-02-11 06:50:11 +00:00