[PowerPC] Do not emit HW loop when TLS var accessed in PHI of loop
exit
If any PHI nodes in loop exit blocks have incoming values from the
loop that are accesses of TLS variables with local dynamic or general
dynamic TLS model, the address will be computed inside the loop.
Since this includes a call to __tls_get_addr, this will in turn cause
the CTR loops verifier to complain. Disable CTR loops in such cases.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=48527
This should fix building ceph 12.2.12 on powerpc64, powerpc, powerpcspe
and powerpc64le.
Requested by: pkubaj
MFC after: 3 days
Implement computeHostNumHardwareThreads() for FreeBSD
This retrieves CPU affinity via FreeBSD's cpuset(2) API, and makes
LLVM respect affinity settings configured by the user via the
cpuset(1) command.
In particular, this allows to reduce the number of threads used on
machines with high core counts, which can interact badly with
parallelized build systems. This is particularly noticable with lld,
which spawns lots of threads even for linking e.g. hello_world!
This fix is related to PR48193, but does not adress the more
fundamental problem, which is that LLVM by default grabs as many CPUs
and/or threads as possible.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D92271
Originally by: mjg
MFC after: 1 week
landed upstream:
For llvm's internal function which retrieves the number of available
"hardware threads", use cpuset_getaffinity(2) on FreeBSD, so it will
honor processor sets configured by the cpuset(1) command.
This should make it possible to avoid e.g. lld creating a huge number of
threads on a machine with many cores, even for linking simple programs.
This will also be submitted upstream.
Submitted by: mjg
"hardware threads", use cpuset_getaffinity(2) on FreeBSD, so it will
honor processor sets configured by the cpuset(1) command.
This should make it possible to avoid e.g. lld creating a huge number of
threads on a machine with many cores, even for linking simple programs.
This will also be submitted upstream.
Submitted by: mjg
MFC after: 1 week
[PowerPC] Skip combining (uint_to_fp x) if x is not simple type
Current powerpc64le backend hits
```
Combining: t7: f64 = uint_to_fp t6
llc: llvm-project/llvm/include/llvm/CodeGen/ValueTypes.h:291:
llvm::MVT llvm::EVT::getSimpleVT() const: Assertion `isSimple() &&
"Expected a SimpleValueType!"' failed.
```
This patch fixes it by skipping combination if `t6` is not simple
type.
Fixed https://bugs.llvm.org/show_bug.cgi?id=47660.
Reviewed By: #powerpc, steven.zhang
Differential Revision: https://reviews.llvm.org/D88388
This should fix the llvm assertion mentioned above when building the
following ports for powerpc64le:
* audio/traverso
* databases/percona57-pam-for-mysql
* databases/percona57-server
* emulators/citra
* emulators/citra-qt5
* games/7kaa
* graphics/dia
* graphics/mandelbulber
* graphics/pcl-pointclouds
* net-p2p/libtorrent-rasterbar
* textproc/htmldoc
Requested by: pkubaj
MFC after: 3 days
[X86] Place new constant node in topological order in
X86DAGToDAGISel::matchBitExtract
Fixes PR47482
This should fix 'Assertion failed: (Op->getNodeId() != -1 && "Node has
already selected predecessor node"), function DoInstructionSelection,
file
/usr/src/contrib/llvm-project/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp,
line 1149' when compiling part of the project_painter project, while
targeting the bdver2 (or higher) CPU.
Reported by: jkim
MFC after: 6 weeks
X-MFC-With: r364284
[X86] SSE4_A should only imply SSE3 not SSSE3 in the frontend.
SSE4_1 and SSE4_2 due imply SSSE3. So I guess I got confused when
switching the code to being table based in D83273.
Fixes PR47464
This should fix builds with -march=amdfam10 emitting SSSE3 instructions
such as pshufb, which lead to programs crashing with SIGILL on such
processors.
Reported by: avg
MFC after: 6 weeks
X-MFC-With: r364284
Eliminate the sizing template parameter N from CoalescingBitVector
Since the parameter is not used anywhere, and the default size of 16
apparently causes PR47359, remove it. This ensures that IntervalMap
will automatically determine the optimal size, using its NodeSizer
struct.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D87044
This should fix 'Assertion failed: (Elements + Grow <= Nodes * Capacity
&& "Not enough room for elements"), function distribute, file
/usr/src/contrib/llvm-project/llvm/lib/Support/IntervalMap.cpp, line
123.' when building the x11-toolkits/py-wxPython40 port on a i386 host.
Reported by: zeising
MFC after: 6 weeks
X-MFC-With: r364284
[PowerPC] Fix a typo for InstAlias of mfsprg
D77531 has a type for mfsprg, it should be mtsprg. This patch is to
fix this typo.
This should fix booting powerpc64 kernels, after LLVM 11 was imported.
PR: 248763
master 2e10b7a39b9, the last commit before the llvmorg-12-init tag, from
which release/11.x was branched.
Note that for now, I rolled back all our local changes to make merging
easier, and I will reapply the still-relevant ones after updating to
11.0.0-rc1.
llvmorg-10.0.1-rc2-0-g77d76b71d7d.
Also add a few more llvm utilities under WITH_CLANG_EXTRAS:
* llvm-dwp, a utility for merging DWARF 5 Split DWARF .dwo files into
.dwp (DWARF package files)
* llvm-size, a size(1) replacement
* llvm-strings, a strings(1) replacement
MFC after: 3 weeks
[BasicAA] Make BasicAA a cfg pass.
Summary:
Part of the changes in D44564 made BasicAA not CFG only due to it
using PhiAnalysisValues which may have values invalidated. Subsequent
patches (rL340613) appear to have addressed this limitation.
BasicAA should not be invalidated by non-CFG-altering passes. A
concrete example is MemCpyOpt which preserves CFG, but we are testing
it invalidates BasicAA.
llvm-dev RFC:
https://groups.google.com/forum/#!topic/llvm-dev/eSPXuWnNfzM
Reviewers: john.brawn, sebpop, hfinkel, brzycki
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74353
This fixes an issue with clang's -fintegrated-cc1 feature, which could
make it output slightly different assembly code, depending on the way it
was invoked.
In r361755 we attempted to work around it by disabling the integrated
cc1 stage, but it did not solve the root cause for all situations.
Extensive testing and bisecting showed that the above change finally
makes the output deterministic, even if -fintegrated-cc1 is on.
Reported by: Fabian Keil <fk@fabiankeil.de>
PR: 246630
MFC after: 3 days
getMainExecutable: Fix hand-rolled AT_EXECPATH for older FreeBSD
Once we hit AT_NULL, we need to bail out of the loop; not just the
enclosing switch. This fixes basic usage (e.g. `cc --version`) when
AT_EXECPATH isn't present on older branches (e.g. under
emu-user-static, at the moment), where we would previously run off
the end of ::environ.
Patch By: kevans
Reviewed By: arichardson
Differential Revision: https://reviews.llvm.org/D79239
MFC after: 3 days
unsupported relocation on symbol" when assembling arm 'adr' pseudo
instructions. However, the upstream commit did not take big-endian arm
into account.
Applying the same changes to the big-endian handling is straightforward,
thanks to Andrew Turner and Peter Smith for the hint. This will also be
submitted upstream.
MFC after: immediately, since this fix is meant for stable/11
[ARM] Only produce qadd8b under hasV6Ops
When compiling for a arm5te cpu from clang, the +dsp attribute is
set. This meant we could try and generate qadd8 instructions where we
would end up having no pattern. I've changed the condition here to be
hasV6Ops && hasDSP, which is what other parts of ARMISelLowering seem
to use for similar instructions.
Fixed PR45677.
This fixes "fatal error: error in backend: Cannot select: t37: i32 =
ARMISD::QADD8b t43, t44" when compiling sys/dev/sound/pcm/feeder_mixer.c
for armv5. For some reason we do not encounter this on head, but this
error popped up while building universes for stable/12.
MFC after: 3 days
[PowerPC] Do not attempt to reuse load for 64-bit FP_TO_UINT without
FPCVT
We call the function that attempts to reuse the conversion without
checking whether the target matches the constraints that the callee
expects. This patch adds the check prior to the call.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=43976
Differential revision: https://reviews.llvm.org/D77564
This should fix 'Assertion failed: ((Op.getOpcode() == ISD::FP_TO_SINT
|| Subtarget.hasFPCVT()) && "i64 FP_TO_UINT is supported only with
FPCVT"), function LowerFP_TO_INTForReuse, file
/usr/src/contrib/llvm/lib/Target/PowerPC/PPCISelLowering.cpp, line 7276'
when building the devel/libslang2 port (and a few others) for PowerPC64.
Requested by: pkubaj
MFC after: 6 weeks
X-MFC-With: 358851
Fix bots after a9ad65a2b34f
In the last commit, I neglected to initialize the new subtarget
feature I added which caused failures on a few bots. This should fix
that.
This unbreaks the build after r359981, which reverted upstream commit
a9ad65a2b34f.
Reported by: jhibbits (and jenkins :)
MFC after: 6 weeks
X-MFC-With: 358851
[PowerPC] Change default for unaligned FP access for older subtargets
This is a fix for https://bugs.llvm.org/show_bug.cgi?id=40554
Some CPU's trap to the kernel on unaligned floating point access and
there are kernels that do not handle the interrupt. The program then
fails with a SIGBUS according to the PR. This just switches the
default for unaligned access to only allow it on recent server CPUs
that are known to allow this.
Differential revision: https://reviews.llvm.org/D71954
This upstream commit causes a compiler hang when building certain ports
(e.g. security/nss, multimedia/x264) for powerpc64. The hang has been
reported in https://bugs.llvm.org/show_bug.cgi?id=45186, but in the mean
time it is more convenient to revert the commit.
Requested by: jhibbits
MFC after: 6 weeks
X-MFC-With: 358851
[PowerPC]: Don't allow r0 as a target for LD_GOT_TPREL_L/32
Summary:
The linker is free to relax this (relocation R_PPC_GOT_TPREL16)
against R_PPC_TLS, if it sees fit (initial exec to local exec). If r0
is used, this can generate execution-invalid code (converts to 'addi
%rX, %r0, FOO, which translates in PPC-lingo to li %rX, FOO). Forbid
this instead.
This fixes static binaries using locales on FreeBSD/powerpc (tested
on FreeBSD/powerpcspe).
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D76662
Requested by: jhibbits
MFC after: 6 weeks
X-MFC-With: 358851
[PowerPC]: e500 target can't use lwsync, use msync instead
The e500 core has a silicon bug that triggers an illegal instruction
program trap on any sync other than msync. Other cores will typically
ignore illegal sync types, and the documentation even implies that
the 'illegal' bits are ignored.
Address this hardware deficiency by only using msync, like the PPC440.
Differential Revision: https://reviews.llvm.org/D76614
Requested by: jhibbits
MFC after: 6 weeks
X-MFC-With: 358851
[EarlyCSE] avoid crashing when detecting min/max/abs patterns (PR41083)
As discussed in PR41083:
https://bugs.llvm.org/show_bug.cgi?id=41083
...we can assert/crash in EarlyCSE using the current hashing scheme
and instructions with flags.
ValueTracking's matchSelectPattern() may rely on overflow (nsw, etc)
or other flags when detecting patterns such as min/max/abs composed
of compare+select. But the value numbering / hashing mechanism used
by EarlyCSE intersects those flags to allow more CSE.
Several alternatives to solve this are discussed in the bug report.
This patch avoids the issue by doing simple matching of min/max/abs
patterns that never requires instruction flags. We give up some CSE
power because of that, but that is not expected to result in much
actual performance difference because InstCombine will canonicalize
these patterns when possible. It even has this comment for abs/nabs:
/// Canonicalize all these variants to 1 pattern.
/// This makes CSE more likely.
(And this patch adds PhaseOrdering tests to verify that the expected
transforms are still happening in the standard optimization
pipelines.
I left this code to use ValueTracking's "flavor" enum values, so we
don't have to change the callers' code. If we decide to go back to
using the ValueTracking call (by changing the hashing algorithm
instead), it should be obvious how to replace this chunk.
Differential Revision: https://reviews.llvm.org/D74285
This fixes an assertion when building the math/gsl port on PowerPC64.
Requested by: pkubja
MFC after: 6 weeks
X-MFC-With: 358851
[MC][ARM] Resolve some pcrel fixups at assembly time (PR44929)
MC currently does not emit these relocation types, and lld does not
handle them. Add FKF_Constant as a work-around of some ARM code after
D72197. Eventually we probably should implement these relocation
types.
By Fangrui Song!
Differential revision: https://reviews.llvm.org/D72892
This re-enables using the arm 'adr' pseudo instruction on global symbols
again. It was broken as a side-effect of upstream commit 2bfee35cb,
which lead to "error: unsupported relocation on symbol" when assembling
such constructs, which are used in e.g. sys/arm/arm/locore-v[46].S.
PR: 244251
Add 8548 CPU definition and attributes
8548 CPU is GCC's name for the e500v2, so accept this in clang. The
e500v2 doesn't support lwsync, so define __NO_LWSYNC__ for this as
well, as GCC does.
Differential Revision: https://reviews.llvm.org/D67787
Merge commit ff0311c4b from llvm git (by Justin Hibbits):
[PowerPC]: Add powerpcspe target triple subarch component
Summary:
This allows the use of '-target powerpcspe-unknown-linux-gnu' or
'powerpcspe-unknown-freebsd' to be used, instead of '-target
powerpc-unknown-linux-gnu -mspe'.
Reviewed By: dim
Differential Revision: https://reviews.llvm.org/D72014
Merge commit ba91dffaf from llvm git (by Fangrui Song):
[Driver][PowerPC] Move powerpcspe logic from cc1 to Driver
Follow-up of D72014. It is more appropriate to use a target feature
instead of a SubTypeArch to express the difference.
Reviewed By: #powerpc, jhibbits
Differential Revision: https://reviews.llvm.org/D72433
commit 36eedfcb3 from llvm git (by Justin Hibbits):
[PowerPC] Fix powerpcspe subtarget enablement in llvm backend
Summary:
As currently written, -target powerpcspe will enable SPE regardless
of disabling the feature later on in the command line. Instead,
change this to just set a default CPU to 'e500' instead of a generic
CPU.
As part of this, add FeatureSPE to the e500 definition.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D72673
These are needed to unbreak the build for powerpcspe.
Requested by: jhibbits
MFC after: 1 week
[MIPS][ELF] Use PC-relative relocations in .eh_frame when possible
When compiling position-independent executables, we now use
DW_EH_PE_pcrel | DW_EH_PE_sdata4. However, the MIPS ABI does not define a
64-bit PC-relative ELF relocation so we cannot use sdata8 for the large
code model case. When using the large code model, we fall back to the
previous behaviour of generating absolute relocations.
With this change clang-generated .o files can be linked by LLD without
having to pass -Wl,-z,notext (which creates text relocations).
This is simpler than the approach used by ld.bfd, which rewrites the
.eh_frame section to convert absolute relocations into relative references.
I saw in D13104 that apparently ld.bfd did not accept pc-relative relocations
for MIPS ouput at some point. However, I also checked that recent ld.bfd
can process the clang-generated .o files so this no longer seems true.
Reviewed By: atanasyan
Differential Revision: https://reviews.llvm.org/D72228
Merge commit 8e8ccf47 from llvm git (by me)
[MIPS] Don't emit R_(MICRO)MIPS_JALR relocations against data symbols
The R_(MICRO)MIPS_JALR optimization only works when used against functions.
Using the relocation against a data symbol (e.g. function pointer) will
cause some linkers that don't ignore the hint in this case (e.g. LLD prior
to commit 5bab291) to generate a relative branch to the data symbol
which crashes at run time. Before this patch, LLVM was erroneously emitting
these relocations against local-dynamic TLS function pointers and global
function pointers with internal visibility.
Reviewers: atanasyan, jrtc27, vstefanovic
Reviewed By: atanasyan
Differential Revision: https://reviews.llvm.org/D72571
These two changes should allow using lld for MIPS64 (and maybe also MIPS32)
by default.
The second commit is not strictly necessary for clang+lld since LLD9 will
not perform the R_MIPS_JALR optimization (it was only added for 10) but it
is probably required in order to use recent ld.bfd.
Reviewed By: dim, emaste
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D23203
[mips] Use less registers to load address of TargetExternalSymbol
There is no pattern matched `add hi, (MipsLo texternalsym)`. As a
result, loading an address of 32-bit symbol requires two registers
and one more additional instruction:
```
addiu $1, $zero, %lo(foo)
lui $2, %hi(foo)
addu $25, $2, $1
```
This patch adds the missed pattern and enables generation more
effective set of instructions:
```
lui $1, %hi(foo)
addiu $25, $1, %lo(foo)
```
Differential Revision: https://reviews.llvm.org/D66771
llvm-svn: 370196
Merge commit 59bb3609f from llvm git (by Simon Atanasyan):
[mips] Fix 64-bit address loading in case of applying 32-bit mask to
the result
If result of 64-bit address loading combines with 32-bit mask, LLVM
tries to optimize the code and remove "redundant" loading of upper
32-bits of the address. It leads to incorrect code on MIPS64 targets.
MIPS backend creates the following chain of commands to load 64-bit
address in the `MipsTargetLowering::getAddrNonPICSym64` method:
```
(add (shl (add (shl (add %highest(sym), %higher(sym)),
16),
%hi(sym)),
16),
%lo(%sym))
```
If the mask presents, LLVM decides to optimize the chain of commands.
It really does not make sense to load upper 32-bits because the
0x0fffffff mask anyway clears them. After removing redundant commands
we get this chain:
```
(add (shl (%hi(sym), 16), %lo(%sym))
```
There is no patterns matched `(MipsHi (i64 symbol))`. Due a bug in
`SYM_32` predicate definition, backend incorrectly selects a pattern
for a 32-bit symbols and uses the `lui` instruction for loading
`%hi(sym)`.
As a result we get incorrect set of instructions with unnecessary
16-bit left shifting:
```
lui at,0x0
R_MIPS_HI16 foo
dsll at,at,0x10
daddiu at,at,0
R_MIPS_LO16 foo
```
This patch resolves two problems:
- Fix `SYM_32/SYM_64` predicates to prevent selection of patterns
dedicated to 32-bit symbols in case of using N64 ABI.
- Add missed patterns for 64-bit symbols for `%hi/%lo`.
Fix PR42736.
Differential Revision: https://reviews.llvm.org/D66228
llvm-svn: 370268
These two commits fix a miscompilation of the kernel for mips64, and
should allow clang to be used as the default compiler for mips64.
Requested by: arichards
MFC after: 3 days
[RISCV] Handle fcopysign(f32, f64) and fcopysign(f64, f32)
Summary: Adds tablegen patterns to explicitly handle fcopysign where
the magnitude and sign arguments have different types, due to the
sign value casts being removed the by DAGCombiner. Support for RV32IF
follows in a separate commit. Adds tests for all relevant scenarios
except RV32IF.
Reviewers: lenary
Reviewed By: lenary
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70678
This is a prerequisite for building and linking hard- and soft-float
riscv worlds with clang and lld.
Requested by: jhb
MFC after: 1 week
X-MFC-With: r353358