Vendor import of llvm release_39 branch r279477:
https://llvm.org/svn/llvm-project/llvm/branches/release_39@279477
This commit is contained in:
parent
4b931d8cdf
commit
2680a82a99
@ -5,12 +5,6 @@ LLVM 3.9 Release Notes
|
||||
.. contents::
|
||||
:local:
|
||||
|
||||
.. warning::
|
||||
These are in-progress notes for the upcoming LLVM 3.9 release. You may
|
||||
prefer the `LLVM 3.8 Release Notes <http://llvm.org/releases/3.8.0/docs
|
||||
/ReleaseNotes.html>`_.
|
||||
|
||||
|
||||
Introduction
|
||||
============
|
||||
|
||||
@ -26,11 +20,6 @@ have questions or comments, the `LLVM Developer's Mailing List
|
||||
<http://lists.llvm.org/mailman/listinfo/llvm-dev>`_ is a good place to send
|
||||
them.
|
||||
|
||||
Note that if you are reading this file from a Subversion checkout or the main
|
||||
LLVM web page, this document applies to the *next* release, not the current
|
||||
one. To see the release notes for a specific release, please see the `releases
|
||||
page <http://llvm.org/releases/>`_.
|
||||
|
||||
Non-comprehensive list of changes in this release
|
||||
=================================================
|
||||
* The LLVMContext gains a new runtime check (see
|
||||
@ -45,10 +34,10 @@ Non-comprehensive list of changes in this release
|
||||
please see the documentation on :doc:`CMake`. For information about the CMake
|
||||
language there is also a :doc:`CMakePrimer` document available.
|
||||
|
||||
* .. note about C API functions LLVMParseBitcode,
|
||||
LLVMParseBitcodeInContext, LLVMGetBitcodeModuleInContext and
|
||||
LLVMGetBitcodeModule having been removed. LLVMGetTargetMachineData has been
|
||||
removed (use LLVMGetDataLayout instead).
|
||||
* C API functions LLVMParseBitcode,
|
||||
LLVMParseBitcodeInContext, LLVMGetBitcodeModuleInContext and
|
||||
LLVMGetBitcodeModule having been removed. LLVMGetTargetMachineData has been
|
||||
removed (use LLVMGetDataLayout instead).
|
||||
|
||||
* The C API function LLVMLinkModules has been removed.
|
||||
|
||||
@ -68,52 +57,35 @@ Non-comprehensive list of changes in this release
|
||||
iterator to the next instruction instead of ``void``. Targets that previously
|
||||
did ``MBB.erase(I); return;`` now probably want ``return MBB.erase(I);``.
|
||||
|
||||
* ``SelectionDAGISel::Select`` now returns ``void``. Out of tree targets will
|
||||
* ``SelectionDAGISel::Select`` now returns ``void``. Out-of-tree targets will
|
||||
need to be updated to replace the argument node and remove any dead nodes in
|
||||
cases where they currently return an ``SDNode *`` from this interface.
|
||||
|
||||
* Raised the minimum required CMake version to 3.4.3.
|
||||
|
||||
* Added the MemorySSA analysis, which hopes to replace MemoryDependenceAnalysis.
|
||||
It should provide higher-quality results than MemDep, and be algorithmically
|
||||
faster than MemDep. Currently, GVNHoist (which is off by default) makes use of
|
||||
MemorySSA.
|
||||
|
||||
.. NOTE
|
||||
For small 1-3 sentence descriptions, just add an entry at the end of
|
||||
this list. If your description won't fit comfortably in one bullet
|
||||
point (e.g. maybe you would like to give an example of the
|
||||
functionality, or simply have a lot to talk about), see the `NOTE` below
|
||||
for adding a new subsection.
|
||||
|
||||
* ... next change ...
|
||||
|
||||
.. NOTE
|
||||
If you would like to document a larger change, then you can add a
|
||||
subsection about it right here. You can copy the following boilerplate
|
||||
and un-indent it (the indentation causes it to be inside this comment).
|
||||
|
||||
Special New Feature
|
||||
-------------------
|
||||
|
||||
Makes programs 10x faster by doing Special New Thing.
|
||||
* The minimum density for lowering switches with jump tables has been reduced
|
||||
from 40% to 10% for functions which are not marked ``optsize`` (that is,
|
||||
compiled with ``-Os``).
|
||||
|
||||
GCC ABI Tag
|
||||
-----------
|
||||
|
||||
Recently, many of the Linux distributions (ex. `Fedora <http://developerblog.redhat.com/2015/02/10/gcc-5-in-fedora/>`_,
|
||||
Recently, many of the Linux distributions (e.g. `Fedora <http://developerblog.redhat.com/2015/02/10/gcc-5-in-fedora/>`_,
|
||||
`Debian <https://wiki.debian.org/GCC5>`_, `Ubuntu <https://wiki.ubuntu.com/GCC5>`_)
|
||||
have moved on to use the new `GCC ABI <https://gcc.gnu.org/onlinedocs/gcc/C_002b_002b-Attributes.html>`_
|
||||
to work around `C++11 incompatibilities in libstdc++ <https://gcc.gnu.org/onlinedocs/libstdc++/manual/using_dual_abi.html>`_.
|
||||
This caused `incompatibility problems <https://gcc.gnu.org/ml/gcc-patches/2015-04/msg00153.html>`_
|
||||
with other compilers (ex. Clang), which needed to be fixed, but due to the
|
||||
with other compilers (e.g. Clang), which needed to be fixed, but due to the
|
||||
experimental nature of GCC's own implementation, it took a long time for it to
|
||||
land in LLVM (`here <https://reviews.llvm.org/D18035>`_ and
|
||||
`here <https://reviews.llvm.org/D17567>`_), not in time for the 3.8 release.
|
||||
land in LLVM (`D18035 <https://reviews.llvm.org/D18035>`_ and
|
||||
`D17567 <https://reviews.llvm.org/D17567>`_), not in time for the 3.8 release.
|
||||
|
||||
Those patches are now present in the 3.9.0 release and should be working on the
|
||||
Those patches are now present in the 3.9.0 release and should be working in the
|
||||
majority of cases, as they have been tested thoroughly. However, some bugs were
|
||||
`filled in GCC <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71712>`_ and have not
|
||||
`filed in GCC <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71712>`_ and have not
|
||||
yet been fixed, so there may be corner cases not covered by either GCC or Clang.
|
||||
Bug fixes to those problems should be reported in Bugzilla (either LLVM or GCC),
|
||||
and patches to LLVM's trunk are very likely to be back-ported to future 3.9.x
|
||||
@ -131,6 +103,10 @@ Changes to the LLVM IR
|
||||
``llvm.masked.gather`` and ``llvm.masked.scatter`` were introduced to the
|
||||
LLVM IR to allow selective memory access for vector data types.
|
||||
|
||||
* The new ``notail`` attribute prevents optimization passes from adding ``tail``
|
||||
or ``musttail`` markers to a call. It is used to prevent tail call
|
||||
optimization from being performed on the call.
|
||||
|
||||
Changes to LLVM's IPO model
|
||||
---------------------------
|
||||
|
||||
@ -145,7 +121,7 @@ Support for ThinLTO
|
||||
-------------------
|
||||
|
||||
LLVM now supports ThinLTO compilation, which can be invoked by compiling
|
||||
and linking with -flto=thin. The gold linker plugin, as well as linkers
|
||||
and linking with ``-flto=thin``. The gold linker plugin, as well as linkers
|
||||
that use the new ThinLTO API in libLTO (like ld64), will transparently
|
||||
execute the ThinLTO backends in parallel threads.
|
||||
For more information on ThinLTO and the LLVM implementation, see the
|
||||
@ -238,7 +214,7 @@ fixes:**
|
||||
Changes to the PowerPC Target
|
||||
-----------------------------
|
||||
|
||||
Moved some optimizations from O3 to O2 (D18562)
|
||||
* Moved some optimizations from O3 to O2 (D18562)
|
||||
|
||||
* Enable sibling call optimization on ppc64 ELFv1/ELFv2 abi
|
||||
|
||||
@ -266,18 +242,6 @@ Changes to the AMDGPU Target
|
||||
* Mesa 11.0.x is no longer supported
|
||||
|
||||
|
||||
Changes to the OCaml bindings
|
||||
-----------------------------
|
||||
|
||||
During this release ...
|
||||
|
||||
Support for attribute 'notail' has been added
|
||||
---------------------------------------------
|
||||
|
||||
This marker prevents optimization passes from adding 'tail' or
|
||||
'musttail' markers to a call. It is used to prevent tail call
|
||||
optimization from being performed on the call.
|
||||
|
||||
External Open Source Projects Using LLVM 3.9
|
||||
============================================
|
||||
|
||||
@ -285,8 +249,6 @@ An exciting aspect of LLVM is that it is used as an enabling technology for
|
||||
a lot of other language and tools projects. This section lists some of the
|
||||
projects that have already been updated to work with LLVM 3.9.
|
||||
|
||||
* A project
|
||||
|
||||
LDC - the LLVM-based D compiler
|
||||
-------------------------------
|
||||
|
||||
|
@ -65,7 +65,7 @@ public:
|
||||
PreservedAnalyses run(Function &F, FunctionAnalysisManager &);
|
||||
|
||||
private:
|
||||
void BuildRankMap(Function &F, ReversePostOrderTraversal<Function *> &RPOT);
|
||||
void BuildRankMap(Function &F);
|
||||
unsigned getRank(Value *V);
|
||||
void canonicalizeOperands(Instruction *I);
|
||||
void ReassociateExpression(BinaryOperator *I);
|
||||
|
@ -4822,6 +4822,10 @@ bool ScalarEvolution::isSCEVExprNeverPoison(const Instruction *I) {
|
||||
// from different loops, so that we know which loop to prove that I is
|
||||
// executed in.
|
||||
for (unsigned OpIndex = 0; OpIndex < I->getNumOperands(); ++OpIndex) {
|
||||
// I could be an extractvalue from a call to an overflow intrinsic.
|
||||
// TODO: We can do better here in some cases.
|
||||
if (!isSCEVable(I->getOperand(OpIndex)->getType()))
|
||||
return false;
|
||||
const SCEV *Op = getSCEV(I->getOperand(OpIndex));
|
||||
if (auto *AddRec = dyn_cast<SCEVAddRecExpr>(Op)) {
|
||||
bool AllOtherOpsLoopInvariant = true;
|
||||
|
@ -1258,8 +1258,11 @@ AArch64LoadStoreOpt::findMatchingInsn(MachineBasicBlock::iterator I,
|
||||
if (MIIsUnscaled) {
|
||||
// If the unscaled offset isn't a multiple of the MemSize, we can't
|
||||
// pair the operations together: bail and keep looking.
|
||||
if (MIOffset % MemSize)
|
||||
if (MIOffset % MemSize) {
|
||||
trackRegDefsUses(MI, ModifiedRegs, UsedRegs, TRI);
|
||||
MemInsns.push_back(&MI);
|
||||
continue;
|
||||
}
|
||||
MIOffset /= MemSize;
|
||||
} else {
|
||||
MIOffset *= MemSize;
|
||||
@ -1424,9 +1427,6 @@ bool AArch64LoadStoreOpt::isMatchingUpdateInsn(MachineInstr &MemMI,
|
||||
default:
|
||||
break;
|
||||
case AArch64::SUBXri:
|
||||
// Negate the offset for a SUB instruction.
|
||||
Offset *= -1;
|
||||
// FALLTHROUGH
|
||||
case AArch64::ADDXri:
|
||||
// Make sure it's a vanilla immediate operand, not a relocation or
|
||||
// anything else we can't handle.
|
||||
@ -1444,6 +1444,9 @@ bool AArch64LoadStoreOpt::isMatchingUpdateInsn(MachineInstr &MemMI,
|
||||
|
||||
bool IsPairedInsn = isPairedLdSt(MemMI);
|
||||
int UpdateOffset = MI.getOperand(2).getImm();
|
||||
if (MI.getOpcode() == AArch64::SUBXri)
|
||||
UpdateOffset = -UpdateOffset;
|
||||
|
||||
// For non-paired load/store instructions, the immediate must fit in a
|
||||
// signed 9-bit integer.
|
||||
if (!IsPairedInsn && (UpdateOffset > 255 || UpdateOffset < -256))
|
||||
@ -1458,13 +1461,13 @@ bool AArch64LoadStoreOpt::isMatchingUpdateInsn(MachineInstr &MemMI,
|
||||
break;
|
||||
|
||||
int ScaledOffset = UpdateOffset / Scale;
|
||||
if (ScaledOffset > 64 || ScaledOffset < -64)
|
||||
if (ScaledOffset > 63 || ScaledOffset < -64)
|
||||
break;
|
||||
}
|
||||
|
||||
// If we have a non-zero Offset, we check that it matches the amount
|
||||
// we're adding to the register.
|
||||
if (!Offset || Offset == MI.getOperand(2).getImm())
|
||||
if (!Offset || Offset == UpdateOffset)
|
||||
return true;
|
||||
break;
|
||||
}
|
||||
|
@ -4033,11 +4033,18 @@ PPCTargetLowering::IsEligibleForTailCallOptimization_64SVR4(
|
||||
if (CalleeCC != CallingConv::Fast && CalleeCC != CallingConv::C)
|
||||
return false;
|
||||
|
||||
// Functions containing by val parameters are not supported.
|
||||
// Caller contains any byval parameter is not supported.
|
||||
if (std::any_of(Ins.begin(), Ins.end(),
|
||||
[](const ISD::InputArg& IA) { return IA.Flags.isByVal(); }))
|
||||
return false;
|
||||
|
||||
// Callee contains any byval parameter is not supported, too.
|
||||
// Note: This is a quick work around, because in some cases, e.g.
|
||||
// caller's stack size > callee's stack size, we are still able to apply
|
||||
// sibling call optimization. See: https://reviews.llvm.org/D23441#513574
|
||||
if (any_of(Outs, [](const ISD::OutputArg& OA) { return OA.Flags.isByVal(); }))
|
||||
return false;
|
||||
|
||||
// No TCO/SCO on indirect call because Caller have to restore its TOC
|
||||
if (!isFunctionGlobalAddress(Callee) &&
|
||||
!isa<ExternalSymbolSDNode>(Callee))
|
||||
|
@ -145,8 +145,7 @@ static BinaryOperator *isReassociableOp(Value *V, unsigned Opcode1,
|
||||
return nullptr;
|
||||
}
|
||||
|
||||
void ReassociatePass::BuildRankMap(
|
||||
Function &F, ReversePostOrderTraversal<Function *> &RPOT) {
|
||||
void ReassociatePass::BuildRankMap(Function &F) {
|
||||
unsigned i = 2;
|
||||
|
||||
// Assign distinct ranks to function arguments.
|
||||
@ -155,6 +154,7 @@ void ReassociatePass::BuildRankMap(
|
||||
DEBUG(dbgs() << "Calculated Rank[" << I->getName() << "] = " << i << "\n");
|
||||
}
|
||||
|
||||
ReversePostOrderTraversal<Function *> RPOT(&F);
|
||||
for (BasicBlock *BB : RPOT) {
|
||||
unsigned BBRank = RankMap[BB] = ++i << 16;
|
||||
|
||||
@ -2172,28 +2172,13 @@ void ReassociatePass::ReassociateExpression(BinaryOperator *I) {
|
||||
}
|
||||
|
||||
PreservedAnalyses ReassociatePass::run(Function &F, FunctionAnalysisManager &) {
|
||||
// Reassociate needs for each instruction to have its operands already
|
||||
// processed, so we first perform a RPOT of the basic blocks so that
|
||||
// when we process a basic block, all its dominators have been processed
|
||||
// before.
|
||||
ReversePostOrderTraversal<Function *> RPOT(&F);
|
||||
BuildRankMap(F, RPOT);
|
||||
// Calculate the rank map for F.
|
||||
BuildRankMap(F);
|
||||
|
||||
MadeChange = false;
|
||||
for (BasicBlock *BI : RPOT) {
|
||||
// Use a worklist to keep track of which instructions have been processed
|
||||
// (and which insts won't be optimized again) so when redoing insts,
|
||||
// optimize insts rightaway which won't be processed later.
|
||||
SmallSet<Instruction *, 8> Worklist;
|
||||
|
||||
// Insert all instructions in the BB
|
||||
for (Instruction &I : *BI)
|
||||
Worklist.insert(&I);
|
||||
|
||||
for (Function::iterator BI = F.begin(), BE = F.end(); BI != BE; ++BI) {
|
||||
// Optimize every instruction in the basic block.
|
||||
for (BasicBlock::iterator II = BI->begin(), IE = BI->end(); II != IE;) {
|
||||
// This instruction has been processed.
|
||||
Worklist.erase(&*II);
|
||||
for (BasicBlock::iterator II = BI->begin(), IE = BI->end(); II != IE;)
|
||||
if (isInstructionTriviallyDead(&*II)) {
|
||||
EraseInst(&*II++);
|
||||
} else {
|
||||
@ -2202,22 +2187,27 @@ PreservedAnalyses ReassociatePass::run(Function &F, FunctionAnalysisManager &) {
|
||||
++II;
|
||||
}
|
||||
|
||||
// If the above optimizations produced new instructions to optimize or
|
||||
// made modifications which need to be redone, do them now if they won't
|
||||
// be handled later.
|
||||
while (!RedoInsts.empty()) {
|
||||
Instruction *I = RedoInsts.pop_back_val();
|
||||
// Process instructions that won't be processed later, either
|
||||
// inside the block itself or in another basic block (based on rank),
|
||||
// since these will be processed later.
|
||||
if ((I->getParent() != BI || !Worklist.count(I)) &&
|
||||
RankMap[I->getParent()] <= RankMap[BI]) {
|
||||
if (isInstructionTriviallyDead(I))
|
||||
EraseInst(I);
|
||||
else
|
||||
OptimizeInst(I);
|
||||
}
|
||||
}
|
||||
// Make a copy of all the instructions to be redone so we can remove dead
|
||||
// instructions.
|
||||
SetVector<AssertingVH<Instruction>> ToRedo(RedoInsts);
|
||||
// Iterate over all instructions to be reevaluated and remove trivially dead
|
||||
// instructions. If any operand of the trivially dead instruction becomes
|
||||
// dead mark it for deletion as well. Continue this process until all
|
||||
// trivially dead instructions have been removed.
|
||||
while (!ToRedo.empty()) {
|
||||
Instruction *I = ToRedo.pop_back_val();
|
||||
if (isInstructionTriviallyDead(I))
|
||||
RecursivelyEraseDeadInsts(I, ToRedo);
|
||||
}
|
||||
|
||||
// Now that we have removed dead instructions, we can reoptimize the
|
||||
// remaining instructions.
|
||||
while (!RedoInsts.empty()) {
|
||||
Instruction *I = RedoInsts.pop_back_val();
|
||||
if (isInstructionTriviallyDead(I))
|
||||
EraseInst(I);
|
||||
else
|
||||
OptimizeInst(I);
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -566,6 +566,12 @@ void llvm::CloneAndPruneIntoFromInst(Function *NewFunc, const Function *OldFunc,
|
||||
if (!I)
|
||||
continue;
|
||||
|
||||
// Skip over non-intrinsic callsites, we don't want to remove any nodes from
|
||||
// the CGSCC.
|
||||
CallSite CS = CallSite(I);
|
||||
if (CS && CS.getCalledFunction() && !CS.getCalledFunction()->isIntrinsic())
|
||||
continue;
|
||||
|
||||
// See if this instruction simplifies.
|
||||
Value *SimpleV = SimplifyInstruction(I, DL);
|
||||
if (!SimpleV)
|
||||
|
@ -82,8 +82,13 @@ static cl::opt<int> MinVectorRegSizeOption(
|
||||
"slp-min-reg-size", cl::init(128), cl::Hidden,
|
||||
cl::desc("Attempt to vectorize for this register size in bits"));
|
||||
|
||||
// FIXME: Set this via cl::opt to allow overriding.
|
||||
static const unsigned RecursionMaxDepth = 12;
|
||||
static cl::opt<unsigned> RecursionMaxDepth(
|
||||
"slp-recursion-max-depth", cl::init(12), cl::Hidden,
|
||||
cl::desc("Limit the recursion depth when building a vectorizable tree"));
|
||||
|
||||
static cl::opt<unsigned> MinTreeSize(
|
||||
"slp-min-tree-size", cl::init(3), cl::Hidden,
|
||||
cl::desc("Only vectorize small trees if they are fully vectorizable"));
|
||||
|
||||
// Limit the number of alias checks. The limit is chosen so that
|
||||
// it has no negative effect on the llvm benchmarks.
|
||||
@ -1842,7 +1847,7 @@ int BoUpSLP::getTreeCost() {
|
||||
VectorizableTree.size() << ".\n");
|
||||
|
||||
// We only vectorize tiny trees if it is fully vectorizable.
|
||||
if (VectorizableTree.size() < 3 && !isFullyVectorizableTinyTree()) {
|
||||
if (VectorizableTree.size() < MinTreeSize && !isFullyVectorizableTinyTree()) {
|
||||
if (VectorizableTree.empty()) {
|
||||
assert(!ExternalUses.size() && "We should not have any external users");
|
||||
}
|
||||
@ -2124,11 +2129,61 @@ void BoUpSLP::reorderInputsAccordingToOpcode(ArrayRef<Value *> VL,
|
||||
}
|
||||
|
||||
void BoUpSLP::setInsertPointAfterBundle(ArrayRef<Value *> VL) {
|
||||
Instruction *VL0 = cast<Instruction>(VL[0]);
|
||||
BasicBlock::iterator NextInst(VL0);
|
||||
++NextInst;
|
||||
Builder.SetInsertPoint(VL0->getParent(), NextInst);
|
||||
Builder.SetCurrentDebugLocation(VL0->getDebugLoc());
|
||||
|
||||
// Get the basic block this bundle is in. All instructions in the bundle
|
||||
// should be in this block.
|
||||
auto *Front = cast<Instruction>(VL.front());
|
||||
auto *BB = Front->getParent();
|
||||
assert(all_of(make_range(VL.begin(), VL.end()), [&](Value *V) -> bool {
|
||||
return cast<Instruction>(V)->getParent() == BB;
|
||||
}));
|
||||
|
||||
// The last instruction in the bundle in program order.
|
||||
Instruction *LastInst = nullptr;
|
||||
|
||||
// Find the last instruction. The common case should be that BB has been
|
||||
// scheduled, and the last instruction is VL.back(). So we start with
|
||||
// VL.back() and iterate over schedule data until we reach the end of the
|
||||
// bundle. The end of the bundle is marked by null ScheduleData.
|
||||
if (BlocksSchedules.count(BB)) {
|
||||
auto *Bundle = BlocksSchedules[BB]->getScheduleData(VL.back());
|
||||
if (Bundle && Bundle->isPartOfBundle())
|
||||
for (; Bundle; Bundle = Bundle->NextInBundle)
|
||||
LastInst = Bundle->Inst;
|
||||
}
|
||||
|
||||
// LastInst can still be null at this point if there's either not an entry
|
||||
// for BB in BlocksSchedules or there's no ScheduleData available for
|
||||
// VL.back(). This can be the case if buildTree_rec aborts for various
|
||||
// reasons (e.g., the maximum recursion depth is reached, the maximum region
|
||||
// size is reached, etc.). ScheduleData is initialized in the scheduling
|
||||
// "dry-run".
|
||||
//
|
||||
// If this happens, we can still find the last instruction by brute force. We
|
||||
// iterate forwards from Front (inclusive) until we either see all
|
||||
// instructions in the bundle or reach the end of the block. If Front is the
|
||||
// last instruction in program order, LastInst will be set to Front, and we
|
||||
// will visit all the remaining instructions in the block.
|
||||
//
|
||||
// One of the reasons we exit early from buildTree_rec is to place an upper
|
||||
// bound on compile-time. Thus, taking an additional compile-time hit here is
|
||||
// not ideal. However, this should be exceedingly rare since it requires that
|
||||
// we both exit early from buildTree_rec and that the bundle be out-of-order
|
||||
// (causing us to iterate all the way to the end of the block).
|
||||
if (!LastInst) {
|
||||
SmallPtrSet<Value *, 16> Bundle(VL.begin(), VL.end());
|
||||
for (auto &I : make_range(BasicBlock::iterator(Front), BB->end())) {
|
||||
if (Bundle.erase(&I))
|
||||
LastInst = &I;
|
||||
if (Bundle.empty())
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
// Set the insertion point after the last instruction in the bundle. Set the
|
||||
// debug location to Front.
|
||||
Builder.SetInsertPoint(BB, next(BasicBlock::iterator(LastInst)));
|
||||
Builder.SetCurrentDebugLocation(Front->getDebugLoc());
|
||||
}
|
||||
|
||||
Value *BoUpSLP::Gather(ArrayRef<Value *> VL, VectorType *Ty) {
|
||||
@ -2206,7 +2261,9 @@ Value *BoUpSLP::vectorizeTree(TreeEntry *E) {
|
||||
|
||||
if (E->NeedToGather) {
|
||||
setInsertPointAfterBundle(E->Scalars);
|
||||
return Gather(E->Scalars, VecTy);
|
||||
auto *V = Gather(E->Scalars, VecTy);
|
||||
E->VectorizedValue = V;
|
||||
return V;
|
||||
}
|
||||
|
||||
unsigned Opcode = getSameOpcode(E->Scalars);
|
||||
@ -2253,7 +2310,10 @@ Value *BoUpSLP::vectorizeTree(TreeEntry *E) {
|
||||
E->VectorizedValue = V;
|
||||
return V;
|
||||
}
|
||||
return Gather(E->Scalars, VecTy);
|
||||
setInsertPointAfterBundle(E->Scalars);
|
||||
auto *V = Gather(E->Scalars, VecTy);
|
||||
E->VectorizedValue = V;
|
||||
return V;
|
||||
}
|
||||
case Instruction::ExtractValue: {
|
||||
if (canReuseExtract(E->Scalars, Instruction::ExtractValue)) {
|
||||
@ -2265,7 +2325,10 @@ Value *BoUpSLP::vectorizeTree(TreeEntry *E) {
|
||||
E->VectorizedValue = V;
|
||||
return propagateMetadata(V, E->Scalars);
|
||||
}
|
||||
return Gather(E->Scalars, VecTy);
|
||||
setInsertPointAfterBundle(E->Scalars);
|
||||
auto *V = Gather(E->Scalars, VecTy);
|
||||
E->VectorizedValue = V;
|
||||
return V;
|
||||
}
|
||||
case Instruction::ZExt:
|
||||
case Instruction::SExt:
|
||||
|
@ -688,3 +688,52 @@ outer.be:
|
||||
exit:
|
||||
ret void
|
||||
}
|
||||
|
||||
|
||||
; PR28932: Don't assert on non-SCEV-able value %2.
|
||||
%struct.anon = type { i8* }
|
||||
@a = common global %struct.anon* null, align 8
|
||||
@b = common global i32 0, align 4
|
||||
declare { i32, i1 } @llvm.ssub.with.overflow.i32(i32, i32)
|
||||
declare void @llvm.trap()
|
||||
define i32 @pr28932() {
|
||||
entry:
|
||||
%.pre = load %struct.anon*, %struct.anon** @a, align 8
|
||||
%.pre7 = load i32, i32* @b, align 4
|
||||
br label %for.cond
|
||||
|
||||
for.cond: ; preds = %cont6, %entry
|
||||
%0 = phi i32 [ %3, %cont6 ], [ %.pre7, %entry ]
|
||||
%1 = phi %struct.anon* [ %.ph, %cont6 ], [ %.pre, %entry ]
|
||||
%tobool = icmp eq %struct.anon* %1, null
|
||||
%2 = tail call { i32, i1 } @llvm.ssub.with.overflow.i32(i32 %0, i32 1)
|
||||
%3 = extractvalue { i32, i1 } %2, 0
|
||||
%4 = extractvalue { i32, i1 } %2, 1
|
||||
%idxprom = sext i32 %3 to i64
|
||||
%5 = getelementptr inbounds %struct.anon, %struct.anon* %1, i64 0, i32 0
|
||||
%6 = load i8*, i8** %5, align 8
|
||||
%7 = getelementptr inbounds i8, i8* %6, i64 %idxprom
|
||||
%8 = load i8, i8* %7, align 1
|
||||
br i1 %tobool, label %if.else, label %if.then
|
||||
|
||||
if.then: ; preds = %for.cond
|
||||
br i1 %4, label %trap, label %cont6
|
||||
|
||||
trap: ; preds = %if.else, %if.then
|
||||
tail call void @llvm.trap()
|
||||
unreachable
|
||||
|
||||
if.else: ; preds = %for.cond
|
||||
br i1 %4, label %trap, label %cont1
|
||||
|
||||
cont1: ; preds = %if.else
|
||||
%conv5 = sext i8 %8 to i64
|
||||
%9 = inttoptr i64 %conv5 to %struct.anon*
|
||||
store %struct.anon* %9, %struct.anon** @a, align 8
|
||||
br label %cont6
|
||||
|
||||
cont6: ; preds = %cont1, %if.then
|
||||
%.ph = phi %struct.anon* [ %9, %cont1 ], [ %1, %if.then ]
|
||||
store i32 %3, i32* @b, align 4
|
||||
br label %for.cond
|
||||
}
|
||||
|
@ -1,4 +1,4 @@
|
||||
; RUN: llc -mtriple=aarch64-linux-gnu -aarch64-atomic-cfg-tidy=0 -verify-machineinstrs -o - %s | FileCheck %s
|
||||
; RUN: llc -mtriple=aarch64-linux-gnu -aarch64-atomic-cfg-tidy=0 -disable-lsr -verify-machineinstrs -o - %s | FileCheck %s
|
||||
|
||||
; This file contains tests for the AArch64 load/store optimizer.
|
||||
|
||||
@ -1232,3 +1232,104 @@ for.body:
|
||||
end:
|
||||
ret void
|
||||
}
|
||||
|
||||
define void @post-indexed-sub-doubleword-offset-min(i64* %a, i64* %b, i64 %count) nounwind {
|
||||
; CHECK-LABEL: post-indexed-sub-doubleword-offset-min
|
||||
; CHECK: ldr x{{[0-9]+}}, [x{{[0-9]+}}], #-256
|
||||
; CHECK: str x{{[0-9]+}}, [x{{[0-9]+}}], #-256
|
||||
br label %for.body
|
||||
for.body:
|
||||
%phi1 = phi i64* [ %gep4, %for.body ], [ %b, %0 ]
|
||||
%phi2 = phi i64* [ %gep3, %for.body ], [ %a, %0 ]
|
||||
%i = phi i64 [ %dec.i, %for.body], [ %count, %0 ]
|
||||
%gep1 = getelementptr i64, i64* %phi1, i64 1
|
||||
%load1 = load i64, i64* %gep1
|
||||
%gep2 = getelementptr i64, i64* %phi2, i64 1
|
||||
store i64 %load1, i64* %gep2
|
||||
%load2 = load i64, i64* %phi1
|
||||
store i64 %load2, i64* %phi2
|
||||
%dec.i = add nsw i64 %i, -1
|
||||
%gep3 = getelementptr i64, i64* %phi2, i64 -32
|
||||
%gep4 = getelementptr i64, i64* %phi1, i64 -32
|
||||
%cond = icmp sgt i64 %dec.i, 0
|
||||
br i1 %cond, label %for.body, label %end
|
||||
end:
|
||||
ret void
|
||||
}
|
||||
|
||||
define void @post-indexed-doubleword-offset-out-of-range(i64* %a, i64* %b, i64 %count) nounwind {
|
||||
; CHECK-LABEL: post-indexed-doubleword-offset-out-of-range
|
||||
; CHECK: ldr x{{[0-9]+}}, [x{{[0-9]+}}]
|
||||
; CHECK: add x{{[0-9]+}}, x{{[0-9]+}}, #256
|
||||
; CHECK: str x{{[0-9]+}}, [x{{[0-9]+}}]
|
||||
; CHECK: add x{{[0-9]+}}, x{{[0-9]+}}, #256
|
||||
|
||||
br label %for.body
|
||||
for.body:
|
||||
%phi1 = phi i64* [ %gep4, %for.body ], [ %b, %0 ]
|
||||
%phi2 = phi i64* [ %gep3, %for.body ], [ %a, %0 ]
|
||||
%i = phi i64 [ %dec.i, %for.body], [ %count, %0 ]
|
||||
%gep1 = getelementptr i64, i64* %phi1, i64 1
|
||||
%load1 = load i64, i64* %gep1
|
||||
%gep2 = getelementptr i64, i64* %phi2, i64 1
|
||||
store i64 %load1, i64* %gep2
|
||||
%load2 = load i64, i64* %phi1
|
||||
store i64 %load2, i64* %phi2
|
||||
%dec.i = add nsw i64 %i, -1
|
||||
%gep3 = getelementptr i64, i64* %phi2, i64 32
|
||||
%gep4 = getelementptr i64, i64* %phi1, i64 32
|
||||
%cond = icmp sgt i64 %dec.i, 0
|
||||
br i1 %cond, label %for.body, label %end
|
||||
end:
|
||||
ret void
|
||||
}
|
||||
|
||||
define void @post-indexed-paired-min-offset(i64* %a, i64* %b, i64 %count) nounwind {
|
||||
; CHECK-LABEL: post-indexed-paired-min-offset
|
||||
; CHECK: ldp x{{[0-9]+}}, x{{[0-9]+}}, [x{{[0-9]+}}], #-512
|
||||
; CHECK: stp x{{[0-9]+}}, x{{[0-9]+}}, [x{{[0-9]+}}], #-512
|
||||
br label %for.body
|
||||
for.body:
|
||||
%phi1 = phi i64* [ %gep4, %for.body ], [ %b, %0 ]
|
||||
%phi2 = phi i64* [ %gep3, %for.body ], [ %a, %0 ]
|
||||
%i = phi i64 [ %dec.i, %for.body], [ %count, %0 ]
|
||||
%gep1 = getelementptr i64, i64* %phi1, i64 1
|
||||
%load1 = load i64, i64* %gep1
|
||||
%gep2 = getelementptr i64, i64* %phi2, i64 1
|
||||
%load2 = load i64, i64* %phi1
|
||||
store i64 %load1, i64* %gep2
|
||||
store i64 %load2, i64* %phi2
|
||||
%dec.i = add nsw i64 %i, -1
|
||||
%gep3 = getelementptr i64, i64* %phi2, i64 -64
|
||||
%gep4 = getelementptr i64, i64* %phi1, i64 -64
|
||||
%cond = icmp sgt i64 %dec.i, 0
|
||||
br i1 %cond, label %for.body, label %end
|
||||
end:
|
||||
ret void
|
||||
}
|
||||
|
||||
define void @post-indexed-paired-offset-out-of-range(i64* %a, i64* %b, i64 %count) nounwind {
|
||||
; CHECK-LABEL: post-indexed-paired-offset-out-of-range
|
||||
; CHECK: ldp x{{[0-9]+}}, x{{[0-9]+}}, [x{{[0-9]+}}]
|
||||
; CHECK: add x{{[0-9]+}}, x{{[0-9]+}}, #512
|
||||
; CHECK: stp x{{[0-9]+}}, x{{[0-9]+}}, [x{{[0-9]+}}]
|
||||
; CHECK: add x{{[0-9]+}}, x{{[0-9]+}}, #512
|
||||
br label %for.body
|
||||
for.body:
|
||||
%phi1 = phi i64* [ %gep4, %for.body ], [ %b, %0 ]
|
||||
%phi2 = phi i64* [ %gep3, %for.body ], [ %a, %0 ]
|
||||
%i = phi i64 [ %dec.i, %for.body], [ %count, %0 ]
|
||||
%gep1 = getelementptr i64, i64* %phi1, i64 1
|
||||
%load1 = load i64, i64* %phi1
|
||||
%gep2 = getelementptr i64, i64* %phi2, i64 1
|
||||
%load2 = load i64, i64* %gep1
|
||||
store i64 %load1, i64* %gep2
|
||||
store i64 %load2, i64* %phi2
|
||||
%dec.i = add nsw i64 %i, -1
|
||||
%gep3 = getelementptr i64, i64* %phi2, i64 64
|
||||
%gep4 = getelementptr i64, i64* %phi1, i64 64
|
||||
%cond = icmp sgt i64 %dec.i, 0
|
||||
br i1 %cond, label %for.body, label %end
|
||||
end:
|
||||
ret void
|
||||
}
|
||||
|
47
test/CodeGen/AArch64/ldst-paired-aliasing.ll
Normal file
47
test/CodeGen/AArch64/ldst-paired-aliasing.ll
Normal file
@ -0,0 +1,47 @@
|
||||
; RUN: llc -mcpu cortex-a53 < %s | FileCheck %s
|
||||
target datalayout = "e-m:e-i64:64-i128:128-n8:16:32:64-S128"
|
||||
target triple = "aarch64--linux-gnu"
|
||||
|
||||
declare void @f(i8*, i8*)
|
||||
declare void @f2(i8*, i8*)
|
||||
declare void @_Z5setupv()
|
||||
declare void @llvm.memset.p0i8.i64(i8* nocapture, i8, i64, i32, i1) #3
|
||||
|
||||
define i32 @main() local_unnamed_addr #1 {
|
||||
; Make sure the stores happen in the correct order (the exact instructions could change).
|
||||
; CHECK-LABEL: main:
|
||||
; CHECK: str q0, [sp, #48]
|
||||
; CHECK: ldr w8, [sp, #48]
|
||||
; CHECK: stur q1, [sp, #72]
|
||||
; CHECK: str q0, [sp, #64]
|
||||
; CHECK: str w9, [sp, #80]
|
||||
|
||||
for.body.lr.ph.i.i.i.i.i.i63:
|
||||
%b1 = alloca [10 x i32], align 16
|
||||
%x0 = bitcast [10 x i32]* %b1 to i8*
|
||||
%b2 = alloca [10 x i32], align 16
|
||||
%x1 = bitcast [10 x i32]* %b2 to i8*
|
||||
tail call void @_Z5setupv()
|
||||
%x2 = getelementptr inbounds [10 x i32], [10 x i32]* %b1, i64 0, i64 6
|
||||
%x3 = bitcast i32* %x2 to i8*
|
||||
call void @llvm.memset.p0i8.i64(i8* %x3, i8 0, i64 16, i32 8, i1 false)
|
||||
%arraydecay2 = getelementptr inbounds [10 x i32], [10 x i32]* %b1, i64 0, i64 0
|
||||
%x4 = bitcast [10 x i32]* %b1 to <4 x i32>*
|
||||
store <4 x i32> <i32 1, i32 1, i32 1, i32 1>, <4 x i32>* %x4, align 16
|
||||
%incdec.ptr.i7.i.i.i.i.i.i64.3 = getelementptr inbounds [10 x i32], [10 x i32]* %b1, i64 0, i64 4
|
||||
%x5 = bitcast i32* %incdec.ptr.i7.i.i.i.i.i.i64.3 to <4 x i32>*
|
||||
store <4 x i32> <i32 1, i32 1, i32 1, i32 1>, <4 x i32>* %x5, align 16
|
||||
%incdec.ptr.i7.i.i.i.i.i.i64.7 = getelementptr inbounds [10 x i32], [10 x i32]* %b1, i64 0, i64 8
|
||||
store i32 1, i32* %incdec.ptr.i7.i.i.i.i.i.i64.7, align 16
|
||||
%x6 = load i32, i32* %arraydecay2, align 16
|
||||
%cmp6 = icmp eq i32 %x6, 1
|
||||
br i1 %cmp6, label %for.inc, label %if.then
|
||||
|
||||
for.inc:
|
||||
call void @f(i8* %x0, i8* %x1)
|
||||
ret i32 0
|
||||
|
||||
if.then:
|
||||
call void @f2(i8* %x0, i8* %x1)
|
||||
ret i32 0
|
||||
}
|
@ -189,3 +189,15 @@ define void @w_caller(i8* %ptr) {
|
||||
; CHECK-SCO-LABEL: w_caller:
|
||||
; CHECK-SCO: bl w_callee
|
||||
}
|
||||
|
||||
%struct.byvalTest = type { [8 x i8] }
|
||||
@byval = common global %struct.byvalTest zeroinitializer
|
||||
|
||||
define void @byval_callee(%struct.byvalTest* byval %ptr) { ret void }
|
||||
define void @byval_caller() {
|
||||
tail call void @byval_callee(%struct.byvalTest* byval @byval)
|
||||
ret void
|
||||
|
||||
; CHECK-SCO-LABEL: bl byval_callee
|
||||
; CHECK-SCO: bl byval_callee
|
||||
}
|
||||
|
@ -299,8 +299,8 @@ entry:
|
||||
}
|
||||
|
||||
; CHECK-LABEL: define i32 @PR28802(
|
||||
; CHECK: call i32 @PR28802.external(i32 0)
|
||||
; CHECK: ret i32 0
|
||||
; CHECK: %[[call:.*]] = call i32 @PR28802.external(i32 0)
|
||||
; CHECK: ret i32 %[[call]]
|
||||
|
||||
define internal i32 @PR28848.callee(i32 %p2, i1 %c) {
|
||||
entry:
|
||||
@ -322,3 +322,25 @@ entry:
|
||||
}
|
||||
; CHECK-LABEL: define i32 @PR28848(
|
||||
; CHECK: ret i32 0
|
||||
|
||||
define internal void @callee7(i16 %param1, i16 %param2) {
|
||||
entry:
|
||||
br label %bb
|
||||
|
||||
bb:
|
||||
%phi = phi i16 [ %param2, %entry ]
|
||||
%add = add i16 %phi, %param1
|
||||
ret void
|
||||
}
|
||||
|
||||
declare i16 @caller7.external(i16 returned)
|
||||
|
||||
define void @caller7() {
|
||||
bb1:
|
||||
%call = call i16 @caller7.external(i16 1)
|
||||
call void @callee7(i16 0, i16 %call)
|
||||
ret void
|
||||
}
|
||||
; CHECK-LABEL: define void @caller7(
|
||||
; CHECK: %call = call i16 @caller7.external(i16 1)
|
||||
; CHECK-NEXT: ret void
|
||||
|
@ -1,57 +0,0 @@
|
||||
; RUN: opt < %s -reassociate -S | FileCheck %s
|
||||
|
||||
; These tests make sure that before processing insts
|
||||
; any previous instructions are already canonicalized.
|
||||
define i32 @foo(i32 %in) {
|
||||
; CHECK-LABEL: @foo
|
||||
; CHECK-NEXT: %factor = mul i32 %in, -4
|
||||
; CHECK-NEXT: %factor1 = mul i32 %in, 2
|
||||
; CHECK-NEXT: %_3 = add i32 %factor, 1
|
||||
; CHECK-NEXT: %_5 = add i32 %_3, %factor1
|
||||
; CHECK-NEXT: ret i32 %_5
|
||||
%_0 = add i32 %in, 1
|
||||
%_1 = mul i32 %in, -2
|
||||
%_2 = add i32 %_0, %_1
|
||||
%_3 = add i32 %_1, %_2
|
||||
%_4 = add i32 %_3, 1
|
||||
%_5 = add i32 %in, %_3
|
||||
ret i32 %_5
|
||||
}
|
||||
|
||||
; CHECK-LABEL: @foo1
|
||||
define void @foo1(float %in, i1 %cmp) {
|
||||
wrapper_entry:
|
||||
br label %foo1
|
||||
|
||||
for.body:
|
||||
%0 = fadd float %in1, %in1
|
||||
br label %foo1
|
||||
|
||||
foo1:
|
||||
%_0 = fmul fast float %in, -3.000000e+00
|
||||
%_1 = fmul fast float %_0, 3.000000e+00
|
||||
%in1 = fadd fast float -3.000000e+00, %_1
|
||||
%in1use = fadd fast float %in1, %in1
|
||||
br label %for.body
|
||||
|
||||
|
||||
}
|
||||
|
||||
; CHECK-LABEL: @foo2
|
||||
define void @foo2(float %in, i1 %cmp) {
|
||||
wrapper_entry:
|
||||
br label %for.body
|
||||
|
||||
for.body:
|
||||
; If the operands of the phi are sheduled for processing before
|
||||
; foo1 is processed, the invariant of reassociate are not preserved
|
||||
%unused = phi float [%in1, %foo1], [undef, %wrapper_entry]
|
||||
br label %foo1
|
||||
|
||||
foo1:
|
||||
%_0 = fmul fast float %in, -3.000000e+00
|
||||
%_1 = fmul fast float %_0, 3.000000e+00
|
||||
%in1 = fadd fast float -3.000000e+00, %_1
|
||||
%in1use = fadd fast float %in1, %in1
|
||||
br label %for.body
|
||||
}
|
@ -1,8 +1,8 @@
|
||||
; RUN: opt < %s -reassociate -S | FileCheck %s
|
||||
; CHECK-LABEL: faddsubAssoc1
|
||||
; CHECK: [[TMP1:%.*]] = fsub fast half 0xH8000, %a
|
||||
; CHECK: [[TMP2:%.*]] = fadd fast half %b, [[TMP1]]
|
||||
; CHECK: fmul fast half [[TMP2]], 0xH4500
|
||||
; CHECK: [[TMP1:%tmp.*]] = fmul fast half %a, 0xH4500
|
||||
; CHECK: [[TMP2:%tmp.*]] = fmul fast half %b, 0xH4500
|
||||
; CHECK: fsub fast half [[TMP2]], [[TMP1]]
|
||||
; CHECK: ret
|
||||
; Input is A op (B op C)
|
||||
define half @faddsubAssoc1(half %a, half %b) {
|
||||
|
@ -88,8 +88,8 @@ define i32 @xor_special2(i32 %x, i32 %y) {
|
||||
%xor1 = xor i32 %xor, %and
|
||||
ret i32 %xor1
|
||||
; CHECK-LABEL: @xor_special2(
|
||||
; CHECK: %xor = xor i32 %y, 123
|
||||
; CHECK: %xor1 = xor i32 %xor, %x
|
||||
; CHECK: %xor = xor i32 %x, 123
|
||||
; CHECK: %xor1 = xor i32 %xor, %y
|
||||
; CHECK: ret i32 %xor1
|
||||
}
|
||||
|
||||
|
87
test/Transforms/SLPVectorizer/AArch64/gather-root.ll
Normal file
87
test/Transforms/SLPVectorizer/AArch64/gather-root.ll
Normal file
@ -0,0 +1,87 @@
|
||||
; RUN: opt < %s -slp-vectorizer -S | FileCheck %s --check-prefix=DEFAULT
|
||||
; RUN: opt < %s -slp-schedule-budget=0 -slp-min-tree-size=0 -slp-threshold=-30 -slp-vectorizer -S | FileCheck %s --check-prefix=GATHER
|
||||
|
||||
target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
|
||||
target triple = "aarch64--linux-gnu"
|
||||
|
||||
@a = common global [80 x i8] zeroinitializer, align 16
|
||||
|
||||
; DEFAULT-LABEL: @PR28330(
|
||||
; DEFAULT: %tmp17 = phi i32 [ %tmp34, %for.body ], [ 0, %entry ]
|
||||
; DEFAULT: %[[S0:.+]] = select <8 x i1> %1, <8 x i32> <i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720>, <8 x i32> <i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80>
|
||||
; DEFAULT: %[[R0:.+]] = shufflevector <8 x i32> %[[S0]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
|
||||
; DEFAULT: %[[R1:.+]] = add <8 x i32> %[[S0]], %[[R0]]
|
||||
; DEFAULT: %[[R2:.+]] = shufflevector <8 x i32> %[[R1]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
|
||||
; DEFAULT: %[[R3:.+]] = add <8 x i32> %[[R1]], %[[R2]]
|
||||
; DEFAULT: %[[R4:.+]] = shufflevector <8 x i32> %[[R3]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
|
||||
; DEFAULT: %[[R5:.+]] = add <8 x i32> %[[R3]], %[[R4]]
|
||||
; DEFAULT: %[[R6:.+]] = extractelement <8 x i32> %[[R5]], i32 0
|
||||
; DEFAULT: %tmp34 = add i32 %[[R6]], %tmp17
|
||||
;
|
||||
; GATHER-LABEL: @PR28330(
|
||||
; GATHER: %tmp17 = phi i32 [ %tmp34, %for.body ], [ 0, %entry ]
|
||||
; GATHER: %tmp19 = select i1 %tmp1, i32 -720, i32 -80
|
||||
; GATHER: %tmp21 = select i1 %tmp3, i32 -720, i32 -80
|
||||
; GATHER: %tmp23 = select i1 %tmp5, i32 -720, i32 -80
|
||||
; GATHER: %tmp25 = select i1 %tmp7, i32 -720, i32 -80
|
||||
; GATHER: %tmp27 = select i1 %tmp9, i32 -720, i32 -80
|
||||
; GATHER: %tmp29 = select i1 %tmp11, i32 -720, i32 -80
|
||||
; GATHER: %tmp31 = select i1 %tmp13, i32 -720, i32 -80
|
||||
; GATHER: %tmp33 = select i1 %tmp15, i32 -720, i32 -80
|
||||
; GATHER: %[[I0:.+]] = insertelement <8 x i32> undef, i32 %tmp19, i32 0
|
||||
; GATHER: %[[I1:.+]] = insertelement <8 x i32> %[[I0]], i32 %tmp21, i32 1
|
||||
; GATHER: %[[I2:.+]] = insertelement <8 x i32> %[[I1]], i32 %tmp23, i32 2
|
||||
; GATHER: %[[I3:.+]] = insertelement <8 x i32> %[[I2]], i32 %tmp25, i32 3
|
||||
; GATHER: %[[I4:.+]] = insertelement <8 x i32> %[[I3]], i32 %tmp27, i32 4
|
||||
; GATHER: %[[I5:.+]] = insertelement <8 x i32> %[[I4]], i32 %tmp29, i32 5
|
||||
; GATHER: %[[I6:.+]] = insertelement <8 x i32> %[[I5]], i32 %tmp31, i32 6
|
||||
; GATHER: %[[I7:.+]] = insertelement <8 x i32> %[[I6]], i32 %tmp33, i32 7
|
||||
; GATHER: %[[R0:.+]] = shufflevector <8 x i32> %[[I7]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
|
||||
; GATHER: %[[R1:.+]] = add <8 x i32> %[[I7]], %[[R0]]
|
||||
; GATHER: %[[R2:.+]] = shufflevector <8 x i32> %[[R1]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
|
||||
; GATHER: %[[R3:.+]] = add <8 x i32> %[[R1]], %[[R2]]
|
||||
; GATHER: %[[R4:.+]] = shufflevector <8 x i32> %[[R3]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
|
||||
; GATHER: %[[R5:.+]] = add <8 x i32> %[[R3]], %[[R4]]
|
||||
; GATHER: %[[R6:.+]] = extractelement <8 x i32> %[[R5]], i32 0
|
||||
; GATHER: %tmp34 = add i32 %[[R6]], %tmp17
|
||||
|
||||
define void @PR28330(i32 %n) {
|
||||
entry:
|
||||
%tmp0 = load i8, i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1), align 1
|
||||
%tmp1 = icmp eq i8 %tmp0, 0
|
||||
%tmp2 = load i8, i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 2), align 2
|
||||
%tmp3 = icmp eq i8 %tmp2, 0
|
||||
%tmp4 = load i8, i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 3), align 1
|
||||
%tmp5 = icmp eq i8 %tmp4, 0
|
||||
%tmp6 = load i8, i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 4), align 4
|
||||
%tmp7 = icmp eq i8 %tmp6, 0
|
||||
%tmp8 = load i8, i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 5), align 1
|
||||
%tmp9 = icmp eq i8 %tmp8, 0
|
||||
%tmp10 = load i8, i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 6), align 2
|
||||
%tmp11 = icmp eq i8 %tmp10, 0
|
||||
%tmp12 = load i8, i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 7), align 1
|
||||
%tmp13 = icmp eq i8 %tmp12, 0
|
||||
%tmp14 = load i8, i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 8), align 8
|
||||
%tmp15 = icmp eq i8 %tmp14, 0
|
||||
br label %for.body
|
||||
|
||||
for.body:
|
||||
%tmp17 = phi i32 [ %tmp34, %for.body ], [ 0, %entry ]
|
||||
%tmp19 = select i1 %tmp1, i32 -720, i32 -80
|
||||
%tmp20 = add i32 %tmp17, %tmp19
|
||||
%tmp21 = select i1 %tmp3, i32 -720, i32 -80
|
||||
%tmp22 = add i32 %tmp20, %tmp21
|
||||
%tmp23 = select i1 %tmp5, i32 -720, i32 -80
|
||||
%tmp24 = add i32 %tmp22, %tmp23
|
||||
%tmp25 = select i1 %tmp7, i32 -720, i32 -80
|
||||
%tmp26 = add i32 %tmp24, %tmp25
|
||||
%tmp27 = select i1 %tmp9, i32 -720, i32 -80
|
||||
%tmp28 = add i32 %tmp26, %tmp27
|
||||
%tmp29 = select i1 %tmp11, i32 -720, i32 -80
|
||||
%tmp30 = add i32 %tmp28, %tmp29
|
||||
%tmp31 = select i1 %tmp13, i32 -720, i32 -80
|
||||
%tmp32 = add i32 %tmp30, %tmp31
|
||||
%tmp33 = select i1 %tmp15, i32 -720, i32 -80
|
||||
%tmp34 = add i32 %tmp32, %tmp33
|
||||
br label %for.body
|
||||
}
|
Loading…
x
Reference in New Issue
Block a user