Vendor import of llvm release_40 branch r296202:

https://llvm.org/svn/llvm-project/llvm/branches/release_40@296202
This commit is contained in:
Dimitry Andric 2017-02-25 14:40:33 +00:00
parent 5a813558fc
commit 9c618dddcd
11 changed files with 297 additions and 380 deletions

View File

@ -5,12 +5,6 @@ LLVM 4.0.0 Release Notes
.. contents:: .. contents::
:local: :local:
.. warning::
These are in-progress notes for the upcoming LLVM 4.0.0 release. You may
prefer the `LLVM 3.9 Release Notes <http://llvm.org/releases/3.9.0/docs
/ReleaseNotes.html>`_.
Introduction Introduction
============ ============
@ -28,74 +22,56 @@ them.
Non-comprehensive list of changes in this release Non-comprehensive list of changes in this release
================================================= =================================================
* The C API functions LLVMAddFunctionAttr, LLVMGetFunctionAttr,
LLVMRemoveFunctionAttr, LLVMAddAttribute, LLVMRemoveAttribute,
LLVMGetAttribute, LLVMAddInstrAttribute and
LLVMRemoveInstrAttribute have been removed.
* The C API enum LLVMAttribute has been deleted.
.. NOTE
For small 1-3 sentence descriptions, just add an entry at the end of
this list. If your description won't fit comfortably in one bullet
point (e.g. maybe you would like to give an example of the
functionality, or simply have a lot to talk about), see the `NOTE` below
for adding a new subsection.
* The definition and uses of LLVM_ATRIBUTE_UNUSED_RESULT in the LLVM source
were replaced with LLVM_NODISCARD, which matches the C++17 [[nodiscard]]
semantics rather than gcc's __attribute__((warn_unused_result)).
* Minimum compiler version to build has been raised to GCC 4.8 and VS 2015. * Minimum compiler version to build has been raised to GCC 4.8 and VS 2015.
* The C API functions ``LLVMAddFunctionAttr``, ``LLVMGetFunctionAttr``,
``LLVMRemoveFunctionAttr``, ``LLVMAddAttribute``, ``LLVMRemoveAttribute``,
``LLVMGetAttribute``, ``LLVMAddInstrAttribute`` and
``LLVMRemoveInstrAttribute`` have been removed.
* The C API enum ``LLVMAttribute`` has been deleted.
* The definition and uses of ``LLVM_ATRIBUTE_UNUSED_RESULT`` in the LLVM source
were replaced with ``LLVM_NODISCARD``, which matches the C++17 ``[[nodiscard]]``
semantics rather than gcc's ``__attribute__((warn_unused_result))``.
* The Timer related APIs now expect a Name and Description. When upgrading code * The Timer related APIs now expect a Name and Description. When upgrading code
the previously used names should become descriptions and a short name in the the previously used names should become descriptions and a short name in the
style of a programming language identifier should be added. style of a programming language identifier should be added.
* LLVM now handles invariant.group across different basic blocks, which makes * LLVM now handles ``invariant.group`` across different basic blocks, which makes
it possible to devirtualize virtual calls inside loops. it possible to devirtualize virtual calls inside loops.
* The aggressive dead code elimination phase ("adce") now remove * The aggressive dead code elimination phase ("adce") now removes
branches which do not effect program behavior. Loops are retained by branches which do not effect program behavior. Loops are retained by
default since they may be infinite but these can also be removed default since they may be infinite but these can also be removed
with LLVM option -adce-remove-loops when the loop body otherwise has with LLVM option ``-adce-remove-loops`` when the loop body otherwise has
no live operations. no live operations.
* The GVNHoist pass is now enabled by default. The new pass based on Global * The GVNHoist pass is now enabled by default. The new pass based on Global
Value Numbering detects similar computations in branch code and replaces Value Numbering detects similar computations in branch code and replaces
multiple instances of the same computation with a unique expression. The multiple instances of the same computation with a unique expression. The
transform benefits code size and generates better schedules. GVNHoist is transform benefits code size and generates better schedules. GVNHoist is
more aggressive at -Os and -Oz, hoisting more expressions at the expense of more aggressive at ``-Os`` and ``-Oz``, hoisting more expressions at the
execution time degradations. expense of execution time degradations.
* The llvm-cov tool can now export coverage data as json. Its html output mode * The llvm-cov tool can now export coverage data as json. Its html output mode
has also improved. has also improved.
* ... next change ... Improvements to ThinLTO (-flto=thin)
------------------------------------
Integration with profile data (PGO). When available, profile data
enables more accurate function importing decisions, as well as
cross-module indirect call promotion.
.. NOTE Significant build-time and binary-size improvements when compiling with
If you would like to document a larger change, then you can add a debug info (-g).
subsection about it right here. You can copy the following boilerplate
and un-indent it (the indentation causes it to be inside this comment).
Special New Feature
-------------------
Makes programs 10x faster by doing Special New Thing.
Improvements to ThinLTO (-flto=thin)
------------------------------------
* Integration with profile data (PGO). When available, profile data
enables more accurate function importing decisions, as well as
cross-module indirect call promotion.
* Significant build-time and binary-size improvements when compiling with
debug info (-g).
LLVM Coroutines LLVM Coroutines
--------------- ---------------
Experimental support for :doc:`Coroutines` was added, which can be enabled Experimental support for :doc:`Coroutines` was added, which can be enabled
with ``-enable-coroutines`` in ``opt`` command tool or using with ``-enable-coroutines`` in ``opt`` the command tool or using the
``addCoroutinePassesToExtensionPoints`` API when building the optimization ``addCoroutinePassesToExtensionPoints`` API when building the optimization
pipeline. pipeline.
@ -106,18 +82,18 @@ For more information on LLVM Coroutines and the LLVM implementation, see
Regcall and Vectorcall Calling Conventions Regcall and Vectorcall Calling Conventions
-------------------------------------------------- --------------------------------------------------
Support was added for _regcall calling convention. Support was added for ``_regcall`` calling convention.
Existing __vectorcall calling convention support was extended to include Existing ``__vectorcall`` calling convention support was extended to include
correct handling of HVAs. correct handling of HVAs.
The __vectorcall calling convention was introduced by Microsoft to The ``__vectorcall`` calling convention was introduced by Microsoft to
enhance register usage when passing parameters. enhance register usage when passing parameters.
For more information please read `__vectorcall documentation For more information please read `__vectorcall documentation
<https://msdn.microsoft.com/en-us/library/dn375768.aspx>`_. <https://msdn.microsoft.com/en-us/library/dn375768.aspx>`_.
The __regcall calling convention was introduced by Intel to The ``__regcall`` calling convention was introduced by Intel to
optimize parameter transfer on function call. optimize parameter transfer on function call.
This calling convention ensures that as many values as possible are This calling convention ensures that as many values as possible are
passed or returned in registers. passed or returned in registers.
For more information please read `__regcall documentation For more information please read `__regcall documentation
<https://software.intel.com/en-us/node/693069>`_. <https://software.intel.com/en-us/node/693069>`_.
@ -127,7 +103,7 @@ Code Generation Testing
Passes that work on the machine instruction representation can be tested with Passes that work on the machine instruction representation can be tested with
the .mir serialization format. ``llc`` supports the ``-run-pass``, the .mir serialization format. ``llc`` supports the ``-run-pass``,
``-stop-after``, ``-stop-before``, ``-start-after``, ``-start-before`` to to ``-stop-after``, ``-stop-before``, ``-start-after``, ``-start-before`` to
run a single pass of the code generation pipeline, or to stop or start the code run a single pass of the code generation pipeline, or to stop or start the code
generation pipeline at a given point. generation pipeline at a given point.
@ -211,9 +187,6 @@ changes landed in this release.
``&*I`` (if not ``end()``); alternatively, clients may refactor to use ``&*I`` (if not ``end()``); alternatively, clients may refactor to use
references for known-good nodes. references for known-good nodes.
Changes to the LLVM IR
----------------------
Changes to the ARM Targets Changes to the ARM Targets
-------------------------- --------------------------
@ -244,28 +217,6 @@ Changes to the ARM Targets
A lot of work has also been done in LLD for ARM, which now supports more A lot of work has also been done in LLD for ARM, which now supports more
relocations and TLS. relocations and TLS.
Changes to the MIPS Target
--------------------------
During this release ...
Changes to the PowerPC Target
-----------------------------
During this release ...
Changes to the X86 Target
-------------------------
During this release ...
Changes to the AMDGPU Target
-----------------------------
During this release ...
Changes to the AVR Target Changes to the AVR Target
----------------------------- -----------------------------
@ -297,8 +248,6 @@ Changes to the OCaml bindings
External Open Source Projects Using LLVM 4.0.0 External Open Source Projects Using LLVM 4.0.0
============================================== ==============================================
* A project...
LDC - the LLVM-based D compiler LDC - the LLVM-based D compiler
------------------------------- -------------------------------

View File

@ -92,12 +92,6 @@ struct SLPVectorizerPass : public PassInfoMixin<SLPVectorizerPass> {
/// collected in GEPs. /// collected in GEPs.
bool vectorizeGEPIndices(BasicBlock *BB, slpvectorizer::BoUpSLP &R); bool vectorizeGEPIndices(BasicBlock *BB, slpvectorizer::BoUpSLP &R);
/// Try to find horizontal reduction or otherwise vectorize a chain of binary
/// operators.
bool vectorizeRootInstruction(PHINode *P, Value *V, BasicBlock *BB,
slpvectorizer::BoUpSLP &R,
TargetTransformInfo *TTI);
/// \brief Scan the basic block and look for patterns that are likely to start /// \brief Scan the basic block and look for patterns that are likely to start
/// a vectorization chain. /// a vectorization chain.
bool vectorizeChainsInBlock(BasicBlock *BB, slpvectorizer::BoUpSLP &R); bool vectorizeChainsInBlock(BasicBlock *BB, slpvectorizer::BoUpSLP &R);

View File

@ -996,6 +996,11 @@ def : Pat <
(V_CMP_EQ_U32_e64 (S_AND_B32 (i32 1), $a), (i32 1)) (V_CMP_EQ_U32_e64 (S_AND_B32 (i32 1), $a), (i32 1))
>; >;
def : Pat <
(i1 (trunc i16:$a)),
(V_CMP_EQ_U32_e64 (S_AND_B32 (i32 1), $a), (i32 1))
>;
def : Pat < def : Pat <
(i1 (trunc i64:$a)), (i1 (trunc i64:$a)),
(V_CMP_EQ_U32_e64 (S_AND_B32 (i32 1), (V_CMP_EQ_U32_e64 (S_AND_B32 (i32 1),

View File

@ -607,12 +607,6 @@ def : Pat<
(COPY $src) (COPY $src)
>; >;
def : Pat<
(i1 (trunc i16:$src)),
(COPY $src)
>;
def : Pat < def : Pat <
(i16 (trunc i64:$src)), (i16 (trunc i64:$src)),
(EXTRACT_SUBREG $src, sub0) (EXTRACT_SUBREG $src, sub0)

View File

@ -41,6 +41,8 @@ STATISTIC(NumSDivs, "Number of sdiv converted to udiv");
STATISTIC(NumAShrs, "Number of ashr converted to lshr"); STATISTIC(NumAShrs, "Number of ashr converted to lshr");
STATISTIC(NumSRems, "Number of srem converted to urem"); STATISTIC(NumSRems, "Number of srem converted to urem");
static cl::opt<bool> DontProcessAdds("cvp-dont-process-adds", cl::init(true));
namespace { namespace {
class CorrelatedValuePropagation : public FunctionPass { class CorrelatedValuePropagation : public FunctionPass {
public: public:
@ -405,6 +407,9 @@ static bool processAShr(BinaryOperator *SDI, LazyValueInfo *LVI) {
static bool processAdd(BinaryOperator *AddOp, LazyValueInfo *LVI) { static bool processAdd(BinaryOperator *AddOp, LazyValueInfo *LVI) {
typedef OverflowingBinaryOperator OBO; typedef OverflowingBinaryOperator OBO;
if (DontProcessAdds)
return false;
if (AddOp->getType()->isVectorTy() || hasLocalDefs(AddOp)) if (AddOp->getType()->isVectorTy() || hasLocalDefs(AddOp))
return false; return false;

View File

@ -1521,8 +1521,8 @@ Value *ReassociatePass::OptimizeAdd(Instruction *I,
if (ConstantInt *CI = dyn_cast<ConstantInt>(Factor)) { if (ConstantInt *CI = dyn_cast<ConstantInt>(Factor)) {
if (CI->isNegative() && !CI->isMinValue(true)) { if (CI->isNegative() && !CI->isMinValue(true)) {
Factor = ConstantInt::get(CI->getContext(), -CI->getValue()); Factor = ConstantInt::get(CI->getContext(), -CI->getValue());
assert(!Duplicates.count(Factor) && if (!Duplicates.insert(Factor).second)
"Shouldn't have two constant factors, missed a canonicalize"); continue;
unsigned Occ = ++FactorOccurrences[Factor]; unsigned Occ = ++FactorOccurrences[Factor];
if (Occ > MaxOcc) { if (Occ > MaxOcc) {
MaxOcc = Occ; MaxOcc = Occ;
@ -1534,8 +1534,8 @@ Value *ReassociatePass::OptimizeAdd(Instruction *I,
APFloat F(CF->getValueAPF()); APFloat F(CF->getValueAPF());
F.changeSign(); F.changeSign();
Factor = ConstantFP::get(CF->getContext(), F); Factor = ConstantFP::get(CF->getContext(), F);
assert(!Duplicates.count(Factor) && if (!Duplicates.insert(Factor).second)
"Shouldn't have two constant factors, missed a canonicalize"); continue;
unsigned Occ = ++FactorOccurrences[Factor]; unsigned Occ = ++FactorOccurrences[Factor];
if (Occ > MaxOcc) { if (Occ > MaxOcc) {
MaxOcc = Occ; MaxOcc = Occ;

View File

@ -4026,40 +4026,36 @@ bool SLPVectorizerPass::tryToVectorize(BinaryOperator *V, BoUpSLP &R) {
if (!V) if (!V)
return false; return false;
Value *P = V->getParent();
// Vectorize in current basic block only.
auto *Op0 = dyn_cast<Instruction>(V->getOperand(0));
auto *Op1 = dyn_cast<Instruction>(V->getOperand(1));
if (!Op0 || !Op1 || Op0->getParent() != P || Op1->getParent() != P)
return false;
// Try to vectorize V. // Try to vectorize V.
if (tryToVectorizePair(Op0, Op1, R)) if (tryToVectorizePair(V->getOperand(0), V->getOperand(1), R))
return true; return true;
auto *A = dyn_cast<BinaryOperator>(Op0); BinaryOperator *A = dyn_cast<BinaryOperator>(V->getOperand(0));
auto *B = dyn_cast<BinaryOperator>(Op1); BinaryOperator *B = dyn_cast<BinaryOperator>(V->getOperand(1));
// Try to skip B. // Try to skip B.
if (B && B->hasOneUse()) { if (B && B->hasOneUse()) {
auto *B0 = dyn_cast<BinaryOperator>(B->getOperand(0)); BinaryOperator *B0 = dyn_cast<BinaryOperator>(B->getOperand(0));
auto *B1 = dyn_cast<BinaryOperator>(B->getOperand(1)); BinaryOperator *B1 = dyn_cast<BinaryOperator>(B->getOperand(1));
if (B0 && B0->getParent() == P && tryToVectorizePair(A, B0, R)) if (tryToVectorizePair(A, B0, R)) {
return true; return true;
if (B1 && B1->getParent() == P && tryToVectorizePair(A, B1, R)) }
if (tryToVectorizePair(A, B1, R)) {
return true; return true;
}
} }
// Try to skip A. // Try to skip A.
if (A && A->hasOneUse()) { if (A && A->hasOneUse()) {
auto *A0 = dyn_cast<BinaryOperator>(A->getOperand(0)); BinaryOperator *A0 = dyn_cast<BinaryOperator>(A->getOperand(0));
auto *A1 = dyn_cast<BinaryOperator>(A->getOperand(1)); BinaryOperator *A1 = dyn_cast<BinaryOperator>(A->getOperand(1));
if (A0 && A0->getParent() == P && tryToVectorizePair(A0, B, R)) if (tryToVectorizePair(A0, B, R)) {
return true; return true;
if (A1 && A1->getParent() == P && tryToVectorizePair(A1, B, R)) }
if (tryToVectorizePair(A1, B, R)) {
return true; return true;
}
} }
return false; return 0;
} }
/// \brief Generate a shuffle mask to be used in a reduction tree. /// \brief Generate a shuffle mask to be used in a reduction tree.
@ -4511,143 +4507,29 @@ static Value *getReductionValue(const DominatorTree *DT, PHINode *P,
return nullptr; return nullptr;
} }
namespace {
/// Tracks instructons and its children.
class WeakVHWithLevel final : public CallbackVH {
/// Operand index of the instruction currently beeing analized.
unsigned Level = 0;
/// Is this the instruction that should be vectorized, or are we now
/// processing children (i.e. operands of this instruction) for potential
/// vectorization?
bool IsInitial = true;
public:
explicit WeakVHWithLevel() = default;
WeakVHWithLevel(Value *V) : CallbackVH(V){};
/// Restart children analysis each time it is repaced by the new instruction.
void allUsesReplacedWith(Value *New) override {
setValPtr(New);
Level = 0;
IsInitial = true;
}
/// Check if the instruction was not deleted during vectorization.
bool isValid() const { return !getValPtr(); }
/// Is the istruction itself must be vectorized?
bool isInitial() const { return IsInitial; }
/// Try to vectorize children.
void clearInitial() { IsInitial = false; }
/// Are all children processed already?
bool isFinal() const {
assert(getValPtr() &&
(isa<Instruction>(getValPtr()) &&
cast<Instruction>(getValPtr())->getNumOperands() >= Level));
return getValPtr() &&
cast<Instruction>(getValPtr())->getNumOperands() == Level;
}
/// Get next child operation.
Value *nextOperand() {
assert(getValPtr() && isa<Instruction>(getValPtr()) &&
cast<Instruction>(getValPtr())->getNumOperands() > Level);
return cast<Instruction>(getValPtr())->getOperand(Level++);
}
virtual ~WeakVHWithLevel() = default;
};
} // namespace
/// \brief Attempt to reduce a horizontal reduction. /// \brief Attempt to reduce a horizontal reduction.
/// If it is legal to match a horizontal reduction feeding /// If it is legal to match a horizontal reduction feeding
/// the phi node P with reduction operators Root in a basic block BB, then check /// the phi node P with reduction operators BI, then check if it
/// if it can be done. /// can be done.
/// \returns true if a horizontal reduction was matched and reduced. /// \returns true if a horizontal reduction was matched and reduced.
/// \returns false if a horizontal reduction was not matched. /// \returns false if a horizontal reduction was not matched.
static bool canBeVectorized( static bool canMatchHorizontalReduction(PHINode *P, BinaryOperator *BI,
PHINode *P, Instruction *Root, BasicBlock *BB, BoUpSLP &R, BoUpSLP &R, TargetTransformInfo *TTI,
TargetTransformInfo *TTI, unsigned MinRegSize) {
const function_ref<bool(BinaryOperator *, BoUpSLP &)> Vectorize) {
if (!ShouldVectorizeHor) if (!ShouldVectorizeHor)
return false; return false;
if (!Root) HorizontalReduction HorRdx(MinRegSize);
if (!HorRdx.matchAssociativeReduction(P, BI))
return false; return false;
if (Root->getParent() != BB) // If there is a sufficient number of reduction values, reduce
return false; // to a nearby power-of-2. Can safely generate oversized
SmallVector<WeakVHWithLevel, 8> Stack(1, Root); // vectors and rely on the backend to split them to legal sizes.
SmallSet<Value *, 8> VisitedInstrs; HorRdx.ReduxWidth =
bool Res = false; std::max((uint64_t)4, PowerOf2Floor(HorRdx.numReductionValues()));
while (!Stack.empty()) {
Value *V = Stack.back();
if (!V) {
Stack.pop_back();
continue;
}
auto *Inst = dyn_cast<Instruction>(V);
if (!Inst || isa<PHINode>(Inst)) {
Stack.pop_back();
continue;
}
if (Stack.back().isInitial()) {
Stack.back().clearInitial();
if (auto *BI = dyn_cast<BinaryOperator>(Inst)) {
HorizontalReduction HorRdx(R.getMinVecRegSize());
if (HorRdx.matchAssociativeReduction(P, BI)) {
// If there is a sufficient number of reduction values, reduce
// to a nearby power-of-2. Can safely generate oversized
// vectors and rely on the backend to split them to legal sizes.
HorRdx.ReduxWidth =
std::max((uint64_t)4, PowerOf2Floor(HorRdx.numReductionValues()));
if (HorRdx.tryToReduce(R, TTI)) { return HorRdx.tryToReduce(R, TTI);
Res = true;
P = nullptr;
continue;
}
}
if (P) {
Inst = dyn_cast<Instruction>(BI->getOperand(0));
if (Inst == P)
Inst = dyn_cast<Instruction>(BI->getOperand(1));
if (!Inst) {
P = nullptr;
continue;
}
}
}
P = nullptr;
if (Vectorize(dyn_cast<BinaryOperator>(Inst), R)) {
Res = true;
continue;
}
}
if (Stack.back().isFinal()) {
Stack.pop_back();
continue;
}
if (auto *NextV = dyn_cast<Instruction>(Stack.back().nextOperand()))
if (NextV->getParent() == BB && VisitedInstrs.insert(NextV).second &&
Stack.size() < RecursionMaxDepth)
Stack.push_back(NextV);
}
return Res;
}
bool SLPVectorizerPass::vectorizeRootInstruction(PHINode *P, Value *V,
BasicBlock *BB, BoUpSLP &R,
TargetTransformInfo *TTI) {
if (!V)
return false;
auto *I = dyn_cast<Instruction>(V);
if (!I)
return false;
if (!isa<BinaryOperator>(I))
P = nullptr;
// Try to match and vectorize a horizontal reduction.
return canBeVectorized(P, I, BB, R, TTI,
[this](BinaryOperator *BI, BoUpSLP &R) -> bool {
return tryToVectorize(BI, R);
});
} }
bool SLPVectorizerPass::vectorizeChainsInBlock(BasicBlock *BB, BoUpSLP &R) { bool SLPVectorizerPass::vectorizeChainsInBlock(BasicBlock *BB, BoUpSLP &R) {
@ -4717,42 +4599,67 @@ bool SLPVectorizerPass::vectorizeChainsInBlock(BasicBlock *BB, BoUpSLP &R) {
if (P->getNumIncomingValues() != 2) if (P->getNumIncomingValues() != 2)
return Changed; return Changed;
Value *Rdx = getReductionValue(DT, P, BB, LI);
// Check if this is a Binary Operator.
BinaryOperator *BI = dyn_cast_or_null<BinaryOperator>(Rdx);
if (!BI)
continue;
// Try to match and vectorize a horizontal reduction. // Try to match and vectorize a horizontal reduction.
if (vectorizeRootInstruction(P, getReductionValue(DT, P, BB, LI), BB, R, if (canMatchHorizontalReduction(P, BI, R, TTI, R.getMinVecRegSize())) {
TTI)) {
Changed = true; Changed = true;
it = BB->begin(); it = BB->begin();
e = BB->end(); e = BB->end();
continue; continue;
} }
Value *Inst = BI->getOperand(0);
if (Inst == P)
Inst = BI->getOperand(1);
if (tryToVectorize(dyn_cast<BinaryOperator>(Inst), R)) {
// We would like to start over since some instructions are deleted
// and the iterator may become invalid value.
Changed = true;
it = BB->begin();
e = BB->end();
continue;
}
continue; continue;
} }
if (ShouldStartVectorizeHorAtStore) { if (ShouldStartVectorizeHorAtStore)
if (StoreInst *SI = dyn_cast<StoreInst>(it)) { if (StoreInst *SI = dyn_cast<StoreInst>(it))
// Try to match and vectorize a horizontal reduction. if (BinaryOperator *BinOp =
if (vectorizeRootInstruction(nullptr, SI->getValueOperand(), BB, R, dyn_cast<BinaryOperator>(SI->getValueOperand())) {
TTI)) { if (canMatchHorizontalReduction(nullptr, BinOp, R, TTI,
Changed = true; R.getMinVecRegSize()) ||
it = BB->begin(); tryToVectorize(BinOp, R)) {
e = BB->end(); Changed = true;
continue; it = BB->begin();
e = BB->end();
continue;
}
} }
}
}
// Try to vectorize horizontal reductions feeding into a return. // Try to vectorize horizontal reductions feeding into a return.
if (ReturnInst *RI = dyn_cast<ReturnInst>(it)) { if (ReturnInst *RI = dyn_cast<ReturnInst>(it))
if (RI->getNumOperands() != 0) { if (RI->getNumOperands() != 0)
// Try to match and vectorize a horizontal reduction. if (BinaryOperator *BinOp =
if (vectorizeRootInstruction(nullptr, RI->getOperand(0), BB, R, TTI)) { dyn_cast<BinaryOperator>(RI->getOperand(0))) {
Changed = true; DEBUG(dbgs() << "SLP: Found a return to vectorize.\n");
it = BB->begin(); if (canMatchHorizontalReduction(nullptr, BinOp, R, TTI,
e = BB->end(); R.getMinVecRegSize()) ||
continue; tryToVectorizePair(BinOp->getOperand(0), BinOp->getOperand(1),
R)) {
Changed = true;
it = BB->begin();
e = BB->end();
continue;
}
} }
}
}
// Try to vectorize trees that start at compare instructions. // Try to vectorize trees that start at compare instructions.
if (CmpInst *CI = dyn_cast<CmpInst>(it)) { if (CmpInst *CI = dyn_cast<CmpInst>(it)) {
@ -4765,14 +4672,16 @@ bool SLPVectorizerPass::vectorizeChainsInBlock(BasicBlock *BB, BoUpSLP &R) {
continue; continue;
} }
for (int I = 0; I < 2; ++I) { for (int i = 0; i < 2; ++i) {
if (vectorizeRootInstruction(nullptr, CI->getOperand(I), BB, R, TTI)) { if (BinaryOperator *BI = dyn_cast<BinaryOperator>(CI->getOperand(i))) {
Changed = true; if (tryToVectorizePair(BI->getOperand(0), BI->getOperand(1), R)) {
// We would like to start over since some instructions are deleted Changed = true;
// and the iterator may become invalid value. // We would like to start over since some instructions are deleted
it = BB->begin(); // and the iterator may become invalid value.
e = BB->end(); it = BB->begin();
break; e = BB->end();
break;
}
} }
} }
continue; continue;

View File

@ -1,13 +1,15 @@
; RUN: llc -march=amdgcn -verify-machineinstrs< %s | FileCheck -check-prefix=SI %s ; RUN: llc -march=amdgcn -verify-machineinstrs< %s | FileCheck -check-prefix=GCN -check-prefix=SI %s
; RUN: llc -march=amdgcn -mcpu=fiji -verify-machineinstrs< %s | FileCheck -check-prefix=GCN -check-prefix=VI %s
; RUN: llc -march=r600 -mcpu=cypress < %s | FileCheck -check-prefix=EG %s ; RUN: llc -march=r600 -mcpu=cypress < %s | FileCheck -check-prefix=EG %s
declare i32 @llvm.r600.read.tidig.x() nounwind readnone declare i32 @llvm.r600.read.tidig.x() nounwind readnone
define void @trunc_i64_to_i32_store(i32 addrspace(1)* %out, i64 %in) { define void @trunc_i64_to_i32_store(i32 addrspace(1)* %out, i64 %in) {
; SI-LABEL: {{^}}trunc_i64_to_i32_store: ; GCN-LABEL: {{^}}trunc_i64_to_i32_store:
; SI: s_load_dword [[SLOAD:s[0-9]+]], s[0:1], 0xb ; GCN: s_load_dword [[SLOAD:s[0-9]+]], s[0:1],
; SI: v_mov_b32_e32 [[VLOAD:v[0-9]+]], [[SLOAD]] ; GCN: v_mov_b32_e32 [[VLOAD:v[0-9]+]], [[SLOAD]]
; SI: buffer_store_dword [[VLOAD]] ; SI: buffer_store_dword [[VLOAD]]
; VI: flat_store_dword v[{{[0-9:]+}}], [[VLOAD]]
; EG-LABEL: {{^}}trunc_i64_to_i32_store: ; EG-LABEL: {{^}}trunc_i64_to_i32_store:
; EG: MEM_RAT_CACHELESS STORE_RAW T0.X, T1.X, 1 ; EG: MEM_RAT_CACHELESS STORE_RAW T0.X, T1.X, 1
@ -18,12 +20,14 @@ define void @trunc_i64_to_i32_store(i32 addrspace(1)* %out, i64 %in) {
ret void ret void
} }
; SI-LABEL: {{^}}trunc_load_shl_i64: ; GCN-LABEL: {{^}}trunc_load_shl_i64:
; SI-DAG: s_load_dwordx2 ; GCN-DAG: s_load_dwordx2
; SI-DAG: s_load_dword [[SREG:s[0-9]+]], ; GCN-DAG: s_load_dword [[SREG:s[0-9]+]],
; SI: s_lshl_b32 [[SHL:s[0-9]+]], [[SREG]], 2 ; GCN: s_lshl_b32 [[SHL:s[0-9]+]], [[SREG]], 2
; SI: v_mov_b32_e32 [[VSHL:v[0-9]+]], [[SHL]] ; GCN: v_mov_b32_e32 [[VSHL:v[0-9]+]], [[SHL]]
; SI: buffer_store_dword [[VSHL]], ; SI: buffer_store_dword [[VSHL]]
; VI: flat_store_dword v[{{[0-9:]+}}], [[VSHL]]
define void @trunc_load_shl_i64(i32 addrspace(1)* %out, i64 %a) { define void @trunc_load_shl_i64(i32 addrspace(1)* %out, i64 %a) {
%b = shl i64 %a, 2 %b = shl i64 %a, 2
%result = trunc i64 %b to i32 %result = trunc i64 %b to i32
@ -31,15 +35,17 @@ define void @trunc_load_shl_i64(i32 addrspace(1)* %out, i64 %a) {
ret void ret void
} }
; SI-LABEL: {{^}}trunc_shl_i64: ; GCN-LABEL: {{^}}trunc_shl_i64:
; SI: s_load_dwordx2 s{{\[}}[[LO_SREG:[0-9]+]]:{{[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}}, 0xd ; SI: s_load_dwordx2 s{{\[}}[[LO_SREG:[0-9]+]]:{{[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}}, 0xd
; SI: s_lshl_b64 s{{\[}}[[LO_SHL:[0-9]+]]:{{[0-9]+\]}}, s{{\[}}[[LO_SREG]]:{{[0-9]+\]}}, 2 ; VI: s_load_dwordx2 s{{\[}}[[LO_SREG:[0-9]+]]:{{[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}}, 0x34
; SI: s_add_u32 s[[LO_SREG2:[0-9]+]], s[[LO_SHL]], ; GCN: s_lshl_b64 s{{\[}}[[LO_SHL:[0-9]+]]:{{[0-9]+\]}}, s{{\[}}[[LO_SREG]]:{{[0-9]+\]}}, 2
; SI: v_mov_b32_e32 v[[LO_VREG:[0-9]+]], s[[LO_SREG2]] ; GCN: s_add_u32 s[[LO_SREG2:[0-9]+]], s[[LO_SHL]],
; SI: s_addc_u32 ; GCN: v_mov_b32_e32 v[[LO_VREG:[0-9]+]], s[[LO_SREG2]]
; GCN: s_addc_u32
; SI: buffer_store_dword v[[LO_VREG]], ; SI: buffer_store_dword v[[LO_VREG]],
; SI: v_mov_b32_e32 ; VI: flat_store_dword v[{{[0-9:]+}}], v[[LO_VREG]]
; SI: v_mov_b32_e32 ; GCN: v_mov_b32_e32
; GCN: v_mov_b32_e32
define void @trunc_shl_i64(i64 addrspace(1)* %out2, i32 addrspace(1)* %out, i64 %a) { define void @trunc_shl_i64(i64 addrspace(1)* %out2, i32 addrspace(1)* %out, i64 %a) {
%aa = add i64 %a, 234 ; Prevent shrinking store. %aa = add i64 %a, 234 ; Prevent shrinking store.
%b = shl i64 %aa, 2 %b = shl i64 %aa, 2
@ -49,9 +55,9 @@ define void @trunc_shl_i64(i64 addrspace(1)* %out2, i32 addrspace(1)* %out, i64
ret void ret void
} }
; SI-LABEL: {{^}}trunc_i32_to_i1: ; GCN-LABEL: {{^}}trunc_i32_to_i1:
; SI: v_and_b32_e32 v{{[0-9]+}}, 1, v{{[0-9]+}} ; GCN: v_and_b32_e32 v{{[0-9]+}}, 1, v{{[0-9]+}}
; SI: v_cmp_eq_u32 ; GCN: v_cmp_eq_u32
define void @trunc_i32_to_i1(i32 addrspace(1)* %out, i32 addrspace(1)* %ptr) { define void @trunc_i32_to_i1(i32 addrspace(1)* %out, i32 addrspace(1)* %ptr) {
%a = load i32, i32 addrspace(1)* %ptr, align 4 %a = load i32, i32 addrspace(1)* %ptr, align 4
%trunc = trunc i32 %a to i1 %trunc = trunc i32 %a to i1
@ -60,9 +66,30 @@ define void @trunc_i32_to_i1(i32 addrspace(1)* %out, i32 addrspace(1)* %ptr) {
ret void ret void
} }
; SI-LABEL: {{^}}sgpr_trunc_i32_to_i1: ; GCN-LABEL: {{^}}trunc_i8_to_i1:
; SI: s_and_b32 s{{[0-9]+}}, 1, s{{[0-9]+}} ; GCN: v_and_b32_e32 v{{[0-9]+}}, 1, v{{[0-9]+}}
; SI: v_cmp_eq_u32 ; GCN: v_cmp_eq_u32
define void @trunc_i8_to_i1(i8 addrspace(1)* %out, i8 addrspace(1)* %ptr) {
%a = load i8, i8 addrspace(1)* %ptr, align 4
%trunc = trunc i8 %a to i1
%result = select i1 %trunc, i8 1, i8 0
store i8 %result, i8 addrspace(1)* %out, align 4
ret void
}
; GCN-LABEL: {{^}}sgpr_trunc_i16_to_i1:
; GCN: s_and_b32 s{{[0-9]+}}, 1, s{{[0-9]+}}
; GCN: v_cmp_eq_u32
define void @sgpr_trunc_i16_to_i1(i16 addrspace(1)* %out, i16 %a) {
%trunc = trunc i16 %a to i1
%result = select i1 %trunc, i16 1, i16 0
store i16 %result, i16 addrspace(1)* %out, align 4
ret void
}
; GCN-LABEL: {{^}}sgpr_trunc_i32_to_i1:
; GCN: s_and_b32 s{{[0-9]+}}, 1, s{{[0-9]+}}
; GCN: v_cmp_eq_u32
define void @sgpr_trunc_i32_to_i1(i32 addrspace(1)* %out, i32 %a) { define void @sgpr_trunc_i32_to_i1(i32 addrspace(1)* %out, i32 %a) {
%trunc = trunc i32 %a to i1 %trunc = trunc i32 %a to i1
%result = select i1 %trunc, i32 1, i32 0 %result = select i1 %trunc, i32 1, i32 0
@ -70,11 +97,12 @@ define void @sgpr_trunc_i32_to_i1(i32 addrspace(1)* %out, i32 %a) {
ret void ret void
} }
; SI-LABEL: {{^}}s_trunc_i64_to_i1: ; GCN-LABEL: {{^}}s_trunc_i64_to_i1:
; SI: s_load_dwordx2 s{{\[}}[[SLO:[0-9]+]]:{{[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0xb ; SI: s_load_dwordx2 s{{\[}}[[SLO:[0-9]+]]:{{[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0xb
; SI: s_and_b32 [[MASKED:s[0-9]+]], 1, s[[SLO]] ; VI: s_load_dwordx2 s{{\[}}[[SLO:[0-9]+]]:{{[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0x2c
; SI: v_cmp_eq_u32_e64 s{{\[}}[[VLO:[0-9]+]]:[[VHI:[0-9]+]]], [[MASKED]], 1{{$}} ; GCN: s_and_b32 [[MASKED:s[0-9]+]], 1, s[[SLO]]
; SI: v_cndmask_b32_e64 {{v[0-9]+}}, -12, 63, s{{\[}}[[VLO]]:[[VHI]]] ; GCN: v_cmp_eq_u32_e64 s{{\[}}[[VLO:[0-9]+]]:[[VHI:[0-9]+]]], [[MASKED]], 1{{$}}
; GCN: v_cndmask_b32_e64 {{v[0-9]+}}, -12, 63, s{{\[}}[[VLO]]:[[VHI]]]
define void @s_trunc_i64_to_i1(i32 addrspace(1)* %out, i64 %x) { define void @s_trunc_i64_to_i1(i32 addrspace(1)* %out, i64 %x) {
%trunc = trunc i64 %x to i1 %trunc = trunc i64 %x to i1
%sel = select i1 %trunc, i32 63, i32 -12 %sel = select i1 %trunc, i32 63, i32 -12
@ -82,11 +110,12 @@ define void @s_trunc_i64_to_i1(i32 addrspace(1)* %out, i64 %x) {
ret void ret void
} }
; SI-LABEL: {{^}}v_trunc_i64_to_i1: ; GCN-LABEL: {{^}}v_trunc_i64_to_i1:
; SI: buffer_load_dwordx2 v{{\[}}[[VLO:[0-9]+]]:{{[0-9]+\]}} ; SI: buffer_load_dwordx2 v{{\[}}[[VLO:[0-9]+]]:{{[0-9]+\]}}
; SI: v_and_b32_e32 [[MASKED:v[0-9]+]], 1, v[[VLO]] ; VI: flat_load_dwordx2 v{{\[}}[[VLO:[0-9]+]]:{{[0-9]+\]}}
; SI: v_cmp_eq_u32_e32 vcc, 1, [[MASKED]] ; GCN: v_and_b32_e32 [[MASKED:v[0-9]+]], 1, v[[VLO]]
; SI: v_cndmask_b32_e64 {{v[0-9]+}}, -12, 63, vcc ; GCN: v_cmp_eq_u32_e32 vcc, 1, [[MASKED]]
; GCN: v_cndmask_b32_e64 {{v[0-9]+}}, -12, 63, vcc
define void @v_trunc_i64_to_i1(i32 addrspace(1)* %out, i64 addrspace(1)* %in) { define void @v_trunc_i64_to_i1(i32 addrspace(1)* %out, i64 addrspace(1)* %in) {
%tid = call i32 @llvm.r600.read.tidig.x() nounwind readnone %tid = call i32 @llvm.r600.read.tidig.x() nounwind readnone
%gep = getelementptr i64, i64 addrspace(1)* %in, i32 %tid %gep = getelementptr i64, i64 addrspace(1)* %in, i32 %tid

View File

@ -1,4 +1,4 @@
; RUN: opt < %s -correlated-propagation -S | FileCheck %s ; RUN: opt < %s -correlated-propagation -cvp-dont-process-adds=false -S | FileCheck %s
; CHECK-LABEL: @test0( ; CHECK-LABEL: @test0(
define void @test0(i32 %a) { define void @test0(i32 %a) {

View File

@ -222,3 +222,23 @@ define i32 @test15(i32 %X1, i32 %X2, i32 %X3) {
; CHECK-LABEL: @test15 ; CHECK-LABEL: @test15
; CHECK: and i1 %A, %B ; CHECK: and i1 %A, %B
} }
; PR30256 - previously this asserted.
; CHECK-LABEL: @test16
; CHECK: %[[FACTOR:.*]] = mul i64 %a, -4
; CHECK-NEXT: %[[RES:.*]] = add i64 %[[FACTOR]], %b
; CHECK-NEXT: ret i64 %[[RES]]
define i64 @test16(i1 %cmp, i64 %a, i64 %b) {
entry:
%shl = shl i64 %a, 1
%shl.neg = sub i64 0, %shl
br i1 %cmp, label %if.then, label %if.end
if.then: ; preds = %entry
%add1 = add i64 %shl.neg, %shl.neg
%add2 = add i64 %add1, %b
ret i64 %add2
if.end: ; preds = %entry
ret i64 0
}

View File

@ -12,25 +12,26 @@ define float @baz() {
; CHECK-NEXT: [[TMP0:%.*]] = load i32, i32* @n, align 4 ; CHECK-NEXT: [[TMP0:%.*]] = load i32, i32* @n, align 4
; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[TMP0]], 3 ; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[TMP0]], 3
; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float ; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float
; CHECK-NEXT: [[TMP1:%.*]] = load <2 x float>, <2 x float>* bitcast ([20 x float]* @arr to <2 x float>*), align 16 ; CHECK-NEXT: [[TMP1:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 0), align 16
; CHECK-NEXT: [[TMP2:%.*]] = load <2 x float>, <2 x float>* bitcast ([20 x float]* @arr1 to <2 x float>*), align 16 ; CHECK-NEXT: [[TMP2:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 0), align 16
; CHECK-NEXT: [[TMP3:%.*]] = fmul fast <2 x float> [[TMP2]], [[TMP1]] ; CHECK-NEXT: [[MUL4:%.*]] = fmul fast float [[TMP2]], [[TMP1]]
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP3]], i32 0 ; CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[MUL4]], [[CONV]]
; CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[TMP4]], [[CONV]] ; CHECK-NEXT: [[TMP3:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 1), align 4
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP3]], i32 1 ; CHECK-NEXT: [[TMP4:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 1), align 4
; CHECK-NEXT: [[ADD_1:%.*]] = fadd fast float [[TMP5]], [[ADD]] ; CHECK-NEXT: [[MUL4_1:%.*]] = fmul fast float [[TMP4]], [[TMP3]]
; CHECK-NEXT: [[TMP6:%.*]] = load <2 x float>, <2 x float>* bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 2) to <2 x float>*), align 8 ; CHECK-NEXT: [[ADD_1:%.*]] = fadd fast float [[MUL4_1]], [[ADD]]
; CHECK-NEXT: [[TMP7:%.*]] = load <2 x float>, <2 x float>* bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 2) to <2 x float>*), align 8 ; CHECK-NEXT: [[TMP5:%.*]] = load <2 x float>, <2 x float>* bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 2) to <2 x float>*), align 8
; CHECK-NEXT: [[TMP8:%.*]] = fmul fast <2 x float> [[TMP7]], [[TMP6]] ; CHECK-NEXT: [[TMP6:%.*]] = load <2 x float>, <2 x float>* bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 2) to <2 x float>*), align 8
; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x float> [[TMP8]], i32 0 ; CHECK-NEXT: [[TMP7:%.*]] = fmul fast <2 x float> [[TMP6]], [[TMP5]]
; CHECK-NEXT: [[ADD_2:%.*]] = fadd fast float [[TMP9]], [[ADD_1]] ; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x float> [[TMP7]], i32 0
; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x float> [[TMP8]], i32 1 ; CHECK-NEXT: [[ADD_2:%.*]] = fadd fast float [[TMP8]], [[ADD_1]]
; CHECK-NEXT: [[ADD_3:%.*]] = fadd fast float [[TMP10]], [[ADD_2]] ; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x float> [[TMP7]], i32 1
; CHECK-NEXT: [[ADD_3:%.*]] = fadd fast float [[TMP9]], [[ADD_2]]
; CHECK-NEXT: [[ADD7:%.*]] = fadd fast float [[ADD_3]], [[CONV]] ; CHECK-NEXT: [[ADD7:%.*]] = fadd fast float [[ADD_3]], [[CONV]]
; CHECK-NEXT: [[ADD19:%.*]] = fadd fast float [[TMP4]], [[ADD7]] ; CHECK-NEXT: [[ADD19:%.*]] = fadd fast float [[MUL4]], [[ADD7]]
; CHECK-NEXT: [[ADD19_1:%.*]] = fadd fast float [[TMP5]], [[ADD19]] ; CHECK-NEXT: [[ADD19_1:%.*]] = fadd fast float [[MUL4_1]], [[ADD19]]
; CHECK-NEXT: [[ADD19_2:%.*]] = fadd fast float [[TMP9]], [[ADD19_1]] ; CHECK-NEXT: [[ADD19_2:%.*]] = fadd fast float [[TMP8]], [[ADD19_1]]
; CHECK-NEXT: [[ADD19_3:%.*]] = fadd fast float [[TMP10]], [[ADD19_2]] ; CHECK-NEXT: [[ADD19_3:%.*]] = fadd fast float [[TMP9]], [[ADD19_2]]
; CHECK-NEXT: store float [[ADD19_3]], float* @res, align 4 ; CHECK-NEXT: store float [[ADD19_3]], float* @res, align 4
; CHECK-NEXT: ret float [[ADD19_3]] ; CHECK-NEXT: ret float [[ADD19_3]]
; ;
@ -69,37 +70,40 @@ define float @bazz() {
; CHECK-NEXT: [[TMP0:%.*]] = load i32, i32* @n, align 4 ; CHECK-NEXT: [[TMP0:%.*]] = load i32, i32* @n, align 4
; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[TMP0]], 3 ; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[TMP0]], 3
; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float ; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float
; CHECK-NEXT: [[TMP1:%.*]] = load <2 x float>, <2 x float>* bitcast ([20 x float]* @arr to <2 x float>*), align 16 ; CHECK-NEXT: [[TMP1:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 0), align 16
; CHECK-NEXT: [[TMP2:%.*]] = load <2 x float>, <2 x float>* bitcast ([20 x float]* @arr1 to <2 x float>*), align 16 ; CHECK-NEXT: [[TMP2:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 0), align 16
; CHECK-NEXT: [[TMP3:%.*]] = fmul fast <2 x float> [[TMP2]], [[TMP1]] ; CHECK-NEXT: [[MUL4:%.*]] = fmul fast float [[TMP2]], [[TMP1]]
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP3]], i32 0 ; CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[MUL4]], [[CONV]]
; CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[TMP4]], [[CONV]] ; CHECK-NEXT: [[TMP3:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 1), align 4
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP3]], i32 1 ; CHECK-NEXT: [[TMP4:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 1), align 4
; CHECK-NEXT: [[ADD_1:%.*]] = fadd fast float [[TMP5]], [[ADD]] ; CHECK-NEXT: [[MUL4_1:%.*]] = fmul fast float [[TMP4]], [[TMP3]]
; CHECK-NEXT: [[TMP6:%.*]] = load <2 x float>, <2 x float>* bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 2) to <2 x float>*), align 8 ; CHECK-NEXT: [[ADD_1:%.*]] = fadd fast float [[MUL4_1]], [[ADD]]
; CHECK-NEXT: [[TMP7:%.*]] = load <2 x float>, <2 x float>* bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 2) to <2 x float>*), align 8 ; CHECK-NEXT: [[TMP5:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 2), align 8
; CHECK-NEXT: [[TMP8:%.*]] = fmul fast <2 x float> [[TMP7]], [[TMP6]] ; CHECK-NEXT: [[TMP6:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 2), align 8
; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x float> [[TMP8]], i32 0 ; CHECK-NEXT: [[MUL4_2:%.*]] = fmul fast float [[TMP6]], [[TMP5]]
; CHECK-NEXT: [[ADD_2:%.*]] = fadd fast float [[TMP9]], [[ADD_1]] ; CHECK-NEXT: [[ADD_2:%.*]] = fadd fast float [[MUL4_2]], [[ADD_1]]
; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x float> [[TMP8]], i32 1 ; CHECK-NEXT: [[TMP7:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 3), align 4
; CHECK-NEXT: [[ADD_3:%.*]] = fadd fast float [[TMP10]], [[ADD_2]] ; CHECK-NEXT: [[TMP8:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 3), align 4
; CHECK-NEXT: [[MUL4_3:%.*]] = fmul fast float [[TMP8]], [[TMP7]]
; CHECK-NEXT: [[ADD_3:%.*]] = fadd fast float [[MUL4_3]], [[ADD_2]]
; CHECK-NEXT: [[MUL5:%.*]] = shl nsw i32 [[TMP0]], 2 ; CHECK-NEXT: [[MUL5:%.*]] = shl nsw i32 [[TMP0]], 2
; CHECK-NEXT: [[CONV6:%.*]] = sitofp i32 [[MUL5]] to float ; CHECK-NEXT: [[CONV6:%.*]] = sitofp i32 [[MUL5]] to float
; CHECK-NEXT: [[ADD7:%.*]] = fadd fast float [[ADD_3]], [[CONV6]] ; CHECK-NEXT: [[ADD7:%.*]] = fadd fast float [[ADD_3]], [[CONV6]]
; CHECK-NEXT: [[TMP11:%.*]] = load <2 x float>, <2 x float>* bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 4) to <2 x float>*), align 16 ; CHECK-NEXT: [[TMP9:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 4), align 16
; CHECK-NEXT: [[TMP12:%.*]] = load <2 x float>, <2 x float>* bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 4) to <2 x float>*), align 16 ; CHECK-NEXT: [[TMP10:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 4), align 16
; CHECK-NEXT: [[TMP13:%.*]] = fmul fast <2 x float> [[TMP12]], [[TMP11]] ; CHECK-NEXT: [[MUL18:%.*]] = fmul fast float [[TMP10]], [[TMP9]]
; CHECK-NEXT: [[TMP14:%.*]] = extractelement <2 x float> [[TMP13]], i32 0 ; CHECK-NEXT: [[ADD19:%.*]] = fadd fast float [[MUL18]], [[ADD7]]
; CHECK-NEXT: [[ADD19:%.*]] = fadd fast float [[TMP14]], [[ADD7]] ; CHECK-NEXT: [[TMP11:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 5), align 4
; CHECK-NEXT: [[TMP15:%.*]] = extractelement <2 x float> [[TMP13]], i32 1 ; CHECK-NEXT: [[TMP12:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 5), align 4
; CHECK-NEXT: [[ADD19_1:%.*]] = fadd fast float [[TMP15]], [[ADD19]] ; CHECK-NEXT: [[MUL18_1:%.*]] = fmul fast float [[TMP12]], [[TMP11]]
; CHECK-NEXT: [[TMP16:%.*]] = load <2 x float>, <2 x float>* bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 6) to <2 x float>*), align 8 ; CHECK-NEXT: [[ADD19_1:%.*]] = fadd fast float [[MUL18_1]], [[ADD19]]
; CHECK-NEXT: [[TMP17:%.*]] = load <2 x float>, <2 x float>* bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 6) to <2 x float>*), align 8 ; CHECK-NEXT: [[TMP13:%.*]] = load <2 x float>, <2 x float>* bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 6) to <2 x float>*), align 8
; CHECK-NEXT: [[TMP18:%.*]] = fmul fast <2 x float> [[TMP17]], [[TMP16]] ; CHECK-NEXT: [[TMP14:%.*]] = load <2 x float>, <2 x float>* bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 6) to <2 x float>*), align 8
; CHECK-NEXT: [[TMP19:%.*]] = extractelement <2 x float> [[TMP18]], i32 0 ; CHECK-NEXT: [[TMP15:%.*]] = fmul fast <2 x float> [[TMP14]], [[TMP13]]
; CHECK-NEXT: [[ADD19_2:%.*]] = fadd fast float [[TMP19]], [[ADD19_1]] ; CHECK-NEXT: [[TMP16:%.*]] = extractelement <2 x float> [[TMP15]], i32 0
; CHECK-NEXT: [[TMP20:%.*]] = extractelement <2 x float> [[TMP18]], i32 1 ; CHECK-NEXT: [[ADD19_2:%.*]] = fadd fast float [[TMP16]], [[ADD19_1]]
; CHECK-NEXT: [[ADD19_3:%.*]] = fadd fast float [[TMP20]], [[ADD19_2]] ; CHECK-NEXT: [[TMP17:%.*]] = extractelement <2 x float> [[TMP15]], i32 1
; CHECK-NEXT: [[ADD19_3:%.*]] = fadd fast float [[TMP17]], [[ADD19_2]]
; CHECK-NEXT: store float [[ADD19_3]], float* @res, align 4 ; CHECK-NEXT: store float [[ADD19_3]], float* @res, align 4
; CHECK-NEXT: ret float [[ADD19_3]] ; CHECK-NEXT: ret float [[ADD19_3]]
; ;
@ -151,20 +155,24 @@ define float @bazzz() {
; CHECK-NEXT: entry: ; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.*]] = load i32, i32* @n, align 4 ; CHECK-NEXT: [[TMP0:%.*]] = load i32, i32* @n, align 4
; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[TMP0]] to float ; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[TMP0]] to float
; CHECK-NEXT: [[TMP1:%.*]] = load <4 x float>, <4 x float>* bitcast ([20 x float]* @arr to <4 x float>*), align 16 ; CHECK-NEXT: [[TMP1:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 0), align 16
; CHECK-NEXT: [[TMP2:%.*]] = load <4 x float>, <4 x float>* bitcast ([20 x float]* @arr1 to <4 x float>*), align 16 ; CHECK-NEXT: [[TMP2:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 0), align 16
; CHECK-NEXT: [[TMP3:%.*]] = fmul fast <4 x float> [[TMP2]], [[TMP1]] ; CHECK-NEXT: [[MUL:%.*]] = fmul fast float [[TMP2]], [[TMP1]]
; CHECK-NEXT: [[TMP4:%.*]] = fadd fast float undef, undef ; CHECK-NEXT: [[TMP3:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 1), align 4
; CHECK-NEXT: [[TMP5:%.*]] = fadd fast float undef, [[TMP4]] ; CHECK-NEXT: [[TMP4:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 1), align 4
; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef> ; CHECK-NEXT: [[MUL_1:%.*]] = fmul fast float [[TMP4]], [[TMP3]]
; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x float> [[TMP3]], [[RDX_SHUF]] ; CHECK-NEXT: [[TMP5:%.*]] = fadd fast float [[MUL_1]], [[MUL]]
; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef> ; CHECK-NEXT: [[TMP6:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 2), align 8
; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <4 x float> [[BIN_RDX]], [[RDX_SHUF1]] ; CHECK-NEXT: [[TMP7:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 2), align 8
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[BIN_RDX2]], i32 0 ; CHECK-NEXT: [[MUL_2:%.*]] = fmul fast float [[TMP7]], [[TMP6]]
; CHECK-NEXT: [[TMP7:%.*]] = fadd fast float undef, [[TMP5]] ; CHECK-NEXT: [[TMP8:%.*]] = fadd fast float [[MUL_2]], [[TMP5]]
; CHECK-NEXT: [[TMP8:%.*]] = fmul fast float [[CONV]], [[TMP6]] ; CHECK-NEXT: [[TMP9:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 3), align 4
; CHECK-NEXT: store float [[TMP8]], float* @res, align 4 ; CHECK-NEXT: [[TMP10:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 3), align 4
; CHECK-NEXT: ret float [[TMP8]] ; CHECK-NEXT: [[MUL_3:%.*]] = fmul fast float [[TMP10]], [[TMP9]]
; CHECK-NEXT: [[TMP11:%.*]] = fadd fast float [[MUL_3]], [[TMP8]]
; CHECK-NEXT: [[TMP12:%.*]] = fmul fast float [[CONV]], [[TMP11]]
; CHECK-NEXT: store float [[TMP12]], float* @res, align 4
; CHECK-NEXT: ret float [[TMP12]]
; ;
entry: entry:
%0 = load i32, i32* @n, align 4 %0 = load i32, i32* @n, align 4
@ -194,19 +202,23 @@ define i32 @foo() {
; CHECK-NEXT: entry: ; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.*]] = load i32, i32* @n, align 4 ; CHECK-NEXT: [[TMP0:%.*]] = load i32, i32* @n, align 4
; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[TMP0]] to float ; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[TMP0]] to float
; CHECK-NEXT: [[TMP1:%.*]] = load <4 x float>, <4 x float>* bitcast ([20 x float]* @arr to <4 x float>*), align 16 ; CHECK-NEXT: [[TMP1:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 0), align 16
; CHECK-NEXT: [[TMP2:%.*]] = load <4 x float>, <4 x float>* bitcast ([20 x float]* @arr1 to <4 x float>*), align 16 ; CHECK-NEXT: [[TMP2:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 0), align 16
; CHECK-NEXT: [[TMP3:%.*]] = fmul fast <4 x float> [[TMP2]], [[TMP1]] ; CHECK-NEXT: [[MUL:%.*]] = fmul fast float [[TMP2]], [[TMP1]]
; CHECK-NEXT: [[TMP4:%.*]] = fadd fast float undef, undef ; CHECK-NEXT: [[TMP3:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 1), align 4
; CHECK-NEXT: [[TMP5:%.*]] = fadd fast float undef, [[TMP4]] ; CHECK-NEXT: [[TMP4:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 1), align 4
; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef> ; CHECK-NEXT: [[MUL_1:%.*]] = fmul fast float [[TMP4]], [[TMP3]]
; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x float> [[TMP3]], [[RDX_SHUF]] ; CHECK-NEXT: [[TMP5:%.*]] = fadd fast float [[MUL_1]], [[MUL]]
; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef> ; CHECK-NEXT: [[TMP6:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 2), align 8
; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <4 x float> [[BIN_RDX]], [[RDX_SHUF1]] ; CHECK-NEXT: [[TMP7:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 2), align 8
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[BIN_RDX2]], i32 0 ; CHECK-NEXT: [[MUL_2:%.*]] = fmul fast float [[TMP7]], [[TMP6]]
; CHECK-NEXT: [[TMP7:%.*]] = fadd fast float undef, [[TMP5]] ; CHECK-NEXT: [[TMP8:%.*]] = fadd fast float [[MUL_2]], [[TMP5]]
; CHECK-NEXT: [[TMP8:%.*]] = fmul fast float [[CONV]], [[TMP6]] ; CHECK-NEXT: [[TMP9:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 3), align 4
; CHECK-NEXT: [[CONV4:%.*]] = fptosi float [[TMP8]] to i32 ; CHECK-NEXT: [[TMP10:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 3), align 4
; CHECK-NEXT: [[MUL_3:%.*]] = fmul fast float [[TMP10]], [[TMP9]]
; CHECK-NEXT: [[TMP11:%.*]] = fadd fast float [[MUL_3]], [[TMP8]]
; CHECK-NEXT: [[TMP12:%.*]] = fmul fast float [[CONV]], [[TMP11]]
; CHECK-NEXT: [[CONV4:%.*]] = fptosi float [[TMP12]] to i32
; CHECK-NEXT: store i32 [[CONV4]], i32* @n, align 4 ; CHECK-NEXT: store i32 [[CONV4]], i32* @n, align 4
; CHECK-NEXT: ret i32 [[CONV4]] ; CHECK-NEXT: ret i32 [[CONV4]]
; ;