Vendor import of llvm release_40 branch r296202:

https://llvm.org/svn/llvm-project/llvm/branches/release_40@296202
This commit is contained in:
dim 2017-02-25 14:40:33 +00:00
parent d8599d273b
commit d42bfc97e3
11 changed files with 297 additions and 380 deletions

View File

@ -5,12 +5,6 @@ LLVM 4.0.0 Release Notes
.. contents::
:local:
.. warning::
These are in-progress notes for the upcoming LLVM 4.0.0 release. You may
prefer the `LLVM 3.9 Release Notes <http://llvm.org/releases/3.9.0/docs
/ReleaseNotes.html>`_.
Introduction
============
@ -28,74 +22,56 @@ them.
Non-comprehensive list of changes in this release
=================================================
* The C API functions LLVMAddFunctionAttr, LLVMGetFunctionAttr,
LLVMRemoveFunctionAttr, LLVMAddAttribute, LLVMRemoveAttribute,
LLVMGetAttribute, LLVMAddInstrAttribute and
LLVMRemoveInstrAttribute have been removed.
* The C API enum LLVMAttribute has been deleted.
.. NOTE
For small 1-3 sentence descriptions, just add an entry at the end of
this list. If your description won't fit comfortably in one bullet
point (e.g. maybe you would like to give an example of the
functionality, or simply have a lot to talk about), see the `NOTE` below
for adding a new subsection.
* The definition and uses of LLVM_ATRIBUTE_UNUSED_RESULT in the LLVM source
were replaced with LLVM_NODISCARD, which matches the C++17 [[nodiscard]]
semantics rather than gcc's __attribute__((warn_unused_result)).
* Minimum compiler version to build has been raised to GCC 4.8 and VS 2015.
* The C API functions ``LLVMAddFunctionAttr``, ``LLVMGetFunctionAttr``,
``LLVMRemoveFunctionAttr``, ``LLVMAddAttribute``, ``LLVMRemoveAttribute``,
``LLVMGetAttribute``, ``LLVMAddInstrAttribute`` and
``LLVMRemoveInstrAttribute`` have been removed.
* The C API enum ``LLVMAttribute`` has been deleted.
* The definition and uses of ``LLVM_ATRIBUTE_UNUSED_RESULT`` in the LLVM source
were replaced with ``LLVM_NODISCARD``, which matches the C++17 ``[[nodiscard]]``
semantics rather than gcc's ``__attribute__((warn_unused_result))``.
* The Timer related APIs now expect a Name and Description. When upgrading code
the previously used names should become descriptions and a short name in the
style of a programming language identifier should be added.
* LLVM now handles invariant.group across different basic blocks, which makes
* LLVM now handles ``invariant.group`` across different basic blocks, which makes
it possible to devirtualize virtual calls inside loops.
* The aggressive dead code elimination phase ("adce") now remove
* The aggressive dead code elimination phase ("adce") now removes
branches which do not effect program behavior. Loops are retained by
default since they may be infinite but these can also be removed
with LLVM option -adce-remove-loops when the loop body otherwise has
with LLVM option ``-adce-remove-loops`` when the loop body otherwise has
no live operations.
* The GVNHoist pass is now enabled by default. The new pass based on Global
Value Numbering detects similar computations in branch code and replaces
multiple instances of the same computation with a unique expression. The
transform benefits code size and generates better schedules. GVNHoist is
more aggressive at -Os and -Oz, hoisting more expressions at the expense of
execution time degradations.
more aggressive at ``-Os`` and ``-Oz``, hoisting more expressions at the
expense of execution time degradations.
* The llvm-cov tool can now export coverage data as json. Its html output mode
has also improved.
* ... next change ...
Improvements to ThinLTO (-flto=thin)
------------------------------------
Integration with profile data (PGO). When available, profile data
enables more accurate function importing decisions, as well as
cross-module indirect call promotion.
.. NOTE
If you would like to document a larger change, then you can add a
subsection about it right here. You can copy the following boilerplate
and un-indent it (the indentation causes it to be inside this comment).
Special New Feature
-------------------
Makes programs 10x faster by doing Special New Thing.
Improvements to ThinLTO (-flto=thin)
------------------------------------
* Integration with profile data (PGO). When available, profile data
enables more accurate function importing decisions, as well as
cross-module indirect call promotion.
* Significant build-time and binary-size improvements when compiling with
debug info (-g).
Significant build-time and binary-size improvements when compiling with
debug info (-g).
LLVM Coroutines
---------------
Experimental support for :doc:`Coroutines` was added, which can be enabled
with ``-enable-coroutines`` in ``opt`` command tool or using
with ``-enable-coroutines`` in ``opt`` the command tool or using the
``addCoroutinePassesToExtensionPoints`` API when building the optimization
pipeline.
@ -106,18 +82,18 @@ For more information on LLVM Coroutines and the LLVM implementation, see
Regcall and Vectorcall Calling Conventions
--------------------------------------------------
Support was added for _regcall calling convention.
Existing __vectorcall calling convention support was extended to include
Support was added for ``_regcall`` calling convention.
Existing ``__vectorcall`` calling convention support was extended to include
correct handling of HVAs.
The __vectorcall calling convention was introduced by Microsoft to
The ``__vectorcall`` calling convention was introduced by Microsoft to
enhance register usage when passing parameters.
For more information please read `__vectorcall documentation
<https://msdn.microsoft.com/en-us/library/dn375768.aspx>`_.
The __regcall calling convention was introduced by Intel to
The ``__regcall`` calling convention was introduced by Intel to
optimize parameter transfer on function call.
This calling convention ensures that as many values as possible are
This calling convention ensures that as many values as possible are
passed or returned in registers.
For more information please read `__regcall documentation
<https://software.intel.com/en-us/node/693069>`_.
@ -127,7 +103,7 @@ Code Generation Testing
Passes that work on the machine instruction representation can be tested with
the .mir serialization format. ``llc`` supports the ``-run-pass``,
``-stop-after``, ``-stop-before``, ``-start-after``, ``-start-before`` to to
``-stop-after``, ``-stop-before``, ``-start-after``, ``-start-before`` to
run a single pass of the code generation pipeline, or to stop or start the code
generation pipeline at a given point.
@ -211,9 +187,6 @@ changes landed in this release.
``&*I`` (if not ``end()``); alternatively, clients may refactor to use
references for known-good nodes.
Changes to the LLVM IR
----------------------
Changes to the ARM Targets
--------------------------
@ -244,28 +217,6 @@ Changes to the ARM Targets
A lot of work has also been done in LLD for ARM, which now supports more
relocations and TLS.
Changes to the MIPS Target
--------------------------
During this release ...
Changes to the PowerPC Target
-----------------------------
During this release ...
Changes to the X86 Target
-------------------------
During this release ...
Changes to the AMDGPU Target
-----------------------------
During this release ...
Changes to the AVR Target
-----------------------------
@ -297,8 +248,6 @@ Changes to the OCaml bindings
External Open Source Projects Using LLVM 4.0.0
==============================================
* A project...
LDC - the LLVM-based D compiler
-------------------------------

View File

@ -92,12 +92,6 @@ private:
/// collected in GEPs.
bool vectorizeGEPIndices(BasicBlock *BB, slpvectorizer::BoUpSLP &R);
/// Try to find horizontal reduction or otherwise vectorize a chain of binary
/// operators.
bool vectorizeRootInstruction(PHINode *P, Value *V, BasicBlock *BB,
slpvectorizer::BoUpSLP &R,
TargetTransformInfo *TTI);
/// \brief Scan the basic block and look for patterns that are likely to start
/// a vectorization chain.
bool vectorizeChainsInBlock(BasicBlock *BB, slpvectorizer::BoUpSLP &R);

View File

@ -996,6 +996,11 @@ def : Pat <
(V_CMP_EQ_U32_e64 (S_AND_B32 (i32 1), $a), (i32 1))
>;
def : Pat <
(i1 (trunc i16:$a)),
(V_CMP_EQ_U32_e64 (S_AND_B32 (i32 1), $a), (i32 1))
>;
def : Pat <
(i1 (trunc i64:$a)),
(V_CMP_EQ_U32_e64 (S_AND_B32 (i32 1),

View File

@ -607,12 +607,6 @@ def : Pat<
(COPY $src)
>;
def : Pat<
(i1 (trunc i16:$src)),
(COPY $src)
>;
def : Pat <
(i16 (trunc i64:$src)),
(EXTRACT_SUBREG $src, sub0)

View File

@ -41,6 +41,8 @@ STATISTIC(NumSDivs, "Number of sdiv converted to udiv");
STATISTIC(NumAShrs, "Number of ashr converted to lshr");
STATISTIC(NumSRems, "Number of srem converted to urem");
static cl::opt<bool> DontProcessAdds("cvp-dont-process-adds", cl::init(true));
namespace {
class CorrelatedValuePropagation : public FunctionPass {
public:
@ -405,6 +407,9 @@ static bool processAShr(BinaryOperator *SDI, LazyValueInfo *LVI) {
static bool processAdd(BinaryOperator *AddOp, LazyValueInfo *LVI) {
typedef OverflowingBinaryOperator OBO;
if (DontProcessAdds)
return false;
if (AddOp->getType()->isVectorTy() || hasLocalDefs(AddOp))
return false;

View File

@ -1521,8 +1521,8 @@ Value *ReassociatePass::OptimizeAdd(Instruction *I,
if (ConstantInt *CI = dyn_cast<ConstantInt>(Factor)) {
if (CI->isNegative() && !CI->isMinValue(true)) {
Factor = ConstantInt::get(CI->getContext(), -CI->getValue());
assert(!Duplicates.count(Factor) &&
"Shouldn't have two constant factors, missed a canonicalize");
if (!Duplicates.insert(Factor).second)
continue;
unsigned Occ = ++FactorOccurrences[Factor];
if (Occ > MaxOcc) {
MaxOcc = Occ;
@ -1534,8 +1534,8 @@ Value *ReassociatePass::OptimizeAdd(Instruction *I,
APFloat F(CF->getValueAPF());
F.changeSign();
Factor = ConstantFP::get(CF->getContext(), F);
assert(!Duplicates.count(Factor) &&
"Shouldn't have two constant factors, missed a canonicalize");
if (!Duplicates.insert(Factor).second)
continue;
unsigned Occ = ++FactorOccurrences[Factor];
if (Occ > MaxOcc) {
MaxOcc = Occ;

View File

@ -4026,40 +4026,36 @@ bool SLPVectorizerPass::tryToVectorize(BinaryOperator *V, BoUpSLP &R) {
if (!V)
return false;
Value *P = V->getParent();
// Vectorize in current basic block only.
auto *Op0 = dyn_cast<Instruction>(V->getOperand(0));
auto *Op1 = dyn_cast<Instruction>(V->getOperand(1));
if (!Op0 || !Op1 || Op0->getParent() != P || Op1->getParent() != P)
return false;
// Try to vectorize V.
if (tryToVectorizePair(Op0, Op1, R))
if (tryToVectorizePair(V->getOperand(0), V->getOperand(1), R))
return true;
auto *A = dyn_cast<BinaryOperator>(Op0);
auto *B = dyn_cast<BinaryOperator>(Op1);
BinaryOperator *A = dyn_cast<BinaryOperator>(V->getOperand(0));
BinaryOperator *B = dyn_cast<BinaryOperator>(V->getOperand(1));
// Try to skip B.
if (B && B->hasOneUse()) {
auto *B0 = dyn_cast<BinaryOperator>(B->getOperand(0));
auto *B1 = dyn_cast<BinaryOperator>(B->getOperand(1));
if (B0 && B0->getParent() == P && tryToVectorizePair(A, B0, R))
BinaryOperator *B0 = dyn_cast<BinaryOperator>(B->getOperand(0));
BinaryOperator *B1 = dyn_cast<BinaryOperator>(B->getOperand(1));
if (tryToVectorizePair(A, B0, R)) {
return true;
if (B1 && B1->getParent() == P && tryToVectorizePair(A, B1, R))
}
if (tryToVectorizePair(A, B1, R)) {
return true;
}
}
// Try to skip A.
if (A && A->hasOneUse()) {
auto *A0 = dyn_cast<BinaryOperator>(A->getOperand(0));
auto *A1 = dyn_cast<BinaryOperator>(A->getOperand(1));
if (A0 && A0->getParent() == P && tryToVectorizePair(A0, B, R))
BinaryOperator *A0 = dyn_cast<BinaryOperator>(A->getOperand(0));
BinaryOperator *A1 = dyn_cast<BinaryOperator>(A->getOperand(1));
if (tryToVectorizePair(A0, B, R)) {
return true;
if (A1 && A1->getParent() == P && tryToVectorizePair(A1, B, R))
}
if (tryToVectorizePair(A1, B, R)) {
return true;
}
}
return false;
return 0;
}
/// \brief Generate a shuffle mask to be used in a reduction tree.
@ -4511,143 +4507,29 @@ static Value *getReductionValue(const DominatorTree *DT, PHINode *P,
return nullptr;
}
namespace {
/// Tracks instructons and its children.
class WeakVHWithLevel final : public CallbackVH {
/// Operand index of the instruction currently beeing analized.
unsigned Level = 0;
/// Is this the instruction that should be vectorized, or are we now
/// processing children (i.e. operands of this instruction) for potential
/// vectorization?
bool IsInitial = true;
public:
explicit WeakVHWithLevel() = default;
WeakVHWithLevel(Value *V) : CallbackVH(V){};
/// Restart children analysis each time it is repaced by the new instruction.
void allUsesReplacedWith(Value *New) override {
setValPtr(New);
Level = 0;
IsInitial = true;
}
/// Check if the instruction was not deleted during vectorization.
bool isValid() const { return !getValPtr(); }
/// Is the istruction itself must be vectorized?
bool isInitial() const { return IsInitial; }
/// Try to vectorize children.
void clearInitial() { IsInitial = false; }
/// Are all children processed already?
bool isFinal() const {
assert(getValPtr() &&
(isa<Instruction>(getValPtr()) &&
cast<Instruction>(getValPtr())->getNumOperands() >= Level));
return getValPtr() &&
cast<Instruction>(getValPtr())->getNumOperands() == Level;
}
/// Get next child operation.
Value *nextOperand() {
assert(getValPtr() && isa<Instruction>(getValPtr()) &&
cast<Instruction>(getValPtr())->getNumOperands() > Level);
return cast<Instruction>(getValPtr())->getOperand(Level++);
}
virtual ~WeakVHWithLevel() = default;
};
} // namespace
/// \brief Attempt to reduce a horizontal reduction.
/// If it is legal to match a horizontal reduction feeding
/// the phi node P with reduction operators Root in a basic block BB, then check
/// if it can be done.
/// the phi node P with reduction operators BI, then check if it
/// can be done.
/// \returns true if a horizontal reduction was matched and reduced.
/// \returns false if a horizontal reduction was not matched.
static bool canBeVectorized(
PHINode *P, Instruction *Root, BasicBlock *BB, BoUpSLP &R,
TargetTransformInfo *TTI,
const function_ref<bool(BinaryOperator *, BoUpSLP &)> Vectorize) {
static bool canMatchHorizontalReduction(PHINode *P, BinaryOperator *BI,
BoUpSLP &R, TargetTransformInfo *TTI,
unsigned MinRegSize) {
if (!ShouldVectorizeHor)
return false;
if (!Root)
HorizontalReduction HorRdx(MinRegSize);
if (!HorRdx.matchAssociativeReduction(P, BI))
return false;
if (Root->getParent() != BB)
return false;
SmallVector<WeakVHWithLevel, 8> Stack(1, Root);
SmallSet<Value *, 8> VisitedInstrs;
bool Res = false;
while (!Stack.empty()) {
Value *V = Stack.back();
if (!V) {
Stack.pop_back();
continue;
}
auto *Inst = dyn_cast<Instruction>(V);
if (!Inst || isa<PHINode>(Inst)) {
Stack.pop_back();
continue;
}
if (Stack.back().isInitial()) {
Stack.back().clearInitial();
if (auto *BI = dyn_cast<BinaryOperator>(Inst)) {
HorizontalReduction HorRdx(R.getMinVecRegSize());
if (HorRdx.matchAssociativeReduction(P, BI)) {
// If there is a sufficient number of reduction values, reduce
// to a nearby power-of-2. Can safely generate oversized
// vectors and rely on the backend to split them to legal sizes.
HorRdx.ReduxWidth =
std::max((uint64_t)4, PowerOf2Floor(HorRdx.numReductionValues()));
// If there is a sufficient number of reduction values, reduce
// to a nearby power-of-2. Can safely generate oversized
// vectors and rely on the backend to split them to legal sizes.
HorRdx.ReduxWidth =
std::max((uint64_t)4, PowerOf2Floor(HorRdx.numReductionValues()));
if (HorRdx.tryToReduce(R, TTI)) {
Res = true;
P = nullptr;
continue;
}
}
if (P) {
Inst = dyn_cast<Instruction>(BI->getOperand(0));
if (Inst == P)
Inst = dyn_cast<Instruction>(BI->getOperand(1));
if (!Inst) {
P = nullptr;
continue;
}
}
}
P = nullptr;
if (Vectorize(dyn_cast<BinaryOperator>(Inst), R)) {
Res = true;
continue;
}
}
if (Stack.back().isFinal()) {
Stack.pop_back();
continue;
}
if (auto *NextV = dyn_cast<Instruction>(Stack.back().nextOperand()))
if (NextV->getParent() == BB && VisitedInstrs.insert(NextV).second &&
Stack.size() < RecursionMaxDepth)
Stack.push_back(NextV);
}
return Res;
}
bool SLPVectorizerPass::vectorizeRootInstruction(PHINode *P, Value *V,
BasicBlock *BB, BoUpSLP &R,
TargetTransformInfo *TTI) {
if (!V)
return false;
auto *I = dyn_cast<Instruction>(V);
if (!I)
return false;
if (!isa<BinaryOperator>(I))
P = nullptr;
// Try to match and vectorize a horizontal reduction.
return canBeVectorized(P, I, BB, R, TTI,
[this](BinaryOperator *BI, BoUpSLP &R) -> bool {
return tryToVectorize(BI, R);
});
return HorRdx.tryToReduce(R, TTI);
}
bool SLPVectorizerPass::vectorizeChainsInBlock(BasicBlock *BB, BoUpSLP &R) {
@ -4717,42 +4599,67 @@ bool SLPVectorizerPass::vectorizeChainsInBlock(BasicBlock *BB, BoUpSLP &R) {
if (P->getNumIncomingValues() != 2)
return Changed;
Value *Rdx = getReductionValue(DT, P, BB, LI);
// Check if this is a Binary Operator.
BinaryOperator *BI = dyn_cast_or_null<BinaryOperator>(Rdx);
if (!BI)
continue;
// Try to match and vectorize a horizontal reduction.
if (vectorizeRootInstruction(P, getReductionValue(DT, P, BB, LI), BB, R,
TTI)) {
if (canMatchHorizontalReduction(P, BI, R, TTI, R.getMinVecRegSize())) {
Changed = true;
it = BB->begin();
e = BB->end();
continue;
}
Value *Inst = BI->getOperand(0);
if (Inst == P)
Inst = BI->getOperand(1);
if (tryToVectorize(dyn_cast<BinaryOperator>(Inst), R)) {
// We would like to start over since some instructions are deleted
// and the iterator may become invalid value.
Changed = true;
it = BB->begin();
e = BB->end();
continue;
}
continue;
}
if (ShouldStartVectorizeHorAtStore) {
if (StoreInst *SI = dyn_cast<StoreInst>(it)) {
// Try to match and vectorize a horizontal reduction.
if (vectorizeRootInstruction(nullptr, SI->getValueOperand(), BB, R,
TTI)) {
Changed = true;
it = BB->begin();
e = BB->end();
continue;
if (ShouldStartVectorizeHorAtStore)
if (StoreInst *SI = dyn_cast<StoreInst>(it))
if (BinaryOperator *BinOp =
dyn_cast<BinaryOperator>(SI->getValueOperand())) {
if (canMatchHorizontalReduction(nullptr, BinOp, R, TTI,
R.getMinVecRegSize()) ||
tryToVectorize(BinOp, R)) {
Changed = true;
it = BB->begin();
e = BB->end();
continue;
}
}
}
}
// Try to vectorize horizontal reductions feeding into a return.
if (ReturnInst *RI = dyn_cast<ReturnInst>(it)) {
if (RI->getNumOperands() != 0) {
// Try to match and vectorize a horizontal reduction.
if (vectorizeRootInstruction(nullptr, RI->getOperand(0), BB, R, TTI)) {
Changed = true;
it = BB->begin();
e = BB->end();
continue;
if (ReturnInst *RI = dyn_cast<ReturnInst>(it))
if (RI->getNumOperands() != 0)
if (BinaryOperator *BinOp =
dyn_cast<BinaryOperator>(RI->getOperand(0))) {
DEBUG(dbgs() << "SLP: Found a return to vectorize.\n");
if (canMatchHorizontalReduction(nullptr, BinOp, R, TTI,
R.getMinVecRegSize()) ||
tryToVectorizePair(BinOp->getOperand(0), BinOp->getOperand(1),
R)) {
Changed = true;
it = BB->begin();
e = BB->end();
continue;
}
}
}
}
// Try to vectorize trees that start at compare instructions.
if (CmpInst *CI = dyn_cast<CmpInst>(it)) {
@ -4765,14 +4672,16 @@ bool SLPVectorizerPass::vectorizeChainsInBlock(BasicBlock *BB, BoUpSLP &R) {
continue;
}
for (int I = 0; I < 2; ++I) {
if (vectorizeRootInstruction(nullptr, CI->getOperand(I), BB, R, TTI)) {
Changed = true;
// We would like to start over since some instructions are deleted
// and the iterator may become invalid value.
it = BB->begin();
e = BB->end();
break;
for (int i = 0; i < 2; ++i) {
if (BinaryOperator *BI = dyn_cast<BinaryOperator>(CI->getOperand(i))) {
if (tryToVectorizePair(BI->getOperand(0), BI->getOperand(1), R)) {
Changed = true;
// We would like to start over since some instructions are deleted
// and the iterator may become invalid value.
it = BB->begin();
e = BB->end();
break;
}
}
}
continue;

View File

@ -1,13 +1,15 @@
; RUN: llc -march=amdgcn -verify-machineinstrs< %s | FileCheck -check-prefix=SI %s
; RUN: llc -march=amdgcn -verify-machineinstrs< %s | FileCheck -check-prefix=GCN -check-prefix=SI %s
; RUN: llc -march=amdgcn -mcpu=fiji -verify-machineinstrs< %s | FileCheck -check-prefix=GCN -check-prefix=VI %s
; RUN: llc -march=r600 -mcpu=cypress < %s | FileCheck -check-prefix=EG %s
declare i32 @llvm.r600.read.tidig.x() nounwind readnone
define void @trunc_i64_to_i32_store(i32 addrspace(1)* %out, i64 %in) {
; SI-LABEL: {{^}}trunc_i64_to_i32_store:
; SI: s_load_dword [[SLOAD:s[0-9]+]], s[0:1], 0xb
; SI: v_mov_b32_e32 [[VLOAD:v[0-9]+]], [[SLOAD]]
; GCN-LABEL: {{^}}trunc_i64_to_i32_store:
; GCN: s_load_dword [[SLOAD:s[0-9]+]], s[0:1],
; GCN: v_mov_b32_e32 [[VLOAD:v[0-9]+]], [[SLOAD]]
; SI: buffer_store_dword [[VLOAD]]
; VI: flat_store_dword v[{{[0-9:]+}}], [[VLOAD]]
; EG-LABEL: {{^}}trunc_i64_to_i32_store:
; EG: MEM_RAT_CACHELESS STORE_RAW T0.X, T1.X, 1
@ -18,12 +20,14 @@ define void @trunc_i64_to_i32_store(i32 addrspace(1)* %out, i64 %in) {
ret void
}
; SI-LABEL: {{^}}trunc_load_shl_i64:
; SI-DAG: s_load_dwordx2
; SI-DAG: s_load_dword [[SREG:s[0-9]+]],
; SI: s_lshl_b32 [[SHL:s[0-9]+]], [[SREG]], 2
; SI: v_mov_b32_e32 [[VSHL:v[0-9]+]], [[SHL]]
; SI: buffer_store_dword [[VSHL]],
; GCN-LABEL: {{^}}trunc_load_shl_i64:
; GCN-DAG: s_load_dwordx2
; GCN-DAG: s_load_dword [[SREG:s[0-9]+]],
; GCN: s_lshl_b32 [[SHL:s[0-9]+]], [[SREG]], 2
; GCN: v_mov_b32_e32 [[VSHL:v[0-9]+]], [[SHL]]
; SI: buffer_store_dword [[VSHL]]
; VI: flat_store_dword v[{{[0-9:]+}}], [[VSHL]]
define void @trunc_load_shl_i64(i32 addrspace(1)* %out, i64 %a) {
%b = shl i64 %a, 2
%result = trunc i64 %b to i32
@ -31,15 +35,17 @@ define void @trunc_load_shl_i64(i32 addrspace(1)* %out, i64 %a) {
ret void
}
; SI-LABEL: {{^}}trunc_shl_i64:
; GCN-LABEL: {{^}}trunc_shl_i64:
; SI: s_load_dwordx2 s{{\[}}[[LO_SREG:[0-9]+]]:{{[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}}, 0xd
; SI: s_lshl_b64 s{{\[}}[[LO_SHL:[0-9]+]]:{{[0-9]+\]}}, s{{\[}}[[LO_SREG]]:{{[0-9]+\]}}, 2
; SI: s_add_u32 s[[LO_SREG2:[0-9]+]], s[[LO_SHL]],
; SI: v_mov_b32_e32 v[[LO_VREG:[0-9]+]], s[[LO_SREG2]]
; SI: s_addc_u32
; VI: s_load_dwordx2 s{{\[}}[[LO_SREG:[0-9]+]]:{{[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}}, 0x34
; GCN: s_lshl_b64 s{{\[}}[[LO_SHL:[0-9]+]]:{{[0-9]+\]}}, s{{\[}}[[LO_SREG]]:{{[0-9]+\]}}, 2
; GCN: s_add_u32 s[[LO_SREG2:[0-9]+]], s[[LO_SHL]],
; GCN: v_mov_b32_e32 v[[LO_VREG:[0-9]+]], s[[LO_SREG2]]
; GCN: s_addc_u32
; SI: buffer_store_dword v[[LO_VREG]],
; SI: v_mov_b32_e32
; SI: v_mov_b32_e32
; VI: flat_store_dword v[{{[0-9:]+}}], v[[LO_VREG]]
; GCN: v_mov_b32_e32
; GCN: v_mov_b32_e32
define void @trunc_shl_i64(i64 addrspace(1)* %out2, i32 addrspace(1)* %out, i64 %a) {
%aa = add i64 %a, 234 ; Prevent shrinking store.
%b = shl i64 %aa, 2
@ -49,9 +55,9 @@ define void @trunc_shl_i64(i64 addrspace(1)* %out2, i32 addrspace(1)* %out, i64
ret void
}
; SI-LABEL: {{^}}trunc_i32_to_i1:
; SI: v_and_b32_e32 v{{[0-9]+}}, 1, v{{[0-9]+}}
; SI: v_cmp_eq_u32
; GCN-LABEL: {{^}}trunc_i32_to_i1:
; GCN: v_and_b32_e32 v{{[0-9]+}}, 1, v{{[0-9]+}}
; GCN: v_cmp_eq_u32
define void @trunc_i32_to_i1(i32 addrspace(1)* %out, i32 addrspace(1)* %ptr) {
%a = load i32, i32 addrspace(1)* %ptr, align 4
%trunc = trunc i32 %a to i1
@ -60,9 +66,30 @@ define void @trunc_i32_to_i1(i32 addrspace(1)* %out, i32 addrspace(1)* %ptr) {
ret void
}
; SI-LABEL: {{^}}sgpr_trunc_i32_to_i1:
; SI: s_and_b32 s{{[0-9]+}}, 1, s{{[0-9]+}}
; SI: v_cmp_eq_u32
; GCN-LABEL: {{^}}trunc_i8_to_i1:
; GCN: v_and_b32_e32 v{{[0-9]+}}, 1, v{{[0-9]+}}
; GCN: v_cmp_eq_u32
define void @trunc_i8_to_i1(i8 addrspace(1)* %out, i8 addrspace(1)* %ptr) {
%a = load i8, i8 addrspace(1)* %ptr, align 4
%trunc = trunc i8 %a to i1
%result = select i1 %trunc, i8 1, i8 0
store i8 %result, i8 addrspace(1)* %out, align 4
ret void
}
; GCN-LABEL: {{^}}sgpr_trunc_i16_to_i1:
; GCN: s_and_b32 s{{[0-9]+}}, 1, s{{[0-9]+}}
; GCN: v_cmp_eq_u32
define void @sgpr_trunc_i16_to_i1(i16 addrspace(1)* %out, i16 %a) {
%trunc = trunc i16 %a to i1
%result = select i1 %trunc, i16 1, i16 0
store i16 %result, i16 addrspace(1)* %out, align 4
ret void
}
; GCN-LABEL: {{^}}sgpr_trunc_i32_to_i1:
; GCN: s_and_b32 s{{[0-9]+}}, 1, s{{[0-9]+}}
; GCN: v_cmp_eq_u32
define void @sgpr_trunc_i32_to_i1(i32 addrspace(1)* %out, i32 %a) {
%trunc = trunc i32 %a to i1
%result = select i1 %trunc, i32 1, i32 0
@ -70,11 +97,12 @@ define void @sgpr_trunc_i32_to_i1(i32 addrspace(1)* %out, i32 %a) {
ret void
}
; SI-LABEL: {{^}}s_trunc_i64_to_i1:
; GCN-LABEL: {{^}}s_trunc_i64_to_i1:
; SI: s_load_dwordx2 s{{\[}}[[SLO:[0-9]+]]:{{[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0xb
; SI: s_and_b32 [[MASKED:s[0-9]+]], 1, s[[SLO]]
; SI: v_cmp_eq_u32_e64 s{{\[}}[[VLO:[0-9]+]]:[[VHI:[0-9]+]]], [[MASKED]], 1{{$}}
; SI: v_cndmask_b32_e64 {{v[0-9]+}}, -12, 63, s{{\[}}[[VLO]]:[[VHI]]]
; VI: s_load_dwordx2 s{{\[}}[[SLO:[0-9]+]]:{{[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0x2c
; GCN: s_and_b32 [[MASKED:s[0-9]+]], 1, s[[SLO]]
; GCN: v_cmp_eq_u32_e64 s{{\[}}[[VLO:[0-9]+]]:[[VHI:[0-9]+]]], [[MASKED]], 1{{$}}
; GCN: v_cndmask_b32_e64 {{v[0-9]+}}, -12, 63, s{{\[}}[[VLO]]:[[VHI]]]
define void @s_trunc_i64_to_i1(i32 addrspace(1)* %out, i64 %x) {
%trunc = trunc i64 %x to i1
%sel = select i1 %trunc, i32 63, i32 -12
@ -82,11 +110,12 @@ define void @s_trunc_i64_to_i1(i32 addrspace(1)* %out, i64 %x) {
ret void
}
; SI-LABEL: {{^}}v_trunc_i64_to_i1:
; GCN-LABEL: {{^}}v_trunc_i64_to_i1:
; SI: buffer_load_dwordx2 v{{\[}}[[VLO:[0-9]+]]:{{[0-9]+\]}}
; SI: v_and_b32_e32 [[MASKED:v[0-9]+]], 1, v[[VLO]]
; SI: v_cmp_eq_u32_e32 vcc, 1, [[MASKED]]
; SI: v_cndmask_b32_e64 {{v[0-9]+}}, -12, 63, vcc
; VI: flat_load_dwordx2 v{{\[}}[[VLO:[0-9]+]]:{{[0-9]+\]}}
; GCN: v_and_b32_e32 [[MASKED:v[0-9]+]], 1, v[[VLO]]
; GCN: v_cmp_eq_u32_e32 vcc, 1, [[MASKED]]
; GCN: v_cndmask_b32_e64 {{v[0-9]+}}, -12, 63, vcc
define void @v_trunc_i64_to_i1(i32 addrspace(1)* %out, i64 addrspace(1)* %in) {
%tid = call i32 @llvm.r600.read.tidig.x() nounwind readnone
%gep = getelementptr i64, i64 addrspace(1)* %in, i32 %tid

View File

@ -1,4 +1,4 @@
; RUN: opt < %s -correlated-propagation -S | FileCheck %s
; RUN: opt < %s -correlated-propagation -cvp-dont-process-adds=false -S | FileCheck %s
; CHECK-LABEL: @test0(
define void @test0(i32 %a) {

View File

@ -222,3 +222,23 @@ define i32 @test15(i32 %X1, i32 %X2, i32 %X3) {
; CHECK-LABEL: @test15
; CHECK: and i1 %A, %B
}
; PR30256 - previously this asserted.
; CHECK-LABEL: @test16
; CHECK: %[[FACTOR:.*]] = mul i64 %a, -4
; CHECK-NEXT: %[[RES:.*]] = add i64 %[[FACTOR]], %b
; CHECK-NEXT: ret i64 %[[RES]]
define i64 @test16(i1 %cmp, i64 %a, i64 %b) {
entry:
%shl = shl i64 %a, 1
%shl.neg = sub i64 0, %shl
br i1 %cmp, label %if.then, label %if.end
if.then: ; preds = %entry
%add1 = add i64 %shl.neg, %shl.neg
%add2 = add i64 %add1, %b
ret i64 %add2
if.end: ; preds = %entry
ret i64 0
}

View File

@ -12,25 +12,26 @@ define float @baz() {
; CHECK-NEXT: [[TMP0:%.*]] = load i32, i32* @n, align 4
; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[TMP0]], 3
; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float
; CHECK-NEXT: [[TMP1:%.*]] = load <2 x float>, <2 x float>* bitcast ([20 x float]* @arr to <2 x float>*), align 16
; CHECK-NEXT: [[TMP2:%.*]] = load <2 x float>, <2 x float>* bitcast ([20 x float]* @arr1 to <2 x float>*), align 16
; CHECK-NEXT: [[TMP3:%.*]] = fmul fast <2 x float> [[TMP2]], [[TMP1]]
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP3]], i32 0
; CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[TMP4]], [[CONV]]
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP3]], i32 1
; CHECK-NEXT: [[ADD_1:%.*]] = fadd fast float [[TMP5]], [[ADD]]
; CHECK-NEXT: [[TMP6:%.*]] = load <2 x float>, <2 x float>* bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 2) to <2 x float>*), align 8
; CHECK-NEXT: [[TMP7:%.*]] = load <2 x float>, <2 x float>* bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 2) to <2 x float>*), align 8
; CHECK-NEXT: [[TMP8:%.*]] = fmul fast <2 x float> [[TMP7]], [[TMP6]]
; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x float> [[TMP8]], i32 0
; CHECK-NEXT: [[ADD_2:%.*]] = fadd fast float [[TMP9]], [[ADD_1]]
; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x float> [[TMP8]], i32 1
; CHECK-NEXT: [[ADD_3:%.*]] = fadd fast float [[TMP10]], [[ADD_2]]
; CHECK-NEXT: [[TMP1:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 0), align 16
; CHECK-NEXT: [[TMP2:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 0), align 16
; CHECK-NEXT: [[MUL4:%.*]] = fmul fast float [[TMP2]], [[TMP1]]
; CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[MUL4]], [[CONV]]
; CHECK-NEXT: [[TMP3:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 1), align 4
; CHECK-NEXT: [[TMP4:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 1), align 4
; CHECK-NEXT: [[MUL4_1:%.*]] = fmul fast float [[TMP4]], [[TMP3]]
; CHECK-NEXT: [[ADD_1:%.*]] = fadd fast float [[MUL4_1]], [[ADD]]
; CHECK-NEXT: [[TMP5:%.*]] = load <2 x float>, <2 x float>* bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 2) to <2 x float>*), align 8
; CHECK-NEXT: [[TMP6:%.*]] = load <2 x float>, <2 x float>* bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 2) to <2 x float>*), align 8
; CHECK-NEXT: [[TMP7:%.*]] = fmul fast <2 x float> [[TMP6]], [[TMP5]]
; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x float> [[TMP7]], i32 0
; CHECK-NEXT: [[ADD_2:%.*]] = fadd fast float [[TMP8]], [[ADD_1]]
; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x float> [[TMP7]], i32 1
; CHECK-NEXT: [[ADD_3:%.*]] = fadd fast float [[TMP9]], [[ADD_2]]
; CHECK-NEXT: [[ADD7:%.*]] = fadd fast float [[ADD_3]], [[CONV]]
; CHECK-NEXT: [[ADD19:%.*]] = fadd fast float [[TMP4]], [[ADD7]]
; CHECK-NEXT: [[ADD19_1:%.*]] = fadd fast float [[TMP5]], [[ADD19]]
; CHECK-NEXT: [[ADD19_2:%.*]] = fadd fast float [[TMP9]], [[ADD19_1]]
; CHECK-NEXT: [[ADD19_3:%.*]] = fadd fast float [[TMP10]], [[ADD19_2]]
; CHECK-NEXT: [[ADD19:%.*]] = fadd fast float [[MUL4]], [[ADD7]]
; CHECK-NEXT: [[ADD19_1:%.*]] = fadd fast float [[MUL4_1]], [[ADD19]]
; CHECK-NEXT: [[ADD19_2:%.*]] = fadd fast float [[TMP8]], [[ADD19_1]]
; CHECK-NEXT: [[ADD19_3:%.*]] = fadd fast float [[TMP9]], [[ADD19_2]]
; CHECK-NEXT: store float [[ADD19_3]], float* @res, align 4
; CHECK-NEXT: ret float [[ADD19_3]]
;
@ -69,37 +70,40 @@ define float @bazz() {
; CHECK-NEXT: [[TMP0:%.*]] = load i32, i32* @n, align 4
; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[TMP0]], 3
; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float
; CHECK-NEXT: [[TMP1:%.*]] = load <2 x float>, <2 x float>* bitcast ([20 x float]* @arr to <2 x float>*), align 16
; CHECK-NEXT: [[TMP2:%.*]] = load <2 x float>, <2 x float>* bitcast ([20 x float]* @arr1 to <2 x float>*), align 16
; CHECK-NEXT: [[TMP3:%.*]] = fmul fast <2 x float> [[TMP2]], [[TMP1]]
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP3]], i32 0
; CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[TMP4]], [[CONV]]
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP3]], i32 1
; CHECK-NEXT: [[ADD_1:%.*]] = fadd fast float [[TMP5]], [[ADD]]
; CHECK-NEXT: [[TMP6:%.*]] = load <2 x float>, <2 x float>* bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 2) to <2 x float>*), align 8
; CHECK-NEXT: [[TMP7:%.*]] = load <2 x float>, <2 x float>* bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 2) to <2 x float>*), align 8
; CHECK-NEXT: [[TMP8:%.*]] = fmul fast <2 x float> [[TMP7]], [[TMP6]]
; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x float> [[TMP8]], i32 0
; CHECK-NEXT: [[ADD_2:%.*]] = fadd fast float [[TMP9]], [[ADD_1]]
; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x float> [[TMP8]], i32 1
; CHECK-NEXT: [[ADD_3:%.*]] = fadd fast float [[TMP10]], [[ADD_2]]
; CHECK-NEXT: [[TMP1:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 0), align 16
; CHECK-NEXT: [[TMP2:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 0), align 16
; CHECK-NEXT: [[MUL4:%.*]] = fmul fast float [[TMP2]], [[TMP1]]
; CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[MUL4]], [[CONV]]
; CHECK-NEXT: [[TMP3:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 1), align 4
; CHECK-NEXT: [[TMP4:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 1), align 4
; CHECK-NEXT: [[MUL4_1:%.*]] = fmul fast float [[TMP4]], [[TMP3]]
; CHECK-NEXT: [[ADD_1:%.*]] = fadd fast float [[MUL4_1]], [[ADD]]
; CHECK-NEXT: [[TMP5:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 2), align 8
; CHECK-NEXT: [[TMP6:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 2), align 8
; CHECK-NEXT: [[MUL4_2:%.*]] = fmul fast float [[TMP6]], [[TMP5]]
; CHECK-NEXT: [[ADD_2:%.*]] = fadd fast float [[MUL4_2]], [[ADD_1]]
; CHECK-NEXT: [[TMP7:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 3), align 4
; CHECK-NEXT: [[TMP8:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 3), align 4
; CHECK-NEXT: [[MUL4_3:%.*]] = fmul fast float [[TMP8]], [[TMP7]]
; CHECK-NEXT: [[ADD_3:%.*]] = fadd fast float [[MUL4_3]], [[ADD_2]]
; CHECK-NEXT: [[MUL5:%.*]] = shl nsw i32 [[TMP0]], 2
; CHECK-NEXT: [[CONV6:%.*]] = sitofp i32 [[MUL5]] to float
; CHECK-NEXT: [[ADD7:%.*]] = fadd fast float [[ADD_3]], [[CONV6]]
; CHECK-NEXT: [[TMP11:%.*]] = load <2 x float>, <2 x float>* bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 4) to <2 x float>*), align 16
; CHECK-NEXT: [[TMP12:%.*]] = load <2 x float>, <2 x float>* bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 4) to <2 x float>*), align 16
; CHECK-NEXT: [[TMP13:%.*]] = fmul fast <2 x float> [[TMP12]], [[TMP11]]
; CHECK-NEXT: [[TMP14:%.*]] = extractelement <2 x float> [[TMP13]], i32 0
; CHECK-NEXT: [[ADD19:%.*]] = fadd fast float [[TMP14]], [[ADD7]]
; CHECK-NEXT: [[TMP15:%.*]] = extractelement <2 x float> [[TMP13]], i32 1
; CHECK-NEXT: [[ADD19_1:%.*]] = fadd fast float [[TMP15]], [[ADD19]]
; CHECK-NEXT: [[TMP16:%.*]] = load <2 x float>, <2 x float>* bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 6) to <2 x float>*), align 8
; CHECK-NEXT: [[TMP17:%.*]] = load <2 x float>, <2 x float>* bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 6) to <2 x float>*), align 8
; CHECK-NEXT: [[TMP18:%.*]] = fmul fast <2 x float> [[TMP17]], [[TMP16]]
; CHECK-NEXT: [[TMP19:%.*]] = extractelement <2 x float> [[TMP18]], i32 0
; CHECK-NEXT: [[ADD19_2:%.*]] = fadd fast float [[TMP19]], [[ADD19_1]]
; CHECK-NEXT: [[TMP20:%.*]] = extractelement <2 x float> [[TMP18]], i32 1
; CHECK-NEXT: [[ADD19_3:%.*]] = fadd fast float [[TMP20]], [[ADD19_2]]
; CHECK-NEXT: [[TMP9:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 4), align 16
; CHECK-NEXT: [[TMP10:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 4), align 16
; CHECK-NEXT: [[MUL18:%.*]] = fmul fast float [[TMP10]], [[TMP9]]
; CHECK-NEXT: [[ADD19:%.*]] = fadd fast float [[MUL18]], [[ADD7]]
; CHECK-NEXT: [[TMP11:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 5), align 4
; CHECK-NEXT: [[TMP12:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 5), align 4
; CHECK-NEXT: [[MUL18_1:%.*]] = fmul fast float [[TMP12]], [[TMP11]]
; CHECK-NEXT: [[ADD19_1:%.*]] = fadd fast float [[MUL18_1]], [[ADD19]]
; CHECK-NEXT: [[TMP13:%.*]] = load <2 x float>, <2 x float>* bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 6) to <2 x float>*), align 8
; CHECK-NEXT: [[TMP14:%.*]] = load <2 x float>, <2 x float>* bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 6) to <2 x float>*), align 8
; CHECK-NEXT: [[TMP15:%.*]] = fmul fast <2 x float> [[TMP14]], [[TMP13]]
; CHECK-NEXT: [[TMP16:%.*]] = extractelement <2 x float> [[TMP15]], i32 0
; CHECK-NEXT: [[ADD19_2:%.*]] = fadd fast float [[TMP16]], [[ADD19_1]]
; CHECK-NEXT: [[TMP17:%.*]] = extractelement <2 x float> [[TMP15]], i32 1
; CHECK-NEXT: [[ADD19_3:%.*]] = fadd fast float [[TMP17]], [[ADD19_2]]
; CHECK-NEXT: store float [[ADD19_3]], float* @res, align 4
; CHECK-NEXT: ret float [[ADD19_3]]
;
@ -151,20 +155,24 @@ define float @bazzz() {
; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.*]] = load i32, i32* @n, align 4
; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[TMP0]] to float
; CHECK-NEXT: [[TMP1:%.*]] = load <4 x float>, <4 x float>* bitcast ([20 x float]* @arr to <4 x float>*), align 16
; CHECK-NEXT: [[TMP2:%.*]] = load <4 x float>, <4 x float>* bitcast ([20 x float]* @arr1 to <4 x float>*), align 16
; CHECK-NEXT: [[TMP3:%.*]] = fmul fast <4 x float> [[TMP2]], [[TMP1]]
; CHECK-NEXT: [[TMP4:%.*]] = fadd fast float undef, undef
; CHECK-NEXT: [[TMP5:%.*]] = fadd fast float undef, [[TMP4]]
; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x float> [[TMP3]], [[RDX_SHUF]]
; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <4 x float> [[BIN_RDX]], [[RDX_SHUF1]]
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[BIN_RDX2]], i32 0
; CHECK-NEXT: [[TMP7:%.*]] = fadd fast float undef, [[TMP5]]
; CHECK-NEXT: [[TMP8:%.*]] = fmul fast float [[CONV]], [[TMP6]]
; CHECK-NEXT: store float [[TMP8]], float* @res, align 4
; CHECK-NEXT: ret float [[TMP8]]
; CHECK-NEXT: [[TMP1:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 0), align 16
; CHECK-NEXT: [[TMP2:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 0), align 16
; CHECK-NEXT: [[MUL:%.*]] = fmul fast float [[TMP2]], [[TMP1]]
; CHECK-NEXT: [[TMP3:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 1), align 4
; CHECK-NEXT: [[TMP4:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 1), align 4
; CHECK-NEXT: [[MUL_1:%.*]] = fmul fast float [[TMP4]], [[TMP3]]
; CHECK-NEXT: [[TMP5:%.*]] = fadd fast float [[MUL_1]], [[MUL]]
; CHECK-NEXT: [[TMP6:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 2), align 8
; CHECK-NEXT: [[TMP7:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 2), align 8
; CHECK-NEXT: [[MUL_2:%.*]] = fmul fast float [[TMP7]], [[TMP6]]
; CHECK-NEXT: [[TMP8:%.*]] = fadd fast float [[MUL_2]], [[TMP5]]
; CHECK-NEXT: [[TMP9:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 3), align 4
; CHECK-NEXT: [[TMP10:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 3), align 4
; CHECK-NEXT: [[MUL_3:%.*]] = fmul fast float [[TMP10]], [[TMP9]]
; CHECK-NEXT: [[TMP11:%.*]] = fadd fast float [[MUL_3]], [[TMP8]]
; CHECK-NEXT: [[TMP12:%.*]] = fmul fast float [[CONV]], [[TMP11]]
; CHECK-NEXT: store float [[TMP12]], float* @res, align 4
; CHECK-NEXT: ret float [[TMP12]]
;
entry:
%0 = load i32, i32* @n, align 4
@ -194,19 +202,23 @@ define i32 @foo() {
; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.*]] = load i32, i32* @n, align 4
; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[TMP0]] to float
; CHECK-NEXT: [[TMP1:%.*]] = load <4 x float>, <4 x float>* bitcast ([20 x float]* @arr to <4 x float>*), align 16
; CHECK-NEXT: [[TMP2:%.*]] = load <4 x float>, <4 x float>* bitcast ([20 x float]* @arr1 to <4 x float>*), align 16
; CHECK-NEXT: [[TMP3:%.*]] = fmul fast <4 x float> [[TMP2]], [[TMP1]]
; CHECK-NEXT: [[TMP4:%.*]] = fadd fast float undef, undef
; CHECK-NEXT: [[TMP5:%.*]] = fadd fast float undef, [[TMP4]]
; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x float> [[TMP3]], [[RDX_SHUF]]
; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <4 x float> [[BIN_RDX]], [[RDX_SHUF1]]
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[BIN_RDX2]], i32 0
; CHECK-NEXT: [[TMP7:%.*]] = fadd fast float undef, [[TMP5]]
; CHECK-NEXT: [[TMP8:%.*]] = fmul fast float [[CONV]], [[TMP6]]
; CHECK-NEXT: [[CONV4:%.*]] = fptosi float [[TMP8]] to i32
; CHECK-NEXT: [[TMP1:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 0), align 16
; CHECK-NEXT: [[TMP2:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 0), align 16
; CHECK-NEXT: [[MUL:%.*]] = fmul fast float [[TMP2]], [[TMP1]]
; CHECK-NEXT: [[TMP3:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 1), align 4
; CHECK-NEXT: [[TMP4:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 1), align 4
; CHECK-NEXT: [[MUL_1:%.*]] = fmul fast float [[TMP4]], [[TMP3]]
; CHECK-NEXT: [[TMP5:%.*]] = fadd fast float [[MUL_1]], [[MUL]]
; CHECK-NEXT: [[TMP6:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 2), align 8
; CHECK-NEXT: [[TMP7:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 2), align 8
; CHECK-NEXT: [[MUL_2:%.*]] = fmul fast float [[TMP7]], [[TMP6]]
; CHECK-NEXT: [[TMP8:%.*]] = fadd fast float [[MUL_2]], [[TMP5]]
; CHECK-NEXT: [[TMP9:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 3), align 4
; CHECK-NEXT: [[TMP10:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 3), align 4
; CHECK-NEXT: [[MUL_3:%.*]] = fmul fast float [[TMP10]], [[TMP9]]
; CHECK-NEXT: [[TMP11:%.*]] = fadd fast float [[MUL_3]], [[TMP8]]
; CHECK-NEXT: [[TMP12:%.*]] = fmul fast float [[CONV]], [[TMP11]]
; CHECK-NEXT: [[CONV4:%.*]] = fptosi float [[TMP12]] to i32
; CHECK-NEXT: store i32 [[CONV4]], i32* @n, align 4
; CHECK-NEXT: ret i32 [[CONV4]]
;