Vendor import of llvm release_40 branch r296202:
https://llvm.org/svn/llvm-project/llvm/branches/release_40@296202
This commit is contained in:
parent
5a813558fc
commit
9c618dddcd
Notes:
svn2git
2020-12-20 02:59:44 +00:00
svn path=/vendor/llvm/dist/; revision=314258 svn path=/vendor/llvm/llvm-release_40-r296202/; revision=314259; tag=vendor/llvm/llvm-release_40-r296202
@ -5,12 +5,6 @@ LLVM 4.0.0 Release Notes
|
||||
.. contents::
|
||||
:local:
|
||||
|
||||
.. warning::
|
||||
These are in-progress notes for the upcoming LLVM 4.0.0 release. You may
|
||||
prefer the `LLVM 3.9 Release Notes <http://llvm.org/releases/3.9.0/docs
|
||||
/ReleaseNotes.html>`_.
|
||||
|
||||
|
||||
Introduction
|
||||
============
|
||||
|
||||
@ -28,74 +22,56 @@ them.
|
||||
|
||||
Non-comprehensive list of changes in this release
|
||||
=================================================
|
||||
* The C API functions LLVMAddFunctionAttr, LLVMGetFunctionAttr,
|
||||
LLVMRemoveFunctionAttr, LLVMAddAttribute, LLVMRemoveAttribute,
|
||||
LLVMGetAttribute, LLVMAddInstrAttribute and
|
||||
LLVMRemoveInstrAttribute have been removed.
|
||||
|
||||
* The C API enum LLVMAttribute has been deleted.
|
||||
|
||||
.. NOTE
|
||||
For small 1-3 sentence descriptions, just add an entry at the end of
|
||||
this list. If your description won't fit comfortably in one bullet
|
||||
point (e.g. maybe you would like to give an example of the
|
||||
functionality, or simply have a lot to talk about), see the `NOTE` below
|
||||
for adding a new subsection.
|
||||
|
||||
* The definition and uses of LLVM_ATRIBUTE_UNUSED_RESULT in the LLVM source
|
||||
were replaced with LLVM_NODISCARD, which matches the C++17 [[nodiscard]]
|
||||
semantics rather than gcc's __attribute__((warn_unused_result)).
|
||||
|
||||
* Minimum compiler version to build has been raised to GCC 4.8 and VS 2015.
|
||||
|
||||
* The C API functions ``LLVMAddFunctionAttr``, ``LLVMGetFunctionAttr``,
|
||||
``LLVMRemoveFunctionAttr``, ``LLVMAddAttribute``, ``LLVMRemoveAttribute``,
|
||||
``LLVMGetAttribute``, ``LLVMAddInstrAttribute`` and
|
||||
``LLVMRemoveInstrAttribute`` have been removed.
|
||||
|
||||
* The C API enum ``LLVMAttribute`` has been deleted.
|
||||
|
||||
* The definition and uses of ``LLVM_ATRIBUTE_UNUSED_RESULT`` in the LLVM source
|
||||
were replaced with ``LLVM_NODISCARD``, which matches the C++17 ``[[nodiscard]]``
|
||||
semantics rather than gcc's ``__attribute__((warn_unused_result))``.
|
||||
|
||||
* The Timer related APIs now expect a Name and Description. When upgrading code
|
||||
the previously used names should become descriptions and a short name in the
|
||||
style of a programming language identifier should be added.
|
||||
|
||||
* LLVM now handles invariant.group across different basic blocks, which makes
|
||||
* LLVM now handles ``invariant.group`` across different basic blocks, which makes
|
||||
it possible to devirtualize virtual calls inside loops.
|
||||
|
||||
* The aggressive dead code elimination phase ("adce") now remove
|
||||
* The aggressive dead code elimination phase ("adce") now removes
|
||||
branches which do not effect program behavior. Loops are retained by
|
||||
default since they may be infinite but these can also be removed
|
||||
with LLVM option -adce-remove-loops when the loop body otherwise has
|
||||
with LLVM option ``-adce-remove-loops`` when the loop body otherwise has
|
||||
no live operations.
|
||||
|
||||
* The GVNHoist pass is now enabled by default. The new pass based on Global
|
||||
Value Numbering detects similar computations in branch code and replaces
|
||||
multiple instances of the same computation with a unique expression. The
|
||||
transform benefits code size and generates better schedules. GVNHoist is
|
||||
more aggressive at -Os and -Oz, hoisting more expressions at the expense of
|
||||
execution time degradations.
|
||||
more aggressive at ``-Os`` and ``-Oz``, hoisting more expressions at the
|
||||
expense of execution time degradations.
|
||||
|
||||
* The llvm-cov tool can now export coverage data as json. Its html output mode
|
||||
has also improved.
|
||||
|
||||
* ... next change ...
|
||||
Improvements to ThinLTO (-flto=thin)
|
||||
------------------------------------
|
||||
Integration with profile data (PGO). When available, profile data
|
||||
enables more accurate function importing decisions, as well as
|
||||
cross-module indirect call promotion.
|
||||
|
||||
.. NOTE
|
||||
If you would like to document a larger change, then you can add a
|
||||
subsection about it right here. You can copy the following boilerplate
|
||||
and un-indent it (the indentation causes it to be inside this comment).
|
||||
|
||||
Special New Feature
|
||||
-------------------
|
||||
|
||||
Makes programs 10x faster by doing Special New Thing.
|
||||
|
||||
Improvements to ThinLTO (-flto=thin)
|
||||
------------------------------------
|
||||
* Integration with profile data (PGO). When available, profile data
|
||||
enables more accurate function importing decisions, as well as
|
||||
cross-module indirect call promotion.
|
||||
* Significant build-time and binary-size improvements when compiling with
|
||||
debug info (-g).
|
||||
Significant build-time and binary-size improvements when compiling with
|
||||
debug info (-g).
|
||||
|
||||
LLVM Coroutines
|
||||
---------------
|
||||
|
||||
Experimental support for :doc:`Coroutines` was added, which can be enabled
|
||||
with ``-enable-coroutines`` in ``opt`` command tool or using
|
||||
with ``-enable-coroutines`` in ``opt`` the command tool or using the
|
||||
``addCoroutinePassesToExtensionPoints`` API when building the optimization
|
||||
pipeline.
|
||||
|
||||
@ -106,18 +82,18 @@ For more information on LLVM Coroutines and the LLVM implementation, see
|
||||
Regcall and Vectorcall Calling Conventions
|
||||
--------------------------------------------------
|
||||
|
||||
Support was added for _regcall calling convention.
|
||||
Existing __vectorcall calling convention support was extended to include
|
||||
Support was added for ``_regcall`` calling convention.
|
||||
Existing ``__vectorcall`` calling convention support was extended to include
|
||||
correct handling of HVAs.
|
||||
|
||||
The __vectorcall calling convention was introduced by Microsoft to
|
||||
The ``__vectorcall`` calling convention was introduced by Microsoft to
|
||||
enhance register usage when passing parameters.
|
||||
For more information please read `__vectorcall documentation
|
||||
<https://msdn.microsoft.com/en-us/library/dn375768.aspx>`_.
|
||||
|
||||
The __regcall calling convention was introduced by Intel to
|
||||
The ``__regcall`` calling convention was introduced by Intel to
|
||||
optimize parameter transfer on function call.
|
||||
This calling convention ensures that as many values as possible are
|
||||
This calling convention ensures that as many values as possible are
|
||||
passed or returned in registers.
|
||||
For more information please read `__regcall documentation
|
||||
<https://software.intel.com/en-us/node/693069>`_.
|
||||
@ -127,7 +103,7 @@ Code Generation Testing
|
||||
|
||||
Passes that work on the machine instruction representation can be tested with
|
||||
the .mir serialization format. ``llc`` supports the ``-run-pass``,
|
||||
``-stop-after``, ``-stop-before``, ``-start-after``, ``-start-before`` to to
|
||||
``-stop-after``, ``-stop-before``, ``-start-after``, ``-start-before`` to
|
||||
run a single pass of the code generation pipeline, or to stop or start the code
|
||||
generation pipeline at a given point.
|
||||
|
||||
@ -211,9 +187,6 @@ changes landed in this release.
|
||||
``&*I`` (if not ``end()``); alternatively, clients may refactor to use
|
||||
references for known-good nodes.
|
||||
|
||||
Changes to the LLVM IR
|
||||
----------------------
|
||||
|
||||
Changes to the ARM Targets
|
||||
--------------------------
|
||||
|
||||
@ -244,28 +217,6 @@ Changes to the ARM Targets
|
||||
A lot of work has also been done in LLD for ARM, which now supports more
|
||||
relocations and TLS.
|
||||
|
||||
|
||||
Changes to the MIPS Target
|
||||
--------------------------
|
||||
|
||||
During this release ...
|
||||
|
||||
|
||||
Changes to the PowerPC Target
|
||||
-----------------------------
|
||||
|
||||
During this release ...
|
||||
|
||||
Changes to the X86 Target
|
||||
-------------------------
|
||||
|
||||
During this release ...
|
||||
|
||||
Changes to the AMDGPU Target
|
||||
-----------------------------
|
||||
|
||||
During this release ...
|
||||
|
||||
Changes to the AVR Target
|
||||
-----------------------------
|
||||
|
||||
@ -297,8 +248,6 @@ Changes to the OCaml bindings
|
||||
External Open Source Projects Using LLVM 4.0.0
|
||||
==============================================
|
||||
|
||||
* A project...
|
||||
|
||||
LDC - the LLVM-based D compiler
|
||||
-------------------------------
|
||||
|
||||
|
@ -92,12 +92,6 @@ struct SLPVectorizerPass : public PassInfoMixin<SLPVectorizerPass> {
|
||||
/// collected in GEPs.
|
||||
bool vectorizeGEPIndices(BasicBlock *BB, slpvectorizer::BoUpSLP &R);
|
||||
|
||||
/// Try to find horizontal reduction or otherwise vectorize a chain of binary
|
||||
/// operators.
|
||||
bool vectorizeRootInstruction(PHINode *P, Value *V, BasicBlock *BB,
|
||||
slpvectorizer::BoUpSLP &R,
|
||||
TargetTransformInfo *TTI);
|
||||
|
||||
/// \brief Scan the basic block and look for patterns that are likely to start
|
||||
/// a vectorization chain.
|
||||
bool vectorizeChainsInBlock(BasicBlock *BB, slpvectorizer::BoUpSLP &R);
|
||||
|
@ -996,6 +996,11 @@ def : Pat <
|
||||
(V_CMP_EQ_U32_e64 (S_AND_B32 (i32 1), $a), (i32 1))
|
||||
>;
|
||||
|
||||
def : Pat <
|
||||
(i1 (trunc i16:$a)),
|
||||
(V_CMP_EQ_U32_e64 (S_AND_B32 (i32 1), $a), (i32 1))
|
||||
>;
|
||||
|
||||
def : Pat <
|
||||
(i1 (trunc i64:$a)),
|
||||
(V_CMP_EQ_U32_e64 (S_AND_B32 (i32 1),
|
||||
|
@ -607,12 +607,6 @@ def : Pat<
|
||||
(COPY $src)
|
||||
>;
|
||||
|
||||
def : Pat<
|
||||
(i1 (trunc i16:$src)),
|
||||
(COPY $src)
|
||||
>;
|
||||
|
||||
|
||||
def : Pat <
|
||||
(i16 (trunc i64:$src)),
|
||||
(EXTRACT_SUBREG $src, sub0)
|
||||
|
@ -41,6 +41,8 @@ STATISTIC(NumSDivs, "Number of sdiv converted to udiv");
|
||||
STATISTIC(NumAShrs, "Number of ashr converted to lshr");
|
||||
STATISTIC(NumSRems, "Number of srem converted to urem");
|
||||
|
||||
static cl::opt<bool> DontProcessAdds("cvp-dont-process-adds", cl::init(true));
|
||||
|
||||
namespace {
|
||||
class CorrelatedValuePropagation : public FunctionPass {
|
||||
public:
|
||||
@ -405,6 +407,9 @@ static bool processAShr(BinaryOperator *SDI, LazyValueInfo *LVI) {
|
||||
static bool processAdd(BinaryOperator *AddOp, LazyValueInfo *LVI) {
|
||||
typedef OverflowingBinaryOperator OBO;
|
||||
|
||||
if (DontProcessAdds)
|
||||
return false;
|
||||
|
||||
if (AddOp->getType()->isVectorTy() || hasLocalDefs(AddOp))
|
||||
return false;
|
||||
|
||||
|
@ -1521,8 +1521,8 @@ Value *ReassociatePass::OptimizeAdd(Instruction *I,
|
||||
if (ConstantInt *CI = dyn_cast<ConstantInt>(Factor)) {
|
||||
if (CI->isNegative() && !CI->isMinValue(true)) {
|
||||
Factor = ConstantInt::get(CI->getContext(), -CI->getValue());
|
||||
assert(!Duplicates.count(Factor) &&
|
||||
"Shouldn't have two constant factors, missed a canonicalize");
|
||||
if (!Duplicates.insert(Factor).second)
|
||||
continue;
|
||||
unsigned Occ = ++FactorOccurrences[Factor];
|
||||
if (Occ > MaxOcc) {
|
||||
MaxOcc = Occ;
|
||||
@ -1534,8 +1534,8 @@ Value *ReassociatePass::OptimizeAdd(Instruction *I,
|
||||
APFloat F(CF->getValueAPF());
|
||||
F.changeSign();
|
||||
Factor = ConstantFP::get(CF->getContext(), F);
|
||||
assert(!Duplicates.count(Factor) &&
|
||||
"Shouldn't have two constant factors, missed a canonicalize");
|
||||
if (!Duplicates.insert(Factor).second)
|
||||
continue;
|
||||
unsigned Occ = ++FactorOccurrences[Factor];
|
||||
if (Occ > MaxOcc) {
|
||||
MaxOcc = Occ;
|
||||
|
@ -4026,40 +4026,36 @@ bool SLPVectorizerPass::tryToVectorize(BinaryOperator *V, BoUpSLP &R) {
|
||||
if (!V)
|
||||
return false;
|
||||
|
||||
Value *P = V->getParent();
|
||||
|
||||
// Vectorize in current basic block only.
|
||||
auto *Op0 = dyn_cast<Instruction>(V->getOperand(0));
|
||||
auto *Op1 = dyn_cast<Instruction>(V->getOperand(1));
|
||||
if (!Op0 || !Op1 || Op0->getParent() != P || Op1->getParent() != P)
|
||||
return false;
|
||||
|
||||
// Try to vectorize V.
|
||||
if (tryToVectorizePair(Op0, Op1, R))
|
||||
if (tryToVectorizePair(V->getOperand(0), V->getOperand(1), R))
|
||||
return true;
|
||||
|
||||
auto *A = dyn_cast<BinaryOperator>(Op0);
|
||||
auto *B = dyn_cast<BinaryOperator>(Op1);
|
||||
BinaryOperator *A = dyn_cast<BinaryOperator>(V->getOperand(0));
|
||||
BinaryOperator *B = dyn_cast<BinaryOperator>(V->getOperand(1));
|
||||
// Try to skip B.
|
||||
if (B && B->hasOneUse()) {
|
||||
auto *B0 = dyn_cast<BinaryOperator>(B->getOperand(0));
|
||||
auto *B1 = dyn_cast<BinaryOperator>(B->getOperand(1));
|
||||
if (B0 && B0->getParent() == P && tryToVectorizePair(A, B0, R))
|
||||
BinaryOperator *B0 = dyn_cast<BinaryOperator>(B->getOperand(0));
|
||||
BinaryOperator *B1 = dyn_cast<BinaryOperator>(B->getOperand(1));
|
||||
if (tryToVectorizePair(A, B0, R)) {
|
||||
return true;
|
||||
if (B1 && B1->getParent() == P && tryToVectorizePair(A, B1, R))
|
||||
}
|
||||
if (tryToVectorizePair(A, B1, R)) {
|
||||
return true;
|
||||
}
|
||||
}
|
||||
|
||||
// Try to skip A.
|
||||
if (A && A->hasOneUse()) {
|
||||
auto *A0 = dyn_cast<BinaryOperator>(A->getOperand(0));
|
||||
auto *A1 = dyn_cast<BinaryOperator>(A->getOperand(1));
|
||||
if (A0 && A0->getParent() == P && tryToVectorizePair(A0, B, R))
|
||||
BinaryOperator *A0 = dyn_cast<BinaryOperator>(A->getOperand(0));
|
||||
BinaryOperator *A1 = dyn_cast<BinaryOperator>(A->getOperand(1));
|
||||
if (tryToVectorizePair(A0, B, R)) {
|
||||
return true;
|
||||
if (A1 && A1->getParent() == P && tryToVectorizePair(A1, B, R))
|
||||
}
|
||||
if (tryToVectorizePair(A1, B, R)) {
|
||||
return true;
|
||||
}
|
||||
}
|
||||
return false;
|
||||
return 0;
|
||||
}
|
||||
|
||||
/// \brief Generate a shuffle mask to be used in a reduction tree.
|
||||
@ -4511,143 +4507,29 @@ static Value *getReductionValue(const DominatorTree *DT, PHINode *P,
|
||||
return nullptr;
|
||||
}
|
||||
|
||||
namespace {
|
||||
/// Tracks instructons and its children.
|
||||
class WeakVHWithLevel final : public CallbackVH {
|
||||
/// Operand index of the instruction currently beeing analized.
|
||||
unsigned Level = 0;
|
||||
/// Is this the instruction that should be vectorized, or are we now
|
||||
/// processing children (i.e. operands of this instruction) for potential
|
||||
/// vectorization?
|
||||
bool IsInitial = true;
|
||||
|
||||
public:
|
||||
explicit WeakVHWithLevel() = default;
|
||||
WeakVHWithLevel(Value *V) : CallbackVH(V){};
|
||||
/// Restart children analysis each time it is repaced by the new instruction.
|
||||
void allUsesReplacedWith(Value *New) override {
|
||||
setValPtr(New);
|
||||
Level = 0;
|
||||
IsInitial = true;
|
||||
}
|
||||
/// Check if the instruction was not deleted during vectorization.
|
||||
bool isValid() const { return !getValPtr(); }
|
||||
/// Is the istruction itself must be vectorized?
|
||||
bool isInitial() const { return IsInitial; }
|
||||
/// Try to vectorize children.
|
||||
void clearInitial() { IsInitial = false; }
|
||||
/// Are all children processed already?
|
||||
bool isFinal() const {
|
||||
assert(getValPtr() &&
|
||||
(isa<Instruction>(getValPtr()) &&
|
||||
cast<Instruction>(getValPtr())->getNumOperands() >= Level));
|
||||
return getValPtr() &&
|
||||
cast<Instruction>(getValPtr())->getNumOperands() == Level;
|
||||
}
|
||||
/// Get next child operation.
|
||||
Value *nextOperand() {
|
||||
assert(getValPtr() && isa<Instruction>(getValPtr()) &&
|
||||
cast<Instruction>(getValPtr())->getNumOperands() > Level);
|
||||
return cast<Instruction>(getValPtr())->getOperand(Level++);
|
||||
}
|
||||
virtual ~WeakVHWithLevel() = default;
|
||||
};
|
||||
} // namespace
|
||||
|
||||
/// \brief Attempt to reduce a horizontal reduction.
|
||||
/// If it is legal to match a horizontal reduction feeding
|
||||
/// the phi node P with reduction operators Root in a basic block BB, then check
|
||||
/// if it can be done.
|
||||
/// the phi node P with reduction operators BI, then check if it
|
||||
/// can be done.
|
||||
/// \returns true if a horizontal reduction was matched and reduced.
|
||||
/// \returns false if a horizontal reduction was not matched.
|
||||
static bool canBeVectorized(
|
||||
PHINode *P, Instruction *Root, BasicBlock *BB, BoUpSLP &R,
|
||||
TargetTransformInfo *TTI,
|
||||
const function_ref<bool(BinaryOperator *, BoUpSLP &)> Vectorize) {
|
||||
static bool canMatchHorizontalReduction(PHINode *P, BinaryOperator *BI,
|
||||
BoUpSLP &R, TargetTransformInfo *TTI,
|
||||
unsigned MinRegSize) {
|
||||
if (!ShouldVectorizeHor)
|
||||
return false;
|
||||
|
||||
if (!Root)
|
||||
HorizontalReduction HorRdx(MinRegSize);
|
||||
if (!HorRdx.matchAssociativeReduction(P, BI))
|
||||
return false;
|
||||
|
||||
if (Root->getParent() != BB)
|
||||
return false;
|
||||
SmallVector<WeakVHWithLevel, 8> Stack(1, Root);
|
||||
SmallSet<Value *, 8> VisitedInstrs;
|
||||
bool Res = false;
|
||||
while (!Stack.empty()) {
|
||||
Value *V = Stack.back();
|
||||
if (!V) {
|
||||
Stack.pop_back();
|
||||
continue;
|
||||
}
|
||||
auto *Inst = dyn_cast<Instruction>(V);
|
||||
if (!Inst || isa<PHINode>(Inst)) {
|
||||
Stack.pop_back();
|
||||
continue;
|
||||
}
|
||||
if (Stack.back().isInitial()) {
|
||||
Stack.back().clearInitial();
|
||||
if (auto *BI = dyn_cast<BinaryOperator>(Inst)) {
|
||||
HorizontalReduction HorRdx(R.getMinVecRegSize());
|
||||
if (HorRdx.matchAssociativeReduction(P, BI)) {
|
||||
// If there is a sufficient number of reduction values, reduce
|
||||
// to a nearby power-of-2. Can safely generate oversized
|
||||
// vectors and rely on the backend to split them to legal sizes.
|
||||
HorRdx.ReduxWidth =
|
||||
std::max((uint64_t)4, PowerOf2Floor(HorRdx.numReductionValues()));
|
||||
// If there is a sufficient number of reduction values, reduce
|
||||
// to a nearby power-of-2. Can safely generate oversized
|
||||
// vectors and rely on the backend to split them to legal sizes.
|
||||
HorRdx.ReduxWidth =
|
||||
std::max((uint64_t)4, PowerOf2Floor(HorRdx.numReductionValues()));
|
||||
|
||||
if (HorRdx.tryToReduce(R, TTI)) {
|
||||
Res = true;
|
||||
P = nullptr;
|
||||
continue;
|
||||
}
|
||||
}
|
||||
if (P) {
|
||||
Inst = dyn_cast<Instruction>(BI->getOperand(0));
|
||||
if (Inst == P)
|
||||
Inst = dyn_cast<Instruction>(BI->getOperand(1));
|
||||
if (!Inst) {
|
||||
P = nullptr;
|
||||
continue;
|
||||
}
|
||||
}
|
||||
}
|
||||
P = nullptr;
|
||||
if (Vectorize(dyn_cast<BinaryOperator>(Inst), R)) {
|
||||
Res = true;
|
||||
continue;
|
||||
}
|
||||
}
|
||||
if (Stack.back().isFinal()) {
|
||||
Stack.pop_back();
|
||||
continue;
|
||||
}
|
||||
|
||||
if (auto *NextV = dyn_cast<Instruction>(Stack.back().nextOperand()))
|
||||
if (NextV->getParent() == BB && VisitedInstrs.insert(NextV).second &&
|
||||
Stack.size() < RecursionMaxDepth)
|
||||
Stack.push_back(NextV);
|
||||
}
|
||||
return Res;
|
||||
}
|
||||
|
||||
bool SLPVectorizerPass::vectorizeRootInstruction(PHINode *P, Value *V,
|
||||
BasicBlock *BB, BoUpSLP &R,
|
||||
TargetTransformInfo *TTI) {
|
||||
if (!V)
|
||||
return false;
|
||||
auto *I = dyn_cast<Instruction>(V);
|
||||
if (!I)
|
||||
return false;
|
||||
|
||||
if (!isa<BinaryOperator>(I))
|
||||
P = nullptr;
|
||||
// Try to match and vectorize a horizontal reduction.
|
||||
return canBeVectorized(P, I, BB, R, TTI,
|
||||
[this](BinaryOperator *BI, BoUpSLP &R) -> bool {
|
||||
return tryToVectorize(BI, R);
|
||||
});
|
||||
return HorRdx.tryToReduce(R, TTI);
|
||||
}
|
||||
|
||||
bool SLPVectorizerPass::vectorizeChainsInBlock(BasicBlock *BB, BoUpSLP &R) {
|
||||
@ -4717,42 +4599,67 @@ bool SLPVectorizerPass::vectorizeChainsInBlock(BasicBlock *BB, BoUpSLP &R) {
|
||||
if (P->getNumIncomingValues() != 2)
|
||||
return Changed;
|
||||
|
||||
Value *Rdx = getReductionValue(DT, P, BB, LI);
|
||||
|
||||
// Check if this is a Binary Operator.
|
||||
BinaryOperator *BI = dyn_cast_or_null<BinaryOperator>(Rdx);
|
||||
if (!BI)
|
||||
continue;
|
||||
|
||||
// Try to match and vectorize a horizontal reduction.
|
||||
if (vectorizeRootInstruction(P, getReductionValue(DT, P, BB, LI), BB, R,
|
||||
TTI)) {
|
||||
if (canMatchHorizontalReduction(P, BI, R, TTI, R.getMinVecRegSize())) {
|
||||
Changed = true;
|
||||
it = BB->begin();
|
||||
e = BB->end();
|
||||
continue;
|
||||
}
|
||||
|
||||
Value *Inst = BI->getOperand(0);
|
||||
if (Inst == P)
|
||||
Inst = BI->getOperand(1);
|
||||
|
||||
if (tryToVectorize(dyn_cast<BinaryOperator>(Inst), R)) {
|
||||
// We would like to start over since some instructions are deleted
|
||||
// and the iterator may become invalid value.
|
||||
Changed = true;
|
||||
it = BB->begin();
|
||||
e = BB->end();
|
||||
continue;
|
||||
}
|
||||
|
||||
continue;
|
||||
}
|
||||
|
||||
if (ShouldStartVectorizeHorAtStore) {
|
||||
if (StoreInst *SI = dyn_cast<StoreInst>(it)) {
|
||||
// Try to match and vectorize a horizontal reduction.
|
||||
if (vectorizeRootInstruction(nullptr, SI->getValueOperand(), BB, R,
|
||||
TTI)) {
|
||||
Changed = true;
|
||||
it = BB->begin();
|
||||
e = BB->end();
|
||||
continue;
|
||||
if (ShouldStartVectorizeHorAtStore)
|
||||
if (StoreInst *SI = dyn_cast<StoreInst>(it))
|
||||
if (BinaryOperator *BinOp =
|
||||
dyn_cast<BinaryOperator>(SI->getValueOperand())) {
|
||||
if (canMatchHorizontalReduction(nullptr, BinOp, R, TTI,
|
||||
R.getMinVecRegSize()) ||
|
||||
tryToVectorize(BinOp, R)) {
|
||||
Changed = true;
|
||||
it = BB->begin();
|
||||
e = BB->end();
|
||||
continue;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Try to vectorize horizontal reductions feeding into a return.
|
||||
if (ReturnInst *RI = dyn_cast<ReturnInst>(it)) {
|
||||
if (RI->getNumOperands() != 0) {
|
||||
// Try to match and vectorize a horizontal reduction.
|
||||
if (vectorizeRootInstruction(nullptr, RI->getOperand(0), BB, R, TTI)) {
|
||||
Changed = true;
|
||||
it = BB->begin();
|
||||
e = BB->end();
|
||||
continue;
|
||||
if (ReturnInst *RI = dyn_cast<ReturnInst>(it))
|
||||
if (RI->getNumOperands() != 0)
|
||||
if (BinaryOperator *BinOp =
|
||||
dyn_cast<BinaryOperator>(RI->getOperand(0))) {
|
||||
DEBUG(dbgs() << "SLP: Found a return to vectorize.\n");
|
||||
if (canMatchHorizontalReduction(nullptr, BinOp, R, TTI,
|
||||
R.getMinVecRegSize()) ||
|
||||
tryToVectorizePair(BinOp->getOperand(0), BinOp->getOperand(1),
|
||||
R)) {
|
||||
Changed = true;
|
||||
it = BB->begin();
|
||||
e = BB->end();
|
||||
continue;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Try to vectorize trees that start at compare instructions.
|
||||
if (CmpInst *CI = dyn_cast<CmpInst>(it)) {
|
||||
@ -4765,14 +4672,16 @@ bool SLPVectorizerPass::vectorizeChainsInBlock(BasicBlock *BB, BoUpSLP &R) {
|
||||
continue;
|
||||
}
|
||||
|
||||
for (int I = 0; I < 2; ++I) {
|
||||
if (vectorizeRootInstruction(nullptr, CI->getOperand(I), BB, R, TTI)) {
|
||||
Changed = true;
|
||||
// We would like to start over since some instructions are deleted
|
||||
// and the iterator may become invalid value.
|
||||
it = BB->begin();
|
||||
e = BB->end();
|
||||
break;
|
||||
for (int i = 0; i < 2; ++i) {
|
||||
if (BinaryOperator *BI = dyn_cast<BinaryOperator>(CI->getOperand(i))) {
|
||||
if (tryToVectorizePair(BI->getOperand(0), BI->getOperand(1), R)) {
|
||||
Changed = true;
|
||||
// We would like to start over since some instructions are deleted
|
||||
// and the iterator may become invalid value.
|
||||
it = BB->begin();
|
||||
e = BB->end();
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
continue;
|
||||
|
@ -1,13 +1,15 @@
|
||||
; RUN: llc -march=amdgcn -verify-machineinstrs< %s | FileCheck -check-prefix=SI %s
|
||||
; RUN: llc -march=amdgcn -verify-machineinstrs< %s | FileCheck -check-prefix=GCN -check-prefix=SI %s
|
||||
; RUN: llc -march=amdgcn -mcpu=fiji -verify-machineinstrs< %s | FileCheck -check-prefix=GCN -check-prefix=VI %s
|
||||
; RUN: llc -march=r600 -mcpu=cypress < %s | FileCheck -check-prefix=EG %s
|
||||
|
||||
declare i32 @llvm.r600.read.tidig.x() nounwind readnone
|
||||
|
||||
define void @trunc_i64_to_i32_store(i32 addrspace(1)* %out, i64 %in) {
|
||||
; SI-LABEL: {{^}}trunc_i64_to_i32_store:
|
||||
; SI: s_load_dword [[SLOAD:s[0-9]+]], s[0:1], 0xb
|
||||
; SI: v_mov_b32_e32 [[VLOAD:v[0-9]+]], [[SLOAD]]
|
||||
; GCN-LABEL: {{^}}trunc_i64_to_i32_store:
|
||||
; GCN: s_load_dword [[SLOAD:s[0-9]+]], s[0:1],
|
||||
; GCN: v_mov_b32_e32 [[VLOAD:v[0-9]+]], [[SLOAD]]
|
||||
; SI: buffer_store_dword [[VLOAD]]
|
||||
; VI: flat_store_dword v[{{[0-9:]+}}], [[VLOAD]]
|
||||
|
||||
; EG-LABEL: {{^}}trunc_i64_to_i32_store:
|
||||
; EG: MEM_RAT_CACHELESS STORE_RAW T0.X, T1.X, 1
|
||||
@ -18,12 +20,14 @@ define void @trunc_i64_to_i32_store(i32 addrspace(1)* %out, i64 %in) {
|
||||
ret void
|
||||
}
|
||||
|
||||
; SI-LABEL: {{^}}trunc_load_shl_i64:
|
||||
; SI-DAG: s_load_dwordx2
|
||||
; SI-DAG: s_load_dword [[SREG:s[0-9]+]],
|
||||
; SI: s_lshl_b32 [[SHL:s[0-9]+]], [[SREG]], 2
|
||||
; SI: v_mov_b32_e32 [[VSHL:v[0-9]+]], [[SHL]]
|
||||
; SI: buffer_store_dword [[VSHL]],
|
||||
; GCN-LABEL: {{^}}trunc_load_shl_i64:
|
||||
; GCN-DAG: s_load_dwordx2
|
||||
; GCN-DAG: s_load_dword [[SREG:s[0-9]+]],
|
||||
; GCN: s_lshl_b32 [[SHL:s[0-9]+]], [[SREG]], 2
|
||||
; GCN: v_mov_b32_e32 [[VSHL:v[0-9]+]], [[SHL]]
|
||||
; SI: buffer_store_dword [[VSHL]]
|
||||
; VI: flat_store_dword v[{{[0-9:]+}}], [[VSHL]]
|
||||
|
||||
define void @trunc_load_shl_i64(i32 addrspace(1)* %out, i64 %a) {
|
||||
%b = shl i64 %a, 2
|
||||
%result = trunc i64 %b to i32
|
||||
@ -31,15 +35,17 @@ define void @trunc_load_shl_i64(i32 addrspace(1)* %out, i64 %a) {
|
||||
ret void
|
||||
}
|
||||
|
||||
; SI-LABEL: {{^}}trunc_shl_i64:
|
||||
; GCN-LABEL: {{^}}trunc_shl_i64:
|
||||
; SI: s_load_dwordx2 s{{\[}}[[LO_SREG:[0-9]+]]:{{[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}}, 0xd
|
||||
; SI: s_lshl_b64 s{{\[}}[[LO_SHL:[0-9]+]]:{{[0-9]+\]}}, s{{\[}}[[LO_SREG]]:{{[0-9]+\]}}, 2
|
||||
; SI: s_add_u32 s[[LO_SREG2:[0-9]+]], s[[LO_SHL]],
|
||||
; SI: v_mov_b32_e32 v[[LO_VREG:[0-9]+]], s[[LO_SREG2]]
|
||||
; SI: s_addc_u32
|
||||
; VI: s_load_dwordx2 s{{\[}}[[LO_SREG:[0-9]+]]:{{[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}}, 0x34
|
||||
; GCN: s_lshl_b64 s{{\[}}[[LO_SHL:[0-9]+]]:{{[0-9]+\]}}, s{{\[}}[[LO_SREG]]:{{[0-9]+\]}}, 2
|
||||
; GCN: s_add_u32 s[[LO_SREG2:[0-9]+]], s[[LO_SHL]],
|
||||
; GCN: v_mov_b32_e32 v[[LO_VREG:[0-9]+]], s[[LO_SREG2]]
|
||||
; GCN: s_addc_u32
|
||||
; SI: buffer_store_dword v[[LO_VREG]],
|
||||
; SI: v_mov_b32_e32
|
||||
; SI: v_mov_b32_e32
|
||||
; VI: flat_store_dword v[{{[0-9:]+}}], v[[LO_VREG]]
|
||||
; GCN: v_mov_b32_e32
|
||||
; GCN: v_mov_b32_e32
|
||||
define void @trunc_shl_i64(i64 addrspace(1)* %out2, i32 addrspace(1)* %out, i64 %a) {
|
||||
%aa = add i64 %a, 234 ; Prevent shrinking store.
|
||||
%b = shl i64 %aa, 2
|
||||
@ -49,9 +55,9 @@ define void @trunc_shl_i64(i64 addrspace(1)* %out2, i32 addrspace(1)* %out, i64
|
||||
ret void
|
||||
}
|
||||
|
||||
; SI-LABEL: {{^}}trunc_i32_to_i1:
|
||||
; SI: v_and_b32_e32 v{{[0-9]+}}, 1, v{{[0-9]+}}
|
||||
; SI: v_cmp_eq_u32
|
||||
; GCN-LABEL: {{^}}trunc_i32_to_i1:
|
||||
; GCN: v_and_b32_e32 v{{[0-9]+}}, 1, v{{[0-9]+}}
|
||||
; GCN: v_cmp_eq_u32
|
||||
define void @trunc_i32_to_i1(i32 addrspace(1)* %out, i32 addrspace(1)* %ptr) {
|
||||
%a = load i32, i32 addrspace(1)* %ptr, align 4
|
||||
%trunc = trunc i32 %a to i1
|
||||
@ -60,9 +66,30 @@ define void @trunc_i32_to_i1(i32 addrspace(1)* %out, i32 addrspace(1)* %ptr) {
|
||||
ret void
|
||||
}
|
||||
|
||||
; SI-LABEL: {{^}}sgpr_trunc_i32_to_i1:
|
||||
; SI: s_and_b32 s{{[0-9]+}}, 1, s{{[0-9]+}}
|
||||
; SI: v_cmp_eq_u32
|
||||
; GCN-LABEL: {{^}}trunc_i8_to_i1:
|
||||
; GCN: v_and_b32_e32 v{{[0-9]+}}, 1, v{{[0-9]+}}
|
||||
; GCN: v_cmp_eq_u32
|
||||
define void @trunc_i8_to_i1(i8 addrspace(1)* %out, i8 addrspace(1)* %ptr) {
|
||||
%a = load i8, i8 addrspace(1)* %ptr, align 4
|
||||
%trunc = trunc i8 %a to i1
|
||||
%result = select i1 %trunc, i8 1, i8 0
|
||||
store i8 %result, i8 addrspace(1)* %out, align 4
|
||||
ret void
|
||||
}
|
||||
|
||||
; GCN-LABEL: {{^}}sgpr_trunc_i16_to_i1:
|
||||
; GCN: s_and_b32 s{{[0-9]+}}, 1, s{{[0-9]+}}
|
||||
; GCN: v_cmp_eq_u32
|
||||
define void @sgpr_trunc_i16_to_i1(i16 addrspace(1)* %out, i16 %a) {
|
||||
%trunc = trunc i16 %a to i1
|
||||
%result = select i1 %trunc, i16 1, i16 0
|
||||
store i16 %result, i16 addrspace(1)* %out, align 4
|
||||
ret void
|
||||
}
|
||||
|
||||
; GCN-LABEL: {{^}}sgpr_trunc_i32_to_i1:
|
||||
; GCN: s_and_b32 s{{[0-9]+}}, 1, s{{[0-9]+}}
|
||||
; GCN: v_cmp_eq_u32
|
||||
define void @sgpr_trunc_i32_to_i1(i32 addrspace(1)* %out, i32 %a) {
|
||||
%trunc = trunc i32 %a to i1
|
||||
%result = select i1 %trunc, i32 1, i32 0
|
||||
@ -70,11 +97,12 @@ define void @sgpr_trunc_i32_to_i1(i32 addrspace(1)* %out, i32 %a) {
|
||||
ret void
|
||||
}
|
||||
|
||||
; SI-LABEL: {{^}}s_trunc_i64_to_i1:
|
||||
; GCN-LABEL: {{^}}s_trunc_i64_to_i1:
|
||||
; SI: s_load_dwordx2 s{{\[}}[[SLO:[0-9]+]]:{{[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0xb
|
||||
; SI: s_and_b32 [[MASKED:s[0-9]+]], 1, s[[SLO]]
|
||||
; SI: v_cmp_eq_u32_e64 s{{\[}}[[VLO:[0-9]+]]:[[VHI:[0-9]+]]], [[MASKED]], 1{{$}}
|
||||
; SI: v_cndmask_b32_e64 {{v[0-9]+}}, -12, 63, s{{\[}}[[VLO]]:[[VHI]]]
|
||||
; VI: s_load_dwordx2 s{{\[}}[[SLO:[0-9]+]]:{{[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0x2c
|
||||
; GCN: s_and_b32 [[MASKED:s[0-9]+]], 1, s[[SLO]]
|
||||
; GCN: v_cmp_eq_u32_e64 s{{\[}}[[VLO:[0-9]+]]:[[VHI:[0-9]+]]], [[MASKED]], 1{{$}}
|
||||
; GCN: v_cndmask_b32_e64 {{v[0-9]+}}, -12, 63, s{{\[}}[[VLO]]:[[VHI]]]
|
||||
define void @s_trunc_i64_to_i1(i32 addrspace(1)* %out, i64 %x) {
|
||||
%trunc = trunc i64 %x to i1
|
||||
%sel = select i1 %trunc, i32 63, i32 -12
|
||||
@ -82,11 +110,12 @@ define void @s_trunc_i64_to_i1(i32 addrspace(1)* %out, i64 %x) {
|
||||
ret void
|
||||
}
|
||||
|
||||
; SI-LABEL: {{^}}v_trunc_i64_to_i1:
|
||||
; GCN-LABEL: {{^}}v_trunc_i64_to_i1:
|
||||
; SI: buffer_load_dwordx2 v{{\[}}[[VLO:[0-9]+]]:{{[0-9]+\]}}
|
||||
; SI: v_and_b32_e32 [[MASKED:v[0-9]+]], 1, v[[VLO]]
|
||||
; SI: v_cmp_eq_u32_e32 vcc, 1, [[MASKED]]
|
||||
; SI: v_cndmask_b32_e64 {{v[0-9]+}}, -12, 63, vcc
|
||||
; VI: flat_load_dwordx2 v{{\[}}[[VLO:[0-9]+]]:{{[0-9]+\]}}
|
||||
; GCN: v_and_b32_e32 [[MASKED:v[0-9]+]], 1, v[[VLO]]
|
||||
; GCN: v_cmp_eq_u32_e32 vcc, 1, [[MASKED]]
|
||||
; GCN: v_cndmask_b32_e64 {{v[0-9]+}}, -12, 63, vcc
|
||||
define void @v_trunc_i64_to_i1(i32 addrspace(1)* %out, i64 addrspace(1)* %in) {
|
||||
%tid = call i32 @llvm.r600.read.tidig.x() nounwind readnone
|
||||
%gep = getelementptr i64, i64 addrspace(1)* %in, i32 %tid
|
||||
|
@ -1,4 +1,4 @@
|
||||
; RUN: opt < %s -correlated-propagation -S | FileCheck %s
|
||||
; RUN: opt < %s -correlated-propagation -cvp-dont-process-adds=false -S | FileCheck %s
|
||||
|
||||
; CHECK-LABEL: @test0(
|
||||
define void @test0(i32 %a) {
|
||||
|
@ -222,3 +222,23 @@ define i32 @test15(i32 %X1, i32 %X2, i32 %X3) {
|
||||
; CHECK-LABEL: @test15
|
||||
; CHECK: and i1 %A, %B
|
||||
}
|
||||
|
||||
; PR30256 - previously this asserted.
|
||||
; CHECK-LABEL: @test16
|
||||
; CHECK: %[[FACTOR:.*]] = mul i64 %a, -4
|
||||
; CHECK-NEXT: %[[RES:.*]] = add i64 %[[FACTOR]], %b
|
||||
; CHECK-NEXT: ret i64 %[[RES]]
|
||||
define i64 @test16(i1 %cmp, i64 %a, i64 %b) {
|
||||
entry:
|
||||
%shl = shl i64 %a, 1
|
||||
%shl.neg = sub i64 0, %shl
|
||||
br i1 %cmp, label %if.then, label %if.end
|
||||
|
||||
if.then: ; preds = %entry
|
||||
%add1 = add i64 %shl.neg, %shl.neg
|
||||
%add2 = add i64 %add1, %b
|
||||
ret i64 %add2
|
||||
|
||||
if.end: ; preds = %entry
|
||||
ret i64 0
|
||||
}
|
||||
|
@ -12,25 +12,26 @@ define float @baz() {
|
||||
; CHECK-NEXT: [[TMP0:%.*]] = load i32, i32* @n, align 4
|
||||
; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[TMP0]], 3
|
||||
; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float
|
||||
; CHECK-NEXT: [[TMP1:%.*]] = load <2 x float>, <2 x float>* bitcast ([20 x float]* @arr to <2 x float>*), align 16
|
||||
; CHECK-NEXT: [[TMP2:%.*]] = load <2 x float>, <2 x float>* bitcast ([20 x float]* @arr1 to <2 x float>*), align 16
|
||||
; CHECK-NEXT: [[TMP3:%.*]] = fmul fast <2 x float> [[TMP2]], [[TMP1]]
|
||||
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP3]], i32 0
|
||||
; CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[TMP4]], [[CONV]]
|
||||
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP3]], i32 1
|
||||
; CHECK-NEXT: [[ADD_1:%.*]] = fadd fast float [[TMP5]], [[ADD]]
|
||||
; CHECK-NEXT: [[TMP6:%.*]] = load <2 x float>, <2 x float>* bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 2) to <2 x float>*), align 8
|
||||
; CHECK-NEXT: [[TMP7:%.*]] = load <2 x float>, <2 x float>* bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 2) to <2 x float>*), align 8
|
||||
; CHECK-NEXT: [[TMP8:%.*]] = fmul fast <2 x float> [[TMP7]], [[TMP6]]
|
||||
; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x float> [[TMP8]], i32 0
|
||||
; CHECK-NEXT: [[ADD_2:%.*]] = fadd fast float [[TMP9]], [[ADD_1]]
|
||||
; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x float> [[TMP8]], i32 1
|
||||
; CHECK-NEXT: [[ADD_3:%.*]] = fadd fast float [[TMP10]], [[ADD_2]]
|
||||
; CHECK-NEXT: [[TMP1:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 0), align 16
|
||||
; CHECK-NEXT: [[TMP2:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 0), align 16
|
||||
; CHECK-NEXT: [[MUL4:%.*]] = fmul fast float [[TMP2]], [[TMP1]]
|
||||
; CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[MUL4]], [[CONV]]
|
||||
; CHECK-NEXT: [[TMP3:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 1), align 4
|
||||
; CHECK-NEXT: [[TMP4:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 1), align 4
|
||||
; CHECK-NEXT: [[MUL4_1:%.*]] = fmul fast float [[TMP4]], [[TMP3]]
|
||||
; CHECK-NEXT: [[ADD_1:%.*]] = fadd fast float [[MUL4_1]], [[ADD]]
|
||||
; CHECK-NEXT: [[TMP5:%.*]] = load <2 x float>, <2 x float>* bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 2) to <2 x float>*), align 8
|
||||
; CHECK-NEXT: [[TMP6:%.*]] = load <2 x float>, <2 x float>* bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 2) to <2 x float>*), align 8
|
||||
; CHECK-NEXT: [[TMP7:%.*]] = fmul fast <2 x float> [[TMP6]], [[TMP5]]
|
||||
; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x float> [[TMP7]], i32 0
|
||||
; CHECK-NEXT: [[ADD_2:%.*]] = fadd fast float [[TMP8]], [[ADD_1]]
|
||||
; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x float> [[TMP7]], i32 1
|
||||
; CHECK-NEXT: [[ADD_3:%.*]] = fadd fast float [[TMP9]], [[ADD_2]]
|
||||
; CHECK-NEXT: [[ADD7:%.*]] = fadd fast float [[ADD_3]], [[CONV]]
|
||||
; CHECK-NEXT: [[ADD19:%.*]] = fadd fast float [[TMP4]], [[ADD7]]
|
||||
; CHECK-NEXT: [[ADD19_1:%.*]] = fadd fast float [[TMP5]], [[ADD19]]
|
||||
; CHECK-NEXT: [[ADD19_2:%.*]] = fadd fast float [[TMP9]], [[ADD19_1]]
|
||||
; CHECK-NEXT: [[ADD19_3:%.*]] = fadd fast float [[TMP10]], [[ADD19_2]]
|
||||
; CHECK-NEXT: [[ADD19:%.*]] = fadd fast float [[MUL4]], [[ADD7]]
|
||||
; CHECK-NEXT: [[ADD19_1:%.*]] = fadd fast float [[MUL4_1]], [[ADD19]]
|
||||
; CHECK-NEXT: [[ADD19_2:%.*]] = fadd fast float [[TMP8]], [[ADD19_1]]
|
||||
; CHECK-NEXT: [[ADD19_3:%.*]] = fadd fast float [[TMP9]], [[ADD19_2]]
|
||||
; CHECK-NEXT: store float [[ADD19_3]], float* @res, align 4
|
||||
; CHECK-NEXT: ret float [[ADD19_3]]
|
||||
;
|
||||
@ -69,37 +70,40 @@ define float @bazz() {
|
||||
; CHECK-NEXT: [[TMP0:%.*]] = load i32, i32* @n, align 4
|
||||
; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[TMP0]], 3
|
||||
; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float
|
||||
; CHECK-NEXT: [[TMP1:%.*]] = load <2 x float>, <2 x float>* bitcast ([20 x float]* @arr to <2 x float>*), align 16
|
||||
; CHECK-NEXT: [[TMP2:%.*]] = load <2 x float>, <2 x float>* bitcast ([20 x float]* @arr1 to <2 x float>*), align 16
|
||||
; CHECK-NEXT: [[TMP3:%.*]] = fmul fast <2 x float> [[TMP2]], [[TMP1]]
|
||||
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP3]], i32 0
|
||||
; CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[TMP4]], [[CONV]]
|
||||
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP3]], i32 1
|
||||
; CHECK-NEXT: [[ADD_1:%.*]] = fadd fast float [[TMP5]], [[ADD]]
|
||||
; CHECK-NEXT: [[TMP6:%.*]] = load <2 x float>, <2 x float>* bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 2) to <2 x float>*), align 8
|
||||
; CHECK-NEXT: [[TMP7:%.*]] = load <2 x float>, <2 x float>* bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 2) to <2 x float>*), align 8
|
||||
; CHECK-NEXT: [[TMP8:%.*]] = fmul fast <2 x float> [[TMP7]], [[TMP6]]
|
||||
; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x float> [[TMP8]], i32 0
|
||||
; CHECK-NEXT: [[ADD_2:%.*]] = fadd fast float [[TMP9]], [[ADD_1]]
|
||||
; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x float> [[TMP8]], i32 1
|
||||
; CHECK-NEXT: [[ADD_3:%.*]] = fadd fast float [[TMP10]], [[ADD_2]]
|
||||
; CHECK-NEXT: [[TMP1:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 0), align 16
|
||||
; CHECK-NEXT: [[TMP2:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 0), align 16
|
||||
; CHECK-NEXT: [[MUL4:%.*]] = fmul fast float [[TMP2]], [[TMP1]]
|
||||
; CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[MUL4]], [[CONV]]
|
||||
; CHECK-NEXT: [[TMP3:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 1), align 4
|
||||
; CHECK-NEXT: [[TMP4:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 1), align 4
|
||||
; CHECK-NEXT: [[MUL4_1:%.*]] = fmul fast float [[TMP4]], [[TMP3]]
|
||||
; CHECK-NEXT: [[ADD_1:%.*]] = fadd fast float [[MUL4_1]], [[ADD]]
|
||||
; CHECK-NEXT: [[TMP5:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 2), align 8
|
||||
; CHECK-NEXT: [[TMP6:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 2), align 8
|
||||
; CHECK-NEXT: [[MUL4_2:%.*]] = fmul fast float [[TMP6]], [[TMP5]]
|
||||
; CHECK-NEXT: [[ADD_2:%.*]] = fadd fast float [[MUL4_2]], [[ADD_1]]
|
||||
; CHECK-NEXT: [[TMP7:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 3), align 4
|
||||
; CHECK-NEXT: [[TMP8:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 3), align 4
|
||||
; CHECK-NEXT: [[MUL4_3:%.*]] = fmul fast float [[TMP8]], [[TMP7]]
|
||||
; CHECK-NEXT: [[ADD_3:%.*]] = fadd fast float [[MUL4_3]], [[ADD_2]]
|
||||
; CHECK-NEXT: [[MUL5:%.*]] = shl nsw i32 [[TMP0]], 2
|
||||
; CHECK-NEXT: [[CONV6:%.*]] = sitofp i32 [[MUL5]] to float
|
||||
; CHECK-NEXT: [[ADD7:%.*]] = fadd fast float [[ADD_3]], [[CONV6]]
|
||||
; CHECK-NEXT: [[TMP11:%.*]] = load <2 x float>, <2 x float>* bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 4) to <2 x float>*), align 16
|
||||
; CHECK-NEXT: [[TMP12:%.*]] = load <2 x float>, <2 x float>* bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 4) to <2 x float>*), align 16
|
||||
; CHECK-NEXT: [[TMP13:%.*]] = fmul fast <2 x float> [[TMP12]], [[TMP11]]
|
||||
; CHECK-NEXT: [[TMP14:%.*]] = extractelement <2 x float> [[TMP13]], i32 0
|
||||
; CHECK-NEXT: [[ADD19:%.*]] = fadd fast float [[TMP14]], [[ADD7]]
|
||||
; CHECK-NEXT: [[TMP15:%.*]] = extractelement <2 x float> [[TMP13]], i32 1
|
||||
; CHECK-NEXT: [[ADD19_1:%.*]] = fadd fast float [[TMP15]], [[ADD19]]
|
||||
; CHECK-NEXT: [[TMP16:%.*]] = load <2 x float>, <2 x float>* bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 6) to <2 x float>*), align 8
|
||||
; CHECK-NEXT: [[TMP17:%.*]] = load <2 x float>, <2 x float>* bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 6) to <2 x float>*), align 8
|
||||
; CHECK-NEXT: [[TMP18:%.*]] = fmul fast <2 x float> [[TMP17]], [[TMP16]]
|
||||
; CHECK-NEXT: [[TMP19:%.*]] = extractelement <2 x float> [[TMP18]], i32 0
|
||||
; CHECK-NEXT: [[ADD19_2:%.*]] = fadd fast float [[TMP19]], [[ADD19_1]]
|
||||
; CHECK-NEXT: [[TMP20:%.*]] = extractelement <2 x float> [[TMP18]], i32 1
|
||||
; CHECK-NEXT: [[ADD19_3:%.*]] = fadd fast float [[TMP20]], [[ADD19_2]]
|
||||
; CHECK-NEXT: [[TMP9:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 4), align 16
|
||||
; CHECK-NEXT: [[TMP10:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 4), align 16
|
||||
; CHECK-NEXT: [[MUL18:%.*]] = fmul fast float [[TMP10]], [[TMP9]]
|
||||
; CHECK-NEXT: [[ADD19:%.*]] = fadd fast float [[MUL18]], [[ADD7]]
|
||||
; CHECK-NEXT: [[TMP11:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 5), align 4
|
||||
; CHECK-NEXT: [[TMP12:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 5), align 4
|
||||
; CHECK-NEXT: [[MUL18_1:%.*]] = fmul fast float [[TMP12]], [[TMP11]]
|
||||
; CHECK-NEXT: [[ADD19_1:%.*]] = fadd fast float [[MUL18_1]], [[ADD19]]
|
||||
; CHECK-NEXT: [[TMP13:%.*]] = load <2 x float>, <2 x float>* bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 6) to <2 x float>*), align 8
|
||||
; CHECK-NEXT: [[TMP14:%.*]] = load <2 x float>, <2 x float>* bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 6) to <2 x float>*), align 8
|
||||
; CHECK-NEXT: [[TMP15:%.*]] = fmul fast <2 x float> [[TMP14]], [[TMP13]]
|
||||
; CHECK-NEXT: [[TMP16:%.*]] = extractelement <2 x float> [[TMP15]], i32 0
|
||||
; CHECK-NEXT: [[ADD19_2:%.*]] = fadd fast float [[TMP16]], [[ADD19_1]]
|
||||
; CHECK-NEXT: [[TMP17:%.*]] = extractelement <2 x float> [[TMP15]], i32 1
|
||||
; CHECK-NEXT: [[ADD19_3:%.*]] = fadd fast float [[TMP17]], [[ADD19_2]]
|
||||
; CHECK-NEXT: store float [[ADD19_3]], float* @res, align 4
|
||||
; CHECK-NEXT: ret float [[ADD19_3]]
|
||||
;
|
||||
@ -151,20 +155,24 @@ define float @bazzz() {
|
||||
; CHECK-NEXT: entry:
|
||||
; CHECK-NEXT: [[TMP0:%.*]] = load i32, i32* @n, align 4
|
||||
; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[TMP0]] to float
|
||||
; CHECK-NEXT: [[TMP1:%.*]] = load <4 x float>, <4 x float>* bitcast ([20 x float]* @arr to <4 x float>*), align 16
|
||||
; CHECK-NEXT: [[TMP2:%.*]] = load <4 x float>, <4 x float>* bitcast ([20 x float]* @arr1 to <4 x float>*), align 16
|
||||
; CHECK-NEXT: [[TMP3:%.*]] = fmul fast <4 x float> [[TMP2]], [[TMP1]]
|
||||
; CHECK-NEXT: [[TMP4:%.*]] = fadd fast float undef, undef
|
||||
; CHECK-NEXT: [[TMP5:%.*]] = fadd fast float undef, [[TMP4]]
|
||||
; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
|
||||
; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x float> [[TMP3]], [[RDX_SHUF]]
|
||||
; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
|
||||
; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <4 x float> [[BIN_RDX]], [[RDX_SHUF1]]
|
||||
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[BIN_RDX2]], i32 0
|
||||
; CHECK-NEXT: [[TMP7:%.*]] = fadd fast float undef, [[TMP5]]
|
||||
; CHECK-NEXT: [[TMP8:%.*]] = fmul fast float [[CONV]], [[TMP6]]
|
||||
; CHECK-NEXT: store float [[TMP8]], float* @res, align 4
|
||||
; CHECK-NEXT: ret float [[TMP8]]
|
||||
; CHECK-NEXT: [[TMP1:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 0), align 16
|
||||
; CHECK-NEXT: [[TMP2:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 0), align 16
|
||||
; CHECK-NEXT: [[MUL:%.*]] = fmul fast float [[TMP2]], [[TMP1]]
|
||||
; CHECK-NEXT: [[TMP3:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 1), align 4
|
||||
; CHECK-NEXT: [[TMP4:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 1), align 4
|
||||
; CHECK-NEXT: [[MUL_1:%.*]] = fmul fast float [[TMP4]], [[TMP3]]
|
||||
; CHECK-NEXT: [[TMP5:%.*]] = fadd fast float [[MUL_1]], [[MUL]]
|
||||
; CHECK-NEXT: [[TMP6:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 2), align 8
|
||||
; CHECK-NEXT: [[TMP7:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 2), align 8
|
||||
; CHECK-NEXT: [[MUL_2:%.*]] = fmul fast float [[TMP7]], [[TMP6]]
|
||||
; CHECK-NEXT: [[TMP8:%.*]] = fadd fast float [[MUL_2]], [[TMP5]]
|
||||
; CHECK-NEXT: [[TMP9:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 3), align 4
|
||||
; CHECK-NEXT: [[TMP10:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 3), align 4
|
||||
; CHECK-NEXT: [[MUL_3:%.*]] = fmul fast float [[TMP10]], [[TMP9]]
|
||||
; CHECK-NEXT: [[TMP11:%.*]] = fadd fast float [[MUL_3]], [[TMP8]]
|
||||
; CHECK-NEXT: [[TMP12:%.*]] = fmul fast float [[CONV]], [[TMP11]]
|
||||
; CHECK-NEXT: store float [[TMP12]], float* @res, align 4
|
||||
; CHECK-NEXT: ret float [[TMP12]]
|
||||
;
|
||||
entry:
|
||||
%0 = load i32, i32* @n, align 4
|
||||
@ -194,19 +202,23 @@ define i32 @foo() {
|
||||
; CHECK-NEXT: entry:
|
||||
; CHECK-NEXT: [[TMP0:%.*]] = load i32, i32* @n, align 4
|
||||
; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[TMP0]] to float
|
||||
; CHECK-NEXT: [[TMP1:%.*]] = load <4 x float>, <4 x float>* bitcast ([20 x float]* @arr to <4 x float>*), align 16
|
||||
; CHECK-NEXT: [[TMP2:%.*]] = load <4 x float>, <4 x float>* bitcast ([20 x float]* @arr1 to <4 x float>*), align 16
|
||||
; CHECK-NEXT: [[TMP3:%.*]] = fmul fast <4 x float> [[TMP2]], [[TMP1]]
|
||||
; CHECK-NEXT: [[TMP4:%.*]] = fadd fast float undef, undef
|
||||
; CHECK-NEXT: [[TMP5:%.*]] = fadd fast float undef, [[TMP4]]
|
||||
; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
|
||||
; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x float> [[TMP3]], [[RDX_SHUF]]
|
||||
; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
|
||||
; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <4 x float> [[BIN_RDX]], [[RDX_SHUF1]]
|
||||
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[BIN_RDX2]], i32 0
|
||||
; CHECK-NEXT: [[TMP7:%.*]] = fadd fast float undef, [[TMP5]]
|
||||
; CHECK-NEXT: [[TMP8:%.*]] = fmul fast float [[CONV]], [[TMP6]]
|
||||
; CHECK-NEXT: [[CONV4:%.*]] = fptosi float [[TMP8]] to i32
|
||||
; CHECK-NEXT: [[TMP1:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 0), align 16
|
||||
; CHECK-NEXT: [[TMP2:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 0), align 16
|
||||
; CHECK-NEXT: [[MUL:%.*]] = fmul fast float [[TMP2]], [[TMP1]]
|
||||
; CHECK-NEXT: [[TMP3:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 1), align 4
|
||||
; CHECK-NEXT: [[TMP4:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 1), align 4
|
||||
; CHECK-NEXT: [[MUL_1:%.*]] = fmul fast float [[TMP4]], [[TMP3]]
|
||||
; CHECK-NEXT: [[TMP5:%.*]] = fadd fast float [[MUL_1]], [[MUL]]
|
||||
; CHECK-NEXT: [[TMP6:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 2), align 8
|
||||
; CHECK-NEXT: [[TMP7:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 2), align 8
|
||||
; CHECK-NEXT: [[MUL_2:%.*]] = fmul fast float [[TMP7]], [[TMP6]]
|
||||
; CHECK-NEXT: [[TMP8:%.*]] = fadd fast float [[MUL_2]], [[TMP5]]
|
||||
; CHECK-NEXT: [[TMP9:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 3), align 4
|
||||
; CHECK-NEXT: [[TMP10:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 3), align 4
|
||||
; CHECK-NEXT: [[MUL_3:%.*]] = fmul fast float [[TMP10]], [[TMP9]]
|
||||
; CHECK-NEXT: [[TMP11:%.*]] = fadd fast float [[MUL_3]], [[TMP8]]
|
||||
; CHECK-NEXT: [[TMP12:%.*]] = fmul fast float [[CONV]], [[TMP11]]
|
||||
; CHECK-NEXT: [[CONV4:%.*]] = fptosi float [[TMP12]] to i32
|
||||
; CHECK-NEXT: store i32 [[CONV4]], i32* @n, align 4
|
||||
; CHECK-NEXT: ret i32 [[CONV4]]
|
||||
;
|
||||
|
Loading…
Reference in New Issue
Block a user