Add llvm patch corresponding to r286007.
This commit is contained in:
parent
bd32a5b2ac
commit
0320ef6854
@ -0,0 +1,214 @@
|
||||
Pull in r219009 from upstream llvm trunk (by Adam Nemet):
|
||||
|
||||
[ISel] Keep matching state consistent when folding during X86 address match
|
||||
|
||||
In the X86 backend, matching an address is initiated by the 'addr' complex
|
||||
pattern and its friends. During this process we may reassociate and-of-shift
|
||||
into shift-of-and (FoldMaskedShiftToScaledMask) to allow folding of the
|
||||
shift into the scale of the address.
|
||||
|
||||
However as demonstrated by the testcase, this can trigger CSE of not only the
|
||||
shift and the AND which the code is prepared for but also the underlying load
|
||||
node. In the testcase this node is sitting in the RecordedNode and MatchScope
|
||||
data structures of the matcher and becomes a deleted node upon CSE. Returning
|
||||
from the complex pattern function, we try to access it again hitting an assert
|
||||
because the node is no longer a load even though this was checked before.
|
||||
|
||||
Now obviously changing the DAG this late is bending the rules but I think it
|
||||
makes sense somewhat. Outside of addresses we prefer and-of-shift because it
|
||||
may lead to smaller immediates (FoldMaskAndShiftToScale is an even better
|
||||
example because it create a non-canonical node). We currently don't recognize
|
||||
addresses during DAGCombiner where arguably this canonicalization should be
|
||||
performed. On the other hand, having this in the matcher allows us to cover
|
||||
all the cases where an address can be used in an instruction.
|
||||
|
||||
I've also talked a little bit to Dan Gohman on llvm-dev who added the RAUW for
|
||||
the new shift node in FoldMaskedShiftToScaledMask. This RAUW is responsible
|
||||
for initiating the recursive CSE on users
|
||||
(http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-September/076903.html) but it
|
||||
is not strictly necessary since the shift is hooked into the visited user. Of
|
||||
course it's safer to keep the DAG consistent at all times (e.g. for accurate
|
||||
number of uses, etc.).
|
||||
|
||||
So rather than changing the fundamentals, I've decided to continue along the
|
||||
previous patches and detect the CSE. This patch installs a very targeted
|
||||
DAGUpdateListener for the duration of a complex-pattern match and updates the
|
||||
matching state accordingly. (Previous patches used HandleSDNode to detect the
|
||||
CSE but that's not practical here). The listener is only installed on X86.
|
||||
|
||||
I tested that there is no measurable overhead due to this while running
|
||||
through the spec2k BC files with llc. The only thing we pay for is the
|
||||
creation of the listener. The callback never ever triggers in spec2k since
|
||||
this is a corner case.
|
||||
|
||||
Fixes rdar://problem/18206171
|
||||
|
||||
This fixes a possible crash in x86 code generation when compiling recent
|
||||
llvm/clang trunk sources.
|
||||
|
||||
Introduced here: http://svnweb.freebsd.org/changeset/base/286007
|
||||
|
||||
Index: include/llvm/CodeGen/SelectionDAGISel.h
|
||||
===================================================================
|
||||
--- include/llvm/CodeGen/SelectionDAGISel.h
|
||||
+++ include/llvm/CodeGen/SelectionDAGISel.h
|
||||
@@ -238,6 +238,12 @@ class SelectionDAGISel : public MachineFunctionPas
|
||||
const unsigned char *MatcherTable,
|
||||
unsigned TableSize);
|
||||
|
||||
+ /// \brief Return true if complex patterns for this target can mutate the
|
||||
+ /// DAG.
|
||||
+ virtual bool ComplexPatternFuncMutatesDAG() const {
|
||||
+ return false;
|
||||
+ }
|
||||
+
|
||||
private:
|
||||
|
||||
// Calls to these functions are generated by tblgen.
|
||||
Index: lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
|
||||
===================================================================
|
||||
--- lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
|
||||
+++ lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
|
||||
@@ -2345,6 +2345,42 @@ struct MatchScope {
|
||||
bool HasChainNodesMatched, HasGlueResultNodesMatched;
|
||||
};
|
||||
|
||||
+/// \\brief A DAG update listener to keep the matching state
|
||||
+/// (i.e. RecordedNodes and MatchScope) uptodate if the target is allowed to
|
||||
+/// change the DAG while matching. X86 addressing mode matcher is an example
|
||||
+/// for this.
|
||||
+class MatchStateUpdater : public SelectionDAG::DAGUpdateListener
|
||||
+{
|
||||
+ SmallVectorImpl<std::pair<SDValue, SDNode*> > &RecordedNodes;
|
||||
+ SmallVectorImpl<MatchScope> &MatchScopes;
|
||||
+public:
|
||||
+ MatchStateUpdater(SelectionDAG &DAG,
|
||||
+ SmallVectorImpl<std::pair<SDValue, SDNode*> > &RN,
|
||||
+ SmallVectorImpl<MatchScope> &MS) :
|
||||
+ SelectionDAG::DAGUpdateListener(DAG),
|
||||
+ RecordedNodes(RN), MatchScopes(MS) { }
|
||||
+
|
||||
+ void NodeDeleted(SDNode *N, SDNode *E) {
|
||||
+ // Some early-returns here to avoid the search if we deleted the node or
|
||||
+ // if the update comes from MorphNodeTo (MorphNodeTo is the last thing we
|
||||
+ // do, so it's unnecessary to update matching state at that point).
|
||||
+ // Neither of these can occur currently because we only install this
|
||||
+ // update listener during matching a complex patterns.
|
||||
+ if (!E || E->isMachineOpcode())
|
||||
+ return;
|
||||
+ // Performing linear search here does not matter because we almost never
|
||||
+ // run this code. You'd have to have a CSE during complex pattern
|
||||
+ // matching.
|
||||
+ for (auto &I : RecordedNodes)
|
||||
+ if (I.first.getNode() == N)
|
||||
+ I.first.setNode(E);
|
||||
+
|
||||
+ for (auto &I : MatchScopes)
|
||||
+ for (auto &J : I.NodeStack)
|
||||
+ if (J.getNode() == N)
|
||||
+ J.setNode(E);
|
||||
+ }
|
||||
+};
|
||||
}
|
||||
|
||||
SDNode *SelectionDAGISel::
|
||||
@@ -2599,6 +2635,14 @@ SelectCodeCommon(SDNode *NodeToMatch, const unsign
|
||||
unsigned CPNum = MatcherTable[MatcherIndex++];
|
||||
unsigned RecNo = MatcherTable[MatcherIndex++];
|
||||
assert(RecNo < RecordedNodes.size() && "Invalid CheckComplexPat");
|
||||
+
|
||||
+ // If target can modify DAG during matching, keep the matching state
|
||||
+ // consistent.
|
||||
+ std::unique_ptr<MatchStateUpdater> MSU;
|
||||
+ if (ComplexPatternFuncMutatesDAG())
|
||||
+ MSU.reset(new MatchStateUpdater(*CurDAG, RecordedNodes,
|
||||
+ MatchScopes));
|
||||
+
|
||||
if (!CheckComplexPattern(NodeToMatch, RecordedNodes[RecNo].second,
|
||||
RecordedNodes[RecNo].first, CPNum,
|
||||
RecordedNodes))
|
||||
Index: lib/Target/X86/X86ISelDAGToDAG.cpp
|
||||
===================================================================
|
||||
--- lib/Target/X86/X86ISelDAGToDAG.cpp
|
||||
+++ lib/Target/X86/X86ISelDAGToDAG.cpp
|
||||
@@ -290,6 +290,13 @@ namespace {
|
||||
const X86InstrInfo *getInstrInfo() const {
|
||||
return getTargetMachine().getInstrInfo();
|
||||
}
|
||||
+
|
||||
+ /// \brief Address-mode matching performs shift-of-and to and-of-shift
|
||||
+ /// reassociation in order to expose more scaled addressing
|
||||
+ /// opportunities.
|
||||
+ bool ComplexPatternFuncMutatesDAG() const override {
|
||||
+ return true;
|
||||
+ }
|
||||
};
|
||||
}
|
||||
|
||||
Index: test/CodeGen/X86/addr-mode-matcher.ll
|
||||
===================================================================
|
||||
--- test/CodeGen/X86/addr-mode-matcher.ll
|
||||
+++ test/CodeGen/X86/addr-mode-matcher.ll
|
||||
@@ -0,0 +1,62 @@
|
||||
+; RUN: llc < %s | FileCheck %s
|
||||
+
|
||||
+; This testcase used to hit an assert during ISel. For details, see the big
|
||||
+; comment inside the function.
|
||||
+
|
||||
+; CHECK-LABEL: foo:
|
||||
+; The AND should be turned into a subreg access.
|
||||
+; CHECK-NOT: and
|
||||
+; The shift (leal) should be folded into the scale of the address in the load.
|
||||
+; CHECK-NOT: leal
|
||||
+; CHECK: movl {{.*}},4),
|
||||
+
|
||||
+target datalayout = "e-m:o-p:32:32-f64:32:64-f80:128-n8:16:32-S128"
|
||||
+target triple = "i386-apple-macosx10.6.0"
|
||||
+
|
||||
+define void @foo(i32 %a) {
|
||||
+bb:
|
||||
+ br label %bb1692
|
||||
+
|
||||
+bb1692:
|
||||
+ %tmp1694 = phi i32 [ 0, %bb ], [ %tmp1745, %bb1692 ]
|
||||
+ %xor = xor i32 0, %tmp1694
|
||||
+
|
||||
+; %load1 = (load (and (shl %xor, 2), 1020))
|
||||
+ %tmp1701 = shl i32 %xor, 2
|
||||
+ %tmp1702 = and i32 %tmp1701, 1020
|
||||
+ %tmp1703 = getelementptr inbounds [1028 x i8]* null, i32 0, i32 %tmp1702
|
||||
+ %tmp1704 = bitcast i8* %tmp1703 to i32*
|
||||
+ %load1 = load i32* %tmp1704, align 4
|
||||
+
|
||||
+; %load2 = (load (shl (and %xor, 255), 2))
|
||||
+ %tmp1698 = and i32 %xor, 255
|
||||
+ %tmp1706 = shl i32 %tmp1698, 2
|
||||
+ %tmp1707 = getelementptr inbounds [1028 x i8]* null, i32 0, i32 %tmp1706
|
||||
+ %tmp1708 = bitcast i8* %tmp1707 to i32*
|
||||
+ %load2 = load i32* %tmp1708, align 4
|
||||
+
|
||||
+ %tmp1710 = or i32 %load2, %a
|
||||
+
|
||||
+; While matching xor we address-match %load1. The and-of-shift reassocication
|
||||
+; in address matching transform this into into a shift-of-and and the resuting
|
||||
+; node becomes identical to %load2. CSE replaces %load1 which leaves its
|
||||
+; references in MatchScope and RecordedNodes stale.
|
||||
+ %tmp1711 = xor i32 %load1, %tmp1710
|
||||
+
|
||||
+ %tmp1744 = getelementptr inbounds [256 x i32]* null, i32 0, i32 %tmp1711
|
||||
+ store i32 0, i32* %tmp1744, align 4
|
||||
+ %tmp1745 = add i32 %tmp1694, 1
|
||||
+ indirectbr i8* undef, [label %bb1756, label %bb1692]
|
||||
+
|
||||
+bb1756:
|
||||
+ br label %bb2705
|
||||
+
|
||||
+bb2705:
|
||||
+ indirectbr i8* undef, [label %bb5721, label %bb5736]
|
||||
+
|
||||
+bb5721:
|
||||
+ br label %bb2705
|
||||
+
|
||||
+bb5736:
|
||||
+ ret void
|
||||
+}
|
Loading…
Reference in New Issue
Block a user