Vendor import of llvm trunk r338536:
https://llvm.org/svn/llvm-project/llvm/trunk@338536
This commit is contained in:
parent
eb11fae6d0
commit
b7eb8e35e4
Notes:
svn2git
2020-12-20 02:59:44 +00:00
svn path=/vendor/llvm/dist/; revision=337137 svn path=/vendor/llvm/llvm-trunk-r338536/; revision=337138; tag=vendor/llvm/llvm-trunk-r338536
@ -867,6 +867,7 @@ if(NOT LLVM_TOOLCHAIN_TOOLS)
|
||||
llvm-ranlib
|
||||
llvm-lib
|
||||
llvm-objdump
|
||||
llvm-rc
|
||||
)
|
||||
endif()
|
||||
|
||||
|
@ -114,8 +114,8 @@ option specifies "``-``", then the output will also be sent to standard output.
|
||||
.. option:: -register-file-size=<size>
|
||||
|
||||
Specify the size of the register file. When specified, this flag limits how
|
||||
many temporary registers are available for register renaming purposes. A value
|
||||
of zero for this flag means "unlimited number of temporary registers".
|
||||
many physical registers are available for register renaming purposes. A value
|
||||
of zero for this flag means "unlimited number of physical registers".
|
||||
|
||||
.. option:: -iterations=<number of iterations>
|
||||
|
||||
@ -207,23 +207,23 @@ EXIT STATUS
|
||||
:program:`llvm-mca` returns 0 on success. Otherwise, an error message is printed
|
||||
to standard error, and the tool returns 1.
|
||||
|
||||
HOW MCA WORKS
|
||||
-------------
|
||||
HOW LLVM-MCA WORKS
|
||||
------------------
|
||||
|
||||
MCA takes assembly code as input. The assembly code is parsed into a sequence
|
||||
of MCInst with the help of the existing LLVM target assembly parsers. The
|
||||
parsed sequence of MCInst is then analyzed by a ``Pipeline`` module to generate
|
||||
a performance report.
|
||||
:program:`llvm-mca` takes assembly code as input. The assembly code is parsed
|
||||
into a sequence of MCInst with the help of the existing LLVM target assembly
|
||||
parsers. The parsed sequence of MCInst is then analyzed by a ``Pipeline`` module
|
||||
to generate a performance report.
|
||||
|
||||
The Pipeline module simulates the execution of the machine code sequence in a
|
||||
loop of iterations (default is 100). During this process, the pipeline collects
|
||||
a number of execution related statistics. At the end of this process, the
|
||||
pipeline generates and prints a report from the collected statistics.
|
||||
|
||||
Here is an example of a performance report generated by MCA for a dot-product
|
||||
of two packed float vectors of four elements. The analysis is conducted for
|
||||
target x86, cpu btver2. The following result can be produced via the following
|
||||
command using the example located at
|
||||
Here is an example of a performance report generated by the tool for a
|
||||
dot-product of two packed float vectors of four elements. The analysis is
|
||||
conducted for target x86, cpu btver2. The following result can be produced via
|
||||
the following command using the example located at
|
||||
``test/tools/llvm-mca/X86/BtVer2/dot-product.s``:
|
||||
|
||||
.. code-block:: bash
|
||||
@ -287,10 +287,30 @@ for a total of 900 dynamically executed instructions.
|
||||
The report is structured in three main sections. The first section collects a
|
||||
few performance numbers; the goal of this section is to give a very quick
|
||||
overview of the performance throughput. In this example, the two important
|
||||
performance indicators are the predicted total number of cycles, and the IPC.
|
||||
IPC is probably the most important throughput indicator. A big delta between
|
||||
the Dispatch Width and the computed IPC is an indicator of potential
|
||||
performance issues.
|
||||
performance indicators are **IPC** and **Block RThroughput** (Block Reciprocal
|
||||
Throughput).
|
||||
|
||||
IPC is computed dividing the total number of simulated instructions by the total
|
||||
number of cycles. A delta between Dispatch Width and IPC is an indicator of a
|
||||
performance issue. In the absence of loop-carried data dependencies, the
|
||||
observed IPC tends to a theoretical maximum which can be computed by dividing
|
||||
the number of instructions of a single iteration by the *Block RThroughput*.
|
||||
|
||||
IPC is bounded from above by the dispatch width. That is because the dispatch
|
||||
width limits the maximum size of a dispatch group. IPC is also limited by the
|
||||
amount of hardware parallelism. The availability of hardware resources affects
|
||||
the resource pressure distribution, and it limits the number of instructions
|
||||
that can be executed in parallel every cycle. A delta between Dispatch
|
||||
Width and the theoretical maximum IPC is an indicator of a performance
|
||||
bottleneck caused by the lack of hardware resources. In general, the lower the
|
||||
Block RThroughput, the better.
|
||||
|
||||
In this example, ``Instructions per iteration/Block RThroughput`` is 1.50. Since
|
||||
there are no loop-carried dependencies, the observed IPC is expected to approach
|
||||
1.50 when the number of iterations tends to infinity. The delta between the
|
||||
Dispatch Width (2.00), and the theoretical maximum IPC (1.50) is an indicator of
|
||||
a performance bottleneck caused by the lack of hardware resources, and the
|
||||
*Resource pressure view* can help to identify the problematic resource usage.
|
||||
|
||||
The second section of the report shows the latency and reciprocal
|
||||
throughput of every instruction in the sequence. That section also reports
|
||||
@ -316,7 +336,7 @@ pressure should be uniformly distributed between multiple resources.
|
||||
|
||||
Timeline View
|
||||
^^^^^^^^^^^^^
|
||||
MCA's timeline view produces a detailed report of each instruction's state
|
||||
The timeline view produces a detailed report of each instruction's state
|
||||
transitions through an instruction pipeline. This view is enabled by the
|
||||
command line option ``-timeline``. As instructions transition through the
|
||||
various stages of the pipeline, their states are depicted in the view report.
|
||||
@ -331,7 +351,7 @@ These states are represented by the following characters:
|
||||
|
||||
Below is the timeline view for a subset of the dot-product example located in
|
||||
``test/tools/llvm-mca/X86/BtVer2/dot-product.s`` and processed by
|
||||
MCA using the following command:
|
||||
:program:`llvm-mca` using the following command:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
@ -366,7 +386,7 @@ MCA using the following command:
|
||||
2. 3 5.7 0.0 0.0 vhaddps %xmm3, %xmm3, %xmm4
|
||||
|
||||
The timeline view is interesting because it shows instruction state changes
|
||||
during execution. It also gives an idea of how MCA processes instructions
|
||||
during execution. It also gives an idea of how the tool processes instructions
|
||||
executed on the target, and how their timing information might be calculated.
|
||||
|
||||
The timeline view is structured in two tables. The first table shows
|
||||
@ -411,12 +431,12 @@ Parallelism).
|
||||
In the dot-product example, there are anti-dependencies introduced by
|
||||
instructions from different iterations. However, those dependencies can be
|
||||
removed at register renaming stage (at the cost of allocating register aliases,
|
||||
and therefore consuming temporary registers).
|
||||
and therefore consuming physical registers).
|
||||
|
||||
Table *Average Wait times* helps diagnose performance issues that are caused by
|
||||
the presence of long latency instructions and potentially long data dependencies
|
||||
which may limit the ILP. Note that MCA, by default, assumes at least 1cy
|
||||
between the dispatch event and the issue event.
|
||||
which may limit the ILP. Note that :program:`llvm-mca`, by default, assumes at
|
||||
least 1cy between the dispatch event and the issue event.
|
||||
|
||||
When the performance is limited by data dependencies and/or long latency
|
||||
instructions, the number of cycles spent while in the *ready* state is expected
|
||||
@ -549,3 +569,177 @@ statistics are displayed by using the command option ``-all-stats`` or
|
||||
|
||||
In this example, we can conclude that the IPC is mostly limited by data
|
||||
dependencies, and not by resource pressure.
|
||||
|
||||
Instruction Flow
|
||||
^^^^^^^^^^^^^^^^
|
||||
This section describes the instruction flow through MCA's default out-of-order
|
||||
pipeline, as well as the functional units involved in the process.
|
||||
|
||||
The default pipeline implements the following sequence of stages used to
|
||||
process instructions.
|
||||
|
||||
* Dispatch (Instruction is dispatched to the schedulers).
|
||||
* Issue (Instruction is issued to the processor pipelines).
|
||||
* Write Back (Instruction is executed, and results are written back).
|
||||
* Retire (Instruction is retired; writes are architecturally committed).
|
||||
|
||||
The default pipeline only models the out-of-order portion of a processor.
|
||||
Therefore, the instruction fetch and decode stages are not modeled. Performance
|
||||
bottlenecks in the frontend are not diagnosed. MCA assumes that instructions
|
||||
have all been decoded and placed into a queue. Also, MCA does not model branch
|
||||
prediction.
|
||||
|
||||
Instruction Dispatch
|
||||
""""""""""""""""""""
|
||||
During the dispatch stage, instructions are picked in program order from a
|
||||
queue of already decoded instructions, and dispatched in groups to the
|
||||
simulated hardware schedulers.
|
||||
|
||||
The size of a dispatch group depends on the availability of the simulated
|
||||
hardware resources. The processor dispatch width defaults to the value
|
||||
of the ``IssueWidth`` in LLVM's scheduling model.
|
||||
|
||||
An instruction can be dispatched if:
|
||||
|
||||
* The size of the dispatch group is smaller than processor's dispatch width.
|
||||
* There are enough entries in the reorder buffer.
|
||||
* There are enough physical registers to do register renaming.
|
||||
* The schedulers are not full.
|
||||
|
||||
Scheduling models can optionally specify which register files are available on
|
||||
the processor. MCA uses that information to initialize register file
|
||||
descriptors. Users can limit the number of physical registers that are
|
||||
globally available for register renaming by using the command option
|
||||
``-register-file-size``. A value of zero for this option means *unbounded*.
|
||||
By knowing how many registers are available for renaming, MCA can predict
|
||||
dispatch stalls caused by the lack of registers.
|
||||
|
||||
The number of reorder buffer entries consumed by an instruction depends on the
|
||||
number of micro-opcodes specified by the target scheduling model. MCA's
|
||||
reorder buffer's purpose is to track the progress of instructions that are
|
||||
"in-flight," and to retire instructions in program order. The number of
|
||||
entries in the reorder buffer defaults to the `MicroOpBufferSize` provided by
|
||||
the target scheduling model.
|
||||
|
||||
Instructions that are dispatched to the schedulers consume scheduler buffer
|
||||
entries. :program:`llvm-mca` queries the scheduling model to determine the set
|
||||
of buffered resources consumed by an instruction. Buffered resources are
|
||||
treated like scheduler resources.
|
||||
|
||||
Instruction Issue
|
||||
"""""""""""""""""
|
||||
Each processor scheduler implements a buffer of instructions. An instruction
|
||||
has to wait in the scheduler's buffer until input register operands become
|
||||
available. Only at that point, does the instruction becomes eligible for
|
||||
execution and may be issued (potentially out-of-order) for execution.
|
||||
Instruction latencies are computed by :program:`llvm-mca` with the help of the
|
||||
scheduling model.
|
||||
|
||||
:program:`llvm-mca`'s scheduler is designed to simulate multiple processor
|
||||
schedulers. The scheduler is responsible for tracking data dependencies, and
|
||||
dynamically selecting which processor resources are consumed by instructions.
|
||||
It delegates the management of processor resource units and resource groups to a
|
||||
resource manager. The resource manager is responsible for selecting resource
|
||||
units that are consumed by instructions. For example, if an instruction
|
||||
consumes 1cy of a resource group, the resource manager selects one of the
|
||||
available units from the group; by default, the resource manager uses a
|
||||
round-robin selector to guarantee that resource usage is uniformly distributed
|
||||
between all units of a group.
|
||||
|
||||
:program:`llvm-mca`'s scheduler implements three instruction queues:
|
||||
|
||||
* WaitQueue: a queue of instructions whose operands are not ready.
|
||||
* ReadyQueue: a queue of instructions ready to execute.
|
||||
* IssuedQueue: a queue of instructions executing.
|
||||
|
||||
Depending on the operand availability, instructions that are dispatched to the
|
||||
scheduler are either placed into the WaitQueue or into the ReadyQueue.
|
||||
|
||||
Every cycle, the scheduler checks if instructions can be moved from the
|
||||
WaitQueue to the ReadyQueue, and if instructions from the ReadyQueue can be
|
||||
issued to the underlying pipelines. The algorithm prioritizes older instructions
|
||||
over younger instructions.
|
||||
|
||||
Write-Back and Retire Stage
|
||||
"""""""""""""""""""""""""""
|
||||
Issued instructions are moved from the ReadyQueue to the IssuedQueue. There,
|
||||
instructions wait until they reach the write-back stage. At that point, they
|
||||
get removed from the queue and the retire control unit is notified.
|
||||
|
||||
When instructions are executed, the retire control unit flags the
|
||||
instruction as "ready to retire."
|
||||
|
||||
Instructions are retired in program order. The register file is notified of
|
||||
the retirement so that it can free the physical registers that were allocated
|
||||
for the instruction during the register renaming stage.
|
||||
|
||||
Load/Store Unit and Memory Consistency Model
|
||||
""""""""""""""""""""""""""""""""""""""""""""
|
||||
To simulate an out-of-order execution of memory operations, :program:`llvm-mca`
|
||||
utilizes a simulated load/store unit (LSUnit) to simulate the speculative
|
||||
execution of loads and stores.
|
||||
|
||||
Each load (or store) consumes an entry in the load (or store) queue. Users can
|
||||
specify flags ``-lqueue`` and ``-squeue`` to limit the number of entries in the
|
||||
load and store queues respectively. The queues are unbounded by default.
|
||||
|
||||
The LSUnit implements a relaxed consistency model for memory loads and stores.
|
||||
The rules are:
|
||||
|
||||
1. A younger load is allowed to pass an older load only if there are no
|
||||
intervening stores or barriers between the two loads.
|
||||
2. A younger load is allowed to pass an older store provided that the load does
|
||||
not alias with the store.
|
||||
3. A younger store is not allowed to pass an older store.
|
||||
4. A younger store is not allowed to pass an older load.
|
||||
|
||||
By default, the LSUnit optimistically assumes that loads do not alias
|
||||
(`-noalias=true`) store operations. Under this assumption, younger loads are
|
||||
always allowed to pass older stores. Essentially, the LSUnit does not attempt
|
||||
to run any alias analysis to predict when loads and stores do not alias with
|
||||
each other.
|
||||
|
||||
Note that, in the case of write-combining memory, rule 3 could be relaxed to
|
||||
allow reordering of non-aliasing store operations. That being said, at the
|
||||
moment, there is no way to further relax the memory model (``-noalias`` is the
|
||||
only option). Essentially, there is no option to specify a different memory
|
||||
type (e.g., write-back, write-combining, write-through; etc.) and consequently
|
||||
to weaken, or strengthen, the memory model.
|
||||
|
||||
Other limitations are:
|
||||
|
||||
* The LSUnit does not know when store-to-load forwarding may occur.
|
||||
* The LSUnit does not know anything about cache hierarchy and memory types.
|
||||
* The LSUnit does not know how to identify serializing operations and memory
|
||||
fences.
|
||||
|
||||
The LSUnit does not attempt to predict if a load or store hits or misses the L1
|
||||
cache. It only knows if an instruction "MayLoad" and/or "MayStore." For
|
||||
loads, the scheduling model provides an "optimistic" load-to-use latency (which
|
||||
usually matches the load-to-use latency for when there is a hit in the L1D).
|
||||
|
||||
:program:`llvm-mca` does not know about serializing operations or memory-barrier
|
||||
like instructions. The LSUnit conservatively assumes that an instruction which
|
||||
has both "MayLoad" and unmodeled side effects behaves like a "soft"
|
||||
load-barrier. That means, it serializes loads without forcing a flush of the
|
||||
load queue. Similarly, instructions that "MayStore" and have unmodeled side
|
||||
effects are treated like store barriers. A full memory barrier is a "MayLoad"
|
||||
and "MayStore" instruction with unmodeled side effects. This is inaccurate, but
|
||||
it is the best that we can do at the moment with the current information
|
||||
available in LLVM.
|
||||
|
||||
A load/store barrier consumes one entry of the load/store queue. A load/store
|
||||
barrier enforces ordering of loads/stores. A younger load cannot pass a load
|
||||
barrier. Also, a younger store cannot pass a store barrier. A younger load
|
||||
has to wait for the memory/load barrier to execute. A load/store barrier is
|
||||
"executed" when it becomes the oldest entry in the load/store queue(s). That
|
||||
also means, by construction, all of the older loads/stores have been executed.
|
||||
|
||||
In conclusion, the full set of load/store consistency rules are:
|
||||
|
||||
#. A store may not pass a previous store.
|
||||
#. A store may not pass a previous load (regardless of ``-noalias``).
|
||||
#. A store has to wait until an older store barrier is fully executed.
|
||||
#. A load may pass a previous load.
|
||||
#. A load may not pass a previous store unless ``-noalias`` is set.
|
||||
#. A load has to wait until an older load barrier is fully executed.
|
||||
|
@ -838,7 +838,7 @@ To configure LLVM, follow these steps:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
% cmake -G "Unix Makefiles" -DCMAKE_INSTALL_PREFIX=prefix=/install/path
|
||||
% cmake -G "Unix Makefiles" -DCMAKE_INSTALL_PREFIX=/install/path
|
||||
[other options] SRC_ROOT
|
||||
|
||||
Compiling the LLVM Suite Source Code
|
||||
|
@ -4588,9 +4588,12 @@ DIExpression
|
||||
``DIExpression`` nodes represent expressions that are inspired by the DWARF
|
||||
expression language. They are used in :ref:`debug intrinsics<dbg_intrinsics>`
|
||||
(such as ``llvm.dbg.declare`` and ``llvm.dbg.value``) to describe how the
|
||||
referenced LLVM variable relates to the source language variable.
|
||||
referenced LLVM variable relates to the source language variable. Debug
|
||||
intrinsics are interpreted left-to-right: start by pushing the value/address
|
||||
operand of the intrinsic onto a stack, then repeatedly push and evaluate
|
||||
opcodes from the DIExpression until the final variable description is produced.
|
||||
|
||||
The current supported vocabulary is limited:
|
||||
The current supported opcode vocabulary is limited:
|
||||
|
||||
- ``DW_OP_deref`` dereferences the top of the expression stack.
|
||||
- ``DW_OP_plus`` pops the last two entries from the expression stack, adds
|
||||
@ -4610,12 +4613,30 @@ The current supported vocabulary is limited:
|
||||
- ``DW_OP_stack_value`` marks a constant value.
|
||||
|
||||
DWARF specifies three kinds of simple location descriptions: Register, memory,
|
||||
and implicit location descriptions. Register and memory location descriptions
|
||||
describe the *location* of a source variable (in the sense that a debugger might
|
||||
modify its value), whereas implicit locations describe merely the *value* of a
|
||||
source variable. DIExpressions also follow this model: A DIExpression that
|
||||
doesn't have a trailing ``DW_OP_stack_value`` will describe an *address* when
|
||||
combined with a concrete location.
|
||||
and implicit location descriptions. Note that a location description is
|
||||
defined over certain ranges of a program, i.e the location of a variable may
|
||||
change over the course of the program. Register and memory location
|
||||
descriptions describe the *concrete location* of a source variable (in the
|
||||
sense that a debugger might modify its value), whereas *implicit locations*
|
||||
describe merely the actual *value* of a source variable which might not exist
|
||||
in registers or in memory (see ``DW_OP_stack_value``).
|
||||
|
||||
A ``llvm.dbg.addr`` or ``llvm.dbg.declare`` intrinsic describes an indirect
|
||||
value (the address) of a source variable. The first operand of the intrinsic
|
||||
must be an address of some kind. A DIExpression attached to the intrinsic
|
||||
refines this address to produce a concrete location for the source variable.
|
||||
|
||||
A ``llvm.dbg.value`` intrinsic describes the direct value of a source variable.
|
||||
The first operand of the intrinsic may be a direct or indirect value. A
|
||||
DIExpresion attached to the intrinsic refines the first operand to produce a
|
||||
direct value. For example, if the first operand is an indirect value, it may be
|
||||
necessary to insert ``DW_OP_deref`` into the DIExpresion in order to produce a
|
||||
valid debug intrinsic.
|
||||
|
||||
.. note::
|
||||
|
||||
A DIExpression is interpreted in the same way regardless of which kind of
|
||||
debug intrinsic it's attached to.
|
||||
|
||||
.. code-block:: text
|
||||
|
||||
|
@ -244,6 +244,11 @@ argument is a `local variable <LangRef.html#dilocalvariable>`_ containing a
|
||||
description of the variable. The third argument is a `complex expression
|
||||
<LangRef.html#diexpression>`_.
|
||||
|
||||
An `llvm.dbg.value` intrinsic describes the *value* of a source variable
|
||||
directly, not its address. Note that the value operand of this intrinsic may
|
||||
be indirect (i.e, a pointer to the source variable), provided that interpreting
|
||||
the complex expression derives the direct value.
|
||||
|
||||
Object lifetimes and scoping
|
||||
============================
|
||||
|
||||
|
@ -17,7 +17,7 @@
|
||||
#include "llvm/ADT/DenseMap.h"
|
||||
#include "llvm/ADT/DenseMapInfo.h"
|
||||
#include "llvm/Support/type_traits.h"
|
||||
#include <algorithm>
|
||||
#include <algorithm>
|
||||
#include <cstddef>
|
||||
#include <initializer_list>
|
||||
#include <iterator>
|
||||
|
@ -43,6 +43,7 @@ class LoopInfo;
|
||||
class PHINode;
|
||||
class SelectInst;
|
||||
class TargetLibraryInfo;
|
||||
class PhiValues;
|
||||
class Value;
|
||||
|
||||
/// This is the AA result object for the basic, local, and stateless alias
|
||||
@ -60,19 +61,22 @@ class BasicAAResult : public AAResultBase<BasicAAResult> {
|
||||
AssumptionCache &AC;
|
||||
DominatorTree *DT;
|
||||
LoopInfo *LI;
|
||||
PhiValues *PV;
|
||||
|
||||
public:
|
||||
BasicAAResult(const DataLayout &DL, const Function &F,
|
||||
const TargetLibraryInfo &TLI, AssumptionCache &AC,
|
||||
DominatorTree *DT = nullptr, LoopInfo *LI = nullptr)
|
||||
: AAResultBase(), DL(DL), F(F), TLI(TLI), AC(AC), DT(DT), LI(LI) {}
|
||||
DominatorTree *DT = nullptr, LoopInfo *LI = nullptr,
|
||||
PhiValues *PV = nullptr)
|
||||
: AAResultBase(), DL(DL), F(F), TLI(TLI), AC(AC), DT(DT), LI(LI), PV(PV)
|
||||
{}
|
||||
|
||||
BasicAAResult(const BasicAAResult &Arg)
|
||||
: AAResultBase(Arg), DL(Arg.DL), F(Arg.F), TLI(Arg.TLI), AC(Arg.AC),
|
||||
DT(Arg.DT), LI(Arg.LI) {}
|
||||
DT(Arg.DT), LI(Arg.LI), PV(Arg.PV) {}
|
||||
BasicAAResult(BasicAAResult &&Arg)
|
||||
: AAResultBase(std::move(Arg)), DL(Arg.DL), F(Arg.F), TLI(Arg.TLI),
|
||||
AC(Arg.AC), DT(Arg.DT), LI(Arg.LI) {}
|
||||
AC(Arg.AC), DT(Arg.DT), LI(Arg.LI), PV(Arg.PV) {}
|
||||
|
||||
/// Handle invalidation events in the new pass manager.
|
||||
bool invalidate(Function &Fn, const PreservedAnalyses &PA,
|
||||
|
@ -682,7 +682,7 @@ bool sortPtrAccesses(ArrayRef<Value *> VL, const DataLayout &DL,
|
||||
SmallVectorImpl<unsigned> &SortedIndices);
|
||||
|
||||
/// Returns true if the memory operations \p A and \p B are consecutive.
|
||||
/// This is a simple API that does not depend on the analysis pass.
|
||||
/// This is a simple API that does not depend on the analysis pass.
|
||||
bool isConsecutiveAccess(Value *A, Value *B, const DataLayout &DL,
|
||||
ScalarEvolution &SE, bool CheckType = true);
|
||||
|
||||
@ -734,7 +734,7 @@ class LoopAccessLegacyAnalysis : public FunctionPass {
|
||||
/// accesses of a loop.
|
||||
///
|
||||
/// It runs the analysis for a loop on demand. This can be initiated by
|
||||
/// querying the loop access info via AM.getResult<LoopAccessAnalysis>.
|
||||
/// querying the loop access info via AM.getResult<LoopAccessAnalysis>.
|
||||
/// getResult return a LoopAccessInfo object. See this class for the
|
||||
/// specifics of what information is provided.
|
||||
class LoopAccessAnalysis
|
||||
|
@ -44,6 +44,7 @@ class Instruction;
|
||||
class LoadInst;
|
||||
class PHITransAddr;
|
||||
class TargetLibraryInfo;
|
||||
class PhiValues;
|
||||
class Value;
|
||||
|
||||
/// A memory dependence query can return one of three different answers.
|
||||
@ -360,13 +361,14 @@ class MemoryDependenceResults {
|
||||
AssumptionCache &AC;
|
||||
const TargetLibraryInfo &TLI;
|
||||
DominatorTree &DT;
|
||||
PhiValues &PV;
|
||||
PredIteratorCache PredCache;
|
||||
|
||||
public:
|
||||
MemoryDependenceResults(AliasAnalysis &AA, AssumptionCache &AC,
|
||||
const TargetLibraryInfo &TLI,
|
||||
DominatorTree &DT)
|
||||
: AA(AA), AC(AC), TLI(TLI), DT(DT) {}
|
||||
DominatorTree &DT, PhiValues &PV)
|
||||
: AA(AA), AC(AC), TLI(TLI), DT(DT), PV(PV) {}
|
||||
|
||||
/// Handle invalidation in the new PM.
|
||||
bool invalidate(Function &F, const PreservedAnalyses &PA,
|
||||
|
@ -10,7 +10,7 @@
|
||||
/// Contains a collection of routines for determining if a given instruction is
|
||||
/// guaranteed to execute if a given point in control flow is reached. The most
|
||||
/// common example is an instruction within a loop being provably executed if we
|
||||
/// branch to the header of it's containing loop.
|
||||
/// branch to the header of it's containing loop.
|
||||
///
|
||||
//===----------------------------------------------------------------------===//
|
||||
|
||||
@ -58,7 +58,7 @@ void computeLoopSafetyInfo(LoopSafetyInfo *, Loop *);
|
||||
bool isGuaranteedToExecute(const Instruction &Inst, const DominatorTree *DT,
|
||||
const Loop *CurLoop,
|
||||
const LoopSafetyInfo *SafetyInfo);
|
||||
|
||||
|
||||
}
|
||||
|
||||
#endif
|
||||
|
@ -326,7 +326,7 @@ class TargetTransformInfoImplBase {
|
||||
bool haveFastSqrt(Type *Ty) { return false; }
|
||||
|
||||
bool isFCmpOrdCheaperThanFCmpZero(Type *Ty) { return true; }
|
||||
|
||||
|
||||
unsigned getFPOpCost(Type *Ty) { return TargetTransformInfo::TCC_Basic; }
|
||||
|
||||
int getIntImmCodeSizeCost(unsigned Opcode, unsigned Idx, const APInt &Imm,
|
||||
|
@ -464,7 +464,7 @@ class Value;
|
||||
/// This is equivelent to saying that all instructions within the basic block
|
||||
/// are guaranteed to transfer execution to their successor within the basic
|
||||
/// block. This has the same assumptions w.r.t. undefined behavior as the
|
||||
/// instruction variant of this function.
|
||||
/// instruction variant of this function.
|
||||
bool isGuaranteedToTransferExecutionToSuccessor(const BasicBlock *BB);
|
||||
|
||||
/// Return true if this function can prove that the instruction I
|
||||
|
@ -856,6 +856,7 @@ HANDLE_DW_UT(0x06, split_type)
|
||||
// TODO: Add Mach-O and COFF names.
|
||||
// Official DWARF sections.
|
||||
HANDLE_DWARF_SECTION(DebugAbbrev, ".debug_abbrev", "debug-abbrev")
|
||||
HANDLE_DWARF_SECTION(DebugAddr, ".debug_addr", "debug-addr")
|
||||
HANDLE_DWARF_SECTION(DebugAranges, ".debug_aranges", "debug-aranges")
|
||||
HANDLE_DWARF_SECTION(DebugInfo, ".debug_info", "debug-info")
|
||||
HANDLE_DWARF_SECTION(DebugTypes, ".debug_types", "debug-types")
|
||||
|
@ -413,8 +413,10 @@ enum {
|
||||
|
||||
// ARM Specific e_flags
|
||||
enum : unsigned {
|
||||
EF_ARM_SOFT_FLOAT = 0x00000200U,
|
||||
EF_ARM_VFP_FLOAT = 0x00000400U,
|
||||
EF_ARM_SOFT_FLOAT = 0x00000200U, // Legacy pre EABI_VER5
|
||||
EF_ARM_ABI_FLOAT_SOFT = 0x00000200U, // EABI_VER5
|
||||
EF_ARM_VFP_FLOAT = 0x00000400U, // Legacy pre EABI_VER5
|
||||
EF_ARM_ABI_FLOAT_HARD = 0x00000400U, // EABI_VER5
|
||||
EF_ARM_EABI_UNKNOWN = 0x00000000U,
|
||||
EF_ARM_EABI_VER1 = 0x01000000U,
|
||||
EF_ARM_EABI_VER2 = 0x02000000U,
|
||||
|
@ -104,12 +104,12 @@ class GCStrategy {
|
||||
const std::string &getName() const { return Name; }
|
||||
|
||||
/// By default, write barriers are replaced with simple store
|
||||
/// instructions. If true, you must provide a custom pass to lower
|
||||
/// instructions. If true, you must provide a custom pass to lower
|
||||
/// calls to \@llvm.gcwrite.
|
||||
bool customWriteBarrier() const { return CustomWriteBarriers; }
|
||||
|
||||
/// By default, read barriers are replaced with simple load
|
||||
/// instructions. If true, you must provide a custom pass to lower
|
||||
/// instructions. If true, you must provide a custom pass to lower
|
||||
/// calls to \@llvm.gcread.
|
||||
bool customReadBarrier() const { return CustomReadBarriers; }
|
||||
|
||||
@ -146,7 +146,7 @@ class GCStrategy {
|
||||
}
|
||||
|
||||
/// By default, roots are left for the code generator so it can generate a
|
||||
/// stack map. If true, you must provide a custom pass to lower
|
||||
/// stack map. If true, you must provide a custom pass to lower
|
||||
/// calls to \@llvm.gcroot.
|
||||
bool customRoots() const { return CustomRoots; }
|
||||
|
||||
|
@ -786,7 +786,7 @@ class LegalizerInfo {
|
||||
/// setAction ({G_ADD, 0, LLT::scalar(32)}, Legal);
|
||||
/// setLegalizeScalarToDifferentSizeStrategy(
|
||||
/// G_ADD, 0, widenToLargerTypesAndNarrowToLargest);
|
||||
/// will end up defining getAction({G_ADD, 0, T}) to return the following
|
||||
/// will end up defining getAction({G_ADD, 0, T}) to return the following
|
||||
/// actions for different scalar types T:
|
||||
/// LLT::scalar(1)..LLT::scalar(31): {WidenScalar, 0, LLT::scalar(32)}
|
||||
/// LLT::scalar(32): {Legal, 0, LLT::scalar(32)}
|
||||
@ -814,7 +814,7 @@ class LegalizerInfo {
|
||||
VectorElementSizeChangeStrategies[OpcodeIdx][TypeIdx] = S;
|
||||
}
|
||||
|
||||
/// A SizeChangeStrategy for the common case where legalization for a
|
||||
/// A SizeChangeStrategy for the common case where legalization for a
|
||||
/// particular operation consists of only supporting a specific set of type
|
||||
/// sizes. E.g.
|
||||
/// setAction ({G_DIV, 0, LLT::scalar(32)}, Legal);
|
||||
|
@ -942,6 +942,16 @@ class MachineIRBuilderBase {
|
||||
/// \return a MachineInstrBuilder for the newly created instruction.
|
||||
MachineInstrBuilder buildAtomicRMWUmin(unsigned OldValRes, unsigned Addr,
|
||||
unsigned Val, MachineMemOperand &MMO);
|
||||
|
||||
/// Build and insert \p Res = G_BLOCK_ADDR \p BA
|
||||
///
|
||||
/// G_BLOCK_ADDR computes the address of a basic block.
|
||||
///
|
||||
/// \pre setBasicBlock or setMI must have been called.
|
||||
/// \pre \p Res must be a generic virtual register of a pointer type.
|
||||
///
|
||||
/// \return The newly created instruction.
|
||||
MachineInstrBuilder buildBlockAddress(unsigned Res, const BlockAddress *BA);
|
||||
};
|
||||
|
||||
/// A CRTP class that contains methods for building instructions that can
|
||||
|
@ -27,15 +27,15 @@ namespace llvm {
|
||||
uint32_t r_symbolnum; // symbol index if r_extern == 1 else section index
|
||||
bool r_pcrel; // was relocated pc-relative already
|
||||
uint8_t r_length; // length = 2 ^ r_length
|
||||
bool r_extern; //
|
||||
bool r_extern; //
|
||||
uint8_t r_type; // if not 0, machine-specific relocation type.
|
||||
bool r_scattered; // 1 = scattered, 0 = non-scattered
|
||||
int32_t r_value; // the value the item to be relocated is referring
|
||||
// to.
|
||||
public:
|
||||
public:
|
||||
uint32_t getPackedFields() const {
|
||||
if (r_scattered)
|
||||
return (1 << 31) | (r_pcrel << 30) | ((r_length & 3) << 28) |
|
||||
return (1 << 31) | (r_pcrel << 30) | ((r_length & 3) << 28) |
|
||||
((r_type & 15) << 24) | (r_address & 0x00FFFFFF);
|
||||
else
|
||||
return (r_symbolnum << 8) | (r_pcrel << 7) | ((r_length & 3) << 5) |
|
||||
@ -45,8 +45,8 @@ namespace llvm {
|
||||
uint32_t getRawAddress() const { return r_address; }
|
||||
|
||||
MachORelocation(uint32_t addr, uint32_t index, bool pcrel, uint8_t len,
|
||||
bool ext, uint8_t type, bool scattered = false,
|
||||
int32_t value = 0) :
|
||||
bool ext, uint8_t type, bool scattered = false,
|
||||
int32_t value = 0) :
|
||||
r_address(addr), r_symbolnum(index), r_pcrel(pcrel), r_length(len),
|
||||
r_extern(ext), r_type(type), r_scattered(scattered), r_value(value) {}
|
||||
};
|
||||
|
@ -105,7 +105,7 @@ class MachineModuleInfo : public ImmutablePass {
|
||||
/// basic block's address of label.
|
||||
MMIAddrLabelMap *AddrLabelSymbols;
|
||||
|
||||
// TODO: Ideally, what we'd like is to have a switch that allows emitting
|
||||
// TODO: Ideally, what we'd like is to have a switch that allows emitting
|
||||
// synchronous (precise at call-sites only) CFA into .eh_frame. However,
|
||||
// even under this switch, we'd like .debug_frame to be precise when using
|
||||
// -g. At this moment, there's no way to specify that some CFI directives
|
||||
|
@ -19,6 +19,7 @@
|
||||
#include "llvm/CodeGen/LiveRegUnits.h"
|
||||
#include "llvm/CodeGen/MachineFunction.h"
|
||||
#include "llvm/CodeGen/TargetRegisterInfo.h"
|
||||
#include "llvm/CodeGen/LivePhysRegs.h"
|
||||
|
||||
namespace llvm {
|
||||
namespace outliner {
|
||||
@ -74,6 +75,13 @@ struct Candidate {
|
||||
/// cost model information.
|
||||
LiveRegUnits LRU;
|
||||
|
||||
/// Contains the accumulated register liveness information for the
|
||||
/// instructions in this \p Candidate.
|
||||
///
|
||||
/// This is optionally used by the target to determine which registers have
|
||||
/// been used across the sequence.
|
||||
LiveRegUnits UsedInSequence;
|
||||
|
||||
/// Return the number of instructions in this Candidate.
|
||||
unsigned getLength() const { return Len; }
|
||||
|
||||
@ -137,6 +145,12 @@ struct Candidate {
|
||||
// outlining candidate.
|
||||
std::for_each(MBB->rbegin(), (MachineBasicBlock::reverse_iterator)front(),
|
||||
[this](MachineInstr &MI) { LRU.stepBackward(MI); });
|
||||
|
||||
// Walk over the sequence itself and figure out which registers were used
|
||||
// in the sequence.
|
||||
UsedInSequence.init(TRI);
|
||||
std::for_each(front(), std::next(back()),
|
||||
[this](MachineInstr &MI) { UsedInSequence.accumulate(MI); });
|
||||
}
|
||||
};
|
||||
|
||||
|
@ -252,7 +252,7 @@ class TargetRegisterInfo;
|
||||
MachineInstr *Instr = nullptr; ///< Alternatively, a MachineInstr.
|
||||
|
||||
public:
|
||||
SUnit *OrigNode = nullptr; ///< If not this, the node from which this node
|
||||
SUnit *OrigNode = nullptr; ///< If not this, the node from which this node
|
||||
/// was cloned. (SD scheduling only)
|
||||
|
||||
const MCSchedClassDesc *SchedClass =
|
||||
|
@ -156,7 +156,7 @@ class StatepointOpers {
|
||||
// TODO:: we should change the STATEPOINT representation so that CC and
|
||||
// Flags should be part of meta operands, with args and deopt operands, and
|
||||
// gc operands all prefixed by their length and a type code. This would be
|
||||
// much more consistent.
|
||||
// much more consistent.
|
||||
public:
|
||||
// These values are aboolute offsets into the operands of the statepoint
|
||||
// instruction.
|
||||
|
@ -718,7 +718,7 @@ class TargetLoweringBase {
|
||||
/// always broken down into scalars in some contexts. This occurs even if the
|
||||
/// vector type is legal.
|
||||
virtual unsigned getVectorTypeBreakdownForCallingConv(
|
||||
LLVMContext &Context, EVT VT, EVT &IntermediateVT,
|
||||
LLVMContext &Context, CallingConv::ID CC, EVT VT, EVT &IntermediateVT,
|
||||
unsigned &NumIntermediates, MVT &RegisterVT) const {
|
||||
return getVectorTypeBreakdown(Context, VT, IntermediateVT, NumIntermediates,
|
||||
RegisterVT);
|
||||
@ -1174,7 +1174,7 @@ class TargetLoweringBase {
|
||||
/// are legal for some operations and not for other operations.
|
||||
/// For MIPS all vector types must be passed through the integer register set.
|
||||
virtual MVT getRegisterTypeForCallingConv(LLVMContext &Context,
|
||||
EVT VT) const {
|
||||
CallingConv::ID CC, EVT VT) const {
|
||||
return getRegisterType(Context, VT);
|
||||
}
|
||||
|
||||
@ -1182,6 +1182,7 @@ class TargetLoweringBase {
|
||||
/// this occurs when a vector type is used, as vector are passed through the
|
||||
/// integer register set.
|
||||
virtual unsigned getNumRegistersForCallingConv(LLVMContext &Context,
|
||||
CallingConv::ID CC,
|
||||
EVT VT) const {
|
||||
return getNumRegisters(Context, VT);
|
||||
}
|
||||
@ -3489,10 +3490,10 @@ class TargetLowering : public TargetLoweringBase {
|
||||
//
|
||||
SDValue BuildSDIV(SDNode *N, const APInt &Divisor, SelectionDAG &DAG,
|
||||
bool IsAfterLegalization,
|
||||
std::vector<SDNode *> *Created) const;
|
||||
SmallVectorImpl<SDNode *> &Created) const;
|
||||
SDValue BuildUDIV(SDNode *N, const APInt &Divisor, SelectionDAG &DAG,
|
||||
bool IsAfterLegalization,
|
||||
std::vector<SDNode *> *Created) const;
|
||||
SmallVectorImpl<SDNode *> &Created) const;
|
||||
|
||||
/// Targets may override this function to provide custom SDIV lowering for
|
||||
/// power-of-2 denominators. If the target returns an empty SDValue, LLVM
|
||||
@ -3500,7 +3501,7 @@ class TargetLowering : public TargetLoweringBase {
|
||||
/// operations.
|
||||
virtual SDValue BuildSDIVPow2(SDNode *N, const APInt &Divisor,
|
||||
SelectionDAG &DAG,
|
||||
std::vector<SDNode *> *Created) const;
|
||||
SmallVectorImpl<SDNode *> &Created) const;
|
||||
|
||||
/// Indicate whether this target prefers to combine FDIVs with the same
|
||||
/// divisor. If the transform should never be done, return zero. If the
|
||||
@ -3690,7 +3691,7 @@ class TargetLowering : public TargetLoweringBase {
|
||||
/// Given an LLVM IR type and return type attributes, compute the return value
|
||||
/// EVTs and flags, and optionally also the offsets, if the return value is
|
||||
/// being lowered to memory.
|
||||
void GetReturnInfo(Type *ReturnType, AttributeList attr,
|
||||
void GetReturnInfo(CallingConv::ID CC, Type *ReturnType, AttributeList attr,
|
||||
SmallVectorImpl<ISD::OutputArg> &Outs,
|
||||
const TargetLowering &TLI, const DataLayout &DL);
|
||||
|
||||
|
@ -16,7 +16,7 @@
|
||||
|
||||
#include "llvm/Pass.h"
|
||||
#include "llvm/Support/CodeGen.h"
|
||||
#include <cassert>
|
||||
#include <cassert>
|
||||
#include <string>
|
||||
|
||||
namespace llvm {
|
||||
|
@ -456,7 +456,7 @@ class TargetRegisterInfo : public MCRegisterInfo {
|
||||
/// stack frame offset. The first register is closest to the incoming stack
|
||||
/// pointer if stack grows down, and vice versa.
|
||||
/// Notice: This function does not take into account disabled CSRs.
|
||||
/// In most cases you will want to use instead the function
|
||||
/// In most cases you will want to use instead the function
|
||||
/// getCalleeSavedRegs that is implemented in MachineRegisterInfo.
|
||||
virtual const MCPhysReg*
|
||||
getCalleeSavedRegs(const MachineFunction *MF) const = 0;
|
||||
@ -518,7 +518,7 @@ class TargetRegisterInfo : public MCRegisterInfo {
|
||||
/// guaranteed to be restored before any uses. This is useful for targets that
|
||||
/// have call sequences where a GOT register may be updated by the caller
|
||||
/// prior to a call and is guaranteed to be restored (also by the caller)
|
||||
/// after the call.
|
||||
/// after the call.
|
||||
virtual bool isCallerPreservedPhysReg(unsigned PhysReg,
|
||||
const MachineFunction &MF) const {
|
||||
return false;
|
||||
|
@ -143,7 +143,6 @@ CV_SYMBOL(S_MANSLOT , 0x1120)
|
||||
CV_SYMBOL(S_MANMANYREG , 0x1121)
|
||||
CV_SYMBOL(S_MANREGREL , 0x1122)
|
||||
CV_SYMBOL(S_MANMANYREG2 , 0x1123)
|
||||
CV_SYMBOL(S_UNAMESPACE , 0x1124)
|
||||
CV_SYMBOL(S_DATAREF , 0x1126)
|
||||
CV_SYMBOL(S_ANNOTATIONREF , 0x1128)
|
||||
CV_SYMBOL(S_TOKENREF , 0x1129)
|
||||
@ -255,6 +254,7 @@ SYMBOL_RECORD_ALIAS(S_GMANDATA , 0x111d, ManagedGlobalData, DataSym)
|
||||
SYMBOL_RECORD(S_LTHREAD32 , 0x1112, ThreadLocalDataSym)
|
||||
SYMBOL_RECORD_ALIAS(S_GTHREAD32 , 0x1113, GlobalTLS, ThreadLocalDataSym)
|
||||
|
||||
SYMBOL_RECORD(S_UNAMESPACE , 0x1124, UsingNamespaceSym)
|
||||
|
||||
#undef CV_SYMBOL
|
||||
#undef SYMBOL_RECORD
|
||||
|
@ -942,6 +942,19 @@ class ThreadLocalDataSym : public SymbolRecord {
|
||||
uint32_t RecordOffset;
|
||||
};
|
||||
|
||||
// S_UNAMESPACE
|
||||
class UsingNamespaceSym : public SymbolRecord {
|
||||
public:
|
||||
explicit UsingNamespaceSym(SymbolRecordKind Kind) : SymbolRecord(Kind) {}
|
||||
explicit UsingNamespaceSym(uint32_t RecordOffset)
|
||||
: SymbolRecord(SymbolRecordKind::RegRelativeSym),
|
||||
RecordOffset(RecordOffset) {}
|
||||
|
||||
StringRef Name;
|
||||
|
||||
uint32_t RecordOffset;
|
||||
};
|
||||
|
||||
// S_ANNOTATION
|
||||
|
||||
using CVSymbol = CVRecord<SymbolKind>;
|
||||
|
@ -154,6 +154,8 @@ enum DIDumpType : unsigned {
|
||||
struct DIDumpOptions {
|
||||
unsigned DumpType = DIDT_All;
|
||||
unsigned RecurseDepth = -1U;
|
||||
uint16_t Version = 0; // DWARF version to assume when extracting.
|
||||
uint8_t AddrSize = 4; // Address byte size to assume when extracting.
|
||||
bool ShowAddresses = true;
|
||||
bool ShowChildren = false;
|
||||
bool ShowParents = false;
|
||||
|
@ -323,6 +323,10 @@ class DWARFContext : public DIContext {
|
||||
/// have initialized the relevant target descriptions.
|
||||
Error loadRegisterInfo(const object::ObjectFile &Obj);
|
||||
|
||||
/// Get address size from CUs.
|
||||
/// TODO: refactor compile_units() to make this const.
|
||||
uint8_t getCUAddrSize();
|
||||
|
||||
private:
|
||||
/// Return the compile unit which contains instruction with provided
|
||||
/// address.
|
||||
|
@ -51,6 +51,8 @@ class DWARFDataExtractor : public DataExtractor {
|
||||
/// reflect the absolute address of this pointer.
|
||||
Optional<uint64_t> getEncodedPointer(uint32_t *Offset, uint8_t Encoding,
|
||||
uint64_t AbsPosOffset = 0) const;
|
||||
|
||||
size_t size() const { return Section == nullptr ? 0 : Section->Data.size(); }
|
||||
};
|
||||
|
||||
} // end namespace llvm
|
||||
|
98
include/llvm/DebugInfo/DWARF/DWARFDebugAddr.h
Normal file
98
include/llvm/DebugInfo/DWARF/DWARFDebugAddr.h
Normal file
@ -0,0 +1,98 @@
|
||||
//===- DWARFDebugAddr.h -------------------------------------*- C++ -*-===//
|
||||
//
|
||||
// The LLVM Compiler Infrastructure
|
||||
//
|
||||
// This file is distributed under the University of Illinois Open Source
|
||||
// License. See LICENSE.TXT for details.
|
||||
//
|
||||
//===------------------------------------------------------------------===//
|
||||
|
||||
#ifndef LLVM_DEBUGINFO_DWARFDEBUGADDR_H
|
||||
#define LLVM_DEBUGINFO_DWARFDEBUGADDR_H
|
||||
|
||||
#include "llvm/BinaryFormat/Dwarf.h"
|
||||
#include "llvm/DebugInfo/DIContext.h"
|
||||
#include "llvm/DebugInfo/DWARF/DWARFDataExtractor.h"
|
||||
#include "llvm/Support/Errc.h"
|
||||
#include "llvm/Support/Error.h"
|
||||
#include <cstdint>
|
||||
#include <map>
|
||||
#include <vector>
|
||||
|
||||
namespace llvm {
|
||||
|
||||
class Error;
|
||||
class raw_ostream;
|
||||
|
||||
/// A class representing an address table as specified in DWARF v5.
|
||||
/// The table consists of a header followed by an array of address values from
|
||||
/// .debug_addr section.
|
||||
class DWARFDebugAddrTable {
|
||||
public:
|
||||
struct Header {
|
||||
/// The total length of the entries for this table, not including the length
|
||||
/// field itself.
|
||||
uint32_t Length = 0;
|
||||
/// The DWARF version number.
|
||||
uint16_t Version = 5;
|
||||
/// The size in bytes of an address on the target architecture. For
|
||||
/// segmented addressing, this is the size of the offset portion of the
|
||||
/// address.
|
||||
uint8_t AddrSize;
|
||||
/// The size in bytes of a segment selector on the target architecture.
|
||||
/// If the target system uses a flat address space, this value is 0.
|
||||
uint8_t SegSize = 0;
|
||||
};
|
||||
|
||||
private:
|
||||
dwarf::DwarfFormat Format;
|
||||
uint32_t HeaderOffset;
|
||||
Header HeaderData;
|
||||
uint32_t DataSize = 0;
|
||||
std::vector<uint64_t> Addrs;
|
||||
|
||||
public:
|
||||
void clear();
|
||||
|
||||
/// Extract an entire table, including all addresses.
|
||||
Error extract(DWARFDataExtractor Data, uint32_t *OffsetPtr,
|
||||
uint16_t Version, uint8_t AddrSize,
|
||||
std::function<void(Error)> WarnCallback);
|
||||
|
||||
uint32_t getHeaderOffset() const { return HeaderOffset; }
|
||||
uint8_t getAddrSize() const { return HeaderData.AddrSize; }
|
||||
void dump(raw_ostream &OS, DIDumpOptions DumpOpts = {}) const;
|
||||
|
||||
/// Return the address based on a given index.
|
||||
Expected<uint64_t> getAddrEntry(uint32_t Index) const;
|
||||
|
||||
/// Return the size of the table header including the length
|
||||
/// but not including the addresses.
|
||||
uint8_t getHeaderSize() const {
|
||||
switch (Format) {
|
||||
case dwarf::DwarfFormat::DWARF32:
|
||||
return 8; // 4 + 2 + 1 + 1
|
||||
case dwarf::DwarfFormat::DWARF64:
|
||||
return 16; // 12 + 2 + 1 + 1
|
||||
}
|
||||
llvm_unreachable("Invalid DWARF format (expected DWARF32 or DWARF64)");
|
||||
}
|
||||
|
||||
/// Returns the length of this table, including the length field, or 0 if the
|
||||
/// length has not been determined (e.g. because the table has not yet been
|
||||
/// parsed, or there was a problem in parsing).
|
||||
uint32_t getLength() const;
|
||||
|
||||
/// Verify that the given length is valid for this table.
|
||||
bool hasValidLength() const { return getLength() != 0; }
|
||||
|
||||
/// Invalidate Length field to stop further processing.
|
||||
void invalidateLength() { HeaderData.Length = 0; }
|
||||
|
||||
/// Returns the length of the array of addresses.
|
||||
uint32_t getDataSize() const;
|
||||
};
|
||||
|
||||
} // end namespace llvm
|
||||
|
||||
#endif // LLVM_DEBUGINFO_DWARFDEBUGADDR_H
|
@ -46,7 +46,7 @@ class DWARFDie {
|
||||
|
||||
public:
|
||||
DWARFDie() = default;
|
||||
DWARFDie(DWARFUnit *Unit, const DWARFDebugInfoEntry * D) : U(Unit), Die(D) {}
|
||||
DWARFDie(DWARFUnit *Unit, const DWARFDebugInfoEntry *D) : U(Unit), Die(D) {}
|
||||
|
||||
bool isValid() const { return U && Die; }
|
||||
explicit operator bool() const { return isValid(); }
|
||||
@ -82,9 +82,7 @@ class DWARFDie {
|
||||
}
|
||||
|
||||
/// Returns true for a valid DIE that terminates a sibling chain.
|
||||
bool isNULL() const {
|
||||
return getAbbreviationDeclarationPtr() == nullptr;
|
||||
}
|
||||
bool isNULL() const { return getAbbreviationDeclarationPtr() == nullptr; }
|
||||
|
||||
/// Returns true if DIE represents a subprogram (not inlined).
|
||||
bool isSubprogramDIE() const;
|
||||
@ -129,7 +127,6 @@ class DWARFDie {
|
||||
void dump(raw_ostream &OS, unsigned indent = 0,
|
||||
DIDumpOptions DumpOpts = DIDumpOptions()) const;
|
||||
|
||||
|
||||
/// Convenience zero-argument overload for debugging.
|
||||
LLVM_DUMP_METHOD void dump() const;
|
||||
|
||||
@ -275,12 +272,16 @@ class DWARFDie {
|
||||
|
||||
iterator begin() const;
|
||||
iterator end() const;
|
||||
|
||||
std::reverse_iterator<iterator> rbegin() const;
|
||||
std::reverse_iterator<iterator> rend() const;
|
||||
|
||||
iterator_range<iterator> children() const;
|
||||
};
|
||||
|
||||
class DWARFDie::attribute_iterator :
|
||||
public iterator_facade_base<attribute_iterator, std::forward_iterator_tag,
|
||||
const DWARFAttribute> {
|
||||
class DWARFDie::attribute_iterator
|
||||
: public iterator_facade_base<attribute_iterator, std::forward_iterator_tag,
|
||||
const DWARFAttribute> {
|
||||
/// The DWARF DIE we are extracting attributes from.
|
||||
DWARFDie Die;
|
||||
/// The value vended to clients via the operator*() or operator->().
|
||||
@ -288,6 +289,9 @@ class DWARFDie::attribute_iterator :
|
||||
/// The attribute index within the abbreviation declaration in Die.
|
||||
uint32_t Index;
|
||||
|
||||
friend bool operator==(const attribute_iterator &LHS,
|
||||
const attribute_iterator &RHS);
|
||||
|
||||
/// Update the attribute index and attempt to read the attribute value. If the
|
||||
/// attribute is able to be read, update AttrValue and the Index member
|
||||
/// variable. If the attribute value is not able to be read, an appropriate
|
||||
@ -303,12 +307,21 @@ class DWARFDie::attribute_iterator :
|
||||
attribute_iterator &operator--();
|
||||
explicit operator bool() const { return AttrValue.isValid(); }
|
||||
const DWARFAttribute &operator*() const { return AttrValue; }
|
||||
bool operator==(const attribute_iterator &X) const { return Index == X.Index; }
|
||||
};
|
||||
|
||||
inline bool operator==(const DWARFDie::attribute_iterator &LHS,
|
||||
const DWARFDie::attribute_iterator &RHS) {
|
||||
return LHS.Index == RHS.Index;
|
||||
}
|
||||
|
||||
inline bool operator!=(const DWARFDie::attribute_iterator &LHS,
|
||||
const DWARFDie::attribute_iterator &RHS) {
|
||||
return !(LHS == RHS);
|
||||
}
|
||||
|
||||
inline bool operator==(const DWARFDie &LHS, const DWARFDie &RHS) {
|
||||
return LHS.getDebugInfoEntry() == RHS.getDebugInfoEntry() &&
|
||||
LHS.getDwarfUnit() == RHS.getDwarfUnit();
|
||||
LHS.getDwarfUnit() == RHS.getDwarfUnit();
|
||||
}
|
||||
|
||||
inline bool operator!=(const DWARFDie &LHS, const DWARFDie &RHS) {
|
||||
@ -323,11 +336,15 @@ class DWARFDie::iterator
|
||||
: public iterator_facade_base<iterator, std::bidirectional_iterator_tag,
|
||||
const DWARFDie> {
|
||||
DWARFDie Die;
|
||||
|
||||
friend std::reverse_iterator<llvm::DWARFDie::iterator>;
|
||||
friend bool operator==(const DWARFDie::iterator &LHS,
|
||||
const DWARFDie::iterator &RHS);
|
||||
|
||||
public:
|
||||
iterator() = default;
|
||||
|
||||
explicit iterator(DWARFDie D) : Die(D) {
|
||||
}
|
||||
explicit iterator(DWARFDie D) : Die(D) {}
|
||||
|
||||
iterator &operator++() {
|
||||
Die = Die.getSibling();
|
||||
@ -339,11 +356,19 @@ class DWARFDie::iterator
|
||||
return *this;
|
||||
}
|
||||
|
||||
explicit operator bool() const { return Die.isValid(); }
|
||||
const DWARFDie &operator*() const { return Die; }
|
||||
bool operator==(const iterator &X) const { return Die == X.Die; }
|
||||
};
|
||||
|
||||
inline bool operator==(const DWARFDie::iterator &LHS,
|
||||
const DWARFDie::iterator &RHS) {
|
||||
return LHS.Die == RHS.Die;
|
||||
}
|
||||
|
||||
inline bool operator!=(const DWARFDie::iterator &LHS,
|
||||
const DWARFDie::iterator &RHS) {
|
||||
return !(LHS == RHS);
|
||||
}
|
||||
|
||||
// These inline functions must follow the DWARFDie::iterator definition above
|
||||
// as they use functions from that class.
|
||||
inline DWARFDie::iterator DWARFDie::begin() const {
|
||||
@ -360,4 +385,80 @@ inline iterator_range<DWARFDie::iterator> DWARFDie::children() const {
|
||||
|
||||
} // end namespace llvm
|
||||
|
||||
namespace std {
|
||||
|
||||
template <>
|
||||
class reverse_iterator<llvm::DWARFDie::iterator>
|
||||
: public llvm::iterator_facade_base<
|
||||
reverse_iterator<llvm::DWARFDie::iterator>,
|
||||
bidirectional_iterator_tag, const llvm::DWARFDie> {
|
||||
|
||||
private:
|
||||
llvm::DWARFDie Die;
|
||||
bool AtEnd;
|
||||
|
||||
public:
|
||||
reverse_iterator(llvm::DWARFDie::iterator It)
|
||||
: Die(It.Die), AtEnd(!It.Die.getPreviousSibling()) {
|
||||
if (!AtEnd)
|
||||
Die = Die.getPreviousSibling();
|
||||
}
|
||||
|
||||
reverse_iterator<llvm::DWARFDie::iterator> &operator++() {
|
||||
assert(!AtEnd && "Incrementing rend");
|
||||
llvm::DWARFDie D = Die.getPreviousSibling();
|
||||
if (D)
|
||||
Die = D;
|
||||
else
|
||||
AtEnd = true;
|
||||
return *this;
|
||||
}
|
||||
|
||||
reverse_iterator<llvm::DWARFDie::iterator> &operator--() {
|
||||
if (AtEnd) {
|
||||
AtEnd = false;
|
||||
return *this;
|
||||
}
|
||||
Die = Die.getSibling();
|
||||
assert(!Die.isNULL() && "Decrementing rbegin");
|
||||
return *this;
|
||||
}
|
||||
|
||||
const llvm::DWARFDie &operator*() const {
|
||||
assert(Die.isValid());
|
||||
return Die;
|
||||
}
|
||||
|
||||
// FIXME: We should be able to specify the equals operator as a friend, but
|
||||
// that causes the compiler to think the operator overload is ambiguous
|
||||
// with the friend declaration and the actual definition as candidates.
|
||||
bool equals(const reverse_iterator<llvm::DWARFDie::iterator> &RHS) const {
|
||||
return Die == RHS.Die && AtEnd == RHS.AtEnd;
|
||||
}
|
||||
};
|
||||
|
||||
} // namespace std
|
||||
|
||||
namespace llvm {
|
||||
|
||||
inline bool operator==(const std::reverse_iterator<DWARFDie::iterator> &LHS,
|
||||
const std::reverse_iterator<DWARFDie::iterator> &RHS) {
|
||||
return LHS.equals(RHS);
|
||||
}
|
||||
|
||||
inline bool operator!=(const std::reverse_iterator<DWARFDie::iterator> &LHS,
|
||||
const std::reverse_iterator<DWARFDie::iterator> &RHS) {
|
||||
return !(LHS == RHS);
|
||||
}
|
||||
|
||||
inline std::reverse_iterator<DWARFDie::iterator> DWARFDie::rbegin() const {
|
||||
return llvm::make_reverse_iterator(end());
|
||||
}
|
||||
|
||||
inline std::reverse_iterator<DWARFDie::iterator> DWARFDie::rend() const {
|
||||
return llvm::make_reverse_iterator(begin());
|
||||
}
|
||||
|
||||
} // end namespace llvm
|
||||
|
||||
#endif // LLVM_DEBUGINFO_DWARFDIE_H
|
||||
|
@ -14,7 +14,10 @@
|
||||
#include "llvm/Support/thread.h"
|
||||
#include <map>
|
||||
#include <mutex>
|
||||
#include <set>
|
||||
#include <sstream>
|
||||
#include <string>
|
||||
#include <vector>
|
||||
|
||||
namespace llvm {
|
||||
namespace orc {
|
||||
@ -205,6 +208,42 @@ std::mutex RPCTypeName<std::vector<T>>::NameMutex;
|
||||
template <typename T>
|
||||
std::string RPCTypeName<std::vector<T>>::Name;
|
||||
|
||||
template <typename T> class RPCTypeName<std::set<T>> {
|
||||
public:
|
||||
static const char *getName() {
|
||||
std::lock_guard<std::mutex> Lock(NameMutex);
|
||||
if (Name.empty())
|
||||
raw_string_ostream(Name)
|
||||
<< "std::set<" << RPCTypeName<T>::getName() << ">";
|
||||
return Name.data();
|
||||
}
|
||||
|
||||
private:
|
||||
static std::mutex NameMutex;
|
||||
static std::string Name;
|
||||
};
|
||||
|
||||
template <typename T> std::mutex RPCTypeName<std::set<T>>::NameMutex;
|
||||
template <typename T> std::string RPCTypeName<std::set<T>>::Name;
|
||||
|
||||
template <typename K, typename V> class RPCTypeName<std::map<K, V>> {
|
||||
public:
|
||||
static const char *getName() {
|
||||
std::lock_guard<std::mutex> Lock(NameMutex);
|
||||
if (Name.empty())
|
||||
raw_string_ostream(Name)
|
||||
<< "std::map<" << RPCTypeNameSequence<K, V>() << ">";
|
||||
return Name.data();
|
||||
}
|
||||
|
||||
private:
|
||||
static std::mutex NameMutex;
|
||||
static std::string Name;
|
||||
};
|
||||
|
||||
template <typename K, typename V>
|
||||
std::mutex RPCTypeName<std::map<K, V>>::NameMutex;
|
||||
template <typename K, typename V> std::string RPCTypeName<std::map<K, V>>::Name;
|
||||
|
||||
/// The SerializationTraits<ChannelT, T> class describes how to serialize and
|
||||
/// deserialize an instance of type T to/from an abstract channel of type
|
||||
@ -527,15 +566,20 @@ class SerializationTraits<ChannelT, Expected<T>, Error> {
|
||||
};
|
||||
|
||||
/// SerializationTraits default specialization for std::pair.
|
||||
template <typename ChannelT, typename T1, typename T2>
|
||||
class SerializationTraits<ChannelT, std::pair<T1, T2>> {
|
||||
template <typename ChannelT, typename T1, typename T2, typename T3, typename T4>
|
||||
class SerializationTraits<ChannelT, std::pair<T1, T2>, std::pair<T3, T4>> {
|
||||
public:
|
||||
static Error serialize(ChannelT &C, const std::pair<T1, T2> &V) {
|
||||
return serializeSeq(C, V.first, V.second);
|
||||
static Error serialize(ChannelT &C, const std::pair<T3, T4> &V) {
|
||||
if (auto Err = SerializationTraits<ChannelT, T1, T3>::serialize(C, V.first))
|
||||
return Err;
|
||||
return SerializationTraits<ChannelT, T2, T4>::serialize(C, V.second);
|
||||
}
|
||||
|
||||
static Error deserialize(ChannelT &C, std::pair<T1, T2> &V) {
|
||||
return deserializeSeq(C, V.first, V.second);
|
||||
static Error deserialize(ChannelT &C, std::pair<T3, T4> &V) {
|
||||
if (auto Err =
|
||||
SerializationTraits<ChannelT, T1, T3>::deserialize(C, V.first))
|
||||
return Err;
|
||||
return SerializationTraits<ChannelT, T2, T4>::deserialize(C, V.second);
|
||||
}
|
||||
};
|
||||
|
||||
@ -589,6 +633,9 @@ class SerializationTraits<ChannelT, std::vector<T>> {
|
||||
|
||||
/// Deserialize a std::vector<T> to a std::vector<T>.
|
||||
static Error deserialize(ChannelT &C, std::vector<T> &V) {
|
||||
assert(V.empty() &&
|
||||
"Expected default-constructed vector to deserialize into");
|
||||
|
||||
uint64_t Count = 0;
|
||||
if (auto Err = deserializeSeq(C, Count))
|
||||
return Err;
|
||||
@ -602,6 +649,92 @@ class SerializationTraits<ChannelT, std::vector<T>> {
|
||||
}
|
||||
};
|
||||
|
||||
template <typename ChannelT, typename T, typename T2>
|
||||
class SerializationTraits<ChannelT, std::set<T>, std::set<T2>> {
|
||||
public:
|
||||
/// Serialize a std::set<T> from std::set<T2>.
|
||||
static Error serialize(ChannelT &C, const std::set<T2> &S) {
|
||||
if (auto Err = serializeSeq(C, static_cast<uint64_t>(S.size())))
|
||||
return Err;
|
||||
|
||||
for (const auto &E : S)
|
||||
if (auto Err = SerializationTraits<ChannelT, T, T2>::serialize(C, E))
|
||||
return Err;
|
||||
|
||||
return Error::success();
|
||||
}
|
||||
|
||||
/// Deserialize a std::set<T> to a std::set<T>.
|
||||
static Error deserialize(ChannelT &C, std::set<T2> &S) {
|
||||
assert(S.empty() && "Expected default-constructed set to deserialize into");
|
||||
|
||||
uint64_t Count = 0;
|
||||
if (auto Err = deserializeSeq(C, Count))
|
||||
return Err;
|
||||
|
||||
while (Count-- != 0) {
|
||||
T2 Val;
|
||||
if (auto Err = SerializationTraits<ChannelT, T, T2>::deserialize(C, Val))
|
||||
return Err;
|
||||
|
||||
auto Added = S.insert(Val).second;
|
||||
if (!Added)
|
||||
return make_error<StringError>("Duplicate element in deserialized set",
|
||||
orcError(OrcErrorCode::UnknownORCError));
|
||||
}
|
||||
|
||||
return Error::success();
|
||||
}
|
||||
};
|
||||
|
||||
template <typename ChannelT, typename K, typename V, typename K2, typename V2>
|
||||
class SerializationTraits<ChannelT, std::map<K, V>, std::map<K2, V2>> {
|
||||
public:
|
||||
/// Serialize a std::map<K, V> from std::map<K2, V2>.
|
||||
static Error serialize(ChannelT &C, const std::map<K2, V2> &M) {
|
||||
if (auto Err = serializeSeq(C, static_cast<uint64_t>(M.size())))
|
||||
return Err;
|
||||
|
||||
for (const auto &E : M) {
|
||||
if (auto Err =
|
||||
SerializationTraits<ChannelT, K, K2>::serialize(C, E.first))
|
||||
return Err;
|
||||
if (auto Err =
|
||||
SerializationTraits<ChannelT, V, V2>::serialize(C, E.second))
|
||||
return Err;
|
||||
}
|
||||
|
||||
return Error::success();
|
||||
}
|
||||
|
||||
/// Deserialize a std::map<K, V> to a std::map<K, V>.
|
||||
static Error deserialize(ChannelT &C, std::map<K2, V2> &M) {
|
||||
assert(M.empty() && "Expected default-constructed map to deserialize into");
|
||||
|
||||
uint64_t Count = 0;
|
||||
if (auto Err = deserializeSeq(C, Count))
|
||||
return Err;
|
||||
|
||||
while (Count-- != 0) {
|
||||
std::pair<K2, V2> Val;
|
||||
if (auto Err =
|
||||
SerializationTraits<ChannelT, K, K2>::deserialize(C, Val.first))
|
||||
return Err;
|
||||
|
||||
if (auto Err =
|
||||
SerializationTraits<ChannelT, V, V2>::deserialize(C, Val.second))
|
||||
return Err;
|
||||
|
||||
auto Added = M.insert(Val).second;
|
||||
if (!Added)
|
||||
return make_error<StringError>("Duplicate element in deserialized map",
|
||||
orcError(OrcErrorCode::UnknownORCError));
|
||||
}
|
||||
|
||||
return Error::success();
|
||||
}
|
||||
};
|
||||
|
||||
} // end namespace rpc
|
||||
} // end namespace orc
|
||||
} // end namespace llvm
|
||||
|
@ -236,3 +236,4 @@ def : MergeRule<"adjustCallerSSPLevel">;
|
||||
def : MergeRule<"adjustCallerStackProbes">;
|
||||
def : MergeRule<"adjustCallerStackProbeSize">;
|
||||
def : MergeRule<"adjustMinLegalVectorWidth">;
|
||||
def : MergeRule<"adjustNullPointerValidAttr">;
|
||||
|
@ -547,7 +547,7 @@ class Instruction : public User,
|
||||
/// may have side effects cannot be removed without semantically changing the
|
||||
/// generated program.
|
||||
bool isSafeToRemove() const;
|
||||
|
||||
|
||||
/// Return true if the instruction is a variety of EH-block.
|
||||
bool isEHPad() const {
|
||||
switch (getOpcode()) {
|
||||
|
@ -4016,7 +4016,7 @@ class InvokeInst : public CallBase<InvokeInst> {
|
||||
void setDoesNotThrow() {
|
||||
addAttribute(AttributeList::FunctionIndex, Attribute::NoUnwind);
|
||||
}
|
||||
|
||||
|
||||
/// Return the function called, or null if this is an
|
||||
/// indirect function invocation.
|
||||
///
|
||||
|
@ -541,7 +541,7 @@ let IntrProperties = [IntrInaccessibleMemOnly] in {
|
||||
[ LLVMMatchType<0>,
|
||||
llvm_metadata_ty,
|
||||
llvm_metadata_ty ]>;
|
||||
def int_experimental_constrained_exp : Intrinsic<[ llvm_anyfloat_ty ],
|
||||
def int_experimental_constrained_exp : Intrinsic<[ llvm_anyfloat_ty ],
|
||||
[ LLVMMatchType<0>,
|
||||
llvm_metadata_ty,
|
||||
llvm_metadata_ty ]>;
|
||||
|
@ -1191,7 +1191,7 @@ def int_amdgcn_ds_bpermute :
|
||||
// Deep learning intrinsics.
|
||||
//===----------------------------------------------------------------------===//
|
||||
|
||||
// f32 %r = llvm.amdgcn.fdot2(v2f16 %a, v2f16 %b, f32 %c)
|
||||
// f32 %r = llvm.amdgcn.fdot2(v2f16 %a, v2f16 %b, f32 %c, i1 %clamp)
|
||||
// %r = %a[0] * %b[0] + %a[1] * %b[1] + %c
|
||||
def int_amdgcn_fdot2 :
|
||||
GCCBuiltin<"__builtin_amdgcn_fdot2">,
|
||||
@ -1200,12 +1200,13 @@ def int_amdgcn_fdot2 :
|
||||
[
|
||||
llvm_v2f16_ty, // %a
|
||||
llvm_v2f16_ty, // %b
|
||||
llvm_float_ty // %c
|
||||
llvm_float_ty, // %c
|
||||
llvm_i1_ty // %clamp
|
||||
],
|
||||
[IntrNoMem, IntrSpeculatable]
|
||||
>;
|
||||
|
||||
// i32 %r = llvm.amdgcn.sdot2(v2i16 %a, v2i16 %b, i32 %c)
|
||||
// i32 %r = llvm.amdgcn.sdot2(v2i16 %a, v2i16 %b, i32 %c, i1 %clamp)
|
||||
// %r = %a[0] * %b[0] + %a[1] * %b[1] + %c
|
||||
def int_amdgcn_sdot2 :
|
||||
GCCBuiltin<"__builtin_amdgcn_sdot2">,
|
||||
@ -1214,12 +1215,13 @@ def int_amdgcn_sdot2 :
|
||||
[
|
||||
llvm_v2i16_ty, // %a
|
||||
llvm_v2i16_ty, // %b
|
||||
llvm_i32_ty // %c
|
||||
llvm_i32_ty, // %c
|
||||
llvm_i1_ty // %clamp
|
||||
],
|
||||
[IntrNoMem, IntrSpeculatable]
|
||||
>;
|
||||
|
||||
// u32 %r = llvm.amdgcn.udot2(v2u16 %a, v2u16 %b, u32 %c)
|
||||
// u32 %r = llvm.amdgcn.udot2(v2u16 %a, v2u16 %b, u32 %c, i1 %clamp)
|
||||
// %r = %a[0] * %b[0] + %a[1] * %b[1] + %c
|
||||
def int_amdgcn_udot2 :
|
||||
GCCBuiltin<"__builtin_amdgcn_udot2">,
|
||||
@ -1228,12 +1230,13 @@ def int_amdgcn_udot2 :
|
||||
[
|
||||
llvm_v2i16_ty, // %a
|
||||
llvm_v2i16_ty, // %b
|
||||
llvm_i32_ty // %c
|
||||
llvm_i32_ty, // %c
|
||||
llvm_i1_ty // %clamp
|
||||
],
|
||||
[IntrNoMem, IntrSpeculatable]
|
||||
>;
|
||||
|
||||
// i32 %r = llvm.amdgcn.sdot4(v4i8 (as i32) %a, v4i8 (as i32) %b, i32 %c)
|
||||
// i32 %r = llvm.amdgcn.sdot4(v4i8 (as i32) %a, v4i8 (as i32) %b, i32 %c, i1 %clamp)
|
||||
// %r = %a[0] * %b[0] + %a[1] * %b[1] + %a[2] * %b[2] + %a[3] * %b[3] + %c
|
||||
def int_amdgcn_sdot4 :
|
||||
GCCBuiltin<"__builtin_amdgcn_sdot4">,
|
||||
@ -1242,12 +1245,13 @@ def int_amdgcn_sdot4 :
|
||||
[
|
||||
llvm_i32_ty, // %a
|
||||
llvm_i32_ty, // %b
|
||||
llvm_i32_ty // %c
|
||||
llvm_i32_ty, // %c
|
||||
llvm_i1_ty // %clamp
|
||||
],
|
||||
[IntrNoMem, IntrSpeculatable]
|
||||
>;
|
||||
|
||||
// u32 %r = llvm.amdgcn.udot4(v4u8 (as u32) %a, v4u8 (as u32) %b, u32 %c)
|
||||
// u32 %r = llvm.amdgcn.udot4(v4u8 (as u32) %a, v4u8 (as u32) %b, u32 %c, i1 %clamp)
|
||||
// %r = %a[0] * %b[0] + %a[1] * %b[1] + %a[2] * %b[2] + %a[3] * %b[3] + %c
|
||||
def int_amdgcn_udot4 :
|
||||
GCCBuiltin<"__builtin_amdgcn_udot4">,
|
||||
@ -1256,12 +1260,13 @@ def int_amdgcn_udot4 :
|
||||
[
|
||||
llvm_i32_ty, // %a
|
||||
llvm_i32_ty, // %b
|
||||
llvm_i32_ty // %c
|
||||
llvm_i32_ty, // %c
|
||||
llvm_i1_ty // %clamp
|
||||
],
|
||||
[IntrNoMem, IntrSpeculatable]
|
||||
>;
|
||||
|
||||
// i32 %r = llvm.amdgcn.sdot8(v8i4 (as i32) %a, v8i4 (as i32) %b, i32 %c)
|
||||
// i32 %r = llvm.amdgcn.sdot8(v8i4 (as i32) %a, v8i4 (as i32) %b, i32 %c, i1 %clamp)
|
||||
// %r = %a[0] * %b[0] + %a[1] * %b[1] + %a[2] * %b[2] + %a[3] * %b[3] +
|
||||
// %a[4] * %b[4] + %a[5] * %b[5] + %a[6] * %b[6] + %a[7] * %b[7] + %c
|
||||
def int_amdgcn_sdot8 :
|
||||
@ -1271,12 +1276,13 @@ def int_amdgcn_sdot8 :
|
||||
[
|
||||
llvm_i32_ty, // %a
|
||||
llvm_i32_ty, // %b
|
||||
llvm_i32_ty // %c
|
||||
llvm_i32_ty, // %c
|
||||
llvm_i1_ty // %clamp
|
||||
],
|
||||
[IntrNoMem, IntrSpeculatable]
|
||||
>;
|
||||
|
||||
// u32 %r = llvm.amdgcn.udot8(v8u4 (as u32) %a, v8u4 (as u32) %b, u32 %c)
|
||||
// u32 %r = llvm.amdgcn.udot8(v8u4 (as u32) %a, v8u4 (as u32) %b, u32 %c, i1 %clamp)
|
||||
// %r = %a[0] * %b[0] + %a[1] * %b[1] + %a[2] * %b[2] + %a[3] * %b[3] +
|
||||
// %a[4] * %b[4] + %a[5] * %b[5] + %a[6] * %b[6] + %a[7] * %b[7] + %c
|
||||
def int_amdgcn_udot8 :
|
||||
@ -1286,7 +1292,8 @@ def int_amdgcn_udot8 :
|
||||
[
|
||||
llvm_i32_ty, // %a
|
||||
llvm_i32_ty, // %b
|
||||
llvm_i32_ty // %c
|
||||
llvm_i32_ty, // %c
|
||||
llvm_i1_ty // %clamp
|
||||
],
|
||||
[IntrNoMem, IntrSpeculatable]
|
||||
>;
|
||||
|
@ -275,7 +275,7 @@ def int_arm_stc : GCCBuiltin<"__builtin_arm_stc">,
|
||||
Intrinsic<[], [llvm_i32_ty, llvm_i32_ty, llvm_ptr_ty], []>;
|
||||
def int_arm_stcl : GCCBuiltin<"__builtin_arm_stcl">,
|
||||
Intrinsic<[], [llvm_i32_ty, llvm_i32_ty, llvm_ptr_ty], []>;
|
||||
def int_arm_stc2 : GCCBuiltin<"__builtin_arm_stc2">,
|
||||
def int_arm_stc2 : GCCBuiltin<"__builtin_arm_stc2">,
|
||||
Intrinsic<[], [llvm_i32_ty, llvm_i32_ty, llvm_ptr_ty], []>;
|
||||
def int_arm_stc2l : GCCBuiltin<"__builtin_arm_stc2l">,
|
||||
Intrinsic<[], [llvm_i32_ty, llvm_i32_ty, llvm_ptr_ty], []>;
|
||||
|
@ -1,10 +1,10 @@
|
||||
//===- IntrinsicsPowerPC.td - Defines PowerPC intrinsics ---*- tablegen -*-===//
|
||||
//
|
||||
//
|
||||
// The LLVM Compiler Infrastructure
|
||||
//
|
||||
// This file is distributed under the University of Illinois Open Source
|
||||
// License. See LICENSE.TXT for details.
|
||||
//
|
||||
//
|
||||
//===----------------------------------------------------------------------===//
|
||||
//
|
||||
// This file defines all of the PowerPC-specific intrinsics.
|
||||
@ -122,21 +122,21 @@ class PowerPC_Vec_FFF_Intrinsic<string GCCIntSuffix>
|
||||
|
||||
/// PowerPC_Vec_BBB_Intrinsic - A PowerPC intrinsic that takes two v16i8
|
||||
/// vectors and returns one. These intrinsics have no side effects.
|
||||
class PowerPC_Vec_BBB_Intrinsic<string GCCIntSuffix>
|
||||
class PowerPC_Vec_BBB_Intrinsic<string GCCIntSuffix>
|
||||
: PowerPC_Vec_Intrinsic<GCCIntSuffix,
|
||||
[llvm_v16i8_ty], [llvm_v16i8_ty, llvm_v16i8_ty],
|
||||
[IntrNoMem]>;
|
||||
|
||||
/// PowerPC_Vec_HHH_Intrinsic - A PowerPC intrinsic that takes two v8i16
|
||||
/// vectors and returns one. These intrinsics have no side effects.
|
||||
class PowerPC_Vec_HHH_Intrinsic<string GCCIntSuffix>
|
||||
class PowerPC_Vec_HHH_Intrinsic<string GCCIntSuffix>
|
||||
: PowerPC_Vec_Intrinsic<GCCIntSuffix,
|
||||
[llvm_v8i16_ty], [llvm_v8i16_ty, llvm_v8i16_ty],
|
||||
[IntrNoMem]>;
|
||||
|
||||
/// PowerPC_Vec_WWW_Intrinsic - A PowerPC intrinsic that takes two v4i32
|
||||
/// vectors and returns one. These intrinsics have no side effects.
|
||||
class PowerPC_Vec_WWW_Intrinsic<string GCCIntSuffix>
|
||||
class PowerPC_Vec_WWW_Intrinsic<string GCCIntSuffix>
|
||||
: PowerPC_Vec_Intrinsic<GCCIntSuffix,
|
||||
[llvm_v4i32_ty], [llvm_v4i32_ty, llvm_v4i32_ty],
|
||||
[IntrNoMem]>;
|
||||
@ -267,7 +267,7 @@ let TargetPrefix = "ppc" in { // All intrinsics start with "llvm.ppc.".
|
||||
def int_ppc_altivec_vcmpgtud : GCCBuiltin<"__builtin_altivec_vcmpgtud">,
|
||||
Intrinsic<[llvm_v2i64_ty], [llvm_v2i64_ty, llvm_v2i64_ty],
|
||||
[IntrNoMem]>;
|
||||
|
||||
|
||||
def int_ppc_altivec_vcmpequw : GCCBuiltin<"__builtin_altivec_vcmpequw">,
|
||||
Intrinsic<[llvm_v4i32_ty], [llvm_v4i32_ty, llvm_v4i32_ty],
|
||||
[IntrNoMem]>;
|
||||
@ -283,7 +283,7 @@ let TargetPrefix = "ppc" in { // All intrinsics start with "llvm.ppc.".
|
||||
def int_ppc_altivec_vcmpnezw : GCCBuiltin<"__builtin_altivec_vcmpnezw">,
|
||||
Intrinsic<[llvm_v4i32_ty], [llvm_v4i32_ty, llvm_v4i32_ty],
|
||||
[IntrNoMem]>;
|
||||
|
||||
|
||||
def int_ppc_altivec_vcmpequh : GCCBuiltin<"__builtin_altivec_vcmpequh">,
|
||||
Intrinsic<[llvm_v8i16_ty], [llvm_v8i16_ty, llvm_v8i16_ty],
|
||||
[IntrNoMem]>;
|
||||
@ -355,7 +355,7 @@ let TargetPrefix = "ppc" in { // All intrinsics start with "llvm.ppc.".
|
||||
def int_ppc_altivec_vcmpnezw_p : GCCBuiltin<"__builtin_altivec_vcmpnezw_p">,
|
||||
Intrinsic<[llvm_i32_ty],[llvm_i32_ty,llvm_v4i32_ty,llvm_v4i32_ty],
|
||||
[IntrNoMem]>;
|
||||
|
||||
|
||||
def int_ppc_altivec_vcmpequh_p : GCCBuiltin<"__builtin_altivec_vcmpequh_p">,
|
||||
Intrinsic<[llvm_i32_ty],[llvm_i32_ty,llvm_v8i16_ty,llvm_v8i16_ty],
|
||||
[IntrNoMem]>;
|
||||
@ -474,10 +474,10 @@ let TargetPrefix = "ppc" in { // All PPC intrinsics start with "llvm.ppc.".
|
||||
Intrinsic<[llvm_v4i32_ty], [llvm_v8i16_ty, llvm_v8i16_ty,
|
||||
llvm_v4i32_ty], [IntrNoMem]>;
|
||||
def int_ppc_altivec_vmsumshs : GCCBuiltin<"__builtin_altivec_vmsumshs">,
|
||||
Intrinsic<[llvm_v4i32_ty], [llvm_v8i16_ty, llvm_v8i16_ty,
|
||||
Intrinsic<[llvm_v4i32_ty], [llvm_v8i16_ty, llvm_v8i16_ty,
|
||||
llvm_v4i32_ty], [IntrNoMem]>;
|
||||
def int_ppc_altivec_vmsumubm : GCCBuiltin<"__builtin_altivec_vmsumubm">,
|
||||
Intrinsic<[llvm_v4i32_ty], [llvm_v16i8_ty, llvm_v16i8_ty,
|
||||
Intrinsic<[llvm_v4i32_ty], [llvm_v16i8_ty, llvm_v16i8_ty,
|
||||
llvm_v4i32_ty], [IntrNoMem]>;
|
||||
def int_ppc_altivec_vmsumuhm : GCCBuiltin<"__builtin_altivec_vmsumuhm">,
|
||||
Intrinsic<[llvm_v4i32_ty], [llvm_v8i16_ty, llvm_v8i16_ty,
|
||||
@ -544,7 +544,7 @@ let TargetPrefix = "ppc" in { // All PPC intrinsics start with "llvm.ppc.".
|
||||
|
||||
// Other multiplies.
|
||||
def int_ppc_altivec_vmladduhm : GCCBuiltin<"__builtin_altivec_vmladduhm">,
|
||||
Intrinsic<[llvm_v8i16_ty], [llvm_v8i16_ty, llvm_v8i16_ty,
|
||||
Intrinsic<[llvm_v8i16_ty], [llvm_v8i16_ty, llvm_v8i16_ty,
|
||||
llvm_v8i16_ty], [IntrNoMem]>;
|
||||
|
||||
// Packs.
|
||||
@ -626,21 +626,21 @@ let TargetPrefix = "ppc" in { // All PPC intrinsics start with "llvm.ppc.".
|
||||
|
||||
// Add Extended Quadword
|
||||
def int_ppc_altivec_vaddeuqm : GCCBuiltin<"__builtin_altivec_vaddeuqm">,
|
||||
Intrinsic<[llvm_v1i128_ty],
|
||||
Intrinsic<[llvm_v1i128_ty],
|
||||
[llvm_v1i128_ty, llvm_v1i128_ty, llvm_v1i128_ty],
|
||||
[IntrNoMem]>;
|
||||
def int_ppc_altivec_vaddecuq : GCCBuiltin<"__builtin_altivec_vaddecuq">,
|
||||
Intrinsic<[llvm_v1i128_ty],
|
||||
Intrinsic<[llvm_v1i128_ty],
|
||||
[llvm_v1i128_ty, llvm_v1i128_ty, llvm_v1i128_ty],
|
||||
[IntrNoMem]>;
|
||||
|
||||
// Sub Extended Quadword
|
||||
def int_ppc_altivec_vsubeuqm : GCCBuiltin<"__builtin_altivec_vsubeuqm">,
|
||||
Intrinsic<[llvm_v1i128_ty],
|
||||
Intrinsic<[llvm_v1i128_ty],
|
||||
[llvm_v1i128_ty, llvm_v1i128_ty, llvm_v1i128_ty],
|
||||
[IntrNoMem]>;
|
||||
def int_ppc_altivec_vsubecuq : GCCBuiltin<"__builtin_altivec_vsubecuq">,
|
||||
Intrinsic<[llvm_v1i128_ty],
|
||||
Intrinsic<[llvm_v1i128_ty],
|
||||
[llvm_v1i128_ty, llvm_v1i128_ty, llvm_v1i128_ty],
|
||||
[IntrNoMem]>;
|
||||
}
|
||||
@ -657,7 +657,7 @@ def int_ppc_altivec_vslw : PowerPC_Vec_WWW_Intrinsic<"vslw">;
|
||||
// Right Shifts.
|
||||
def int_ppc_altivec_vsr : PowerPC_Vec_WWW_Intrinsic<"vsr">;
|
||||
def int_ppc_altivec_vsro : PowerPC_Vec_WWW_Intrinsic<"vsro">;
|
||||
|
||||
|
||||
def int_ppc_altivec_vsrb : PowerPC_Vec_BBB_Intrinsic<"vsrb">;
|
||||
def int_ppc_altivec_vsrh : PowerPC_Vec_HHH_Intrinsic<"vsrh">;
|
||||
def int_ppc_altivec_vsrw : PowerPC_Vec_WWW_Intrinsic<"vsrw">;
|
||||
@ -679,10 +679,10 @@ let TargetPrefix = "ppc" in { // All PPC intrinsics start with "llvm.ppc.".
|
||||
Intrinsic<[llvm_v16i8_ty], [llvm_ptr_ty], [IntrNoMem]>;
|
||||
|
||||
def int_ppc_altivec_vperm : GCCBuiltin<"__builtin_altivec_vperm_4si">,
|
||||
Intrinsic<[llvm_v4i32_ty], [llvm_v4i32_ty,
|
||||
Intrinsic<[llvm_v4i32_ty], [llvm_v4i32_ty,
|
||||
llvm_v4i32_ty, llvm_v16i8_ty], [IntrNoMem]>;
|
||||
def int_ppc_altivec_vsel : GCCBuiltin<"__builtin_altivec_vsel_4si">,
|
||||
Intrinsic<[llvm_v4i32_ty], [llvm_v4i32_ty,
|
||||
Intrinsic<[llvm_v4i32_ty], [llvm_v4i32_ty,
|
||||
llvm_v4i32_ty, llvm_v4i32_ty], [IntrNoMem]>;
|
||||
def int_ppc_altivec_vgbbd : GCCBuiltin<"__builtin_altivec_vgbbd">,
|
||||
Intrinsic<[llvm_v16i8_ty], [llvm_v16i8_ty], [IntrNoMem]>;
|
||||
|
@ -285,7 +285,7 @@ class PMTopLevelManager {
|
||||
SpecificBumpPtrAllocator<AUFoldingSetNode> AUFoldingSetNodeAllocator;
|
||||
|
||||
// Maps from a pass to it's associated entry in UniqueAnalysisUsages. Does
|
||||
// not own the storage associated with either key or value..
|
||||
// not own the storage associated with either key or value..
|
||||
DenseMap<Pass *, AnalysisUsage*> AnUsageMap;
|
||||
|
||||
/// Collection of PassInfo objects found via analysis IDs and in this top
|
||||
|
@ -325,7 +325,7 @@ class Statepoint
|
||||
explicit Statepoint(CallSite CS) : Base(CS) {}
|
||||
};
|
||||
|
||||
/// Common base class for representing values projected from a statepoint.
|
||||
/// Common base class for representing values projected from a statepoint.
|
||||
/// Currently, the only projections available are gc.result and gc.relocate.
|
||||
class GCProjectionInst : public IntrinsicInst {
|
||||
public:
|
||||
|
@ -101,10 +101,10 @@ class User : public Value {
|
||||
void operator delete(void *Usr);
|
||||
/// Placement delete - required by std, called if the ctor throws.
|
||||
void operator delete(void *Usr, unsigned) {
|
||||
// Note: If a subclass manipulates the information which is required to calculate the
|
||||
// Usr memory pointer, e.g. NumUserOperands, the operator delete of that subclass has
|
||||
// Note: If a subclass manipulates the information which is required to calculate the
|
||||
// Usr memory pointer, e.g. NumUserOperands, the operator delete of that subclass has
|
||||
// to restore the changed information to the original value, since the dtor of that class
|
||||
// is not called if the ctor fails.
|
||||
// is not called if the ctor fails.
|
||||
User::operator delete(Usr);
|
||||
|
||||
#ifndef LLVM_ENABLE_EXCEPTIONS
|
||||
@ -113,10 +113,10 @@ class User : public Value {
|
||||
}
|
||||
/// Placement delete - required by std, called if the ctor throws.
|
||||
void operator delete(void *Usr, unsigned, bool) {
|
||||
// Note: If a subclass manipulates the information which is required to calculate the
|
||||
// Usr memory pointer, e.g. NumUserOperands, the operator delete of that subclass has
|
||||
// Note: If a subclass manipulates the information which is required to calculate the
|
||||
// Usr memory pointer, e.g. NumUserOperands, the operator delete of that subclass has
|
||||
// to restore the changed information to the original value, since the dtor of that class
|
||||
// is not called if the ctor fails.
|
||||
// is not called if the ctor fails.
|
||||
User::operator delete(Usr);
|
||||
|
||||
#ifndef LLVM_ENABLE_EXCEPTIONS
|
||||
|
@ -44,7 +44,7 @@ namespace {
|
||||
llvm::LLVMContext Context;
|
||||
(void)new llvm::Module("", Context);
|
||||
(void)new llvm::UnreachableInst(Context);
|
||||
(void) llvm::createVerifierPass();
|
||||
(void) llvm::createVerifierPass();
|
||||
}
|
||||
} ForceVMCoreLinking;
|
||||
}
|
||||
|
@ -362,6 +362,13 @@ class MCDwarfLineAddr {
|
||||
static void Encode(MCContext &Context, MCDwarfLineTableParams Params,
|
||||
int64_t LineDelta, uint64_t AddrDelta, raw_ostream &OS);
|
||||
|
||||
/// Utility function to encode a Dwarf pair of LineDelta and AddrDeltas using
|
||||
/// fixed length operands.
|
||||
static bool FixedEncode(MCContext &Context,
|
||||
MCDwarfLineTableParams Params,
|
||||
int64_t LineDelta, uint64_t AddrDelta,
|
||||
raw_ostream &OS, uint32_t *Offset, uint32_t *Size);
|
||||
|
||||
/// Utility function to emit the encoding to a streamer.
|
||||
static void Emit(MCStreamer *MCOS, MCDwarfLineTableParams Params,
|
||||
int64_t LineDelta, uint64_t AddrDelta);
|
||||
|
@ -149,6 +149,7 @@ class MCEncodedFragment : public MCFragment {
|
||||
case MCFragment::FT_Relaxable:
|
||||
case MCFragment::FT_CompactEncodedInst:
|
||||
case MCFragment::FT_Data:
|
||||
case MCFragment::FT_Dwarf:
|
||||
return true;
|
||||
}
|
||||
}
|
||||
@ -232,7 +233,7 @@ class MCEncodedFragmentWithFixups :
|
||||
static bool classof(const MCFragment *F) {
|
||||
MCFragment::FragmentType Kind = F->getKind();
|
||||
return Kind == MCFragment::FT_Relaxable || Kind == MCFragment::FT_Data ||
|
||||
Kind == MCFragment::FT_CVDefRange;
|
||||
Kind == MCFragment::FT_CVDefRange || Kind == MCFragment::FT_Dwarf;;
|
||||
}
|
||||
};
|
||||
|
||||
@ -514,7 +515,7 @@ class MCLEBFragment : public MCFragment {
|
||||
}
|
||||
};
|
||||
|
||||
class MCDwarfLineAddrFragment : public MCFragment {
|
||||
class MCDwarfLineAddrFragment : public MCEncodedFragmentWithFixups<8, 1> {
|
||||
/// LineDelta - the value of the difference between the two line numbers
|
||||
/// between two .loc dwarf directives.
|
||||
int64_t LineDelta;
|
||||
@ -523,15 +524,11 @@ class MCDwarfLineAddrFragment : public MCFragment {
|
||||
/// make up the address delta between two .loc dwarf directives.
|
||||
const MCExpr *AddrDelta;
|
||||
|
||||
SmallString<8> Contents;
|
||||
|
||||
public:
|
||||
MCDwarfLineAddrFragment(int64_t LineDelta, const MCExpr &AddrDelta,
|
||||
MCSection *Sec = nullptr)
|
||||
: MCFragment(FT_Dwarf, false, Sec), LineDelta(LineDelta),
|
||||
AddrDelta(&AddrDelta) {
|
||||
Contents.push_back(0);
|
||||
}
|
||||
: MCEncodedFragmentWithFixups<8, 1>(FT_Dwarf, false, Sec),
|
||||
LineDelta(LineDelta), AddrDelta(&AddrDelta) {}
|
||||
|
||||
/// \name Accessors
|
||||
/// @{
|
||||
@ -540,9 +537,6 @@ class MCDwarfLineAddrFragment : public MCFragment {
|
||||
|
||||
const MCExpr &getAddrDelta() const { return *AddrDelta; }
|
||||
|
||||
SmallString<8> &getContents() { return Contents; }
|
||||
const SmallString<8> &getContents() const { return Contents; }
|
||||
|
||||
/// @}
|
||||
|
||||
static bool classof(const MCFragment *F) {
|
||||
|
@ -64,7 +64,7 @@ class MCInstrAnalysis {
|
||||
|
||||
/// Returns true if at least one of the register writes performed by
|
||||
/// \param Inst implicitly clears the upper portion of all super-registers.
|
||||
///
|
||||
///
|
||||
/// Example: on X86-64, a write to EAX implicitly clears the upper half of
|
||||
/// RAX. Also (still on x86) an XMM write perfomed by an AVX 128-bit
|
||||
/// instruction implicitly clears the upper portion of the correspondent
|
||||
@ -87,6 +87,19 @@ class MCInstrAnalysis {
|
||||
const MCInst &Inst,
|
||||
APInt &Writes) const;
|
||||
|
||||
/// Returns true if \param Inst is a dependency breaking instruction for the
|
||||
/// given subtarget.
|
||||
///
|
||||
/// The value computed by a dependency breaking instruction is not dependent
|
||||
/// on the inputs. An example of dependency breaking instruction on X86 is
|
||||
/// `XOR %eax, %eax`.
|
||||
/// TODO: In future, we could implement an alternative approach where this
|
||||
/// method returns `true` if the input instruction is not dependent on
|
||||
/// some/all of its input operands. An APInt mask could then be used to
|
||||
/// identify independent operands.
|
||||
virtual bool isDependencyBreaking(const MCSubtargetInfo &STI,
|
||||
const MCInst &Inst) const;
|
||||
|
||||
/// Given a branch instruction try to get the address the branch
|
||||
/// targets. Return true on success, and the address in Target.
|
||||
virtual bool
|
||||
|
@ -15,7 +15,7 @@ namespace llvm {
|
||||
/// AsmCond - Class to support conditional assembly
|
||||
///
|
||||
/// The conditional assembly feature (.if, .else, .elseif and .endif) is
|
||||
/// implemented with AsmCond that tells us what we are in the middle of
|
||||
/// implemented with AsmCond that tells us what we are in the middle of
|
||||
/// processing. Ignore can be either true or false. When true we are ignoring
|
||||
/// the block of code in the middle of a conditional.
|
||||
|
||||
|
@ -297,8 +297,8 @@ class MCStreamer {
|
||||
/// If the comment includes embedded \n's, they will each get the comment
|
||||
/// prefix as appropriate. The added comment should not end with a \n.
|
||||
/// By default, each comment is terminated with an end of line, i.e. the
|
||||
/// EOL param is set to true by default. If one prefers not to end the
|
||||
/// comment with a new line then the EOL param should be passed
|
||||
/// EOL param is set to true by default. If one prefers not to end the
|
||||
/// comment with a new line then the EOL param should be passed
|
||||
/// with a false value.
|
||||
virtual void AddComment(const Twine &T, bool EOL = true) {}
|
||||
|
||||
|
@ -333,7 +333,7 @@ class MachOObjectFile : public ObjectFile {
|
||||
|
||||
relocation_iterator locrel_begin() const;
|
||||
relocation_iterator locrel_end() const;
|
||||
|
||||
|
||||
void moveRelocationNext(DataRefImpl &Rel) const override;
|
||||
uint64_t getRelocationOffset(DataRefImpl Rel) const override;
|
||||
symbol_iterator getRelocationSymbol(DataRefImpl Rel) const override;
|
||||
|
@ -231,7 +231,7 @@ AnalysisType &Pass::getAnalysisID(AnalysisID PI) const {
|
||||
// should be a small number, we just do a linear search over a (dense)
|
||||
// vector.
|
||||
Pass *ResultPass = Resolver->findImplPass(PI);
|
||||
assert(ResultPass &&
|
||||
assert(ResultPass &&
|
||||
"getAnalysis*() called on an analysis that was not "
|
||||
"'required' by pass!");
|
||||
|
||||
|
@ -9,7 +9,7 @@
|
||||
//
|
||||
// This file defines PassRegistry, a class that is used in the initialization
|
||||
// and registration of passes. At application startup, passes are registered
|
||||
// with the PassRegistry, which is later provided to the PassManager for
|
||||
// with the PassRegistry, which is later provided to the PassManager for
|
||||
// dependency resolution and similar tasks.
|
||||
//
|
||||
//===----------------------------------------------------------------------===//
|
||||
|
@ -207,7 +207,7 @@ struct CounterMappingRegion {
|
||||
/// A CodeRegion associates some code with a counter
|
||||
CodeRegion,
|
||||
|
||||
/// An ExpansionRegion represents a file expansion region that associates
|
||||
/// An ExpansionRegion represents a file expansion region that associates
|
||||
/// a source range with the expansion of a virtual source file, such as
|
||||
/// for a macro instantiation or #include file.
|
||||
ExpansionRegion,
|
||||
|
@ -213,6 +213,8 @@ enum {
|
||||
// Tag_ABI_VFP_args, (=28), uleb128
|
||||
BaseAAPCS = 0,
|
||||
HardFPAAPCS = 1,
|
||||
ToolChainFPPCS = 2,
|
||||
CompatibleFPAAPCS = 3,
|
||||
|
||||
// Tag_FP_HP_extension, (=36), uleb128
|
||||
AllowHPFP = 1, // Allow use of Half Precision FP
|
||||
|
@ -15,7 +15,7 @@
|
||||
|
||||
namespace llvm {
|
||||
|
||||
/// An auxiliary type to facilitate extraction of 3-byte entities.
|
||||
/// An auxiliary type to facilitate extraction of 3-byte entities.
|
||||
struct Uint24 {
|
||||
uint8_t Bytes[3];
|
||||
Uint24(uint8_t U) {
|
||||
|
@ -530,11 +530,10 @@ class DominatorTreeBase {
|
||||
/// CFG about its children and inverse children. This implies that deletions
|
||||
/// of CFG edges must not delete the CFG nodes before calling this function.
|
||||
///
|
||||
/// Batch updates should be generally faster when performing longer sequences
|
||||
/// of updates than calling insertEdge/deleteEdge manually multiple times, as
|
||||
/// it can reorder the updates and remove redundant ones internally.
|
||||
/// The batch updater is also able to detect sequences of zero and exactly one
|
||||
/// update -- it's optimized to do less work in these cases.
|
||||
/// The applyUpdates function can reorder the updates and remove redundant
|
||||
/// ones internally. The batch updater is also able to detect sequences of
|
||||
/// zero and exactly one update -- it's optimized to do less work in these
|
||||
/// cases.
|
||||
///
|
||||
/// Note that for postdominators it automatically takes care of applying
|
||||
/// updates on reverse edges internally (so there's no need to swap the
|
||||
@ -854,10 +853,15 @@ class DominatorTreeBase {
|
||||
assert(isReachableFromEntry(B));
|
||||
assert(isReachableFromEntry(A));
|
||||
|
||||
const unsigned ALevel = A->getLevel();
|
||||
const DomTreeNodeBase<NodeT> *IDom;
|
||||
while ((IDom = B->getIDom()) != nullptr && IDom != A && IDom != B)
|
||||
|
||||
// Don't walk nodes above A's subtree. When we reach A's level, we must
|
||||
// either find A or be in some other subtree not dominated by A.
|
||||
while ((IDom = B->getIDom()) != nullptr && IDom->getLevel() >= ALevel)
|
||||
B = IDom; // Walk up the tree
|
||||
return IDom != nullptr;
|
||||
|
||||
return B == A;
|
||||
}
|
||||
|
||||
/// Wipe this tree's state without releasing any resources.
|
||||
|
@ -43,7 +43,6 @@ class MemoryBuffer {
|
||||
const char *BufferStart; // Start of the buffer.
|
||||
const char *BufferEnd; // End of the buffer.
|
||||
|
||||
|
||||
protected:
|
||||
MemoryBuffer() = default;
|
||||
|
||||
@ -148,9 +147,6 @@ class MemoryBuffer {
|
||||
virtual BufferKind getBufferKind() const = 0;
|
||||
|
||||
MemoryBufferRef getMemBufferRef() const;
|
||||
|
||||
private:
|
||||
virtual void anchor();
|
||||
};
|
||||
|
||||
/// This class is an extension of MemoryBuffer, which allows copy-on-write
|
||||
|
@ -49,6 +49,9 @@ class SmallVectorMemoryBuffer : public MemoryBuffer {
|
||||
init(this->SV.begin(), this->SV.end(), false);
|
||||
}
|
||||
|
||||
// Key function.
|
||||
~SmallVectorMemoryBuffer() override;
|
||||
|
||||
StringRef getBufferIdentifier() const override { return BufferName; }
|
||||
|
||||
BufferKind getBufferKind() const override { return MemoryBuffer_Malloc; }
|
||||
@ -56,7 +59,6 @@ class SmallVectorMemoryBuffer : public MemoryBuffer {
|
||||
private:
|
||||
SmallVector<char, 0> SV;
|
||||
std::string BufferName;
|
||||
void anchor() override;
|
||||
};
|
||||
|
||||
} // namespace llvm
|
||||
|
@ -470,12 +470,15 @@ HANDLE_TARGET_OPCODE(G_BSWAP)
|
||||
/// Generic AddressSpaceCast.
|
||||
HANDLE_TARGET_OPCODE(G_ADDRSPACE_CAST)
|
||||
|
||||
/// Generic block address
|
||||
HANDLE_TARGET_OPCODE(G_BLOCK_ADDR)
|
||||
|
||||
// TODO: Add more generic opcodes as we move along.
|
||||
|
||||
/// Marker for the end of the generic opcode.
|
||||
/// This is used to check if an opcode is in the range of the
|
||||
/// generic opcodes.
|
||||
HANDLE_TARGET_OPCODE_MARKER(PRE_ISEL_GENERIC_OPCODE_END, G_ADDRSPACE_CAST)
|
||||
HANDLE_TARGET_OPCODE_MARKER(PRE_ISEL_GENERIC_OPCODE_END, G_BLOCK_ADDR)
|
||||
|
||||
/// BUILTIN_OP_END - This must be the last enum value in this list.
|
||||
/// The target-specific post-isel opcode values start here.
|
||||
|
@ -38,10 +38,12 @@
|
||||
#ifndef LLVM_SUPPORT_XXHASH_H
|
||||
#define LLVM_SUPPORT_XXHASH_H
|
||||
|
||||
#include "llvm/ADT/ArrayRef.h"
|
||||
#include "llvm/ADT/StringRef.h"
|
||||
|
||||
namespace llvm {
|
||||
uint64_t xxHash64(llvm::StringRef Data);
|
||||
uint64_t xxHash64(llvm::ArrayRef<uint8_t> Data);
|
||||
}
|
||||
|
||||
#endif
|
||||
|
@ -131,6 +131,13 @@ def G_ADDRSPACE_CAST : GenericInstruction {
|
||||
let InOperandList = (ins type1:$src);
|
||||
let hasSideEffects = 0;
|
||||
}
|
||||
|
||||
def G_BLOCK_ADDR : GenericInstruction {
|
||||
let OutOperandList = (outs type0:$dst);
|
||||
let InOperandList = (ins unknown:$ba);
|
||||
let hasSideEffects = 0;
|
||||
}
|
||||
|
||||
//------------------------------------------------------------------------------
|
||||
// Binary ops.
|
||||
//------------------------------------------------------------------------------
|
||||
|
@ -1,10 +1,10 @@
|
||||
//===- TargetCallingConv.td - Target Calling Conventions ---*- tablegen -*-===//
|
||||
//
|
||||
//
|
||||
// The LLVM Compiler Infrastructure
|
||||
//
|
||||
// This file is distributed under the University of Illinois Open Source
|
||||
// License. See LICENSE.TXT for details.
|
||||
//
|
||||
//
|
||||
//===----------------------------------------------------------------------===//
|
||||
//
|
||||
// This file defines the target-independent interfaces with which targets
|
||||
|
@ -13,7 +13,7 @@
|
||||
// an instruction. Each MCInstPredicate class has a well-known semantic, and it
|
||||
// is used by a PredicateExpander to generate code for MachineInstr and/or
|
||||
// MCInst.
|
||||
//
|
||||
//
|
||||
// MCInstPredicate definitions can be used to construct MCSchedPredicate
|
||||
// definitions. An MCSchedPredicate can be used in place of a SchedPredicate
|
||||
// when defining SchedReadVariant and SchedWriteVariant used by a processor
|
||||
@ -63,7 +63,7 @@
|
||||
//
|
||||
// New MCInstPredicate classes must be added to this file. For each new class
|
||||
// XYZ, an "expandXYZ" method must be added to the PredicateExpander.
|
||||
//
|
||||
//
|
||||
//===----------------------------------------------------------------------===//
|
||||
|
||||
// Forward declarations.
|
||||
|
@ -82,7 +82,7 @@ class SpeculativeExecutionPass
|
||||
bool considerHoistingFromTo(BasicBlock &FromBlock, BasicBlock &ToBlock);
|
||||
|
||||
// If true, this pass is a nop unless the target architecture has branch
|
||||
// divergence.
|
||||
// divergence.
|
||||
const bool OnlyIfDivergentTarget = false;
|
||||
|
||||
TargetTransformInfo *TTI = nullptr;
|
||||
|
@ -74,7 +74,7 @@ class Value;
|
||||
/// vararg functions can be extracted. This is safe, if all vararg handling
|
||||
/// code is extracted, including vastart. If AllowAlloca is true, then
|
||||
/// extraction of blocks containing alloca instructions would be possible,
|
||||
/// however code extractor won't validate whether extraction is legal.
|
||||
/// however code extractor won't validate whether extraction is legal.
|
||||
CodeExtractor(ArrayRef<BasicBlock *> BBs, DominatorTree *DT = nullptr,
|
||||
bool AggregateArgs = false, BlockFrequencyInfo *BFI = nullptr,
|
||||
BranchProbabilityInfo *BPI = nullptr,
|
||||
|
@ -18,7 +18,7 @@
|
||||
#include "llvm/ADT/DenseMap.h"
|
||||
#include "llvm/ADT/StringRef.h"
|
||||
#include "llvm/IR/Attributes.h"
|
||||
#include "llvm/IR/Instructions.h"
|
||||
#include "llvm/IR/Instructions.h"
|
||||
#include "llvm/IR/Operator.h"
|
||||
#include "llvm/IR/ValueMap.h"
|
||||
#include "llvm/Support/AtomicOrdering.h"
|
||||
|
@ -134,7 +134,7 @@ class RewriteSymbolPass : public PassInfoMixin<RewriteSymbolPass> {
|
||||
private:
|
||||
void loadAndParseMapFiles();
|
||||
|
||||
SymbolRewriter::RewriteDescriptorList Descriptors;
|
||||
SymbolRewriter::RewriteDescriptorList Descriptors;
|
||||
};
|
||||
|
||||
} // end namespace llvm
|
||||
|
@ -142,7 +142,7 @@ void AliasSet::addPointer(AliasSetTracker &AST, PointerRec &Entry,
|
||||
Alias = SetMayAlias;
|
||||
AST.TotalMayAliasSetSize += size();
|
||||
} else {
|
||||
// First entry of must alias must have maximum size!
|
||||
// First entry of must alias must have maximum size!
|
||||
P->updateSizeAndAAInfo(Size, AAInfo);
|
||||
}
|
||||
assert(Result != NoAlias && "Cannot be part of must set!");
|
||||
@ -251,9 +251,9 @@ void AliasSetTracker::clear() {
|
||||
for (PointerMapType::iterator I = PointerMap.begin(), E = PointerMap.end();
|
||||
I != E; ++I)
|
||||
I->second->eraseFromList();
|
||||
|
||||
|
||||
PointerMap.clear();
|
||||
|
||||
|
||||
// The alias sets should all be clear now.
|
||||
AliasSets.clear();
|
||||
}
|
||||
@ -269,7 +269,7 @@ AliasSet *AliasSetTracker::mergeAliasSetsForPointer(const Value *Ptr,
|
||||
for (iterator I = begin(), E = end(); I != E;) {
|
||||
iterator Cur = I++;
|
||||
if (Cur->Forward || !Cur->aliasesPointer(Ptr, Size, AAInfo, AA)) continue;
|
||||
|
||||
|
||||
if (!FoundSet) { // If this is the first alias set ptr can go into.
|
||||
FoundSet = &*Cur; // Remember it.
|
||||
} else { // Otherwise, we must merge the sets.
|
||||
@ -336,13 +336,13 @@ AliasSet &AliasSetTracker::getAliasSetForPointer(Value *Pointer,
|
||||
// Return the set!
|
||||
return *Entry.getAliasSet(*this)->getForwardedTarget(*this);
|
||||
}
|
||||
|
||||
|
||||
if (AliasSet *AS = mergeAliasSetsForPointer(Pointer, Size, AAInfo)) {
|
||||
// Add it to the alias set it aliases.
|
||||
AS->addPointer(*this, Entry, Size, AAInfo);
|
||||
return *AS;
|
||||
}
|
||||
|
||||
|
||||
// Otherwise create a new alias set to hold the loaded pointer.
|
||||
AliasSets.push_back(new AliasSet());
|
||||
AliasSets.back().addPointer(*this, Entry, Size, AAInfo);
|
||||
@ -526,10 +526,10 @@ void AliasSetTracker::deleteValue(Value *PtrVal) {
|
||||
AS->SetSize--;
|
||||
TotalMayAliasSetSize--;
|
||||
}
|
||||
|
||||
|
||||
// Stop using the alias set.
|
||||
AS->dropRef(*this);
|
||||
|
||||
|
||||
PointerMap.erase(I);
|
||||
}
|
||||
|
||||
|
@ -28,6 +28,7 @@
|
||||
#include "llvm/Analysis/MemoryLocation.h"
|
||||
#include "llvm/Analysis/TargetLibraryInfo.h"
|
||||
#include "llvm/Analysis/ValueTracking.h"
|
||||
#include "llvm/Analysis/PhiValues.h"
|
||||
#include "llvm/IR/Argument.h"
|
||||
#include "llvm/IR/Attributes.h"
|
||||
#include "llvm/IR/CallSite.h"
|
||||
@ -93,7 +94,8 @@ bool BasicAAResult::invalidate(Function &Fn, const PreservedAnalyses &PA,
|
||||
// depend on them.
|
||||
if (Inv.invalidate<AssumptionAnalysis>(Fn, PA) ||
|
||||
(DT && Inv.invalidate<DominatorTreeAnalysis>(Fn, PA)) ||
|
||||
(LI && Inv.invalidate<LoopAnalysis>(Fn, PA)))
|
||||
(LI && Inv.invalidate<LoopAnalysis>(Fn, PA)) ||
|
||||
(PV && Inv.invalidate<PhiValuesAnalysis>(Fn, PA)))
|
||||
return true;
|
||||
|
||||
// Otherwise this analysis result remains valid.
|
||||
@ -1527,34 +1529,70 @@ AliasResult BasicAAResult::aliasPHI(const PHINode *PN, LocationSize PNSize,
|
||||
return Alias;
|
||||
}
|
||||
|
||||
SmallPtrSet<Value *, 4> UniqueSrc;
|
||||
SmallVector<Value *, 4> V1Srcs;
|
||||
bool isRecursive = false;
|
||||
for (Value *PV1 : PN->incoming_values()) {
|
||||
if (isa<PHINode>(PV1))
|
||||
// If any of the source itself is a PHI, return MayAlias conservatively
|
||||
// to avoid compile time explosion. The worst possible case is if both
|
||||
// sides are PHI nodes. In which case, this is O(m x n) time where 'm'
|
||||
// and 'n' are the number of PHI sources.
|
||||
if (PV) {
|
||||
// If we have PhiValues then use it to get the underlying phi values.
|
||||
const PhiValues::ValueSet &PhiValueSet = PV->getValuesForPhi(PN);
|
||||
// If we have more phi values than the search depth then return MayAlias
|
||||
// conservatively to avoid compile time explosion. The worst possible case
|
||||
// is if both sides are PHI nodes. In which case, this is O(m x n) time
|
||||
// where 'm' and 'n' are the number of PHI sources.
|
||||
if (PhiValueSet.size() > MaxLookupSearchDepth)
|
||||
return MayAlias;
|
||||
|
||||
if (EnableRecPhiAnalysis)
|
||||
if (GEPOperator *PV1GEP = dyn_cast<GEPOperator>(PV1)) {
|
||||
// Check whether the incoming value is a GEP that advances the pointer
|
||||
// result of this PHI node (e.g. in a loop). If this is the case, we
|
||||
// would recurse and always get a MayAlias. Handle this case specially
|
||||
// below.
|
||||
if (PV1GEP->getPointerOperand() == PN && PV1GEP->getNumIndices() == 1 &&
|
||||
isa<ConstantInt>(PV1GEP->idx_begin())) {
|
||||
isRecursive = true;
|
||||
continue;
|
||||
// Add the values to V1Srcs
|
||||
for (Value *PV1 : PhiValueSet) {
|
||||
if (EnableRecPhiAnalysis) {
|
||||
if (GEPOperator *PV1GEP = dyn_cast<GEPOperator>(PV1)) {
|
||||
// Check whether the incoming value is a GEP that advances the pointer
|
||||
// result of this PHI node (e.g. in a loop). If this is the case, we
|
||||
// would recurse and always get a MayAlias. Handle this case specially
|
||||
// below.
|
||||
if (PV1GEP->getPointerOperand() == PN && PV1GEP->getNumIndices() == 1 &&
|
||||
isa<ConstantInt>(PV1GEP->idx_begin())) {
|
||||
isRecursive = true;
|
||||
continue;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if (UniqueSrc.insert(PV1).second)
|
||||
V1Srcs.push_back(PV1);
|
||||
}
|
||||
} else {
|
||||
// If we don't have PhiInfo then just look at the operands of the phi itself
|
||||
// FIXME: Remove this once we can guarantee that we have PhiInfo always
|
||||
SmallPtrSet<Value *, 4> UniqueSrc;
|
||||
for (Value *PV1 : PN->incoming_values()) {
|
||||
if (isa<PHINode>(PV1))
|
||||
// If any of the source itself is a PHI, return MayAlias conservatively
|
||||
// to avoid compile time explosion. The worst possible case is if both
|
||||
// sides are PHI nodes. In which case, this is O(m x n) time where 'm'
|
||||
// and 'n' are the number of PHI sources.
|
||||
return MayAlias;
|
||||
|
||||
if (EnableRecPhiAnalysis)
|
||||
if (GEPOperator *PV1GEP = dyn_cast<GEPOperator>(PV1)) {
|
||||
// Check whether the incoming value is a GEP that advances the pointer
|
||||
// result of this PHI node (e.g. in a loop). If this is the case, we
|
||||
// would recurse and always get a MayAlias. Handle this case specially
|
||||
// below.
|
||||
if (PV1GEP->getPointerOperand() == PN && PV1GEP->getNumIndices() == 1 &&
|
||||
isa<ConstantInt>(PV1GEP->idx_begin())) {
|
||||
isRecursive = true;
|
||||
continue;
|
||||
}
|
||||
}
|
||||
|
||||
if (UniqueSrc.insert(PV1).second)
|
||||
V1Srcs.push_back(PV1);
|
||||
}
|
||||
}
|
||||
|
||||
// If V1Srcs is empty then that means that the phi has no underlying non-phi
|
||||
// value. This should only be possible in blocks unreachable from the entry
|
||||
// block, but return MayAlias just in case.
|
||||
if (V1Srcs.empty())
|
||||
return MayAlias;
|
||||
|
||||
// If this PHI node is recursive, set the size of the accessed memory to
|
||||
// unknown to represent all the possible values the GEP could advance the
|
||||
// pointer to.
|
||||
@ -1879,7 +1917,8 @@ BasicAAResult BasicAA::run(Function &F, FunctionAnalysisManager &AM) {
|
||||
AM.getResult<TargetLibraryAnalysis>(F),
|
||||
AM.getResult<AssumptionAnalysis>(F),
|
||||
&AM.getResult<DominatorTreeAnalysis>(F),
|
||||
AM.getCachedResult<LoopAnalysis>(F));
|
||||
AM.getCachedResult<LoopAnalysis>(F),
|
||||
AM.getCachedResult<PhiValuesAnalysis>(F));
|
||||
}
|
||||
|
||||
BasicAAWrapperPass::BasicAAWrapperPass() : FunctionPass(ID) {
|
||||
@ -1891,12 +1930,12 @@ char BasicAAWrapperPass::ID = 0;
|
||||
void BasicAAWrapperPass::anchor() {}
|
||||
|
||||
INITIALIZE_PASS_BEGIN(BasicAAWrapperPass, "basicaa",
|
||||
"Basic Alias Analysis (stateless AA impl)", true, true)
|
||||
"Basic Alias Analysis (stateless AA impl)", false, true)
|
||||
INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)
|
||||
INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
|
||||
INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)
|
||||
INITIALIZE_PASS_END(BasicAAWrapperPass, "basicaa",
|
||||
"Basic Alias Analysis (stateless AA impl)", true, true)
|
||||
"Basic Alias Analysis (stateless AA impl)", false, true)
|
||||
|
||||
FunctionPass *llvm::createBasicAAWrapperPass() {
|
||||
return new BasicAAWrapperPass();
|
||||
@ -1907,10 +1946,12 @@ bool BasicAAWrapperPass::runOnFunction(Function &F) {
|
||||
auto &TLIWP = getAnalysis<TargetLibraryInfoWrapperPass>();
|
||||
auto &DTWP = getAnalysis<DominatorTreeWrapperPass>();
|
||||
auto *LIWP = getAnalysisIfAvailable<LoopInfoWrapperPass>();
|
||||
auto *PVWP = getAnalysisIfAvailable<PhiValuesWrapperPass>();
|
||||
|
||||
Result.reset(new BasicAAResult(F.getParent()->getDataLayout(), F, TLIWP.getTLI(),
|
||||
ACT.getAssumptionCache(F), &DTWP.getDomTree(),
|
||||
LIWP ? &LIWP->getLoopInfo() : nullptr));
|
||||
LIWP ? &LIWP->getLoopInfo() : nullptr,
|
||||
PVWP ? &PVWP->getResult() : nullptr));
|
||||
|
||||
return false;
|
||||
}
|
||||
@ -1920,6 +1961,7 @@ void BasicAAWrapperPass::getAnalysisUsage(AnalysisUsage &AU) const {
|
||||
AU.addRequired<AssumptionCacheTracker>();
|
||||
AU.addRequired<DominatorTreeWrapperPass>();
|
||||
AU.addRequired<TargetLibraryInfoWrapperPass>();
|
||||
AU.addUsedIfAvailable<PhiValuesWrapperPass>();
|
||||
}
|
||||
|
||||
BasicAAResult llvm::createLegacyPMBasicAAResult(Pass &P, Function &F) {
|
||||
|
@ -124,7 +124,7 @@ namespace {
|
||||
}
|
||||
|
||||
char CFGPrinterLegacyPass::ID = 0;
|
||||
INITIALIZE_PASS(CFGPrinterLegacyPass, "dot-cfg", "Print CFG of function to 'dot' file",
|
||||
INITIALIZE_PASS(CFGPrinterLegacyPass, "dot-cfg", "Print CFG of function to 'dot' file",
|
||||
false, true)
|
||||
|
||||
PreservedAnalyses CFGPrinterPass::run(Function &F,
|
||||
|
@ -166,7 +166,7 @@ void CallGraphNode::print(raw_ostream &OS) const {
|
||||
OS << "Call graph node for function: '" << F->getName() << "'";
|
||||
else
|
||||
OS << "Call graph node <<null function>>";
|
||||
|
||||
|
||||
OS << "<<" << this << ">> #uses=" << getNumReferences() << '\n';
|
||||
|
||||
for (const auto &I : *this) {
|
||||
|
@ -41,7 +41,7 @@ using namespace llvm;
|
||||
|
||||
#define DEBUG_TYPE "cgscc-passmgr"
|
||||
|
||||
static cl::opt<unsigned>
|
||||
static cl::opt<unsigned>
|
||||
MaxIterations("max-cg-scc-iterations", cl::ReallyHidden, cl::init(4));
|
||||
|
||||
STATISTIC(MaxSCCIterations, "Maximum CGSCCPassMgr iterations on one SCC");
|
||||
@ -97,13 +97,13 @@ class CGPassManager : public ModulePass, public PMDataManager {
|
||||
}
|
||||
|
||||
PassManagerType getPassManagerType() const override {
|
||||
return PMT_CallGraphPassManager;
|
||||
return PMT_CallGraphPassManager;
|
||||
}
|
||||
|
||||
|
||||
private:
|
||||
bool RunAllPassesOnSCC(CallGraphSCC &CurSCC, CallGraph &CG,
|
||||
bool &DevirtualizedCall);
|
||||
|
||||
|
||||
bool RunPassOnSCC(Pass *P, CallGraphSCC &CurSCC,
|
||||
CallGraph &CG, bool &CallGraphUpToDate,
|
||||
bool &DevirtualizedCall);
|
||||
@ -142,21 +142,21 @@ bool CGPassManager::RunPassOnSCC(Pass *P, CallGraphSCC &CurSCC,
|
||||
if (EmitICRemark)
|
||||
emitInstrCountChangedRemark(P, M, InstrCount);
|
||||
}
|
||||
|
||||
|
||||
// After the CGSCCPass is done, when assertions are enabled, use
|
||||
// RefreshCallGraph to verify that the callgraph was correctly updated.
|
||||
#ifndef NDEBUG
|
||||
if (Changed)
|
||||
RefreshCallGraph(CurSCC, CG, true);
|
||||
#endif
|
||||
|
||||
|
||||
return Changed;
|
||||
}
|
||||
|
||||
|
||||
assert(PM->getPassManagerType() == PMT_FunctionPassManager &&
|
||||
"Invalid CGPassManager member");
|
||||
FPPassManager *FPP = (FPPassManager*)P;
|
||||
|
||||
|
||||
// Run pass P on all functions in the current SCC.
|
||||
for (CallGraphNode *CGN : CurSCC) {
|
||||
if (Function *F = CGN->getFunction()) {
|
||||
@ -168,7 +168,7 @@ bool CGPassManager::RunPassOnSCC(Pass *P, CallGraphSCC &CurSCC,
|
||||
F->getContext().yield();
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
// The function pass(es) modified the IR, they may have clobbered the
|
||||
// callgraph.
|
||||
if (Changed && CallGraphUpToDate) {
|
||||
@ -199,7 +199,7 @@ bool CGPassManager::RefreshCallGraph(const CallGraphSCC &CurSCC, CallGraph &CG,
|
||||
|
||||
bool MadeChange = false;
|
||||
bool DevirtualizedCall = false;
|
||||
|
||||
|
||||
// Scan all functions in the SCC.
|
||||
unsigned FunctionNo = 0;
|
||||
for (CallGraphSCC::iterator SCCIdx = CurSCC.begin(), E = CurSCC.end();
|
||||
@ -207,14 +207,14 @@ bool CGPassManager::RefreshCallGraph(const CallGraphSCC &CurSCC, CallGraph &CG,
|
||||
CallGraphNode *CGN = *SCCIdx;
|
||||
Function *F = CGN->getFunction();
|
||||
if (!F || F->isDeclaration()) continue;
|
||||
|
||||
|
||||
// Walk the function body looking for call sites. Sync up the call sites in
|
||||
// CGN with those actually in the function.
|
||||
|
||||
// Keep track of the number of direct and indirect calls that were
|
||||
// invalidated and removed.
|
||||
unsigned NumDirectRemoved = 0, NumIndirectRemoved = 0;
|
||||
|
||||
|
||||
// Get the set of call sites currently in the function.
|
||||
for (CallGraphNode::iterator I = CGN->begin(), E = CGN->end(); I != E; ) {
|
||||
// If this call site is null, then the function pass deleted the call
|
||||
@ -226,7 +226,7 @@ bool CGPassManager::RefreshCallGraph(const CallGraphSCC &CurSCC, CallGraph &CG,
|
||||
CallSites.count(I->first) ||
|
||||
|
||||
// If the call edge is not from a call or invoke, or it is a
|
||||
// instrinsic call, then the function pass RAUW'd a call with
|
||||
// instrinsic call, then the function pass RAUW'd a call with
|
||||
// another value. This can happen when constant folding happens
|
||||
// of well known functions etc.
|
||||
!CallSite(I->first) ||
|
||||
@ -236,18 +236,18 @@ bool CGPassManager::RefreshCallGraph(const CallGraphSCC &CurSCC, CallGraph &CG,
|
||||
CallSite(I->first).getCalledFunction()->getIntrinsicID()))) {
|
||||
assert(!CheckingMode &&
|
||||
"CallGraphSCCPass did not update the CallGraph correctly!");
|
||||
|
||||
|
||||
// If this was an indirect call site, count it.
|
||||
if (!I->second->getFunction())
|
||||
++NumIndirectRemoved;
|
||||
else
|
||||
else
|
||||
++NumDirectRemoved;
|
||||
|
||||
|
||||
// Just remove the edge from the set of callees, keep track of whether
|
||||
// I points to the last element of the vector.
|
||||
bool WasLast = I + 1 == E;
|
||||
CGN->removeCallEdge(I);
|
||||
|
||||
|
||||
// If I pointed to the last element of the vector, we have to bail out:
|
||||
// iterator checking rejects comparisons of the resultant pointer with
|
||||
// end.
|
||||
@ -256,10 +256,10 @@ bool CGPassManager::RefreshCallGraph(const CallGraphSCC &CurSCC, CallGraph &CG,
|
||||
E = CGN->end();
|
||||
continue;
|
||||
}
|
||||
|
||||
|
||||
assert(!CallSites.count(I->first) &&
|
||||
"Call site occurs in node multiple times");
|
||||
|
||||
|
||||
CallSite CS(I->first);
|
||||
if (CS) {
|
||||
Function *Callee = CS.getCalledFunction();
|
||||
@ -269,7 +269,7 @@ bool CGPassManager::RefreshCallGraph(const CallGraphSCC &CurSCC, CallGraph &CG,
|
||||
}
|
||||
++I;
|
||||
}
|
||||
|
||||
|
||||
// Loop over all of the instructions in the function, getting the callsites.
|
||||
// Keep track of the number of direct/indirect calls added.
|
||||
unsigned NumDirectAdded = 0, NumIndirectAdded = 0;
|
||||
@ -280,7 +280,7 @@ bool CGPassManager::RefreshCallGraph(const CallGraphSCC &CurSCC, CallGraph &CG,
|
||||
if (!CS) continue;
|
||||
Function *Callee = CS.getCalledFunction();
|
||||
if (Callee && Callee->isIntrinsic()) continue;
|
||||
|
||||
|
||||
// If this call site already existed in the callgraph, just verify it
|
||||
// matches up to expectations and remove it from CallSites.
|
||||
DenseMap<Value*, CallGraphNode*>::iterator ExistingIt =
|
||||
@ -290,11 +290,11 @@ bool CGPassManager::RefreshCallGraph(const CallGraphSCC &CurSCC, CallGraph &CG,
|
||||
|
||||
// Remove from CallSites since we have now seen it.
|
||||
CallSites.erase(ExistingIt);
|
||||
|
||||
|
||||
// Verify that the callee is right.
|
||||
if (ExistingNode->getFunction() == CS.getCalledFunction())
|
||||
continue;
|
||||
|
||||
|
||||
// If we are in checking mode, we are not allowed to actually mutate
|
||||
// the callgraph. If this is a case where we can infer that the
|
||||
// callgraph is less precise than it could be (e.g. an indirect call
|
||||
@ -303,10 +303,10 @@ bool CGPassManager::RefreshCallGraph(const CallGraphSCC &CurSCC, CallGraph &CG,
|
||||
if (CheckingMode && CS.getCalledFunction() &&
|
||||
ExistingNode->getFunction() == nullptr)
|
||||
continue;
|
||||
|
||||
|
||||
assert(!CheckingMode &&
|
||||
"CallGraphSCCPass did not update the CallGraph correctly!");
|
||||
|
||||
|
||||
// If not, we either went from a direct call to indirect, indirect to
|
||||
// direct, or direct to different direct.
|
||||
CallGraphNode *CalleeNode;
|
||||
@ -328,7 +328,7 @@ bool CGPassManager::RefreshCallGraph(const CallGraphSCC &CurSCC, CallGraph &CG,
|
||||
MadeChange = true;
|
||||
continue;
|
||||
}
|
||||
|
||||
|
||||
assert(!CheckingMode &&
|
||||
"CallGraphSCCPass did not update the CallGraph correctly!");
|
||||
|
||||
@ -341,11 +341,11 @@ bool CGPassManager::RefreshCallGraph(const CallGraphSCC &CurSCC, CallGraph &CG,
|
||||
CalleeNode = CG.getCallsExternalNode();
|
||||
++NumIndirectAdded;
|
||||
}
|
||||
|
||||
|
||||
CGN->addCalledFunction(CS, CalleeNode);
|
||||
MadeChange = true;
|
||||
}
|
||||
|
||||
|
||||
// We scanned the old callgraph node, removing invalidated call sites and
|
||||
// then added back newly found call sites. One thing that can happen is
|
||||
// that an old indirect call site was deleted and replaced with a new direct
|
||||
@ -359,13 +359,13 @@ bool CGPassManager::RefreshCallGraph(const CallGraphSCC &CurSCC, CallGraph &CG,
|
||||
if (NumIndirectRemoved > NumIndirectAdded &&
|
||||
NumDirectRemoved < NumDirectAdded)
|
||||
DevirtualizedCall = true;
|
||||
|
||||
|
||||
// After scanning this function, if we still have entries in callsites, then
|
||||
// they are dangling pointers. WeakTrackingVH should save us for this, so
|
||||
// abort if
|
||||
// this happens.
|
||||
assert(CallSites.empty() && "Dangling pointers found in call sites map");
|
||||
|
||||
|
||||
// Periodically do an explicit clear to remove tombstones when processing
|
||||
// large scc's.
|
||||
if ((FunctionNo & 15) == 15)
|
||||
@ -392,7 +392,7 @@ bool CGPassManager::RefreshCallGraph(const CallGraphSCC &CurSCC, CallGraph &CG,
|
||||
bool CGPassManager::RunAllPassesOnSCC(CallGraphSCC &CurSCC, CallGraph &CG,
|
||||
bool &DevirtualizedCall) {
|
||||
bool Changed = false;
|
||||
|
||||
|
||||
// Keep track of whether the callgraph is known to be up-to-date or not.
|
||||
// The CGSSC pass manager runs two types of passes:
|
||||
// CallGraphSCC Passes and other random function passes. Because other
|
||||
@ -406,7 +406,7 @@ bool CGPassManager::RunAllPassesOnSCC(CallGraphSCC &CurSCC, CallGraph &CG,
|
||||
for (unsigned PassNo = 0, e = getNumContainedPasses();
|
||||
PassNo != e; ++PassNo) {
|
||||
Pass *P = getContainedPass(PassNo);
|
||||
|
||||
|
||||
// If we're in -debug-pass=Executions mode, construct the SCC node list,
|
||||
// otherwise avoid constructing this string as it is expensive.
|
||||
if (isPassDebuggingExecutionsOrMore()) {
|
||||
@ -423,23 +423,23 @@ bool CGPassManager::RunAllPassesOnSCC(CallGraphSCC &CurSCC, CallGraph &CG,
|
||||
dumpPassInfo(P, EXECUTION_MSG, ON_CG_MSG, Functions);
|
||||
}
|
||||
dumpRequiredSet(P);
|
||||
|
||||
|
||||
initializeAnalysisImpl(P);
|
||||
|
||||
|
||||
// Actually run this pass on the current SCC.
|
||||
Changed |= RunPassOnSCC(P, CurSCC, CG,
|
||||
CallGraphUpToDate, DevirtualizedCall);
|
||||
|
||||
|
||||
if (Changed)
|
||||
dumpPassInfo(P, MODIFICATION_MSG, ON_CG_MSG, "");
|
||||
dumpPreservedSet(P);
|
||||
|
||||
verifyPreservedAnalysis(P);
|
||||
|
||||
verifyPreservedAnalysis(P);
|
||||
removeNotPreservedAnalysis(P);
|
||||
recordAvailableAnalysis(P);
|
||||
removeDeadPasses(P, "", ON_CG_MSG);
|
||||
}
|
||||
|
||||
|
||||
// If the callgraph was left out of date (because the last pass run was a
|
||||
// functionpass), refresh it before we move on to the next SCC.
|
||||
if (!CallGraphUpToDate)
|
||||
@ -452,7 +452,7 @@ bool CGPassManager::RunAllPassesOnSCC(CallGraphSCC &CurSCC, CallGraph &CG,
|
||||
bool CGPassManager::runOnModule(Module &M) {
|
||||
CallGraph &CG = getAnalysis<CallGraphWrapperPass>().getCallGraph();
|
||||
bool Changed = doInitialization(CG);
|
||||
|
||||
|
||||
// Walk the callgraph in bottom-up SCC order.
|
||||
scc_iterator<CallGraph*> CGI = scc_begin(&CG);
|
||||
|
||||
@ -485,7 +485,7 @@ bool CGPassManager::runOnModule(Module &M) {
|
||||
DevirtualizedCall = false;
|
||||
Changed |= RunAllPassesOnSCC(CurSCC, CG, DevirtualizedCall);
|
||||
} while (Iteration++ < MaxIterations && DevirtualizedCall);
|
||||
|
||||
|
||||
if (DevirtualizedCall)
|
||||
LLVM_DEBUG(dbgs() << " CGSCCPASSMGR: Stopped iteration after "
|
||||
<< Iteration
|
||||
@ -500,7 +500,7 @@ bool CGPassManager::runOnModule(Module &M) {
|
||||
/// Initialize CG
|
||||
bool CGPassManager::doInitialization(CallGraph &CG) {
|
||||
bool Changed = false;
|
||||
for (unsigned i = 0, e = getNumContainedPasses(); i != e; ++i) {
|
||||
for (unsigned i = 0, e = getNumContainedPasses(); i != e; ++i) {
|
||||
if (PMDataManager *PM = getContainedPass(i)->getAsPMDataManager()) {
|
||||
assert(PM->getPassManagerType() == PMT_FunctionPassManager &&
|
||||
"Invalid CGPassManager member");
|
||||
@ -515,7 +515,7 @@ bool CGPassManager::doInitialization(CallGraph &CG) {
|
||||
/// Finalize CG
|
||||
bool CGPassManager::doFinalization(CallGraph &CG) {
|
||||
bool Changed = false;
|
||||
for (unsigned i = 0, e = getNumContainedPasses(); i != e; ++i) {
|
||||
for (unsigned i = 0, e = getNumContainedPasses(); i != e; ++i) {
|
||||
if (PMDataManager *PM = getContainedPass(i)->getAsPMDataManager()) {
|
||||
assert(PM->getPassManagerType() == PMT_FunctionPassManager &&
|
||||
"Invalid CGPassManager member");
|
||||
@ -541,7 +541,7 @@ void CallGraphSCC::ReplaceNode(CallGraphNode *Old, CallGraphNode *New) {
|
||||
Nodes[i] = New;
|
||||
break;
|
||||
}
|
||||
|
||||
|
||||
// Update the active scc_iterator so that it doesn't contain dangling
|
||||
// pointers to the old CallGraphNode.
|
||||
scc_iterator<CallGraph*> *CGI = (scc_iterator<CallGraph*>*)Context;
|
||||
@ -555,18 +555,18 @@ void CallGraphSCC::ReplaceNode(CallGraphNode *Old, CallGraphNode *New) {
|
||||
/// Assign pass manager to manage this pass.
|
||||
void CallGraphSCCPass::assignPassManager(PMStack &PMS,
|
||||
PassManagerType PreferredType) {
|
||||
// Find CGPassManager
|
||||
// Find CGPassManager
|
||||
while (!PMS.empty() &&
|
||||
PMS.top()->getPassManagerType() > PMT_CallGraphPassManager)
|
||||
PMS.pop();
|
||||
|
||||
assert(!PMS.empty() && "Unable to handle Call Graph Pass");
|
||||
CGPassManager *CGP;
|
||||
|
||||
|
||||
if (PMS.top()->getPassManagerType() == PMT_CallGraphPassManager)
|
||||
CGP = (CGPassManager*)PMS.top();
|
||||
else {
|
||||
// Create new Call Graph SCC Pass Manager if it does not exist.
|
||||
// Create new Call Graph SCC Pass Manager if it does not exist.
|
||||
assert(!PMS.empty() && "Unable to create Call Graph Pass Manager");
|
||||
PMDataManager *PMD = PMS.top();
|
||||
|
||||
@ -608,7 +608,7 @@ namespace {
|
||||
class PrintCallGraphPass : public CallGraphSCCPass {
|
||||
std::string Banner;
|
||||
raw_ostream &OS; // raw_ostream to print on.
|
||||
|
||||
|
||||
public:
|
||||
static char ID;
|
||||
|
||||
@ -640,10 +640,10 @@ namespace {
|
||||
}
|
||||
return false;
|
||||
}
|
||||
|
||||
|
||||
StringRef getPassName() const override { return "Print CallGraph IR"; }
|
||||
};
|
||||
|
||||
|
||||
} // end anonymous namespace.
|
||||
|
||||
char PrintCallGraphPass::ID = 0;
|
||||
|
@ -272,7 +272,7 @@ void DemandedBits::performAnalysis() {
|
||||
// Analysis already completed for this function.
|
||||
return;
|
||||
Analyzed = true;
|
||||
|
||||
|
||||
Visited.clear();
|
||||
AliveBits.clear();
|
||||
|
||||
@ -367,7 +367,7 @@ void DemandedBits::performAnalysis() {
|
||||
|
||||
APInt DemandedBits::getDemandedBits(Instruction *I) {
|
||||
performAnalysis();
|
||||
|
||||
|
||||
const DataLayout &DL = I->getModule()->getDataLayout();
|
||||
auto Found = AliveBits.find(I);
|
||||
if (Found != AliveBits.end())
|
||||
|
@ -409,7 +409,7 @@ bool GlobalsAAResult::AnalyzeIndirectGlobalMemory(GlobalVariable *GV) {
|
||||
if (Constant *C = GV->getInitializer())
|
||||
if (!C->isNullValue())
|
||||
return false;
|
||||
|
||||
|
||||
// Walk the user list of the global. If we find anything other than a direct
|
||||
// load or store, bail out.
|
||||
for (User *U : GV->users()) {
|
||||
@ -464,7 +464,7 @@ bool GlobalsAAResult::AnalyzeIndirectGlobalMemory(GlobalVariable *GV) {
|
||||
return true;
|
||||
}
|
||||
|
||||
void GlobalsAAResult::CollectSCCMembership(CallGraph &CG) {
|
||||
void GlobalsAAResult::CollectSCCMembership(CallGraph &CG) {
|
||||
// We do a bottom-up SCC traversal of the call graph. In other words, we
|
||||
// visit all callees before callers (leaf-first).
|
||||
unsigned SCCID = 0;
|
||||
@ -633,7 +633,7 @@ static bool isNonEscapingGlobalNoAliasWithLoad(const GlobalValue *GV,
|
||||
Inputs.push_back(V);
|
||||
do {
|
||||
const Value *Input = Inputs.pop_back_val();
|
||||
|
||||
|
||||
if (isa<GlobalValue>(Input) || isa<Argument>(Input) || isa<CallInst>(Input) ||
|
||||
isa<InvokeInst>(Input))
|
||||
// Arguments to functions or returns from functions are inherently
|
||||
@ -654,7 +654,7 @@ static bool isNonEscapingGlobalNoAliasWithLoad(const GlobalValue *GV,
|
||||
if (auto *LI = dyn_cast<LoadInst>(Input)) {
|
||||
Inputs.push_back(GetUnderlyingObject(LI->getPointerOperand(), DL));
|
||||
continue;
|
||||
}
|
||||
}
|
||||
if (auto *SI = dyn_cast<SelectInst>(Input)) {
|
||||
const Value *LHS = GetUnderlyingObject(SI->getTrueValue(), DL);
|
||||
const Value *RHS = GetUnderlyingObject(SI->getFalseValue(), DL);
|
||||
@ -672,7 +672,7 @@ static bool isNonEscapingGlobalNoAliasWithLoad(const GlobalValue *GV,
|
||||
}
|
||||
continue;
|
||||
}
|
||||
|
||||
|
||||
return false;
|
||||
} while (!Inputs.empty());
|
||||
|
||||
@ -754,7 +754,7 @@ bool GlobalsAAResult::isNonEscapingGlobalNoAlias(const GlobalValue *GV,
|
||||
// non-addr-taken globals.
|
||||
continue;
|
||||
}
|
||||
|
||||
|
||||
// Recurse through a limited number of selects, loads and PHIs. This is an
|
||||
// arbitrary depth of 4, lower numbers could be used to fix compile time
|
||||
// issues if needed, but this is generally expected to be only be important
|
||||
|
@ -65,6 +65,48 @@ static Value *SimplifyCastInst(unsigned, Value *, Type *,
|
||||
static Value *SimplifyGEPInst(Type *, ArrayRef<Value *>, const SimplifyQuery &,
|
||||
unsigned);
|
||||
|
||||
static Value *foldSelectWithBinaryOp(Value *Cond, Value *TrueVal,
|
||||
Value *FalseVal) {
|
||||
BinaryOperator::BinaryOps BinOpCode;
|
||||
if (auto *BO = dyn_cast<BinaryOperator>(Cond))
|
||||
BinOpCode = BO->getOpcode();
|
||||
else
|
||||
return nullptr;
|
||||
|
||||
CmpInst::Predicate ExpectedPred, Pred1, Pred2;
|
||||
if (BinOpCode == BinaryOperator::Or) {
|
||||
ExpectedPred = ICmpInst::ICMP_NE;
|
||||
} else if (BinOpCode == BinaryOperator::And) {
|
||||
ExpectedPred = ICmpInst::ICMP_EQ;
|
||||
} else
|
||||
return nullptr;
|
||||
|
||||
// %A = icmp eq %TV, %FV
|
||||
// %B = icmp eq %X, %Y (and one of these is a select operand)
|
||||
// %C = and %A, %B
|
||||
// %D = select %C, %TV, %FV
|
||||
// -->
|
||||
// %FV
|
||||
|
||||
// %A = icmp ne %TV, %FV
|
||||
// %B = icmp ne %X, %Y (and one of these is a select operand)
|
||||
// %C = or %A, %B
|
||||
// %D = select %C, %TV, %FV
|
||||
// -->
|
||||
// %TV
|
||||
Value *X, *Y;
|
||||
if (!match(Cond, m_c_BinOp(m_c_ICmp(Pred1, m_Specific(TrueVal),
|
||||
m_Specific(FalseVal)),
|
||||
m_ICmp(Pred2, m_Value(X), m_Value(Y)))) ||
|
||||
Pred1 != Pred2 || Pred1 != ExpectedPred)
|
||||
return nullptr;
|
||||
|
||||
if (X == TrueVal || X == FalseVal || Y == TrueVal || Y == FalseVal)
|
||||
return BinOpCode == BinaryOperator::Or ? TrueVal : FalseVal;
|
||||
|
||||
return nullptr;
|
||||
}
|
||||
|
||||
/// For a boolean type or a vector of boolean type, return false or a vector
|
||||
/// with every element false.
|
||||
static Constant *getFalse(Type *Ty) {
|
||||
@ -1283,6 +1325,23 @@ static Value *SimplifyLShrInst(Value *Op0, Value *Op1, bool isExact,
|
||||
if (match(Op0, m_NUWShl(m_Value(X), m_Specific(Op1))))
|
||||
return X;
|
||||
|
||||
// ((X << A) | Y) >> A -> X if effective width of Y is not larger than A.
|
||||
// We can return X as we do in the above case since OR alters no bits in X.
|
||||
// SimplifyDemandedBits in InstCombine can do more general optimization for
|
||||
// bit manipulation. This pattern aims to provide opportunities for other
|
||||
// optimizers by supporting a simple but common case in InstSimplify.
|
||||
Value *Y;
|
||||
const APInt *ShRAmt, *ShLAmt;
|
||||
if (match(Op1, m_APInt(ShRAmt)) &&
|
||||
match(Op0, m_c_Or(m_NUWShl(m_Value(X), m_APInt(ShLAmt)), m_Value(Y))) &&
|
||||
*ShRAmt == *ShLAmt) {
|
||||
const KnownBits YKnown = computeKnownBits(Y, Q.DL, 0, Q.AC, Q.CxtI, Q.DT);
|
||||
const unsigned Width = Op0->getType()->getScalarSizeInBits();
|
||||
const unsigned EffWidthY = Width - YKnown.countMinLeadingZeros();
|
||||
if (EffWidthY <= ShRAmt->getZExtValue())
|
||||
return X;
|
||||
}
|
||||
|
||||
return nullptr;
|
||||
}
|
||||
|
||||
@ -3752,6 +3811,9 @@ static Value *SimplifySelectInst(Value *Cond, Value *TrueVal, Value *FalseVal,
|
||||
simplifySelectWithICmpCond(Cond, TrueVal, FalseVal, Q, MaxRecurse))
|
||||
return V;
|
||||
|
||||
if (Value *V = foldSelectWithBinaryOp(Cond, TrueVal, FalseVal))
|
||||
return V;
|
||||
|
||||
return nullptr;
|
||||
}
|
||||
|
||||
@ -4604,149 +4666,131 @@ static bool maskIsAllZeroOrUndef(Value *Mask) {
|
||||
return true;
|
||||
}
|
||||
|
||||
template <typename IterTy>
|
||||
static Value *SimplifyIntrinsic(Function *F, IterTy ArgBegin, IterTy ArgEnd,
|
||||
const SimplifyQuery &Q, unsigned MaxRecurse) {
|
||||
static Value *simplifyUnaryIntrinsic(Function *F, Value *Op0,
|
||||
const SimplifyQuery &Q) {
|
||||
// Idempotent functions return the same result when called repeatedly.
|
||||
Intrinsic::ID IID = F->getIntrinsicID();
|
||||
unsigned NumOperands = std::distance(ArgBegin, ArgEnd);
|
||||
if (IsIdempotent(IID))
|
||||
if (auto *II = dyn_cast<IntrinsicInst>(Op0))
|
||||
if (II->getIntrinsicID() == IID)
|
||||
return II;
|
||||
|
||||
// Unary Ops
|
||||
if (NumOperands == 1) {
|
||||
// Perform idempotent optimizations
|
||||
if (IsIdempotent(IID)) {
|
||||
if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(*ArgBegin)) {
|
||||
if (II->getIntrinsicID() == IID)
|
||||
return II;
|
||||
}
|
||||
}
|
||||
|
||||
Value *IIOperand = *ArgBegin;
|
||||
Value *X;
|
||||
switch (IID) {
|
||||
case Intrinsic::fabs: {
|
||||
if (SignBitMustBeZero(IIOperand, Q.TLI))
|
||||
return IIOperand;
|
||||
return nullptr;
|
||||
}
|
||||
case Intrinsic::bswap: {
|
||||
// bswap(bswap(x)) -> x
|
||||
if (match(IIOperand, m_BSwap(m_Value(X))))
|
||||
return X;
|
||||
return nullptr;
|
||||
}
|
||||
case Intrinsic::bitreverse: {
|
||||
// bitreverse(bitreverse(x)) -> x
|
||||
if (match(IIOperand, m_BitReverse(m_Value(X))))
|
||||
return X;
|
||||
return nullptr;
|
||||
}
|
||||
case Intrinsic::exp: {
|
||||
// exp(log(x)) -> x
|
||||
if (Q.CxtI->hasAllowReassoc() &&
|
||||
match(IIOperand, m_Intrinsic<Intrinsic::log>(m_Value(X))))
|
||||
return X;
|
||||
return nullptr;
|
||||
}
|
||||
case Intrinsic::exp2: {
|
||||
// exp2(log2(x)) -> x
|
||||
if (Q.CxtI->hasAllowReassoc() &&
|
||||
match(IIOperand, m_Intrinsic<Intrinsic::log2>(m_Value(X))))
|
||||
return X;
|
||||
return nullptr;
|
||||
}
|
||||
case Intrinsic::log: {
|
||||
// log(exp(x)) -> x
|
||||
if (Q.CxtI->hasAllowReassoc() &&
|
||||
match(IIOperand, m_Intrinsic<Intrinsic::exp>(m_Value(X))))
|
||||
return X;
|
||||
return nullptr;
|
||||
}
|
||||
case Intrinsic::log2: {
|
||||
// log2(exp2(x)) -> x
|
||||
if (Q.CxtI->hasAllowReassoc() &&
|
||||
match(IIOperand, m_Intrinsic<Intrinsic::exp2>(m_Value(X)))) {
|
||||
return X;
|
||||
}
|
||||
return nullptr;
|
||||
}
|
||||
default:
|
||||
return nullptr;
|
||||
}
|
||||
Value *X;
|
||||
switch (IID) {
|
||||
case Intrinsic::fabs:
|
||||
if (SignBitMustBeZero(Op0, Q.TLI)) return Op0;
|
||||
break;
|
||||
case Intrinsic::bswap:
|
||||
// bswap(bswap(x)) -> x
|
||||
if (match(Op0, m_BSwap(m_Value(X)))) return X;
|
||||
break;
|
||||
case Intrinsic::bitreverse:
|
||||
// bitreverse(bitreverse(x)) -> x
|
||||
if (match(Op0, m_BitReverse(m_Value(X)))) return X;
|
||||
break;
|
||||
case Intrinsic::exp:
|
||||
// exp(log(x)) -> x
|
||||
if (Q.CxtI->hasAllowReassoc() &&
|
||||
match(Op0, m_Intrinsic<Intrinsic::log>(m_Value(X)))) return X;
|
||||
break;
|
||||
case Intrinsic::exp2:
|
||||
// exp2(log2(x)) -> x
|
||||
if (Q.CxtI->hasAllowReassoc() &&
|
||||
match(Op0, m_Intrinsic<Intrinsic::log2>(m_Value(X)))) return X;
|
||||
break;
|
||||
case Intrinsic::log:
|
||||
// log(exp(x)) -> x
|
||||
if (Q.CxtI->hasAllowReassoc() &&
|
||||
match(Op0, m_Intrinsic<Intrinsic::exp>(m_Value(X)))) return X;
|
||||
break;
|
||||
case Intrinsic::log2:
|
||||
// log2(exp2(x)) -> x
|
||||
if (Q.CxtI->hasAllowReassoc() &&
|
||||
match(Op0, m_Intrinsic<Intrinsic::exp2>(m_Value(X)))) return X;
|
||||
break;
|
||||
default:
|
||||
break;
|
||||
}
|
||||
|
||||
// Binary Ops
|
||||
if (NumOperands == 2) {
|
||||
Value *LHS = *ArgBegin;
|
||||
Value *RHS = *(ArgBegin + 1);
|
||||
Type *ReturnType = F->getReturnType();
|
||||
return nullptr;
|
||||
}
|
||||
|
||||
switch (IID) {
|
||||
case Intrinsic::usub_with_overflow:
|
||||
case Intrinsic::ssub_with_overflow: {
|
||||
// X - X -> { 0, false }
|
||||
if (LHS == RHS)
|
||||
return Constant::getNullValue(ReturnType);
|
||||
|
||||
// X - undef -> undef
|
||||
// undef - X -> undef
|
||||
if (isa<UndefValue>(LHS) || isa<UndefValue>(RHS))
|
||||
return UndefValue::get(ReturnType);
|
||||
|
||||
return nullptr;
|
||||
}
|
||||
case Intrinsic::uadd_with_overflow:
|
||||
case Intrinsic::sadd_with_overflow: {
|
||||
// X + undef -> undef
|
||||
if (isa<UndefValue>(LHS) || isa<UndefValue>(RHS))
|
||||
return UndefValue::get(ReturnType);
|
||||
|
||||
return nullptr;
|
||||
}
|
||||
case Intrinsic::umul_with_overflow:
|
||||
case Intrinsic::smul_with_overflow: {
|
||||
// 0 * X -> { 0, false }
|
||||
// X * 0 -> { 0, false }
|
||||
if (match(LHS, m_Zero()) || match(RHS, m_Zero()))
|
||||
return Constant::getNullValue(ReturnType);
|
||||
|
||||
// undef * X -> { 0, false }
|
||||
// X * undef -> { 0, false }
|
||||
if (match(LHS, m_Undef()) || match(RHS, m_Undef()))
|
||||
return Constant::getNullValue(ReturnType);
|
||||
|
||||
return nullptr;
|
||||
}
|
||||
case Intrinsic::load_relative: {
|
||||
Constant *C0 = dyn_cast<Constant>(LHS);
|
||||
Constant *C1 = dyn_cast<Constant>(RHS);
|
||||
if (C0 && C1)
|
||||
static Value *simplifyBinaryIntrinsic(Function *F, Value *Op0, Value *Op1,
|
||||
const SimplifyQuery &Q) {
|
||||
Intrinsic::ID IID = F->getIntrinsicID();
|
||||
Type *ReturnType = F->getReturnType();
|
||||
switch (IID) {
|
||||
case Intrinsic::usub_with_overflow:
|
||||
case Intrinsic::ssub_with_overflow:
|
||||
// X - X -> { 0, false }
|
||||
if (Op0 == Op1)
|
||||
return Constant::getNullValue(ReturnType);
|
||||
// X - undef -> undef
|
||||
// undef - X -> undef
|
||||
if (isa<UndefValue>(Op0) || isa<UndefValue>(Op1))
|
||||
return UndefValue::get(ReturnType);
|
||||
break;
|
||||
case Intrinsic::uadd_with_overflow:
|
||||
case Intrinsic::sadd_with_overflow:
|
||||
// X + undef -> undef
|
||||
if (isa<UndefValue>(Op0) || isa<UndefValue>(Op1))
|
||||
return UndefValue::get(ReturnType);
|
||||
break;
|
||||
case Intrinsic::umul_with_overflow:
|
||||
case Intrinsic::smul_with_overflow:
|
||||
// 0 * X -> { 0, false }
|
||||
// X * 0 -> { 0, false }
|
||||
if (match(Op0, m_Zero()) || match(Op1, m_Zero()))
|
||||
return Constant::getNullValue(ReturnType);
|
||||
// undef * X -> { 0, false }
|
||||
// X * undef -> { 0, false }
|
||||
if (match(Op0, m_Undef()) || match(Op1, m_Undef()))
|
||||
return Constant::getNullValue(ReturnType);
|
||||
break;
|
||||
case Intrinsic::load_relative:
|
||||
if (auto *C0 = dyn_cast<Constant>(Op0))
|
||||
if (auto *C1 = dyn_cast<Constant>(Op1))
|
||||
return SimplifyRelativeLoad(C0, C1, Q.DL);
|
||||
return nullptr;
|
||||
}
|
||||
case Intrinsic::powi:
|
||||
if (ConstantInt *Power = dyn_cast<ConstantInt>(RHS)) {
|
||||
// powi(x, 0) -> 1.0
|
||||
if (Power->isZero())
|
||||
return ConstantFP::get(LHS->getType(), 1.0);
|
||||
// powi(x, 1) -> x
|
||||
if (Power->isOne())
|
||||
return LHS;
|
||||
}
|
||||
return nullptr;
|
||||
case Intrinsic::maxnum:
|
||||
case Intrinsic::minnum:
|
||||
// If one argument is NaN, return the other argument.
|
||||
if (match(LHS, m_NaN()))
|
||||
return RHS;
|
||||
if (match(RHS, m_NaN()))
|
||||
return LHS;
|
||||
return nullptr;
|
||||
default:
|
||||
return nullptr;
|
||||
break;
|
||||
case Intrinsic::powi:
|
||||
if (auto *Power = dyn_cast<ConstantInt>(Op1)) {
|
||||
// powi(x, 0) -> 1.0
|
||||
if (Power->isZero())
|
||||
return ConstantFP::get(Op0->getType(), 1.0);
|
||||
// powi(x, 1) -> x
|
||||
if (Power->isOne())
|
||||
return Op0;
|
||||
}
|
||||
break;
|
||||
case Intrinsic::maxnum:
|
||||
case Intrinsic::minnum:
|
||||
// If one argument is NaN, return the other argument.
|
||||
if (match(Op0, m_NaN())) return Op1;
|
||||
if (match(Op1, m_NaN())) return Op0;
|
||||
break;
|
||||
default:
|
||||
break;
|
||||
}
|
||||
|
||||
// Simplify calls to llvm.masked.load.*
|
||||
return nullptr;
|
||||
}
|
||||
|
||||
template <typename IterTy>
|
||||
static Value *simplifyIntrinsic(Function *F, IterTy ArgBegin, IterTy ArgEnd,
|
||||
const SimplifyQuery &Q) {
|
||||
// Intrinsics with no operands have some kind of side effect. Don't simplify.
|
||||
unsigned NumOperands = std::distance(ArgBegin, ArgEnd);
|
||||
if (NumOperands == 0)
|
||||
return nullptr;
|
||||
|
||||
Intrinsic::ID IID = F->getIntrinsicID();
|
||||
if (NumOperands == 1)
|
||||
return simplifyUnaryIntrinsic(F, ArgBegin[0], Q);
|
||||
|
||||
if (NumOperands == 2)
|
||||
return simplifyBinaryIntrinsic(F, ArgBegin[0], ArgBegin[1], Q);
|
||||
|
||||
// Handle intrinsics with 3 or more arguments.
|
||||
switch (IID) {
|
||||
case Intrinsic::masked_load: {
|
||||
Value *MaskArg = ArgBegin[2];
|
||||
@ -4756,6 +4800,19 @@ static Value *SimplifyIntrinsic(Function *F, IterTy ArgBegin, IterTy ArgEnd,
|
||||
return PassthruArg;
|
||||
return nullptr;
|
||||
}
|
||||
case Intrinsic::fshl:
|
||||
case Intrinsic::fshr: {
|
||||
Value *ShAmtArg = ArgBegin[2];
|
||||
const APInt *ShAmtC;
|
||||
if (match(ShAmtArg, m_APInt(ShAmtC))) {
|
||||
// If there's effectively no shift, return the 1st arg or 2nd arg.
|
||||
// TODO: For vectors, we could check each element of a non-splat constant.
|
||||
APInt BitWidth = APInt(ShAmtC->getBitWidth(), ShAmtC->getBitWidth());
|
||||
if (ShAmtC->urem(BitWidth).isNullValue())
|
||||
return ArgBegin[IID == Intrinsic::fshl ? 0 : 1];
|
||||
}
|
||||
return nullptr;
|
||||
}
|
||||
default:
|
||||
return nullptr;
|
||||
}
|
||||
@ -4780,7 +4837,7 @@ static Value *SimplifyCall(ImmutableCallSite CS, Value *V, IterTy ArgBegin,
|
||||
return nullptr;
|
||||
|
||||
if (F->isIntrinsic())
|
||||
if (Value *Ret = SimplifyIntrinsic(F, ArgBegin, ArgEnd, Q, MaxRecurse))
|
||||
if (Value *Ret = simplifyIntrinsic(F, ArgBegin, ArgEnd, Q))
|
||||
return Ret;
|
||||
|
||||
if (!canConstantFoldCallTo(CS, F))
|
||||
|
@ -725,7 +725,7 @@ bool LazyValueInfoImpl::solveBlockValueNonLocal(ValueLatticeElement &BBLV,
|
||||
// frequently arranged such that dominating ones come first and we quickly
|
||||
// find a path to function entry. TODO: We should consider explicitly
|
||||
// canonicalizing to make this true rather than relying on this happy
|
||||
// accident.
|
||||
// accident.
|
||||
for (pred_iterator PI = pred_begin(BB), E = pred_end(BB); PI != E; ++PI) {
|
||||
ValueLatticeElement EdgeResult;
|
||||
if (!getEdgeValue(Val, *PI, BB, EdgeResult))
|
||||
|
@ -176,8 +176,8 @@ const SCEV *llvm::replaceSymbolicStrideSCEV(PredicatedScalarEvolution &PSE,
|
||||
|
||||
/// Calculate Start and End points of memory access.
|
||||
/// Let's assume A is the first access and B is a memory access on N-th loop
|
||||
/// iteration. Then B is calculated as:
|
||||
/// B = A + Step*N .
|
||||
/// iteration. Then B is calculated as:
|
||||
/// B = A + Step*N .
|
||||
/// Step value may be positive or negative.
|
||||
/// N is a calculated back-edge taken count:
|
||||
/// N = (TripCount > 0) ? RoundDown(TripCount -1 , VF) : 0
|
||||
@ -1317,7 +1317,7 @@ bool MemoryDepChecker::couldPreventStoreLoadForward(uint64_t Distance,
|
||||
return false;
|
||||
}
|
||||
|
||||
/// Given a non-constant (unknown) dependence-distance \p Dist between two
|
||||
/// Given a non-constant (unknown) dependence-distance \p Dist between two
|
||||
/// memory accesses, that have the same stride whose absolute value is given
|
||||
/// in \p Stride, and that have the same type size \p TypeByteSize,
|
||||
/// in a loop whose takenCount is \p BackedgeTakenCount, check if it is
|
||||
@ -1336,19 +1336,19 @@ static bool isSafeDependenceDistance(const DataLayout &DL, ScalarEvolution &SE,
|
||||
|
||||
// If we can prove that
|
||||
// (**) |Dist| > BackedgeTakenCount * Step
|
||||
// where Step is the absolute stride of the memory accesses in bytes,
|
||||
// where Step is the absolute stride of the memory accesses in bytes,
|
||||
// then there is no dependence.
|
||||
//
|
||||
// Ratioanle:
|
||||
// We basically want to check if the absolute distance (|Dist/Step|)
|
||||
// is >= the loop iteration count (or > BackedgeTakenCount).
|
||||
// This is equivalent to the Strong SIV Test (Practical Dependence Testing,
|
||||
// Section 4.2.1); Note, that for vectorization it is sufficient to prove
|
||||
// Ratioanle:
|
||||
// We basically want to check if the absolute distance (|Dist/Step|)
|
||||
// is >= the loop iteration count (or > BackedgeTakenCount).
|
||||
// This is equivalent to the Strong SIV Test (Practical Dependence Testing,
|
||||
// Section 4.2.1); Note, that for vectorization it is sufficient to prove
|
||||
// that the dependence distance is >= VF; This is checked elsewhere.
|
||||
// But in some cases we can prune unknown dependence distances early, and
|
||||
// even before selecting the VF, and without a runtime test, by comparing
|
||||
// the distance against the loop iteration count. Since the vectorized code
|
||||
// will be executed only if LoopCount >= VF, proving distance >= LoopCount
|
||||
// But in some cases we can prune unknown dependence distances early, and
|
||||
// even before selecting the VF, and without a runtime test, by comparing
|
||||
// the distance against the loop iteration count. Since the vectorized code
|
||||
// will be executed only if LoopCount >= VF, proving distance >= LoopCount
|
||||
// also guarantees that distance >= VF.
|
||||
//
|
||||
const uint64_t ByteStride = Stride * TypeByteSize;
|
||||
@ -1360,8 +1360,8 @@ static bool isSafeDependenceDistance(const DataLayout &DL, ScalarEvolution &SE,
|
||||
uint64_t DistTypeSize = DL.getTypeAllocSize(Dist.getType());
|
||||
uint64_t ProductTypeSize = DL.getTypeAllocSize(Product->getType());
|
||||
|
||||
// The dependence distance can be positive/negative, so we sign extend Dist;
|
||||
// The multiplication of the absolute stride in bytes and the
|
||||
// The dependence distance can be positive/negative, so we sign extend Dist;
|
||||
// The multiplication of the absolute stride in bytes and the
|
||||
// backdgeTakenCount is non-negative, so we zero extend Product.
|
||||
if (DistTypeSize > ProductTypeSize)
|
||||
CastedProduct = SE.getZeroExtendExpr(Product, Dist.getType());
|
||||
@ -2212,24 +2212,24 @@ void LoopAccessInfo::collectStridedAccess(Value *MemAccess) {
|
||||
"versioning:");
|
||||
LLVM_DEBUG(dbgs() << " Ptr: " << *Ptr << " Stride: " << *Stride << "\n");
|
||||
|
||||
// Avoid adding the "Stride == 1" predicate when we know that
|
||||
// Avoid adding the "Stride == 1" predicate when we know that
|
||||
// Stride >= Trip-Count. Such a predicate will effectively optimize a single
|
||||
// or zero iteration loop, as Trip-Count <= Stride == 1.
|
||||
//
|
||||
//
|
||||
// TODO: We are currently not making a very informed decision on when it is
|
||||
// beneficial to apply stride versioning. It might make more sense that the
|
||||
// users of this analysis (such as the vectorizer) will trigger it, based on
|
||||
// their specific cost considerations; For example, in cases where stride
|
||||
// users of this analysis (such as the vectorizer) will trigger it, based on
|
||||
// their specific cost considerations; For example, in cases where stride
|
||||
// versioning does not help resolving memory accesses/dependences, the
|
||||
// vectorizer should evaluate the cost of the runtime test, and the benefit
|
||||
// of various possible stride specializations, considering the alternatives
|
||||
// of using gather/scatters (if available).
|
||||
|
||||
// vectorizer should evaluate the cost of the runtime test, and the benefit
|
||||
// of various possible stride specializations, considering the alternatives
|
||||
// of using gather/scatters (if available).
|
||||
|
||||
const SCEV *StrideExpr = PSE->getSCEV(Stride);
|
||||
const SCEV *BETakenCount = PSE->getBackedgeTakenCount();
|
||||
const SCEV *BETakenCount = PSE->getBackedgeTakenCount();
|
||||
|
||||
// Match the types so we can compare the stride and the BETakenCount.
|
||||
// The Stride can be positive/negative, so we sign extend Stride;
|
||||
// The Stride can be positive/negative, so we sign extend Stride;
|
||||
// The backdgeTakenCount is non-negative, so we zero extend BETakenCount.
|
||||
const DataLayout &DL = TheLoop->getHeader()->getModule()->getDataLayout();
|
||||
uint64_t StrideTypeSize = DL.getTypeAllocSize(StrideExpr->getType());
|
||||
@ -2243,7 +2243,7 @@ void LoopAccessInfo::collectStridedAccess(Value *MemAccess) {
|
||||
CastedBECount = SE->getZeroExtendExpr(BETakenCount, StrideExpr->getType());
|
||||
const SCEV *StrideMinusBETaken = SE->getMinusSCEV(CastedStride, CastedBECount);
|
||||
// Since TripCount == BackEdgeTakenCount + 1, checking:
|
||||
// "Stride >= TripCount" is equivalent to checking:
|
||||
// "Stride >= TripCount" is equivalent to checking:
|
||||
// Stride - BETakenCount > 0
|
||||
if (SE->isKnownPositive(StrideMinusBETaken)) {
|
||||
LLVM_DEBUG(
|
||||
|
@ -118,7 +118,7 @@ bool MemDepPrinter::runOnFunction(Function &F) {
|
||||
} else {
|
||||
SmallVector<NonLocalDepResult, 4> NLDI;
|
||||
assert( (isa<LoadInst>(Inst) || isa<StoreInst>(Inst) ||
|
||||
isa<VAArgInst>(Inst)) && "Unknown memory instruction!");
|
||||
isa<VAArgInst>(Inst)) && "Unknown memory instruction!");
|
||||
MDA.getNonLocalPointerDependency(Inst, NLDI);
|
||||
|
||||
DepSet &InstDeps = Deps[Inst];
|
||||
|
@ -26,6 +26,7 @@
|
||||
#include "llvm/Analysis/MemoryLocation.h"
|
||||
#include "llvm/Analysis/OrderedBasicBlock.h"
|
||||
#include "llvm/Analysis/PHITransAddr.h"
|
||||
#include "llvm/Analysis/PhiValues.h"
|
||||
#include "llvm/Analysis/TargetLibraryInfo.h"
|
||||
#include "llvm/Analysis/ValueTracking.h"
|
||||
#include "llvm/IR/Attributes.h"
|
||||
@ -1513,6 +1514,8 @@ void MemoryDependenceResults::invalidateCachedPointerInfo(Value *Ptr) {
|
||||
RemoveCachedNonLocalPointerDependencies(ValueIsLoadPair(Ptr, false));
|
||||
// Flush load info for the pointer.
|
||||
RemoveCachedNonLocalPointerDependencies(ValueIsLoadPair(Ptr, true));
|
||||
// Invalidate phis that use the pointer.
|
||||
PV.invalidateValue(Ptr);
|
||||
}
|
||||
|
||||
void MemoryDependenceResults::invalidateCachedPredecessors() {
|
||||
@ -1671,6 +1674,9 @@ void MemoryDependenceResults::removeInstruction(Instruction *RemInst) {
|
||||
}
|
||||
}
|
||||
|
||||
// Invalidate phis that use the removed instruction.
|
||||
PV.invalidateValue(RemInst);
|
||||
|
||||
assert(!NonLocalDeps.count(RemInst) && "RemInst got reinserted?");
|
||||
LLVM_DEBUG(verifyRemoved(RemInst));
|
||||
}
|
||||
@ -1730,7 +1736,8 @@ MemoryDependenceAnalysis::run(Function &F, FunctionAnalysisManager &AM) {
|
||||
auto &AC = AM.getResult<AssumptionAnalysis>(F);
|
||||
auto &TLI = AM.getResult<TargetLibraryAnalysis>(F);
|
||||
auto &DT = AM.getResult<DominatorTreeAnalysis>(F);
|
||||
return MemoryDependenceResults(AA, AC, TLI, DT);
|
||||
auto &PV = AM.getResult<PhiValuesAnalysis>(F);
|
||||
return MemoryDependenceResults(AA, AC, TLI, DT, PV);
|
||||
}
|
||||
|
||||
char MemoryDependenceWrapperPass::ID = 0;
|
||||
@ -1741,6 +1748,7 @@ INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)
|
||||
INITIALIZE_PASS_DEPENDENCY(AAResultsWrapperPass)
|
||||
INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
|
||||
INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)
|
||||
INITIALIZE_PASS_DEPENDENCY(PhiValuesWrapperPass)
|
||||
INITIALIZE_PASS_END(MemoryDependenceWrapperPass, "memdep",
|
||||
"Memory Dependence Analysis", false, true)
|
||||
|
||||
@ -1758,6 +1766,7 @@ void MemoryDependenceWrapperPass::getAnalysisUsage(AnalysisUsage &AU) const {
|
||||
AU.setPreservesAll();
|
||||
AU.addRequired<AssumptionCacheTracker>();
|
||||
AU.addRequired<DominatorTreeWrapperPass>();
|
||||
AU.addRequired<PhiValuesWrapperPass>();
|
||||
AU.addRequiredTransitive<AAResultsWrapperPass>();
|
||||
AU.addRequiredTransitive<TargetLibraryInfoWrapperPass>();
|
||||
}
|
||||
@ -1773,7 +1782,8 @@ bool MemoryDependenceResults::invalidate(Function &F, const PreservedAnalyses &P
|
||||
// Check whether the analyses we depend on became invalid for any reason.
|
||||
if (Inv.invalidate<AAManager>(F, PA) ||
|
||||
Inv.invalidate<AssumptionAnalysis>(F, PA) ||
|
||||
Inv.invalidate<DominatorTreeAnalysis>(F, PA))
|
||||
Inv.invalidate<DominatorTreeAnalysis>(F, PA) ||
|
||||
Inv.invalidate<PhiValuesAnalysis>(F, PA))
|
||||
return true;
|
||||
|
||||
// Otherwise this analysis result remains valid.
|
||||
@ -1789,6 +1799,7 @@ bool MemoryDependenceWrapperPass::runOnFunction(Function &F) {
|
||||
auto &AC = getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F);
|
||||
auto &TLI = getAnalysis<TargetLibraryInfoWrapperPass>().getTLI();
|
||||
auto &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();
|
||||
MemDep.emplace(AA, AC, TLI, DT);
|
||||
auto &PV = getAnalysis<PhiValuesWrapperPass>().getResult();
|
||||
MemDep.emplace(AA, AC, TLI, DT, PV);
|
||||
return false;
|
||||
}
|
||||
|
@ -235,7 +235,7 @@ class MustExecuteAnnotatedWriter : public AssemblyAnnotationWriter {
|
||||
}
|
||||
|
||||
|
||||
void printInfoComment(const Value &V, formatted_raw_ostream &OS) override {
|
||||
void printInfoComment(const Value &V, formatted_raw_ostream &OS) override {
|
||||
if (!MustExec.count(&V))
|
||||
return;
|
||||
|
||||
@ -245,7 +245,7 @@ class MustExecuteAnnotatedWriter : public AssemblyAnnotationWriter {
|
||||
OS << " ; (mustexec in " << NumLoops << " loops: ";
|
||||
else
|
||||
OS << " ; (mustexec in: ";
|
||||
|
||||
|
||||
bool first = true;
|
||||
for (const Loop *L : Loops) {
|
||||
if (!first)
|
||||
@ -264,6 +264,6 @@ bool MustExecutePrinter::runOnFunction(Function &F) {
|
||||
|
||||
MustExecuteAnnotatedWriter Writer(F, DT, LI);
|
||||
F.print(dbgs(), &Writer);
|
||||
|
||||
|
||||
return false;
|
||||
}
|
||||
|
@ -4839,7 +4839,7 @@ ScalarEvolution::createAddRecFromPHIWithCastsImpl(const SCEVUnknown *SymbolicPHI
|
||||
|
||||
// Construct the extended SCEV: (Ext ix (Trunc iy (Expr) to ix) to iy)
|
||||
// for each of StartVal and Accum
|
||||
auto getExtendedExpr = [&](const SCEV *Expr,
|
||||
auto getExtendedExpr = [&](const SCEV *Expr,
|
||||
bool CreateSignExtend) -> const SCEV * {
|
||||
assert(isLoopInvariant(Expr, L) && "Expr is expected to be invariant");
|
||||
const SCEV *TruncatedExpr = getTruncateExpr(Expr, TruncTy);
|
||||
@ -4935,11 +4935,11 @@ ScalarEvolution::createAddRecFromPHIWithCasts(const SCEVUnknown *SymbolicPHI) {
|
||||
return Rewrite;
|
||||
}
|
||||
|
||||
// FIXME: This utility is currently required because the Rewriter currently
|
||||
// does not rewrite this expression:
|
||||
// {0, +, (sext ix (trunc iy to ix) to iy)}
|
||||
// FIXME: This utility is currently required because the Rewriter currently
|
||||
// does not rewrite this expression:
|
||||
// {0, +, (sext ix (trunc iy to ix) to iy)}
|
||||
// into {0, +, %step},
|
||||
// even when the following Equal predicate exists:
|
||||
// even when the following Equal predicate exists:
|
||||
// "%step == (sext ix (trunc iy to ix) to iy)".
|
||||
bool PredicatedScalarEvolution::areAddRecsEqualWithPreds(
|
||||
const SCEVAddRecExpr *AR1, const SCEVAddRecExpr *AR2) const {
|
||||
|
@ -721,7 +721,7 @@ struct ReductionData {
|
||||
static Optional<ReductionData> getReductionData(Instruction *I) {
|
||||
Value *L, *R;
|
||||
if (m_BinOp(m_Value(L), m_Value(R)).match(I))
|
||||
return ReductionData(RK_Arithmetic, I->getOpcode(), L, R);
|
||||
return ReductionData(RK_Arithmetic, I->getOpcode(), L, R);
|
||||
if (auto *SI = dyn_cast<SelectInst>(I)) {
|
||||
if (m_SMin(m_Value(L), m_Value(R)).match(SI) ||
|
||||
m_SMax(m_Value(L), m_Value(R)).match(SI) ||
|
||||
@ -730,8 +730,8 @@ static Optional<ReductionData> getReductionData(Instruction *I) {
|
||||
m_UnordFMin(m_Value(L), m_Value(R)).match(SI) ||
|
||||
m_UnordFMax(m_Value(L), m_Value(R)).match(SI)) {
|
||||
auto *CI = cast<CmpInst>(SI->getCondition());
|
||||
return ReductionData(RK_MinMax, CI->getOpcode(), L, R);
|
||||
}
|
||||
return ReductionData(RK_MinMax, CI->getOpcode(), L, R);
|
||||
}
|
||||
if (m_UMin(m_Value(L), m_Value(R)).match(SI) ||
|
||||
m_UMax(m_Value(L), m_Value(R)).match(SI)) {
|
||||
auto *CI = cast<CmpInst>(SI->getCondition());
|
||||
@ -851,11 +851,11 @@ static ReductionKind matchPairwiseReduction(const ExtractElementInst *ReduxRoot,
|
||||
|
||||
// We look for a sequence of shuffle,shuffle,add triples like the following
|
||||
// that builds a pairwise reduction tree.
|
||||
//
|
||||
//
|
||||
// (X0, X1, X2, X3)
|
||||
// (X0 + X1, X2 + X3, undef, undef)
|
||||
// ((X0 + X1) + (X2 + X3), undef, undef, undef)
|
||||
//
|
||||
//
|
||||
// %rdx.shuf.0.0 = shufflevector <4 x float> %rdx, <4 x float> undef,
|
||||
// <4 x i32> <i32 0, i32 2 , i32 undef, i32 undef>
|
||||
// %rdx.shuf.0.1 = shufflevector <4 x float> %rdx, <4 x float> undef,
|
||||
@ -916,7 +916,7 @@ matchVectorSplittingReduction(const ExtractElementInst *ReduxRoot,
|
||||
|
||||
// We look for a sequence of shuffles and adds like the following matching one
|
||||
// fadd, shuffle vector pair at a time.
|
||||
//
|
||||
//
|
||||
// %rdx.shuf = shufflevector <4 x float> %rdx, <4 x float> undef,
|
||||
// <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
|
||||
// %bin.rdx = fadd <4 x float> %rdx, %rdx.shuf
|
||||
@ -927,7 +927,7 @@ matchVectorSplittingReduction(const ExtractElementInst *ReduxRoot,
|
||||
|
||||
unsigned MaskStart = 1;
|
||||
Instruction *RdxOp = RdxStart;
|
||||
SmallVector<int, 32> ShuffleMask(NumVecElems, 0);
|
||||
SmallVector<int, 32> ShuffleMask(NumVecElems, 0);
|
||||
unsigned NumVecElemsRemain = NumVecElems;
|
||||
while (NumVecElemsRemain - 1) {
|
||||
// Check for the right reduction operation.
|
||||
@ -1093,7 +1093,7 @@ int TargetTransformInfo::getInstructionThroughput(const Instruction *I) const {
|
||||
case Instruction::InsertElement: {
|
||||
const InsertElementInst * IE = cast<InsertElementInst>(I);
|
||||
ConstantInt *CI = dyn_cast<ConstantInt>(IE->getOperand(2));
|
||||
unsigned Idx = -1;
|
||||
unsigned Idx = -1;
|
||||
if (CI)
|
||||
Idx = CI->getZExtValue();
|
||||
return getVectorInstrCost(I->getOpcode(),
|
||||
@ -1104,7 +1104,7 @@ int TargetTransformInfo::getInstructionThroughput(const Instruction *I) const {
|
||||
// TODO: Identify and add costs for insert/extract subvector, etc.
|
||||
if (Shuffle->changesLength())
|
||||
return -1;
|
||||
|
||||
|
||||
if (Shuffle->isIdentity())
|
||||
return 0;
|
||||
|
||||
|
@ -71,7 +71,7 @@
|
||||
#include <cassert>
|
||||
#include <cstdint>
|
||||
#include <iterator>
|
||||
#include <utility>
|
||||
#include <utility>
|
||||
|
||||
using namespace llvm;
|
||||
using namespace llvm::PatternMatch;
|
||||
@ -3828,7 +3828,7 @@ static bool checkRippleForSignedAdd(const KnownBits &LHSKnown,
|
||||
|
||||
// If either of the values is known to be non-negative, adding them can only
|
||||
// overflow if the second is also non-negative, so we can assume that.
|
||||
// Two non-negative numbers will only overflow if there is a carry to the
|
||||
// Two non-negative numbers will only overflow if there is a carry to the
|
||||
// sign bit, so we can check if even when the values are as big as possible
|
||||
// there is no overflow to the sign bit.
|
||||
if (LHSKnown.isNonNegative() || RHSKnown.isNonNegative()) {
|
||||
@ -3855,7 +3855,7 @@ static bool checkRippleForSignedAdd(const KnownBits &LHSKnown,
|
||||
}
|
||||
|
||||
// If we reached here it means that we know nothing about the sign bits.
|
||||
// In this case we can't know if there will be an overflow, since by
|
||||
// In this case we can't know if there will be an overflow, since by
|
||||
// changing the sign bits any two values can be made to overflow.
|
||||
return false;
|
||||
}
|
||||
@ -3905,7 +3905,7 @@ static OverflowResult computeOverflowForSignedAdd(const Value *LHS,
|
||||
// operands.
|
||||
bool LHSOrRHSKnownNonNegative =
|
||||
(LHSKnown.isNonNegative() || RHSKnown.isNonNegative());
|
||||
bool LHSOrRHSKnownNegative =
|
||||
bool LHSOrRHSKnownNegative =
|
||||
(LHSKnown.isNegative() || RHSKnown.isNegative());
|
||||
if (LHSOrRHSKnownNonNegative || LHSOrRHSKnownNegative) {
|
||||
KnownBits AddKnown = computeKnownBits(Add, DL, /*Depth=*/0, AC, CxtI, DT);
|
||||
@ -4454,7 +4454,7 @@ static SelectPatternResult matchMinMax(CmpInst::Predicate Pred,
|
||||
SPR = matchMinMaxOfMinMax(Pred, CmpLHS, CmpRHS, TrueVal, FalseVal, Depth);
|
||||
if (SPR.Flavor != SelectPatternFlavor::SPF_UNKNOWN)
|
||||
return SPR;
|
||||
|
||||
|
||||
if (Pred != CmpInst::ICMP_SGT && Pred != CmpInst::ICMP_SLT)
|
||||
return {SPF_UNKNOWN, SPNB_NA, false};
|
||||
|
||||
@ -4630,7 +4630,7 @@ static SelectPatternResult matchSelectPattern(CmpInst::Predicate Pred,
|
||||
case FCmpInst::FCMP_OLE: return {SPF_FMINNUM, NaNBehavior, Ordered};
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
if (isKnownNegation(TrueVal, FalseVal)) {
|
||||
// Sign-extending LHS does not change its sign, so TrueVal/FalseVal can
|
||||
// match against either LHS or sext(LHS).
|
||||
|
@ -842,7 +842,7 @@ static void maybeSetDSOLocal(bool DSOLocal, GlobalValue &GV) {
|
||||
}
|
||||
|
||||
/// parseIndirectSymbol:
|
||||
/// ::= GlobalVar '=' OptionalLinkage OptionalPreemptionSpecifier
|
||||
/// ::= GlobalVar '=' OptionalLinkage OptionalPreemptionSpecifier
|
||||
/// OptionalVisibility OptionalDLLStorageClass
|
||||
/// OptionalThreadLocal OptionalUnnamedAddr
|
||||
// 'alias|ifunc' IndirectSymbol
|
||||
@ -3935,7 +3935,7 @@ bool LLParser::ParseMDField(LocTy Loc, StringRef Name, EmissionKindField &Result
|
||||
Lex.Lex();
|
||||
return false;
|
||||
}
|
||||
|
||||
|
||||
template <>
|
||||
bool LLParser::ParseMDField(LocTy Loc, StringRef Name,
|
||||
DwarfAttEncodingField &Result) {
|
||||
|
@ -3809,7 +3809,7 @@ void IndexBitcodeWriter::writeCombinedGlobalValueSummary() {
|
||||
continue;
|
||||
// The mapping from OriginalId to GUID may return a GUID
|
||||
// that corresponds to a static variable. Filter it out here.
|
||||
// This can happen when
|
||||
// This can happen when
|
||||
// 1) There is a call to a library function which does not have
|
||||
// a CallValidId;
|
||||
// 2) There is a static variable with the OriginalGUID identical
|
||||
|
@ -46,7 +46,7 @@ class LLVM_LIBRARY_VISIBILITY AntiDepBreaker {
|
||||
MachineBasicBlock::iterator End,
|
||||
unsigned InsertPosIndex,
|
||||
DbgValueVector &DbgValues) = 0;
|
||||
|
||||
|
||||
/// Update liveness information to account for the current
|
||||
/// instruction, which will not be scheduled.
|
||||
virtual void Observe(MachineInstr &MI, unsigned Count,
|
||||
|
@ -24,8 +24,26 @@ unsigned AddressPool::getIndex(const MCSymbol *Sym, bool TLS) {
|
||||
return IterBool.first->second.Number;
|
||||
}
|
||||
|
||||
|
||||
void AddressPool::emitHeader(AsmPrinter &Asm, MCSection *Section) {
|
||||
static const uint8_t AddrSize = Asm.getDataLayout().getPointerSize();
|
||||
Asm.OutStreamer->SwitchSection(Section);
|
||||
|
||||
uint64_t Length = sizeof(uint16_t) // version
|
||||
+ sizeof(uint8_t) // address_size
|
||||
+ sizeof(uint8_t) // segment_selector_size
|
||||
+ AddrSize * Pool.size(); // entries
|
||||
Asm.emitInt32(Length); // TODO: Support DWARF64 format.
|
||||
Asm.emitInt16(Asm.getDwarfVersion());
|
||||
Asm.emitInt8(AddrSize);
|
||||
Asm.emitInt8(0); // TODO: Support non-zero segment_selector_size.
|
||||
}
|
||||
|
||||
// Emit addresses into the section given.
|
||||
void AddressPool::emit(AsmPrinter &Asm, MCSection *AddrSection) {
|
||||
if (Asm.getDwarfVersion() >= 5)
|
||||
emitHeader(Asm, AddrSection);
|
||||
|
||||
if (Pool.empty())
|
||||
return;
|
||||
|
||||
|
@ -50,6 +50,9 @@ class AddressPool {
|
||||
bool hasBeenUsed() const { return HasBeenUsed; }
|
||||
|
||||
void resetUsedFlag() { HasBeenUsed = false; }
|
||||
|
||||
private:
|
||||
void emitHeader(AsmPrinter &Asm, MCSection *Section);
|
||||
};
|
||||
|
||||
} // end namespace llvm
|
||||
|
@ -364,7 +364,9 @@ DwarfDebug::DwarfDebug(AsmPrinter *A, Module *M)
|
||||
else
|
||||
UseSectionsAsReferences = DwarfSectionsAsReferences == Enable;
|
||||
|
||||
GenerateTypeUnits = GenerateDwarfTypeUnits;
|
||||
// Don't generate type units for unsupported object file formats.
|
||||
GenerateTypeUnits =
|
||||
A->TM.getTargetTriple().isOSBinFormatELF() && GenerateDwarfTypeUnits;
|
||||
|
||||
TheAccelTableKind = computeAccelTableKind(
|
||||
DwarfVersion, GenerateTypeUnits, DebuggerTuning, A->TM.getTargetTriple());
|
||||
@ -886,8 +888,7 @@ void DwarfDebug::endModule() {
|
||||
emitDebugInfoDWO();
|
||||
emitDebugAbbrevDWO();
|
||||
emitDebugLineDWO();
|
||||
// Emit DWO addresses.
|
||||
AddrPool.emit(*Asm, Asm->getObjFileLowering().getDwarfAddrSection());
|
||||
emitDebugAddr();
|
||||
}
|
||||
|
||||
// Emit info into the dwarf accelerator table sections.
|
||||
@ -2136,7 +2137,7 @@ void DwarfDebug::emitDebugRanges() {
|
||||
return;
|
||||
}
|
||||
|
||||
if (getDwarfVersion() >= 5 && NoRangesPresent())
|
||||
if (NoRangesPresent())
|
||||
return;
|
||||
|
||||
// Start the dwarf ranges section.
|
||||
@ -2297,6 +2298,12 @@ void DwarfDebug::emitDebugStrDWO() {
|
||||
OffSec, /* UseRelativeOffsets = */ false);
|
||||
}
|
||||
|
||||
// Emit DWO addresses.
|
||||
void DwarfDebug::emitDebugAddr() {
|
||||
assert(useSplitDwarf() && "No split dwarf?");
|
||||
AddrPool.emit(*Asm, Asm->getObjFileLowering().getDwarfAddrSection());
|
||||
}
|
||||
|
||||
MCDwarfDwoLineTable *DwarfDebug::getDwoLineTable(const DwarfCompileUnit &CU) {
|
||||
if (!useSplitDwarf())
|
||||
return nullptr;
|
||||
|
@ -447,6 +447,9 @@ class DwarfDebug : public DebugHandlerBase {
|
||||
/// Emit the debug str dwo section.
|
||||
void emitDebugStrDWO();
|
||||
|
||||
/// Emit DWO addresses.
|
||||
void emitDebugAddr();
|
||||
|
||||
/// Flags to let the linker know we have emitted new style pubnames. Only
|
||||
/// emit it here if we don't have a skeleton CU for split dwarf.
|
||||
void addGnuPubAttributes(DwarfCompileUnit &U, DIE &D) const;
|
||||
|
@ -112,7 +112,7 @@ class DwarfExpression {
|
||||
uint64_t OffsetInBits = 0;
|
||||
unsigned DwarfVersion;
|
||||
|
||||
/// Sometimes we need to add a DW_OP_bit_piece to describe a subregister.
|
||||
/// Sometimes we need to add a DW_OP_bit_piece to describe a subregister.
|
||||
unsigned SubRegisterSizeInBits = 0;
|
||||
unsigned SubRegisterOffsetInBits = 0;
|
||||
|
||||
|
@ -95,6 +95,6 @@ bool DwarfFile::addScopeVariable(LexicalScope *LS, DbgVariable *Var) {
|
||||
}
|
||||
} else {
|
||||
ScopeVars.Locals.push_back(Var);
|
||||
}
|
||||
}
|
||||
return true;
|
||||
}
|
||||
|
@ -1182,7 +1182,7 @@ DIE *DwarfUnit::getOrCreateModule(const DIModule *M) {
|
||||
addString(MDie, dwarf::DW_AT_LLVM_include_path, M->getIncludePath());
|
||||
if (!M->getISysRoot().empty())
|
||||
addString(MDie, dwarf::DW_AT_LLVM_isysroot, M->getISysRoot());
|
||||
|
||||
|
||||
return &MDie;
|
||||
}
|
||||
|
||||
@ -1691,7 +1691,7 @@ void DwarfUnit::emitCommonHeader(bool UseOffsets, dwarf::UnitType UT) {
|
||||
}
|
||||
|
||||
void DwarfTypeUnit::emitHeader(bool UseOffsets) {
|
||||
DwarfUnit::emitCommonHeader(UseOffsets,
|
||||
DwarfUnit::emitCommonHeader(UseOffsets,
|
||||
DD->useSplitDwarf() ? dwarf::DW_UT_split_type
|
||||
: dwarf::DW_UT_type);
|
||||
Asm->OutStreamer->AddComment("Type Signature");
|
||||
|
@ -362,19 +362,19 @@ IntegerType *AtomicExpand::getCorrespondingIntegerType(Type *T,
|
||||
|
||||
/// Convert an atomic load of a non-integral type to an integer load of the
|
||||
/// equivalent bitwidth. See the function comment on
|
||||
/// convertAtomicStoreToIntegerType for background.
|
||||
/// convertAtomicStoreToIntegerType for background.
|
||||
LoadInst *AtomicExpand::convertAtomicLoadToIntegerType(LoadInst *LI) {
|
||||
auto *M = LI->getModule();
|
||||
Type *NewTy = getCorrespondingIntegerType(LI->getType(),
|
||||
M->getDataLayout());
|
||||
|
||||
IRBuilder<> Builder(LI);
|
||||
|
||||
|
||||
Value *Addr = LI->getPointerOperand();
|
||||
Type *PT = PointerType::get(NewTy,
|
||||
Addr->getType()->getPointerAddressSpace());
|
||||
Value *NewAddr = Builder.CreateBitCast(Addr, PT);
|
||||
|
||||
|
||||
auto *NewLI = Builder.CreateLoad(NewAddr);
|
||||
NewLI->setAlignment(LI->getAlignment());
|
||||
NewLI->setVolatile(LI->isVolatile());
|
||||
@ -452,7 +452,7 @@ StoreInst *AtomicExpand::convertAtomicStoreToIntegerType(StoreInst *SI) {
|
||||
Type *NewTy = getCorrespondingIntegerType(SI->getValueOperand()->getType(),
|
||||
M->getDataLayout());
|
||||
Value *NewVal = Builder.CreateBitCast(SI->getValueOperand(), NewTy);
|
||||
|
||||
|
||||
Value *Addr = SI->getPointerOperand();
|
||||
Type *PT = PointerType::get(NewTy,
|
||||
Addr->getType()->getPointerAddressSpace());
|
||||
@ -920,14 +920,14 @@ Value *AtomicExpand::insertRMWLLSCLoop(
|
||||
/// the equivalent bitwidth. We used to not support pointer cmpxchg in the
|
||||
/// IR. As a migration step, we convert back to what use to be the standard
|
||||
/// way to represent a pointer cmpxchg so that we can update backends one by
|
||||
/// one.
|
||||
/// one.
|
||||
AtomicCmpXchgInst *AtomicExpand::convertCmpXchgToIntegerType(AtomicCmpXchgInst *CI) {
|
||||
auto *M = CI->getModule();
|
||||
Type *NewTy = getCorrespondingIntegerType(CI->getCompareOperand()->getType(),
|
||||
M->getDataLayout());
|
||||
|
||||
IRBuilder<> Builder(CI);
|
||||
|
||||
|
||||
Value *Addr = CI->getPointerOperand();
|
||||
Type *PT = PointerType::get(NewTy,
|
||||
Addr->getType()->getPointerAddressSpace());
|
||||
@ -935,8 +935,8 @@ AtomicCmpXchgInst *AtomicExpand::convertCmpXchgToIntegerType(AtomicCmpXchgInst *
|
||||
|
||||
Value *NewCmp = Builder.CreatePtrToInt(CI->getCompareOperand(), NewTy);
|
||||
Value *NewNewVal = Builder.CreatePtrToInt(CI->getNewValOperand(), NewTy);
|
||||
|
||||
|
||||
|
||||
|
||||
auto *NewCI = Builder.CreateAtomicCmpXchg(NewAddr, NewCmp, NewNewVal,
|
||||
CI->getSuccessOrdering(),
|
||||
CI->getFailureOrdering(),
|
||||
|
@ -8,7 +8,7 @@
|
||||
//===----------------------------------------------------------------------===//
|
||||
//
|
||||
// This file contains the boilerplate required to define our various built in
|
||||
// gc lowering strategies.
|
||||
// gc lowering strategies.
|
||||
//
|
||||
//===----------------------------------------------------------------------===//
|
||||
|
||||
|
@ -530,7 +530,7 @@ BreakAntiDependencies(const std::vector<SUnit> &SUnits,
|
||||
// Kill instructions can define registers but are really nops, and there
|
||||
// might be a real definition earlier that needs to be paired with uses
|
||||
// dominated by this kill.
|
||||
|
||||
|
||||
// FIXME: It may be possible to remove the isKill() restriction once PR18663
|
||||
// has been properly fixed. There can be value in processing kills as seen
|
||||
// in the AggressiveAntiDepBreaker class.
|
||||
|
@ -159,7 +159,7 @@ GCStrategy *GCModuleInfo::getGCStrategy(const StringRef Name) {
|
||||
auto NMI = GCStrategyMap.find(Name);
|
||||
if (NMI != GCStrategyMap.end())
|
||||
return NMI->getValue();
|
||||
|
||||
|
||||
for (auto& Entry : GCRegistry::entries()) {
|
||||
if (Name == Entry.getName()) {
|
||||
std::unique_ptr<GCStrategy> S = Entry.instantiate();
|
||||
@ -171,11 +171,11 @@ GCStrategy *GCModuleInfo::getGCStrategy(const StringRef Name) {
|
||||
}
|
||||
|
||||
if (GCRegistry::begin() == GCRegistry::end()) {
|
||||
// In normal operation, the registry should not be empty. There should
|
||||
// In normal operation, the registry should not be empty. There should
|
||||
// be the builtin GCs if nothing else. The most likely scenario here is
|
||||
// that we got here without running the initializers used by the Registry
|
||||
// that we got here without running the initializers used by the Registry
|
||||
// itself and it's registration mechanism.
|
||||
const std::string error = ("unsupported GC: " + Name).str() +
|
||||
const std::string error = ("unsupported GC: " + Name).str() +
|
||||
" (did you remember to link and initialize the CodeGen library?)";
|
||||
report_fatal_error(error);
|
||||
} else
|
||||
|
@ -11,6 +11,7 @@
|
||||
//===----------------------------------------------------------------------===//
|
||||
|
||||
#include "llvm/CodeGen/GlobalISel/IRTranslator.h"
|
||||
#include "llvm/ADT/PostOrderIterator.h"
|
||||
#include "llvm/ADT/STLExtras.h"
|
||||
#include "llvm/ADT/ScopeExit.h"
|
||||
#include "llvm/ADT/SmallSet.h"
|
||||
@ -33,6 +34,7 @@
|
||||
#include "llvm/CodeGen/TargetRegisterInfo.h"
|
||||
#include "llvm/CodeGen/TargetSubtargetInfo.h"
|
||||
#include "llvm/IR/BasicBlock.h"
|
||||
#include "llvm/IR/CFG.h"
|
||||
#include "llvm/IR/Constant.h"
|
||||
#include "llvm/IR/Constants.h"
|
||||
#include "llvm/IR/DataLayout.h"
|
||||
@ -1503,6 +1505,8 @@ bool IRTranslator::translate(const Constant &C, unsigned Reg) {
|
||||
Ops.push_back(getOrCreateVReg(*CV->getOperand(i)));
|
||||
}
|
||||
EntryBuilder.buildMerge(Reg, Ops);
|
||||
} else if (auto *BA = dyn_cast<BlockAddress>(&C)) {
|
||||
EntryBuilder.buildBlockAddress(Reg, BA);
|
||||
} else
|
||||
return false;
|
||||
|
||||
@ -1611,19 +1615,20 @@ bool IRTranslator::runOnMachineFunction(MachineFunction &CurMF) {
|
||||
ArgIt++;
|
||||
}
|
||||
|
||||
// And translate the function!
|
||||
for (const BasicBlock &BB : F) {
|
||||
MachineBasicBlock &MBB = getMBB(BB);
|
||||
// Need to visit defs before uses when translating instructions.
|
||||
ReversePostOrderTraversal<const Function *> RPOT(&F);
|
||||
for (const BasicBlock *BB : RPOT) {
|
||||
MachineBasicBlock &MBB = getMBB(*BB);
|
||||
// Set the insertion point of all the following translations to
|
||||
// the end of this basic block.
|
||||
CurBuilder.setMBB(MBB);
|
||||
|
||||
for (const Instruction &Inst : BB) {
|
||||
for (const Instruction &Inst : *BB) {
|
||||
if (translate(Inst))
|
||||
continue;
|
||||
|
||||
OptimizationRemarkMissed R("gisel-irtranslator", "GISelFailure",
|
||||
Inst.getDebugLoc(), &BB);
|
||||
Inst.getDebugLoc(), BB);
|
||||
R << "unable to translate instruction: " << ore::NV("Opcode", &Inst);
|
||||
|
||||
if (ORE->allowExtraAnalysis("gisel-irtranslator")) {
|
||||
|
@ -809,6 +809,15 @@ MachineIRBuilderBase::buildAtomicRMWUmin(unsigned OldValRes, unsigned Addr,
|
||||
MMO);
|
||||
}
|
||||
|
||||
MachineInstrBuilder
|
||||
MachineIRBuilderBase::buildBlockAddress(unsigned Res, const BlockAddress *BA) {
|
||||
#ifndef NDEBUG
|
||||
assert(getMRI()->getType(Res).isPointer() && "invalid res type");
|
||||
#endif
|
||||
|
||||
return buildInstr(TargetOpcode::G_BLOCK_ADDR).addDef(Res).addBlockAddress(BA);
|
||||
}
|
||||
|
||||
void MachineIRBuilderBase::validateTruncExt(unsigned Dst, unsigned Src,
|
||||
bool IsExtend) {
|
||||
#ifndef NDEBUG
|
||||
|
@ -56,7 +56,7 @@
|
||||
// - it makes linker optimizations less useful (order files, LOHs, ...)
|
||||
// - it forces usage of indexed addressing (which isn't necessarily "free")
|
||||
// - it can increase register pressure when the uses are disparate enough.
|
||||
//
|
||||
//
|
||||
// We use heuristics to discover the best global grouping we can (cf cl::opts).
|
||||
//
|
||||
// ===---------------------------------------------------------------------===//
|
||||
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue
Block a user