Vendor import of llvm trunk r338536:

https://llvm.org/svn/llvm-project/llvm/trunk@338536
svn path=/vendor/llvm/dist/; revision=337137 svn path=/vendor/llvm/llvm-trunk-r338536/; revision=337138; tag=vendor/llvm/llvm-trunk-r338536
2018-08-02 17:32:43 +00:00 · 2018-08-02 17:32:43 +00:00 · b7eb8e35e4 · 2020-12-20 02:59:44 +00:00
commit b7eb8e35e4
parent eb11fae6d0
973 changed files with 25304 additions and 6447 deletions
--- a/cmake/modules/AddLLVM.cmake
+++ b/cmake/modules/AddLLVM.cmake
@ -867,6 +867,7 @@ if(NOT LLVM_TOOLCHAIN_TOOLS)
    llvm-ranlib
    llvm-lib
    llvm-objdump
+    llvm-rc
    )
 endif()

--- a/docs/CommandGuide/llvm-mca.rst
+++ b/docs/CommandGuide/llvm-mca.rst
@ -114,8 +114,8 @@ option specifies "``-``", then the output will also be sent to standard output.
 .. option:: -register-file-size=<size>

 Specify the size of the register file. When specified, this flag limits how
- many temporary registers are available for register renaming purposes. A value
- of zero for this flag means "unlimited number of temporary registers".
+ many physical registers are available for register renaming purposes. A value
+ of zero for this flag means "unlimited number of physical registers".

 .. option:: -iterations=<number of iterations>

@ -207,23 +207,23 @@ EXIT STATUS
 :program:`llvm-mca` returns 0 on success. Otherwise, an error message is printed
 to standard error, and the tool returns 1.

-HOW MCA WORKS
-------------
+HOW LLVM-MCA WORKS
+------------------

-MCA takes assembly code as input. The assembly code is parsed into a sequence
-of MCInst with the help of the existing LLVM target assembly parsers. The
-parsed sequence of MCInst is then analyzed by a ``Pipeline`` module to generate
-a performance report.
+:program:`llvm-mca` takes assembly code as input. The assembly code is parsed
+into a sequence of MCInst with the help of the existing LLVM target assembly
+parsers. The parsed sequence of MCInst is then analyzed by a ``Pipeline`` module
+to generate a performance report.

 The Pipeline module simulates the execution of the machine code sequence in a
 loop of iterations (default is 100). During this process, the pipeline collects
 a number of execution related statistics. At the end of this process, the
 pipeline generates and prints a report from the collected statistics.

-Here is an example of a performance report generated by MCA for a dot-product
-of two packed float vectors of four elements. The analysis is conducted for
-target x86, cpu btver2.  The following result can be produced via the following
-command using the example located at
+Here is an example of a performance report generated by the tool for a
+dot-product of two packed float vectors of four elements. The analysis is
+conducted for target x86, cpu btver2.  The following result can be produced via
+the following command using the example located at
 ``test/tools/llvm-mca/X86/BtVer2/dot-product.s``:

 .. code-block:: bash
@ -287,10 +287,30 @@ for a total of 900 dynamically executed instructions.
 The report is structured in three main sections.  The first section collects a
 few performance numbers; the goal of this section is to give a very quick
 overview of the performance throughput. In this example, the two important
-performance indicators are the predicted total number of cycles, and the IPC.
-IPC is probably the most important throughput indicator. A big delta between
-the Dispatch Width and the computed IPC is an indicator of potential
-performance issues.
+performance indicators are **IPC** and **Block RThroughput** (Block Reciprocal
+Throughput).
+
+IPC is computed dividing the total number of simulated instructions by the total
+number of cycles.  A delta between Dispatch Width and IPC is an indicator of a
+performance issue. In the absence of loop-carried data dependencies, the
+observed IPC tends to a theoretical maximum which can be computed by dividing
+the number of instructions of a single iteration by the *Block RThroughput*.
+
+IPC is bounded from above by the dispatch width. That is because the dispatch
+width limits the maximum size of a dispatch group. IPC is also limited by the
+amount of hardware parallelism. The availability of hardware resources affects
+the resource pressure distribution, and it limits the number of instructions
+that can be executed in parallel every cycle.  A delta between Dispatch
+Width and the theoretical maximum IPC is an indicator of a performance
+bottleneck caused by the lack of hardware resources. In general, the lower the
+Block RThroughput, the better.
+
+In this example, ``Instructions per iteration/Block RThroughput`` is 1.50. Since
+there are no loop-carried dependencies, the observed IPC is expected to approach
+1.50 when the number of iterations tends to infinity. The delta between the
+Dispatch Width (2.00), and the theoretical maximum IPC (1.50) is an indicator of
+a performance bottleneck caused by the lack of hardware resources, and the
+*Resource pressure view* can help to identify the problematic resource usage.

 The second section of the report shows the latency and reciprocal
 throughput of every instruction in the sequence. That section also reports
@ -316,7 +336,7 @@ pressure should be uniformly distributed between multiple resources.

 Timeline View
 ^^^^^^^^^^^^^
-MCA's timeline view produces a detailed report of each instruction's state
+The timeline view produces a detailed report of each instruction's state
 transitions through an instruction pipeline.  This view is enabled by the
 command line option ``-timeline``.  As instructions transition through the
 various stages of the pipeline, their states are depicted in the view report.
@ -331,7 +351,7 @@ These states are represented by the following characters:

 Below is the timeline view for a subset of the dot-product example located in
 ``test/tools/llvm-mca/X86/BtVer2/dot-product.s`` and processed by
-MCA using the following command:
+:program:`llvm-mca` using the following command:

 .. code-block:: bash

@ -366,7 +386,7 @@ MCA using the following command:
  2.     3     5.7    0.0    0.0       vhaddps	%xmm3, %xmm3, %xmm4

 The timeline view is interesting because it shows instruction state changes
-during execution.  It also gives an idea of how MCA processes instructions
+during execution.  It also gives an idea of how the tool processes instructions
 executed on the target, and how their timing information might be calculated.

 The timeline view is structured in two tables.  The first table shows
@ -411,12 +431,12 @@ Parallelism).
 In the dot-product example, there are anti-dependencies introduced by
 instructions from different iterations.  However, those dependencies can be
 removed at register renaming stage (at the cost of allocating register aliases,
-and therefore consuming temporary registers).
+and therefore consuming physical registers).

 Table *Average Wait times* helps diagnose performance issues that are caused by
 the presence of long latency instructions and potentially long data dependencies
-which may limit the ILP.  Note that MCA, by default, assumes at least 1cy
-between the dispatch event and the issue event.
+which may limit the ILP.  Note that :program:`llvm-mca`, by default, assumes at
+least 1cy between the dispatch event and the issue event.

 When the performance is limited by data dependencies and/or long latency
 instructions, the number of cycles spent while in the *ready* state is expected
@ -549,3 +569,177 @@ statistics are displayed by using the command option ``-all-stats`` or

 In this example, we can conclude that the IPC is mostly limited by data
 dependencies, and not by resource pressure.
+
+Instruction Flow
+^^^^^^^^^^^^^^^^
+This section describes the instruction flow through MCA's default out-of-order
+pipeline, as well as the functional units involved in the process.
+
+The default pipeline implements the following sequence of stages used to
+process instructions.
+
+* Dispatch (Instruction is dispatched to the schedulers).
+* Issue (Instruction is issued to the processor pipelines).
+* Write Back (Instruction is executed, and results are written back).
+* Retire (Instruction is retired; writes are architecturally committed).
+
+The default pipeline only models the out-of-order portion of a processor.
+Therefore, the instruction fetch and decode stages are not modeled. Performance
+bottlenecks in the frontend are not diagnosed.  MCA assumes that instructions
+have all been decoded and placed into a queue.  Also, MCA does not model branch
+prediction.
+
+Instruction Dispatch
+""""""""""""""""""""
+During the dispatch stage, instructions are picked in program order from a
+queue of already decoded instructions, and dispatched in groups to the
+simulated hardware schedulers.
+
+The size of a dispatch group depends on the availability of the simulated
+hardware resources.  The processor dispatch width defaults to the value
+of the ``IssueWidth`` in LLVM's scheduling model.
+
+An instruction can be dispatched if:
+
+* The size of the dispatch group is smaller than processor's dispatch width.
+* There are enough entries in the reorder buffer.
+* There are enough physical registers to do register renaming.
+* The schedulers are not full.
+
+Scheduling models can optionally specify which register files are available on
+the processor. MCA uses that information to initialize register file
+descriptors.  Users can limit the number of physical registers that are
+globally available for register renaming by using the command option
+``-register-file-size``.  A value of zero for this option means *unbounded*.
+By knowing how many registers are available for renaming, MCA can predict
+dispatch stalls caused by the lack of registers.
+
+The number of reorder buffer entries consumed by an instruction depends on the
+number of micro-opcodes specified by the target scheduling model.  MCA's
+reorder buffer's purpose is to track the progress of instructions that are
+"in-flight," and to retire instructions in program order.  The number of
+entries in the reorder buffer defaults to the `MicroOpBufferSize` provided by
+the target scheduling model.
+
+Instructions that are dispatched to the schedulers consume scheduler buffer
+entries. :program:`llvm-mca` queries the scheduling model to determine the set
+of buffered resources consumed by an instruction.  Buffered resources are
+treated like scheduler resources.
+
+Instruction Issue
+"""""""""""""""""
+Each processor scheduler implements a buffer of instructions.  An instruction
+has to wait in the scheduler's buffer until input register operands become
+available.  Only at that point, does the instruction becomes eligible for
+execution and may be issued (potentially out-of-order) for execution.
+Instruction latencies are computed by :program:`llvm-mca` with the help of the
+scheduling model.
+
+:program:`llvm-mca`'s scheduler is designed to simulate multiple processor
+schedulers.  The scheduler is responsible for tracking data dependencies, and
+dynamically selecting which processor resources are consumed by instructions.
+It delegates the management of processor resource units and resource groups to a
+resource manager.  The resource manager is responsible for selecting resource
+units that are consumed by instructions.  For example, if an instruction
+consumes 1cy of a resource group, the resource manager selects one of the
+available units from the group; by default, the resource manager uses a
+round-robin selector to guarantee that resource usage is uniformly distributed
+between all units of a group.
+
+:program:`llvm-mca`'s scheduler implements three instruction queues:
+
+* WaitQueue: a queue of instructions whose operands are not ready.
+* ReadyQueue: a queue of instructions ready to execute.
+* IssuedQueue: a queue of instructions executing.
+
+Depending on the operand availability, instructions that are dispatched to the
+scheduler are either placed into the WaitQueue or into the ReadyQueue.
+
+Every cycle, the scheduler checks if instructions can be moved from the
+WaitQueue to the ReadyQueue, and if instructions from the ReadyQueue can be
+issued to the underlying pipelines. The algorithm prioritizes older instructions
+over younger instructions.
+
+Write-Back and Retire Stage
+"""""""""""""""""""""""""""
+Issued instructions are moved from the ReadyQueue to the IssuedQueue.  There,
+instructions wait until they reach the write-back stage.  At that point, they
+get removed from the queue and the retire control unit is notified.
+
+When instructions are executed, the retire control unit flags the
+instruction as "ready to retire."
+
+Instructions are retired in program order.  The register file is notified of
+the retirement so that it can free the physical registers that were allocated
+for the instruction during the register renaming stage.
+
+Load/Store Unit and Memory Consistency Model
+""""""""""""""""""""""""""""""""""""""""""""
+To simulate an out-of-order execution of memory operations, :program:`llvm-mca`
+utilizes a simulated load/store unit (LSUnit) to simulate the speculative
+execution of loads and stores.
+
+Each load (or store) consumes an entry in the load (or store) queue. Users can
+specify flags ``-lqueue`` and ``-squeue`` to limit the number of entries in the
+load and store queues respectively. The queues are unbounded by default.
+
+The LSUnit implements a relaxed consistency model for memory loads and stores.
+The rules are:
+
+1. A younger load is allowed to pass an older load only if there are no
+   intervening stores or barriers between the two loads.
+2. A younger load is allowed to pass an older store provided that the load does
+   not alias with the store.
+3. A younger store is not allowed to pass an older store.
+4. A younger store is not allowed to pass an older load.
+
+By default, the LSUnit optimistically assumes that loads do not alias
+(`-noalias=true`) store operations.  Under this assumption, younger loads are
+always allowed to pass older stores.  Essentially, the LSUnit does not attempt
+to run any alias analysis to predict when loads and stores do not alias with
+each other.
+
+Note that, in the case of write-combining memory, rule 3 could be relaxed to
+allow reordering of non-aliasing store operations.  That being said, at the
+moment, there is no way to further relax the memory model (``-noalias`` is the
+only option).  Essentially, there is no option to specify a different memory
+type (e.g., write-back, write-combining, write-through; etc.) and consequently
+to weaken, or strengthen, the memory model.
+
+Other limitations are:
+
+* The LSUnit does not know when store-to-load forwarding may occur.
+* The LSUnit does not know anything about cache hierarchy and memory types.
+* The LSUnit does not know how to identify serializing operations and memory
+  fences.
+
+The LSUnit does not attempt to predict if a load or store hits or misses the L1
+cache.  It only knows if an instruction "MayLoad" and/or "MayStore."  For
+loads, the scheduling model provides an "optimistic" load-to-use latency (which
+usually matches the load-to-use latency for when there is a hit in the L1D).
+
+:program:`llvm-mca` does not know about serializing operations or memory-barrier
+like instructions.  The LSUnit conservatively assumes that an instruction which
+has both "MayLoad" and unmodeled side effects behaves like a "soft"
+load-barrier.  That means, it serializes loads without forcing a flush of the
+load queue.  Similarly, instructions that "MayStore" and have unmodeled side
+effects are treated like store barriers.  A full memory barrier is a "MayLoad"
+and "MayStore" instruction with unmodeled side effects.  This is inaccurate, but
+it is the best that we can do at the moment with the current information
+available in LLVM.
+
+A load/store barrier consumes one entry of the load/store queue.  A load/store
+barrier enforces ordering of loads/stores.  A younger load cannot pass a load
+barrier.  Also, a younger store cannot pass a store barrier.  A younger load
+has to wait for the memory/load barrier to execute.  A load/store barrier is
+"executed" when it becomes the oldest entry in the load/store queue(s). That
+also means, by construction, all of the older loads/stores have been executed.
+
+In conclusion, the full set of load/store consistency rules are:
+
+#. A store may not pass a previous store.
+#. A store may not pass a previous load (regardless of ``-noalias``).
+#. A store has to wait until an older store barrier is fully executed.
+#. A load may pass a previous load.
+#. A load may not pass a previous store unless ``-noalias`` is set.
+#. A load has to wait until an older load barrier is fully executed.
--- a/docs/GettingStarted.rst
+++ b/docs/GettingStarted.rst
@ -838,7 +838,7 @@ To configure LLVM, follow these steps:

   .. code-block:: console

-     % cmake -G "Unix Makefiles" -DCMAKE_INSTALL_PREFIX=prefix=/install/path
+     % cmake -G "Unix Makefiles" -DCMAKE_INSTALL_PREFIX=/install/path
       [other options] SRC_ROOT

 Compiling the LLVM Suite Source Code
--- a/docs/LangRef.rst
+++ b/docs/LangRef.rst
@ -4588,9 +4588,12 @@ DIExpression
 ``DIExpression`` nodes represent expressions that are inspired by the DWARF
 expression language. They are used in :ref:`debug intrinsics<dbg_intrinsics>`
 (such as ``llvm.dbg.declare`` and ``llvm.dbg.value``) to describe how the
-referenced LLVM variable relates to the source language variable.
+referenced LLVM variable relates to the source language variable. Debug
+intrinsics are interpreted left-to-right: start by pushing the value/address
+operand of the intrinsic onto a stack, then repeatedly push and evaluate
+opcodes from the DIExpression until the final variable description is produced.

-The current supported vocabulary is limited:
+The current supported opcode vocabulary is limited:

 - ``DW_OP_deref`` dereferences the top of the expression stack.
 - ``DW_OP_plus`` pops the last two entries from the expression stack, adds
@ -4610,12 +4613,30 @@ The current supported vocabulary is limited:
 - ``DW_OP_stack_value`` marks a constant value.

 DWARF specifies three kinds of simple location descriptions: Register, memory,
-and implicit location descriptions. Register and memory location descriptions
-describe the *location* of a source variable (in the sense that a debugger might
-modify its value), whereas implicit locations describe merely the *value* of a
-source variable. DIExpressions also follow this model: A DIExpression that
-doesn't have a trailing ``DW_OP_stack_value`` will describe an *address* when
-combined with a concrete location.
+and implicit location descriptions.  Note that a location description is
+defined over certain ranges of a program, i.e the location of a variable may
+change over the course of the program. Register and memory location
+descriptions describe the *concrete location* of a source variable (in the
+sense that a debugger might modify its value), whereas *implicit locations*
+describe merely the actual *value* of a source variable which might not exist
+in registers or in memory (see ``DW_OP_stack_value``).
+
+A ``llvm.dbg.addr`` or ``llvm.dbg.declare`` intrinsic describes an indirect
+value (the address) of a source variable. The first operand of the intrinsic
+must be an address of some kind. A DIExpression attached to the intrinsic
+refines this address to produce a concrete location for the source variable.
+
+A ``llvm.dbg.value`` intrinsic describes the direct value of a source variable.
+The first operand of the intrinsic may be a direct or indirect value. A
+DIExpresion attached to the intrinsic refines the first operand to produce a
+direct value. For example, if the first operand is an indirect value, it may be
+necessary to insert ``DW_OP_deref`` into the DIExpresion in order to produce a
+valid debug intrinsic.
+
+.. note::
+
+   A DIExpression is interpreted in the same way regardless of which kind of
+   debug intrinsic it's attached to.

 .. code-block:: text

--- a/docs/SourceLevelDebugging.rst
+++ b/docs/SourceLevelDebugging.rst
@ -244,6 +244,11 @@ argument is a `local variable <LangRef.html#dilocalvariable>`_ containing a
 description of the variable.  The third argument is a `complex expression
 <LangRef.html#diexpression>`_.

+An `llvm.dbg.value` intrinsic describes the *value* of a source variable
+directly, not its address.  Note that the value operand of this intrinsic may
+be indirect (i.e, a pointer to the source variable), provided that interpreting
+the complex expression derives the direct value.
+
 Object lifetimes and scoping
 ============================

--- a/include/llvm/ADT/DenseSet.h
+++ b/include/llvm/ADT/DenseSet.h
@ -17,7 +17,7 @@
 #include "llvm/ADT/DenseMap.h"
 #include "llvm/ADT/DenseMapInfo.h"
 #include "llvm/Support/type_traits.h"
-#include <algorithm> 
+#include <algorithm>
 #include <cstddef>
 #include <initializer_list>
 #include <iterator>
--- a/include/llvm/Analysis/BasicAliasAnalysis.h
+++ b/include/llvm/Analysis/BasicAliasAnalysis.h
@ -43,6 +43,7 @@ class LoopInfo;
 class PHINode;
 class SelectInst;
 class TargetLibraryInfo;
+class PhiValues;
 class Value;

 /// This is the AA result object for the basic, local, and stateless alias
@ -60,19 +61,22 @@ class BasicAAResult : public AAResultBase<BasicAAResult> {
  AssumptionCache &AC;
  DominatorTree *DT;
  LoopInfo *LI;
+  PhiValues *PV;

 public:
  BasicAAResult(const DataLayout &DL, const Function &F,
                const TargetLibraryInfo &TLI, AssumptionCache &AC,
-                DominatorTree *DT = nullptr, LoopInfo *LI = nullptr)
-      : AAResultBase(), DL(DL), F(F), TLI(TLI), AC(AC), DT(DT), LI(LI) {}
+                DominatorTree *DT = nullptr, LoopInfo *LI = nullptr,
+                PhiValues *PV = nullptr)
+      : AAResultBase(), DL(DL), F(F), TLI(TLI), AC(AC), DT(DT), LI(LI), PV(PV)
+        {}

  BasicAAResult(const BasicAAResult &Arg)
      : AAResultBase(Arg), DL(Arg.DL), F(Arg.F), TLI(Arg.TLI), AC(Arg.AC),
-        DT(Arg.DT), LI(Arg.LI) {}
+        DT(Arg.DT),  LI(Arg.LI), PV(Arg.PV) {}
  BasicAAResult(BasicAAResult &&Arg)
      : AAResultBase(std::move(Arg)), DL(Arg.DL), F(Arg.F), TLI(Arg.TLI),
-        AC(Arg.AC), DT(Arg.DT), LI(Arg.LI) {}
+        AC(Arg.AC), DT(Arg.DT), LI(Arg.LI), PV(Arg.PV) {}

  /// Handle invalidation events in the new pass manager.
  bool invalidate(Function &Fn, const PreservedAnalyses &PA,
--- a/include/llvm/Analysis/LoopAccessAnalysis.h
+++ b/include/llvm/Analysis/LoopAccessAnalysis.h
@ -682,7 +682,7 @@ bool sortPtrAccesses(ArrayRef<Value *> VL, const DataLayout &DL,
                     SmallVectorImpl<unsigned> &SortedIndices);

 /// Returns true if the memory operations \p A and \p B are consecutive.
-/// This is a simple API that does not depend on the analysis pass. 
+/// This is a simple API that does not depend on the analysis pass.
 bool isConsecutiveAccess(Value *A, Value *B, const DataLayout &DL,
                         ScalarEvolution &SE, bool CheckType = true);

@ -734,7 +734,7 @@ class LoopAccessLegacyAnalysis : public FunctionPass {
 /// accesses of a loop.
 ///
 /// It runs the analysis for a loop on demand.  This can be initiated by
-/// querying the loop access info via AM.getResult<LoopAccessAnalysis>. 
+/// querying the loop access info via AM.getResult<LoopAccessAnalysis>.
 /// getResult return a LoopAccessInfo object.  See this class for the
 /// specifics of what information is provided.
 class LoopAccessAnalysis
--- a/include/llvm/Analysis/MemoryDependenceAnalysis.h
+++ b/include/llvm/Analysis/MemoryDependenceAnalysis.h
@ -44,6 +44,7 @@ class Instruction;
 class LoadInst;
 class PHITransAddr;
 class TargetLibraryInfo;
+class PhiValues;
 class Value;

 /// A memory dependence query can return one of three different answers.
@ -360,13 +361,14 @@ class MemoryDependenceResults {
  AssumptionCache &AC;
  const TargetLibraryInfo &TLI;
  DominatorTree &DT;
+  PhiValues &PV;
  PredIteratorCache PredCache;

 public:
  MemoryDependenceResults(AliasAnalysis &AA, AssumptionCache &AC,
                          const TargetLibraryInfo &TLI,
-                          DominatorTree &DT)
-      : AA(AA), AC(AC), TLI(TLI), DT(DT) {}
+                          DominatorTree &DT, PhiValues &PV)
+      : AA(AA), AC(AC), TLI(TLI), DT(DT), PV(PV) {}

  /// Handle invalidation in the new PM.
  bool invalidate(Function &F, const PreservedAnalyses &PA,
--- a/include/llvm/Analysis/MustExecute.h
+++ b/include/llvm/Analysis/MustExecute.h
@ -10,7 +10,7 @@
 /// Contains a collection of routines for determining if a given instruction is
 /// guaranteed to execute if a given point in control flow is reached.  The most
 /// common example is an instruction within a loop being provably executed if we
-/// branch to the header of it's containing loop.  
+/// branch to the header of it's containing loop.
 ///
 //===----------------------------------------------------------------------===//

@ -58,7 +58,7 @@ void computeLoopSafetyInfo(LoopSafetyInfo *, Loop *);
 bool isGuaranteedToExecute(const Instruction &Inst, const DominatorTree *DT,
                           const Loop *CurLoop,
                           const LoopSafetyInfo *SafetyInfo);
-  
+
 }

 #endif
--- a/include/llvm/Analysis/TargetTransformInfoImpl.h
+++ b/include/llvm/Analysis/TargetTransformInfoImpl.h
@ -326,7 +326,7 @@ class TargetTransformInfoImplBase {
  bool haveFastSqrt(Type *Ty) { return false; }

  bool isFCmpOrdCheaperThanFCmpZero(Type *Ty) { return true; }
-  
+
  unsigned getFPOpCost(Type *Ty) { return TargetTransformInfo::TCC_Basic; }

  int getIntImmCodeSizeCost(unsigned Opcode, unsigned Idx, const APInt &Imm,
--- a/include/llvm/Analysis/ValueTracking.h
+++ b/include/llvm/Analysis/ValueTracking.h
@ -464,7 +464,7 @@ class Value;
  /// This is equivelent to saying that all instructions within the basic block
  /// are guaranteed to transfer execution to their successor within the basic
  /// block. This has the same assumptions w.r.t. undefined behavior as the
-  /// instruction variant of this function. 
+  /// instruction variant of this function.
  bool isGuaranteedToTransferExecutionToSuccessor(const BasicBlock *BB);

  /// Return true if this function can prove that the instruction I
--- a/include/llvm/BinaryFormat/Dwarf.def
+++ b/include/llvm/BinaryFormat/Dwarf.def
@ -856,6 +856,7 @@ HANDLE_DW_UT(0x06, split_type)
 // TODO: Add Mach-O and COFF names.
 // Official DWARF sections.
 HANDLE_DWARF_SECTION(DebugAbbrev, ".debug_abbrev", "debug-abbrev")
+HANDLE_DWARF_SECTION(DebugAddr, ".debug_addr", "debug-addr")
 HANDLE_DWARF_SECTION(DebugAranges, ".debug_aranges", "debug-aranges")
 HANDLE_DWARF_SECTION(DebugInfo, ".debug_info", "debug-info")
 HANDLE_DWARF_SECTION(DebugTypes, ".debug_types", "debug-types")
--- a/include/llvm/BinaryFormat/ELF.h
+++ b/include/llvm/BinaryFormat/ELF.h
@ -413,8 +413,10 @@ enum {

 // ARM Specific e_flags
 enum : unsigned {
-  EF_ARM_SOFT_FLOAT = 0x00000200U,
-  EF_ARM_VFP_FLOAT = 0x00000400U,
+  EF_ARM_SOFT_FLOAT = 0x00000200U,     // Legacy pre EABI_VER5
+  EF_ARM_ABI_FLOAT_SOFT = 0x00000200U, // EABI_VER5
+  EF_ARM_VFP_FLOAT = 0x00000400U,      // Legacy pre EABI_VER5
+  EF_ARM_ABI_FLOAT_HARD = 0x00000400U, // EABI_VER5
  EF_ARM_EABI_UNKNOWN = 0x00000000U,
  EF_ARM_EABI_VER1 = 0x01000000U,
  EF_ARM_EABI_VER2 = 0x02000000U,
--- a/include/llvm/CodeGen/GCStrategy.h
+++ b/include/llvm/CodeGen/GCStrategy.h
@ -104,12 +104,12 @@ class GCStrategy {
  const std::string &getName() const { return Name; }

  /// By default, write barriers are replaced with simple store
-  /// instructions. If true, you must provide a custom pass to lower 
+  /// instructions. If true, you must provide a custom pass to lower
  /// calls to \@llvm.gcwrite.
  bool customWriteBarrier() const { return CustomWriteBarriers; }

  /// By default, read barriers are replaced with simple load
-  /// instructions. If true, you must provide a custom pass to lower 
+  /// instructions. If true, you must provide a custom pass to lower
  /// calls to \@llvm.gcread.
  bool customReadBarrier() const { return CustomReadBarriers; }

@ -146,7 +146,7 @@ class GCStrategy {
  }

  /// By default, roots are left for the code generator so it can generate a
-  /// stack map. If true, you must provide a custom pass to lower 
+  /// stack map. If true, you must provide a custom pass to lower
  /// calls to \@llvm.gcroot.
  bool customRoots() const { return CustomRoots; }

--- a/include/llvm/CodeGen/GlobalISel/LegalizerInfo.h
+++ b/include/llvm/CodeGen/GlobalISel/LegalizerInfo.h
@ -786,7 +786,7 @@ class LegalizerInfo {
  /// setAction ({G_ADD, 0, LLT::scalar(32)}, Legal);
  /// setLegalizeScalarToDifferentSizeStrategy(
  ///   G_ADD, 0, widenToLargerTypesAndNarrowToLargest);
-  /// will end up defining getAction({G_ADD, 0, T}) to return the following 
+  /// will end up defining getAction({G_ADD, 0, T}) to return the following
  /// actions for different scalar types T:
  ///  LLT::scalar(1)..LLT::scalar(31): {WidenScalar, 0, LLT::scalar(32)}
  ///  LLT::scalar(32):                 {Legal, 0, LLT::scalar(32)}
@ -814,7 +814,7 @@ class LegalizerInfo {
    VectorElementSizeChangeStrategies[OpcodeIdx][TypeIdx] = S;
  }

-  /// A SizeChangeStrategy for the common case where legalization for a 
+  /// A SizeChangeStrategy for the common case where legalization for a
  /// particular operation consists of only supporting a specific set of type
  /// sizes. E.g.
  ///   setAction ({G_DIV, 0, LLT::scalar(32)}, Legal);
--- a/include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h
+++ b/include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h
@ -942,6 +942,16 @@ class MachineIRBuilderBase {
  /// \return a MachineInstrBuilder for the newly created instruction.
  MachineInstrBuilder buildAtomicRMWUmin(unsigned OldValRes, unsigned Addr,
                                         unsigned Val, MachineMemOperand &MMO);
+
+  /// Build and insert \p Res = G_BLOCK_ADDR \p BA
+  ///
+  /// G_BLOCK_ADDR computes the address of a basic block.
+  ///
+  /// \pre setBasicBlock or setMI must have been called.
+  /// \pre \p Res must be a generic virtual register of a pointer type.
+  ///
+  /// \return The newly created instruction.
+  MachineInstrBuilder buildBlockAddress(unsigned Res, const BlockAddress *BA);
 };

 /// A CRTP class that contains methods for building instructions that can
--- a/include/llvm/CodeGen/MachORelocation.h
+++ b/include/llvm/CodeGen/MachORelocation.h
@ -27,15 +27,15 @@ namespace llvm {
    uint32_t r_symbolnum; // symbol index if r_extern == 1 else section index
    bool     r_pcrel;     // was relocated pc-relative already
    uint8_t  r_length;    // length = 2 ^ r_length
-    bool     r_extern;    // 
+    bool     r_extern;    //
    uint8_t  r_type;      // if not 0, machine-specific relocation type.
    bool     r_scattered; // 1 = scattered, 0 = non-scattered
    int32_t  r_value;     // the value the item to be relocated is referring
                          // to.
-  public:      
+  public:
    uint32_t getPackedFields() const {
      if (r_scattered)
-        return (1 << 31) | (r_pcrel << 30) | ((r_length & 3) << 28) | 
+        return (1 << 31) | (r_pcrel << 30) | ((r_length & 3) << 28) |
          ((r_type & 15) << 24) | (r_address & 0x00FFFFFF);
      else
        return (r_symbolnum << 8) | (r_pcrel << 7) | ((r_length & 3) << 5) |
@ -45,8 +45,8 @@ namespace llvm {
    uint32_t getRawAddress() const { return r_address; }

    MachORelocation(uint32_t addr, uint32_t index, bool pcrel, uint8_t len,
-                    bool ext, uint8_t type, bool scattered = false, 
-                    int32_t value = 0) : 
+                    bool ext, uint8_t type, bool scattered = false,
+                    int32_t value = 0) :
      r_address(addr), r_symbolnum(index), r_pcrel(pcrel), r_length(len),
      r_extern(ext), r_type(type), r_scattered(scattered), r_value(value) {}
  };
--- a/include/llvm/CodeGen/MachineModuleInfo.h
+++ b/include/llvm/CodeGen/MachineModuleInfo.h
@ -105,7 +105,7 @@ class MachineModuleInfo : public ImmutablePass {
  /// basic block's address of label.
  MMIAddrLabelMap *AddrLabelSymbols;

-  // TODO: Ideally, what we'd like is to have a switch that allows emitting 
+  // TODO: Ideally, what we'd like is to have a switch that allows emitting
  // synchronous (precise at call-sites only) CFA into .eh_frame. However,
  // even under this switch, we'd like .debug_frame to be precise when using
  // -g. At this moment, there's no way to specify that some CFI directives
--- a/include/llvm/CodeGen/MachineOutliner.h
+++ b/include/llvm/CodeGen/MachineOutliner.h
@ -19,6 +19,7 @@
 #include "llvm/CodeGen/LiveRegUnits.h"
 #include "llvm/CodeGen/MachineFunction.h"
 #include "llvm/CodeGen/TargetRegisterInfo.h"
+#include "llvm/CodeGen/LivePhysRegs.h"

 namespace llvm {
 namespace outliner {
@ -74,6 +75,13 @@ struct Candidate {
  /// cost model information.
  LiveRegUnits LRU;

+  /// Contains the accumulated register liveness information for the
+  /// instructions in this \p Candidate.
+  ///
+  /// This is optionally used by the target to determine which registers have
+  /// been used across the sequence.
+  LiveRegUnits UsedInSequence;
+
  /// Return the number of instructions in this Candidate.
  unsigned getLength() const { return Len; }

@ -137,6 +145,12 @@ struct Candidate {
    // outlining candidate.
    std::for_each(MBB->rbegin(), (MachineBasicBlock::reverse_iterator)front(),
                  [this](MachineInstr &MI) { LRU.stepBackward(MI); });
+
+    // Walk over the sequence itself and figure out which registers were used
+    // in the sequence.
+    UsedInSequence.init(TRI);
+    std::for_each(front(), std::next(back()),
+                  [this](MachineInstr &MI) { UsedInSequence.accumulate(MI); });
  }
 };

--- a/include/llvm/CodeGen/ScheduleDAG.h
+++ b/include/llvm/CodeGen/ScheduleDAG.h
@ -252,7 +252,7 @@ class TargetRegisterInfo;
    MachineInstr *Instr = nullptr; ///< Alternatively, a MachineInstr.

  public:
-    SUnit *OrigNode = nullptr; ///< If not this, the node from which this node 
+    SUnit *OrigNode = nullptr; ///< If not this, the node from which this node
                               /// was cloned. (SD scheduling only)

    const MCSchedClassDesc *SchedClass =
--- a/include/llvm/CodeGen/StackMaps.h
+++ b/include/llvm/CodeGen/StackMaps.h
@ -156,7 +156,7 @@ class StatepointOpers {
  // TODO:: we should change the STATEPOINT representation so that CC and
  // Flags should be part of meta operands, with args and deopt operands, and
  // gc operands all prefixed by their length and a type code. This would be
-  // much more consistent. 
+  // much more consistent.
 public:
  // These values are aboolute offsets into the operands of the statepoint
  // instruction.
--- a/include/llvm/CodeGen/TargetLowering.h
+++ b/include/llvm/CodeGen/TargetLowering.h
@ -718,7 +718,7 @@ class TargetLoweringBase {
  /// always broken down into scalars in some contexts. This occurs even if the
  /// vector type is legal.
  virtual unsigned getVectorTypeBreakdownForCallingConv(
-      LLVMContext &Context, EVT VT, EVT &IntermediateVT,
+      LLVMContext &Context, CallingConv::ID CC, EVT VT, EVT &IntermediateVT,
      unsigned &NumIntermediates, MVT &RegisterVT) const {
    return getVectorTypeBreakdown(Context, VT, IntermediateVT, NumIntermediates,
                                  RegisterVT);
@ -1174,7 +1174,7 @@ class TargetLoweringBase {
  /// are legal for some operations and not for other operations.
  /// For MIPS all vector types must be passed through the integer register set.
  virtual MVT getRegisterTypeForCallingConv(LLVMContext &Context,
-                                            EVT VT) const {
+                                            CallingConv::ID CC, EVT VT) const {
    return getRegisterType(Context, VT);
  }

@ -1182,6 +1182,7 @@ class TargetLoweringBase {
  /// this occurs when a vector type is used, as vector are passed through the
  /// integer register set.
  virtual unsigned getNumRegistersForCallingConv(LLVMContext &Context,
+                                                 CallingConv::ID CC,
                                                 EVT VT) const {
    return getNumRegisters(Context, VT);
  }
@ -3489,10 +3490,10 @@ class TargetLowering : public TargetLoweringBase {
  //
  SDValue BuildSDIV(SDNode *N, const APInt &Divisor, SelectionDAG &DAG,
                    bool IsAfterLegalization,
-                    std::vector<SDNode *> *Created) const;
+                    SmallVectorImpl<SDNode *> &Created) const;
  SDValue BuildUDIV(SDNode *N, const APInt &Divisor, SelectionDAG &DAG,
                    bool IsAfterLegalization,
-                    std::vector<SDNode *> *Created) const;
+                    SmallVectorImpl<SDNode *> &Created) const;

  /// Targets may override this function to provide custom SDIV lowering for
  /// power-of-2 denominators.  If the target returns an empty SDValue, LLVM
@ -3500,7 +3501,7 @@ class TargetLowering : public TargetLoweringBase {
  /// operations.
  virtual SDValue BuildSDIVPow2(SDNode *N, const APInt &Divisor,
                                SelectionDAG &DAG,
-                                std::vector<SDNode *> *Created) const;
+                                SmallVectorImpl<SDNode *> &Created) const;

  /// Indicate whether this target prefers to combine FDIVs with the same
  /// divisor. If the transform should never be done, return zero. If the
@ -3690,7 +3691,7 @@ class TargetLowering : public TargetLoweringBase {
 /// Given an LLVM IR type and return type attributes, compute the return value
 /// EVTs and flags, and optionally also the offsets, if the return value is
 /// being lowered to memory.
-void GetReturnInfo(Type *ReturnType, AttributeList attr,
+void GetReturnInfo(CallingConv::ID CC, Type *ReturnType, AttributeList attr,
                   SmallVectorImpl<ISD::OutputArg> &Outs,
                   const TargetLowering &TLI, const DataLayout &DL);

--- a/include/llvm/CodeGen/TargetPassConfig.h
+++ b/include/llvm/CodeGen/TargetPassConfig.h
@ -16,7 +16,7 @@

 #include "llvm/Pass.h"
 #include "llvm/Support/CodeGen.h"
-#include <cassert> 
+#include <cassert>
 #include <string>

 namespace llvm {
--- a/include/llvm/CodeGen/TargetRegisterInfo.h
+++ b/include/llvm/CodeGen/TargetRegisterInfo.h
@ -456,7 +456,7 @@ class TargetRegisterInfo : public MCRegisterInfo {
  /// stack frame offset. The first register is closest to the incoming stack
  /// pointer if stack grows down, and vice versa.
  /// Notice: This function does not take into account disabled CSRs.
-  ///         In most cases you will want to use instead the function 
+  ///         In most cases you will want to use instead the function
  ///         getCalleeSavedRegs that is implemented in MachineRegisterInfo.
  virtual const MCPhysReg*
  getCalleeSavedRegs(const MachineFunction *MF) const = 0;
@ -518,7 +518,7 @@ class TargetRegisterInfo : public MCRegisterInfo {
  /// guaranteed to be restored before any uses. This is useful for targets that
  /// have call sequences where a GOT register may be updated by the caller
  /// prior to a call and is guaranteed to be restored (also by the caller)
-  /// after the call. 
+  /// after the call.
  virtual bool isCallerPreservedPhysReg(unsigned PhysReg,
                                        const MachineFunction &MF) const {
    return false;
--- a/include/llvm/DebugInfo/CodeView/CodeViewSymbols.def
+++ b/include/llvm/DebugInfo/CodeView/CodeViewSymbols.def
@ -143,7 +143,6 @@ CV_SYMBOL(S_MANSLOT       , 0x1120)
 CV_SYMBOL(S_MANMANYREG    , 0x1121)
 CV_SYMBOL(S_MANREGREL     , 0x1122)
 CV_SYMBOL(S_MANMANYREG2   , 0x1123)
-CV_SYMBOL(S_UNAMESPACE    , 0x1124)
 CV_SYMBOL(S_DATAREF       , 0x1126)
 CV_SYMBOL(S_ANNOTATIONREF , 0x1128)
 CV_SYMBOL(S_TOKENREF      , 0x1129)
@ -255,6 +254,7 @@ SYMBOL_RECORD_ALIAS(S_GMANDATA      , 0x111d, ManagedGlobalData, DataSym)
 SYMBOL_RECORD(S_LTHREAD32     , 0x1112, ThreadLocalDataSym)
 SYMBOL_RECORD_ALIAS(S_GTHREAD32     , 0x1113, GlobalTLS, ThreadLocalDataSym)

+SYMBOL_RECORD(S_UNAMESPACE    , 0x1124, UsingNamespaceSym)

 #undef CV_SYMBOL
 #undef SYMBOL_RECORD
--- a/include/llvm/DebugInfo/CodeView/SymbolRecord.h
+++ b/include/llvm/DebugInfo/CodeView/SymbolRecord.h
@ -942,6 +942,19 @@ class ThreadLocalDataSym : public SymbolRecord {
  uint32_t RecordOffset;
 };

+// S_UNAMESPACE
+class UsingNamespaceSym : public SymbolRecord {
+public:
+  explicit UsingNamespaceSym(SymbolRecordKind Kind) : SymbolRecord(Kind) {}
+  explicit UsingNamespaceSym(uint32_t RecordOffset)
+      : SymbolRecord(SymbolRecordKind::RegRelativeSym),
+        RecordOffset(RecordOffset) {}
+
+  StringRef Name;
+
+  uint32_t RecordOffset;
+};
+
 // S_ANNOTATION

 using CVSymbol = CVRecord<SymbolKind>;
--- a/include/llvm/DebugInfo/DIContext.h
+++ b/include/llvm/DebugInfo/DIContext.h
@ -154,6 +154,8 @@ enum DIDumpType : unsigned {
 struct DIDumpOptions {
  unsigned DumpType = DIDT_All;
  unsigned RecurseDepth = -1U;
+  uint16_t Version = 0; // DWARF version to assume when extracting.
+  uint8_t AddrSize = 4; // Address byte size to assume when extracting.
  bool ShowAddresses = true;
  bool ShowChildren = false;
  bool ShowParents = false;
--- a/include/llvm/DebugInfo/DWARF/DWARFContext.h
+++ b/include/llvm/DebugInfo/DWARF/DWARFContext.h
@ -323,6 +323,10 @@ class DWARFContext : public DIContext {
  /// have initialized the relevant target descriptions.
  Error loadRegisterInfo(const object::ObjectFile &Obj);

+  /// Get address size from CUs.
+  /// TODO: refactor compile_units() to make this const.
+  uint8_t getCUAddrSize();
+
 private:
  /// Return the compile unit which contains instruction with provided
  /// address.
--- a/include/llvm/DebugInfo/DWARF/DWARFDataExtractor.h
+++ b/include/llvm/DebugInfo/DWARF/DWARFDataExtractor.h
@ -51,6 +51,8 @@ class DWARFDataExtractor : public DataExtractor {
  /// reflect the absolute address of this pointer.
  Optional<uint64_t> getEncodedPointer(uint32_t *Offset, uint8_t Encoding,
                                       uint64_t AbsPosOffset = 0) const;
+
+  size_t size() const { return Section == nullptr ? 0 : Section->Data.size(); }
 };

 } // end namespace llvm
--- a/include/llvm/DebugInfo/DWARF/DWARFDebugAddr.h
+++ b/include/llvm/DebugInfo/DWARF/DWARFDebugAddr.h
@ -0,0 +1,98 @@
+//===- DWARFDebugAddr.h -------------------------------------*- C++ -*-===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===------------------------------------------------------------------===//
+
+#ifndef LLVM_DEBUGINFO_DWARFDEBUGADDR_H
+#define LLVM_DEBUGINFO_DWARFDEBUGADDR_H
+
+#include "llvm/BinaryFormat/Dwarf.h"
+#include "llvm/DebugInfo/DIContext.h"
+#include "llvm/DebugInfo/DWARF/DWARFDataExtractor.h"
+#include "llvm/Support/Errc.h"
+#include "llvm/Support/Error.h"
+#include <cstdint>
+#include <map>
+#include <vector>
+
+namespace llvm {
+
+class Error;
+class raw_ostream;
+
+/// A class representing an address table as specified in DWARF v5.
+/// The table consists of a header followed by an array of address values from
+/// .debug_addr section.
+class DWARFDebugAddrTable {
+public:
+  struct Header {
+    /// The total length of the entries for this table, not including the length
+    /// field itself.
+    uint32_t Length = 0;
+    /// The DWARF version number.
+    uint16_t Version = 5;
+    /// The size in bytes of an address on the target architecture. For
+    /// segmented addressing, this is the size of the offset portion of the
+    /// address.
+    uint8_t AddrSize;
+    /// The size in bytes of a segment selector on the target architecture.
+    /// If the target system uses a flat address space, this value is 0.
+    uint8_t SegSize = 0;
+  };
+
+private:
+  dwarf::DwarfFormat Format;
+  uint32_t HeaderOffset;
+  Header HeaderData;
+  uint32_t DataSize = 0;
+  std::vector<uint64_t> Addrs;
+
+public:
+  void clear();
+
+  /// Extract an entire table, including all addresses.
+  Error extract(DWARFDataExtractor Data, uint32_t *OffsetPtr,
+                uint16_t Version, uint8_t AddrSize,
+                std::function<void(Error)> WarnCallback);
+
+  uint32_t getHeaderOffset() const { return HeaderOffset; }
+  uint8_t getAddrSize() const { return HeaderData.AddrSize; }
+  void dump(raw_ostream &OS, DIDumpOptions DumpOpts = {}) const;
+
+  /// Return the address based on a given index.
+  Expected<uint64_t> getAddrEntry(uint32_t Index) const;
+
+  /// Return the size of the table header including the length
+  /// but not including the addresses.
+  uint8_t getHeaderSize() const {
+    switch (Format) {
+    case dwarf::DwarfFormat::DWARF32:
+      return 8; // 4 + 2 + 1 + 1
+    case dwarf::DwarfFormat::DWARF64:
+      return 16; // 12 + 2 + 1 + 1
+    }
+    llvm_unreachable("Invalid DWARF format (expected DWARF32 or DWARF64)");
+  }
+
+  /// Returns the length of this table, including the length field, or 0 if the
+  /// length has not been determined (e.g. because the table has not yet been
+  /// parsed, or there was a problem in parsing).
+  uint32_t getLength() const;
+
+  /// Verify that the given length is valid for this table.
+  bool hasValidLength() const { return getLength() != 0; }
+
+  /// Invalidate Length field to stop further processing.
+  void invalidateLength() { HeaderData.Length = 0; }
+
+  /// Returns the length of the array of addresses.
+  uint32_t getDataSize() const;
+};
+
+} // end namespace llvm
+
+#endif // LLVM_DEBUGINFO_DWARFDEBUGADDR_H
--- a/include/llvm/DebugInfo/DWARF/DWARFDie.h
+++ b/include/llvm/DebugInfo/DWARF/DWARFDie.h
@ -46,7 +46,7 @@ class DWARFDie {

 public:
  DWARFDie() = default;
-  DWARFDie(DWARFUnit *Unit, const DWARFDebugInfoEntry * D) : U(Unit), Die(D) {}
+  DWARFDie(DWARFUnit *Unit, const DWARFDebugInfoEntry *D) : U(Unit), Die(D) {}

  bool isValid() const { return U && Die; }
  explicit operator bool() const { return isValid(); }
@ -82,9 +82,7 @@ class DWARFDie {
  }

  /// Returns true for a valid DIE that terminates a sibling chain.
-  bool isNULL() const {
-    return getAbbreviationDeclarationPtr() == nullptr;
-  }
+  bool isNULL() const { return getAbbreviationDeclarationPtr() == nullptr; }

  /// Returns true if DIE represents a subprogram (not inlined).
  bool isSubprogramDIE() const;
@ -129,7 +127,6 @@ class DWARFDie {
  void dump(raw_ostream &OS, unsigned indent = 0,
            DIDumpOptions DumpOpts = DIDumpOptions()) const;

-
  /// Convenience zero-argument overload for debugging.
  LLVM_DUMP_METHOD void dump() const;

@ -275,12 +272,16 @@ class DWARFDie {

  iterator begin() const;
  iterator end() const;
+
+  std::reverse_iterator<iterator> rbegin() const;
+  std::reverse_iterator<iterator> rend() const;
+
  iterator_range<iterator> children() const;
 };

-class DWARFDie::attribute_iterator :
-    public iterator_facade_base<attribute_iterator, std::forward_iterator_tag,
-                                const DWARFAttribute> {
+class DWARFDie::attribute_iterator
+    : public iterator_facade_base<attribute_iterator, std::forward_iterator_tag,
+                                  const DWARFAttribute> {
  /// The DWARF DIE we are extracting attributes from.
  DWARFDie Die;
  /// The value vended to clients via the operator*() or operator->().
@ -288,6 +289,9 @@ class DWARFDie::attribute_iterator :
  /// The attribute index within the abbreviation declaration in Die.
  uint32_t Index;

+  friend bool operator==(const attribute_iterator &LHS,
+                         const attribute_iterator &RHS);
+
  /// Update the attribute index and attempt to read the attribute value. If the
  /// attribute is able to be read, update AttrValue and the Index member
  /// variable. If the attribute value is not able to be read, an appropriate
@ -303,12 +307,21 @@ class DWARFDie::attribute_iterator :
  attribute_iterator &operator--();
  explicit operator bool() const { return AttrValue.isValid(); }
  const DWARFAttribute &operator*() const { return AttrValue; }
-  bool operator==(const attribute_iterator &X) const { return Index == X.Index; }
 };

+inline bool operator==(const DWARFDie::attribute_iterator &LHS,
+                       const DWARFDie::attribute_iterator &RHS) {
+  return LHS.Index == RHS.Index;
+}
+
+inline bool operator!=(const DWARFDie::attribute_iterator &LHS,
+                       const DWARFDie::attribute_iterator &RHS) {
+  return !(LHS == RHS);
+}
+
 inline bool operator==(const DWARFDie &LHS, const DWARFDie &RHS) {
  return LHS.getDebugInfoEntry() == RHS.getDebugInfoEntry() &&
-      LHS.getDwarfUnit() == RHS.getDwarfUnit();
+         LHS.getDwarfUnit() == RHS.getDwarfUnit();
 }

 inline bool operator!=(const DWARFDie &LHS, const DWARFDie &RHS) {
@ -323,11 +336,15 @@ class DWARFDie::iterator
    : public iterator_facade_base<iterator, std::bidirectional_iterator_tag,
                                  const DWARFDie> {
  DWARFDie Die;
+
+  friend std::reverse_iterator<llvm::DWARFDie::iterator>;
+  friend bool operator==(const DWARFDie::iterator &LHS,
+                         const DWARFDie::iterator &RHS);
+
 public:
  iterator() = default;

-  explicit iterator(DWARFDie D) : Die(D) {
-  }
+  explicit iterator(DWARFDie D) : Die(D) {}

  iterator &operator++() {
    Die = Die.getSibling();
@ -339,11 +356,19 @@ class DWARFDie::iterator
    return *this;
  }

-  explicit operator bool() const { return Die.isValid(); }
  const DWARFDie &operator*() const { return Die; }
-  bool operator==(const iterator &X) const { return Die == X.Die; }
 };

+inline bool operator==(const DWARFDie::iterator &LHS,
+                       const DWARFDie::iterator &RHS) {
+  return LHS.Die == RHS.Die;
+}
+
+inline bool operator!=(const DWARFDie::iterator &LHS,
+                       const DWARFDie::iterator &RHS) {
+  return !(LHS == RHS);
+}
+
 // These inline functions must follow the DWARFDie::iterator definition above
 // as they use functions from that class.
 inline DWARFDie::iterator DWARFDie::begin() const {
@ -360,4 +385,80 @@ inline iterator_range<DWARFDie::iterator> DWARFDie::children() const {

 } // end namespace llvm

+namespace std {
+
+template <>
+class reverse_iterator<llvm::DWARFDie::iterator>
+    : public llvm::iterator_facade_base<
+          reverse_iterator<llvm::DWARFDie::iterator>,
+          bidirectional_iterator_tag, const llvm::DWARFDie> {
+
+private:
+  llvm::DWARFDie Die;
+  bool AtEnd;
+
+public:
+  reverse_iterator(llvm::DWARFDie::iterator It)
+      : Die(It.Die), AtEnd(!It.Die.getPreviousSibling()) {
+    if (!AtEnd)
+      Die = Die.getPreviousSibling();
+  }
+
+  reverse_iterator<llvm::DWARFDie::iterator> &operator++() {
+    assert(!AtEnd && "Incrementing rend");
+    llvm::DWARFDie D = Die.getPreviousSibling();
+    if (D)
+      Die = D;
+    else
+      AtEnd = true;
+    return *this;
+  }
+
+  reverse_iterator<llvm::DWARFDie::iterator> &operator--() {
+    if (AtEnd) {
+      AtEnd = false;
+      return *this;
+    }
+    Die = Die.getSibling();
+    assert(!Die.isNULL() && "Decrementing rbegin");
+    return *this;
+  }
+
+  const llvm::DWARFDie &operator*() const {
+    assert(Die.isValid());
+    return Die;
+  }
+
+  // FIXME: We should be able to specify the equals operator as a friend, but
+  //        that causes the compiler to think the operator overload is ambiguous
+  //        with the friend declaration and the actual definition as candidates.
+  bool equals(const reverse_iterator<llvm::DWARFDie::iterator> &RHS) const {
+    return Die == RHS.Die && AtEnd == RHS.AtEnd;
+  }
+};
+
+} // namespace std
+
+namespace llvm {
+
+inline bool operator==(const std::reverse_iterator<DWARFDie::iterator> &LHS,
+                       const std::reverse_iterator<DWARFDie::iterator> &RHS) {
+  return LHS.equals(RHS);
+}
+
+inline bool operator!=(const std::reverse_iterator<DWARFDie::iterator> &LHS,
+                       const std::reverse_iterator<DWARFDie::iterator> &RHS) {
+  return !(LHS == RHS);
+}
+
+inline std::reverse_iterator<DWARFDie::iterator> DWARFDie::rbegin() const {
+  return llvm::make_reverse_iterator(end());
+}
+
+inline std::reverse_iterator<DWARFDie::iterator> DWARFDie::rend() const {
+  return llvm::make_reverse_iterator(begin());
+}
+
+} // end namespace llvm
+
 #endif // LLVM_DEBUGINFO_DWARFDIE_H
--- a/include/llvm/ExecutionEngine/Orc/RPCSerialization.h
+++ b/include/llvm/ExecutionEngine/Orc/RPCSerialization.h
@ -14,7 +14,10 @@
 #include "llvm/Support/thread.h"
 #include <map>
 #include <mutex>
+#include <set>
 #include <sstream>
+#include <string>
+#include <vector>

 namespace llvm {
 namespace orc {
@ -205,6 +208,42 @@ std::mutex RPCTypeName<std::vector<T>>::NameMutex;
 template <typename T>
 std::string RPCTypeName<std::vector<T>>::Name;

+template <typename T> class RPCTypeName<std::set<T>> {
+public:
+  static const char *getName() {
+    std::lock_guard<std::mutex> Lock(NameMutex);
+    if (Name.empty())
+      raw_string_ostream(Name)
+          << "std::set<" << RPCTypeName<T>::getName() << ">";
+    return Name.data();
+  }
+
+private:
+  static std::mutex NameMutex;
+  static std::string Name;
+};
+
+template <typename T> std::mutex RPCTypeName<std::set<T>>::NameMutex;
+template <typename T> std::string RPCTypeName<std::set<T>>::Name;
+
+template <typename K, typename V> class RPCTypeName<std::map<K, V>> {
+public:
+  static const char *getName() {
+    std::lock_guard<std::mutex> Lock(NameMutex);
+    if (Name.empty())
+      raw_string_ostream(Name)
+          << "std::map<" << RPCTypeNameSequence<K, V>() << ">";
+    return Name.data();
+  }
+
+private:
+  static std::mutex NameMutex;
+  static std::string Name;
+};
+
+template <typename K, typename V>
+std::mutex RPCTypeName<std::map<K, V>>::NameMutex;
+template <typename K, typename V> std::string RPCTypeName<std::map<K, V>>::Name;

 /// The SerializationTraits<ChannelT, T> class describes how to serialize and
 /// deserialize an instance of type T to/from an abstract channel of type
@ -527,15 +566,20 @@ class SerializationTraits<ChannelT, Expected<T>, Error> {
 };

 /// SerializationTraits default specialization for std::pair.
-template <typename ChannelT, typename T1, typename T2>
-class SerializationTraits<ChannelT, std::pair<T1, T2>> {
+template <typename ChannelT, typename T1, typename T2, typename T3, typename T4>
+class SerializationTraits<ChannelT, std::pair<T1, T2>, std::pair<T3, T4>> {
 public:
-  static Error serialize(ChannelT &C, const std::pair<T1, T2> &V) {
-    return serializeSeq(C, V.first, V.second);
+  static Error serialize(ChannelT &C, const std::pair<T3, T4> &V) {
+    if (auto Err = SerializationTraits<ChannelT, T1, T3>::serialize(C, V.first))
+      return Err;
+    return SerializationTraits<ChannelT, T2, T4>::serialize(C, V.second);
  }

-  static Error deserialize(ChannelT &C, std::pair<T1, T2> &V) {
-    return deserializeSeq(C, V.first, V.second);
+  static Error deserialize(ChannelT &C, std::pair<T3, T4> &V) {
+    if (auto Err =
+            SerializationTraits<ChannelT, T1, T3>::deserialize(C, V.first))
+      return Err;
+    return SerializationTraits<ChannelT, T2, T4>::deserialize(C, V.second);
  }
 };

@ -589,6 +633,9 @@ class SerializationTraits<ChannelT, std::vector<T>> {

  /// Deserialize a std::vector<T> to a std::vector<T>.
  static Error deserialize(ChannelT &C, std::vector<T> &V) {
+    assert(V.empty() &&
+           "Expected default-constructed vector to deserialize into");
+
    uint64_t Count = 0;
    if (auto Err = deserializeSeq(C, Count))
      return Err;
@ -602,6 +649,92 @@ class SerializationTraits<ChannelT, std::vector<T>> {
  }
 };

+template <typename ChannelT, typename T, typename T2>
+class SerializationTraits<ChannelT, std::set<T>, std::set<T2>> {
+public:
+  /// Serialize a std::set<T> from std::set<T2>.
+  static Error serialize(ChannelT &C, const std::set<T2> &S) {
+    if (auto Err = serializeSeq(C, static_cast<uint64_t>(S.size())))
+      return Err;
+
+    for (const auto &E : S)
+      if (auto Err = SerializationTraits<ChannelT, T, T2>::serialize(C, E))
+        return Err;
+
+    return Error::success();
+  }
+
+  /// Deserialize a std::set<T> to a std::set<T>.
+  static Error deserialize(ChannelT &C, std::set<T2> &S) {
+    assert(S.empty() && "Expected default-constructed set to deserialize into");
+
+    uint64_t Count = 0;
+    if (auto Err = deserializeSeq(C, Count))
+      return Err;
+
+    while (Count-- != 0) {
+      T2 Val;
+      if (auto Err = SerializationTraits<ChannelT, T, T2>::deserialize(C, Val))
+        return Err;
+
+      auto Added = S.insert(Val).second;
+      if (!Added)
+        return make_error<StringError>("Duplicate element in deserialized set",
+                                       orcError(OrcErrorCode::UnknownORCError));
+    }
+
+    return Error::success();
+  }
+};
+
+template <typename ChannelT, typename K, typename V, typename K2, typename V2>
+class SerializationTraits<ChannelT, std::map<K, V>, std::map<K2, V2>> {
+public:
+  /// Serialize a std::map<K, V> from std::map<K2, V2>.
+  static Error serialize(ChannelT &C, const std::map<K2, V2> &M) {
+    if (auto Err = serializeSeq(C, static_cast<uint64_t>(M.size())))
+      return Err;
+
+    for (const auto &E : M) {
+      if (auto Err =
+              SerializationTraits<ChannelT, K, K2>::serialize(C, E.first))
+        return Err;
+      if (auto Err =
+              SerializationTraits<ChannelT, V, V2>::serialize(C, E.second))
+        return Err;
+    }
+
+    return Error::success();
+  }
+
+  /// Deserialize a std::map<K, V> to a std::map<K, V>.
+  static Error deserialize(ChannelT &C, std::map<K2, V2> &M) {
+    assert(M.empty() && "Expected default-constructed map to deserialize into");
+
+    uint64_t Count = 0;
+    if (auto Err = deserializeSeq(C, Count))
+      return Err;
+
+    while (Count-- != 0) {
+      std::pair<K2, V2> Val;
+      if (auto Err =
+              SerializationTraits<ChannelT, K, K2>::deserialize(C, Val.first))
+        return Err;
+
+      if (auto Err =
+              SerializationTraits<ChannelT, V, V2>::deserialize(C, Val.second))
+        return Err;
+
+      auto Added = M.insert(Val).second;
+      if (!Added)
+        return make_error<StringError>("Duplicate element in deserialized map",
+                                       orcError(OrcErrorCode::UnknownORCError));
+    }
+
+    return Error::success();
+  }
+};
+
 } // end namespace rpc
 } // end namespace orc
 } // end namespace llvm
--- a/include/llvm/IR/Attributes.td
+++ b/include/llvm/IR/Attributes.td
@ -236,3 +236,4 @@ def : MergeRule<"adjustCallerSSPLevel">;
 def : MergeRule<"adjustCallerStackProbes">;
 def : MergeRule<"adjustCallerStackProbeSize">;
 def : MergeRule<"adjustMinLegalVectorWidth">;
+def : MergeRule<"adjustNullPointerValidAttr">;
--- a/include/llvm/IR/Instruction.h
+++ b/include/llvm/IR/Instruction.h
@ -547,7 +547,7 @@ class Instruction : public User,
  /// may have side effects cannot be removed without semantically changing the
  /// generated program.
  bool isSafeToRemove() const;
-  
+
  /// Return true if the instruction is a variety of EH-block.
  bool isEHPad() const {
    switch (getOpcode()) {
--- a/include/llvm/IR/Instructions.h
+++ b/include/llvm/IR/Instructions.h
@ -4016,7 +4016,7 @@ class InvokeInst : public CallBase<InvokeInst> {
  void setDoesNotThrow() {
    addAttribute(AttributeList::FunctionIndex, Attribute::NoUnwind);
  }
-  
+
  /// Return the function called, or null if this is an
  /// indirect function invocation.
  ///
--- a/include/llvm/IR/Intrinsics.td
+++ b/include/llvm/IR/Intrinsics.td
@ -541,7 +541,7 @@ let IntrProperties = [IntrInaccessibleMemOnly] in {
                                                    [ LLVMMatchType<0>,
                                                      llvm_metadata_ty,
                                                      llvm_metadata_ty ]>;
-  def int_experimental_constrained_exp  : Intrinsic<[ llvm_anyfloat_ty ], 
+  def int_experimental_constrained_exp  : Intrinsic<[ llvm_anyfloat_ty ],
                                                    [ LLVMMatchType<0>,
                                                      llvm_metadata_ty,
                                                      llvm_metadata_ty ]>;
--- a/include/llvm/IR/IntrinsicsAMDGPU.td
+++ b/include/llvm/IR/IntrinsicsAMDGPU.td
@ -1191,7 +1191,7 @@ def int_amdgcn_ds_bpermute :
 // Deep learning intrinsics.
 //===----------------------------------------------------------------------===//

-// f32 %r = llvm.amdgcn.fdot2(v2f16 %a, v2f16 %b, f32 %c)
+// f32 %r = llvm.amdgcn.fdot2(v2f16 %a, v2f16 %b, f32 %c, i1 %clamp)
 //   %r = %a[0] * %b[0] + %a[1] * %b[1] + %c
 def int_amdgcn_fdot2 :
  GCCBuiltin<"__builtin_amdgcn_fdot2">,
@ -1200,12 +1200,13 @@ def int_amdgcn_fdot2 :
    [
      llvm_v2f16_ty, // %a
      llvm_v2f16_ty, // %b
-      llvm_float_ty  // %c
+      llvm_float_ty, // %c
+      llvm_i1_ty     // %clamp
    ],
    [IntrNoMem, IntrSpeculatable]
  >;

-// i32 %r = llvm.amdgcn.sdot2(v2i16 %a, v2i16 %b, i32 %c)
+// i32 %r = llvm.amdgcn.sdot2(v2i16 %a, v2i16 %b, i32 %c, i1 %clamp)
 //   %r = %a[0] * %b[0] + %a[1] * %b[1] + %c
 def int_amdgcn_sdot2 :
  GCCBuiltin<"__builtin_amdgcn_sdot2">,
@ -1214,12 +1215,13 @@ def int_amdgcn_sdot2 :
    [
      llvm_v2i16_ty, // %a
      llvm_v2i16_ty, // %b
-      llvm_i32_ty    // %c
+      llvm_i32_ty,   // %c
+      llvm_i1_ty     // %clamp
    ],
    [IntrNoMem, IntrSpeculatable]
  >;

-// u32 %r = llvm.amdgcn.udot2(v2u16 %a, v2u16 %b, u32 %c)
+// u32 %r = llvm.amdgcn.udot2(v2u16 %a, v2u16 %b, u32 %c, i1 %clamp)
 //   %r = %a[0] * %b[0] + %a[1] * %b[1] + %c
 def int_amdgcn_udot2 :
  GCCBuiltin<"__builtin_amdgcn_udot2">,
@ -1228,12 +1230,13 @@ def int_amdgcn_udot2 :
    [
      llvm_v2i16_ty, // %a
      llvm_v2i16_ty, // %b
-      llvm_i32_ty    // %c
+      llvm_i32_ty,   // %c
+      llvm_i1_ty     // %clamp
    ],
    [IntrNoMem, IntrSpeculatable]
  >;

-// i32 %r = llvm.amdgcn.sdot4(v4i8 (as i32) %a, v4i8 (as i32) %b, i32 %c)
+// i32 %r = llvm.amdgcn.sdot4(v4i8 (as i32) %a, v4i8 (as i32) %b, i32 %c, i1 %clamp)
 //   %r = %a[0] * %b[0] + %a[1] * %b[1] + %a[2] * %b[2] + %a[3] * %b[3] + %c
 def int_amdgcn_sdot4 :
  GCCBuiltin<"__builtin_amdgcn_sdot4">,
@ -1242,12 +1245,13 @@ def int_amdgcn_sdot4 :
    [
      llvm_i32_ty, // %a
      llvm_i32_ty, // %b
-      llvm_i32_ty  // %c
+      llvm_i32_ty, // %c
+      llvm_i1_ty   // %clamp
    ],
    [IntrNoMem, IntrSpeculatable]
  >;

-// u32 %r = llvm.amdgcn.udot4(v4u8 (as u32) %a, v4u8 (as u32) %b, u32 %c)
+// u32 %r = llvm.amdgcn.udot4(v4u8 (as u32) %a, v4u8 (as u32) %b, u32 %c, i1 %clamp)
 //   %r = %a[0] * %b[0] + %a[1] * %b[1] + %a[2] * %b[2] + %a[3] * %b[3] + %c
 def int_amdgcn_udot4 :
  GCCBuiltin<"__builtin_amdgcn_udot4">,
@ -1256,12 +1260,13 @@ def int_amdgcn_udot4 :
    [
      llvm_i32_ty, // %a
      llvm_i32_ty, // %b
-      llvm_i32_ty  // %c
+      llvm_i32_ty, // %c
+      llvm_i1_ty   // %clamp
    ],
    [IntrNoMem, IntrSpeculatable]
  >;

-// i32 %r = llvm.amdgcn.sdot8(v8i4 (as i32) %a, v8i4 (as i32) %b, i32 %c)
+// i32 %r = llvm.amdgcn.sdot8(v8i4 (as i32) %a, v8i4 (as i32) %b, i32 %c, i1 %clamp)
 //   %r = %a[0] * %b[0] + %a[1] * %b[1] + %a[2] * %b[2] + %a[3] * %b[3] +
 //        %a[4] * %b[4] + %a[5] * %b[5] + %a[6] * %b[6] + %a[7] * %b[7] + %c
 def int_amdgcn_sdot8 :
@ -1271,12 +1276,13 @@ def int_amdgcn_sdot8 :
    [
      llvm_i32_ty, // %a
      llvm_i32_ty, // %b
-      llvm_i32_ty  // %c
+      llvm_i32_ty, // %c
+      llvm_i1_ty   // %clamp
    ],
    [IntrNoMem, IntrSpeculatable]
  >;

-// u32 %r = llvm.amdgcn.udot8(v8u4 (as u32) %a, v8u4 (as u32) %b, u32 %c)
+// u32 %r = llvm.amdgcn.udot8(v8u4 (as u32) %a, v8u4 (as u32) %b, u32 %c, i1 %clamp)
 //   %r = %a[0] * %b[0] + %a[1] * %b[1] + %a[2] * %b[2] + %a[3] * %b[3] +
 //        %a[4] * %b[4] + %a[5] * %b[5] + %a[6] * %b[6] + %a[7] * %b[7] + %c
 def int_amdgcn_udot8 :
@ -1286,7 +1292,8 @@ def int_amdgcn_udot8 :
    [
      llvm_i32_ty, // %a
      llvm_i32_ty, // %b
-      llvm_i32_ty  // %c
+      llvm_i32_ty, // %c
+      llvm_i1_ty   // %clamp
    ],
    [IntrNoMem, IntrSpeculatable]
  >;
--- a/include/llvm/IR/IntrinsicsARM.td
+++ b/include/llvm/IR/IntrinsicsARM.td
@ -275,7 +275,7 @@ def int_arm_stc : GCCBuiltin<"__builtin_arm_stc">,
   Intrinsic<[], [llvm_i32_ty, llvm_i32_ty, llvm_ptr_ty], []>;
 def int_arm_stcl : GCCBuiltin<"__builtin_arm_stcl">,
   Intrinsic<[], [llvm_i32_ty, llvm_i32_ty, llvm_ptr_ty], []>;
-def int_arm_stc2 : GCCBuiltin<"__builtin_arm_stc2">, 
+def int_arm_stc2 : GCCBuiltin<"__builtin_arm_stc2">,
   Intrinsic<[], [llvm_i32_ty, llvm_i32_ty, llvm_ptr_ty], []>;
 def int_arm_stc2l : GCCBuiltin<"__builtin_arm_stc2l">,
   Intrinsic<[], [llvm_i32_ty, llvm_i32_ty, llvm_ptr_ty], []>;
--- a/include/llvm/IR/IntrinsicsPowerPC.td
+++ b/include/llvm/IR/IntrinsicsPowerPC.td
@ -1,10 +1,10 @@
 //===- IntrinsicsPowerPC.td - Defines PowerPC intrinsics ---*- tablegen -*-===//
-// 
+//
 //                     The LLVM Compiler Infrastructure
 //
 // This file is distributed under the University of Illinois Open Source
 // License. See LICENSE.TXT for details.
-// 
+//
 //===----------------------------------------------------------------------===//
 //
 // This file defines all of the PowerPC-specific intrinsics.
@ -122,21 +122,21 @@ class PowerPC_Vec_FFF_Intrinsic<string GCCIntSuffix>

 /// PowerPC_Vec_BBB_Intrinsic - A PowerPC intrinsic that takes two v16i8
 /// vectors and returns one.  These intrinsics have no side effects.
-class PowerPC_Vec_BBB_Intrinsic<string GCCIntSuffix> 
+class PowerPC_Vec_BBB_Intrinsic<string GCCIntSuffix>
  : PowerPC_Vec_Intrinsic<GCCIntSuffix,
                          [llvm_v16i8_ty], [llvm_v16i8_ty, llvm_v16i8_ty],
                          [IntrNoMem]>;

 /// PowerPC_Vec_HHH_Intrinsic - A PowerPC intrinsic that takes two v8i16
 /// vectors and returns one.  These intrinsics have no side effects.
-class PowerPC_Vec_HHH_Intrinsic<string GCCIntSuffix> 
+class PowerPC_Vec_HHH_Intrinsic<string GCCIntSuffix>
  : PowerPC_Vec_Intrinsic<GCCIntSuffix,
                          [llvm_v8i16_ty], [llvm_v8i16_ty, llvm_v8i16_ty],
                          [IntrNoMem]>;

 /// PowerPC_Vec_WWW_Intrinsic - A PowerPC intrinsic that takes two v4i32
 /// vectors and returns one.  These intrinsics have no side effects.
-class PowerPC_Vec_WWW_Intrinsic<string GCCIntSuffix> 
+class PowerPC_Vec_WWW_Intrinsic<string GCCIntSuffix>
  : PowerPC_Vec_Intrinsic<GCCIntSuffix,
                          [llvm_v4i32_ty], [llvm_v4i32_ty, llvm_v4i32_ty],
                          [IntrNoMem]>;
@ -267,7 +267,7 @@ let TargetPrefix = "ppc" in {  // All intrinsics start with "llvm.ppc.".
  def int_ppc_altivec_vcmpgtud : GCCBuiltin<"__builtin_altivec_vcmpgtud">,
              Intrinsic<[llvm_v2i64_ty], [llvm_v2i64_ty, llvm_v2i64_ty],
                        [IntrNoMem]>;
-                                                
+
  def int_ppc_altivec_vcmpequw : GCCBuiltin<"__builtin_altivec_vcmpequw">,
              Intrinsic<[llvm_v4i32_ty], [llvm_v4i32_ty, llvm_v4i32_ty],
                        [IntrNoMem]>;
@ -283,7 +283,7 @@ let TargetPrefix = "ppc" in {  // All intrinsics start with "llvm.ppc.".
  def int_ppc_altivec_vcmpnezw : GCCBuiltin<"__builtin_altivec_vcmpnezw">,
              Intrinsic<[llvm_v4i32_ty], [llvm_v4i32_ty, llvm_v4i32_ty],
                        [IntrNoMem]>;
-                        
+
  def int_ppc_altivec_vcmpequh : GCCBuiltin<"__builtin_altivec_vcmpequh">,
              Intrinsic<[llvm_v8i16_ty], [llvm_v8i16_ty, llvm_v8i16_ty],
                        [IntrNoMem]>;
@ -355,7 +355,7 @@ let TargetPrefix = "ppc" in {  // All intrinsics start with "llvm.ppc.".
  def int_ppc_altivec_vcmpnezw_p : GCCBuiltin<"__builtin_altivec_vcmpnezw_p">,
              Intrinsic<[llvm_i32_ty],[llvm_i32_ty,llvm_v4i32_ty,llvm_v4i32_ty],
                        [IntrNoMem]>;
-                        
+
  def int_ppc_altivec_vcmpequh_p : GCCBuiltin<"__builtin_altivec_vcmpequh_p">,
              Intrinsic<[llvm_i32_ty],[llvm_i32_ty,llvm_v8i16_ty,llvm_v8i16_ty],
                        [IntrNoMem]>;
@ -474,10 +474,10 @@ let TargetPrefix = "ppc" in {  // All PPC intrinsics start with "llvm.ppc.".
            Intrinsic<[llvm_v4i32_ty], [llvm_v8i16_ty, llvm_v8i16_ty,
                       llvm_v4i32_ty], [IntrNoMem]>;
  def int_ppc_altivec_vmsumshs : GCCBuiltin<"__builtin_altivec_vmsumshs">,
-            Intrinsic<[llvm_v4i32_ty], [llvm_v8i16_ty, llvm_v8i16_ty, 
+            Intrinsic<[llvm_v4i32_ty], [llvm_v8i16_ty, llvm_v8i16_ty,
                       llvm_v4i32_ty], [IntrNoMem]>;
  def int_ppc_altivec_vmsumubm : GCCBuiltin<"__builtin_altivec_vmsumubm">,
-            Intrinsic<[llvm_v4i32_ty], [llvm_v16i8_ty, llvm_v16i8_ty, 
+            Intrinsic<[llvm_v4i32_ty], [llvm_v16i8_ty, llvm_v16i8_ty,
                       llvm_v4i32_ty], [IntrNoMem]>;
  def int_ppc_altivec_vmsumuhm : GCCBuiltin<"__builtin_altivec_vmsumuhm">,
            Intrinsic<[llvm_v4i32_ty], [llvm_v8i16_ty, llvm_v8i16_ty,
@ -544,7 +544,7 @@ let TargetPrefix = "ppc" in {  // All PPC intrinsics start with "llvm.ppc.".

  // Other multiplies.
  def int_ppc_altivec_vmladduhm : GCCBuiltin<"__builtin_altivec_vmladduhm">,
-            Intrinsic<[llvm_v8i16_ty], [llvm_v8i16_ty, llvm_v8i16_ty, 
+            Intrinsic<[llvm_v8i16_ty], [llvm_v8i16_ty, llvm_v8i16_ty,
                       llvm_v8i16_ty], [IntrNoMem]>;

  // Packs.
@ -626,21 +626,21 @@ let TargetPrefix = "ppc" in {  // All PPC intrinsics start with "llvm.ppc.".

  // Add Extended Quadword
  def int_ppc_altivec_vaddeuqm : GCCBuiltin<"__builtin_altivec_vaddeuqm">,
-              Intrinsic<[llvm_v1i128_ty], 
+              Intrinsic<[llvm_v1i128_ty],
                        [llvm_v1i128_ty, llvm_v1i128_ty, llvm_v1i128_ty],
                        [IntrNoMem]>;
  def int_ppc_altivec_vaddecuq : GCCBuiltin<"__builtin_altivec_vaddecuq">,
-              Intrinsic<[llvm_v1i128_ty], 
+              Intrinsic<[llvm_v1i128_ty],
                        [llvm_v1i128_ty, llvm_v1i128_ty, llvm_v1i128_ty],
                        [IntrNoMem]>;

  // Sub Extended Quadword
  def int_ppc_altivec_vsubeuqm : GCCBuiltin<"__builtin_altivec_vsubeuqm">,
-              Intrinsic<[llvm_v1i128_ty], 
+              Intrinsic<[llvm_v1i128_ty],
                        [llvm_v1i128_ty, llvm_v1i128_ty, llvm_v1i128_ty],
                        [IntrNoMem]>;
  def int_ppc_altivec_vsubecuq : GCCBuiltin<"__builtin_altivec_vsubecuq">,
-              Intrinsic<[llvm_v1i128_ty], 
+              Intrinsic<[llvm_v1i128_ty],
                        [llvm_v1i128_ty, llvm_v1i128_ty, llvm_v1i128_ty],
                        [IntrNoMem]>;
 }
@ -657,7 +657,7 @@ def int_ppc_altivec_vslw  : PowerPC_Vec_WWW_Intrinsic<"vslw">;
 // Right Shifts.
 def int_ppc_altivec_vsr   : PowerPC_Vec_WWW_Intrinsic<"vsr">;
 def int_ppc_altivec_vsro  : PowerPC_Vec_WWW_Intrinsic<"vsro">;
-  
+
 def int_ppc_altivec_vsrb  : PowerPC_Vec_BBB_Intrinsic<"vsrb">;
 def int_ppc_altivec_vsrh  : PowerPC_Vec_HHH_Intrinsic<"vsrh">;
 def int_ppc_altivec_vsrw  : PowerPC_Vec_WWW_Intrinsic<"vsrw">;
@ -679,10 +679,10 @@ let TargetPrefix = "ppc" in {  // All PPC intrinsics start with "llvm.ppc.".
              Intrinsic<[llvm_v16i8_ty], [llvm_ptr_ty], [IntrNoMem]>;

  def int_ppc_altivec_vperm : GCCBuiltin<"__builtin_altivec_vperm_4si">,
-              Intrinsic<[llvm_v4i32_ty], [llvm_v4i32_ty, 
+              Intrinsic<[llvm_v4i32_ty], [llvm_v4i32_ty,
                         llvm_v4i32_ty, llvm_v16i8_ty], [IntrNoMem]>;
  def int_ppc_altivec_vsel : GCCBuiltin<"__builtin_altivec_vsel_4si">,
-              Intrinsic<[llvm_v4i32_ty], [llvm_v4i32_ty, 
+              Intrinsic<[llvm_v4i32_ty], [llvm_v4i32_ty,
                         llvm_v4i32_ty, llvm_v4i32_ty], [IntrNoMem]>;
  def int_ppc_altivec_vgbbd : GCCBuiltin<"__builtin_altivec_vgbbd">,
              Intrinsic<[llvm_v16i8_ty], [llvm_v16i8_ty], [IntrNoMem]>;
--- a/include/llvm/IR/LegacyPassManagers.h
+++ b/include/llvm/IR/LegacyPassManagers.h
@ -285,7 +285,7 @@ class PMTopLevelManager {
  SpecificBumpPtrAllocator<AUFoldingSetNode> AUFoldingSetNodeAllocator;

  // Maps from a pass to it's associated entry in UniqueAnalysisUsages.  Does
-  // not own the storage associated with either key or value.. 
+  // not own the storage associated with either key or value..
  DenseMap<Pass *, AnalysisUsage*> AnUsageMap;

  /// Collection of PassInfo objects found via analysis IDs and in this top
--- a/include/llvm/IR/Statepoint.h
+++ b/include/llvm/IR/Statepoint.h
@ -325,7 +325,7 @@ class Statepoint
  explicit Statepoint(CallSite CS) : Base(CS) {}
 };

-/// Common base class for representing values projected from a statepoint.  
+/// Common base class for representing values projected from a statepoint.
 /// Currently, the only projections available are gc.result and gc.relocate.
 class GCProjectionInst : public IntrinsicInst {
 public:
--- a/include/llvm/IR/User.h
+++ b/include/llvm/IR/User.h
@ -101,10 +101,10 @@ class User : public Value {
  void operator delete(void *Usr);
  /// Placement delete - required by std, called if the ctor throws.
  void operator delete(void *Usr, unsigned) {
-    // Note: If a subclass manipulates the information which is required to calculate the 
-    // Usr memory pointer, e.g. NumUserOperands, the operator delete of that subclass has 
+    // Note: If a subclass manipulates the information which is required to calculate the
+    // Usr memory pointer, e.g. NumUserOperands, the operator delete of that subclass has
    // to restore the changed information to the original value, since the dtor of that class
-    // is not called if the ctor fails.  
+    // is not called if the ctor fails.
    User::operator delete(Usr);

 #ifndef LLVM_ENABLE_EXCEPTIONS
@ -113,10 +113,10 @@ class User : public Value {
  }
  /// Placement delete - required by std, called if the ctor throws.
  void operator delete(void *Usr, unsigned, bool) {
-    // Note: If a subclass manipulates the information which is required to calculate the 
-    // Usr memory pointer, e.g. NumUserOperands, the operator delete of that subclass has 
+    // Note: If a subclass manipulates the information which is required to calculate the
+    // Usr memory pointer, e.g. NumUserOperands, the operator delete of that subclass has
    // to restore the changed information to the original value, since the dtor of that class
-    // is not called if the ctor fails.  
+    // is not called if the ctor fails.
    User::operator delete(Usr);

 #ifndef LLVM_ENABLE_EXCEPTIONS
--- a/include/llvm/LinkAllIR.h
+++ b/include/llvm/LinkAllIR.h
@ -44,7 +44,7 @@ namespace {
      llvm::LLVMContext Context;
      (void)new llvm::Module("", Context);
      (void)new llvm::UnreachableInst(Context);
-      (void)    llvm::createVerifierPass(); 
+      (void)    llvm::createVerifierPass();
    }
  } ForceVMCoreLinking;
 }
--- a/include/llvm/MC/MCDwarf.h
+++ b/include/llvm/MC/MCDwarf.h
@ -362,6 +362,13 @@ class MCDwarfLineAddr {
  static void Encode(MCContext &Context, MCDwarfLineTableParams Params,
                     int64_t LineDelta, uint64_t AddrDelta, raw_ostream &OS);

+  /// Utility function to encode a Dwarf pair of LineDelta and AddrDeltas using
+  /// fixed length operands.
+  static bool FixedEncode(MCContext &Context,
+                          MCDwarfLineTableParams Params,
+                          int64_t LineDelta, uint64_t AddrDelta,
+                          raw_ostream &OS, uint32_t *Offset, uint32_t *Size);
+
  /// Utility function to emit the encoding to a streamer.
  static void Emit(MCStreamer *MCOS, MCDwarfLineTableParams Params,
                   int64_t LineDelta, uint64_t AddrDelta);
--- a/include/llvm/MC/MCFragment.h
+++ b/include/llvm/MC/MCFragment.h
@ -149,6 +149,7 @@ class MCEncodedFragment : public MCFragment {
    case MCFragment::FT_Relaxable:
    case MCFragment::FT_CompactEncodedInst:
    case MCFragment::FT_Data:
+    case MCFragment::FT_Dwarf:
      return true;
    }
  }
@ -232,7 +233,7 @@ class MCEncodedFragmentWithFixups :
  static bool classof(const MCFragment *F) {
    MCFragment::FragmentType Kind = F->getKind();
    return Kind == MCFragment::FT_Relaxable || Kind == MCFragment::FT_Data ||
-           Kind == MCFragment::FT_CVDefRange;
+           Kind == MCFragment::FT_CVDefRange || Kind == MCFragment::FT_Dwarf;;
  }
 };

@ -514,7 +515,7 @@ class MCLEBFragment : public MCFragment {
  }
 };

-class MCDwarfLineAddrFragment : public MCFragment {
+class MCDwarfLineAddrFragment : public MCEncodedFragmentWithFixups<8, 1> {
  /// LineDelta - the value of the difference between the two line numbers
  /// between two .loc dwarf directives.
  int64_t LineDelta;
@ -523,15 +524,11 @@ class MCDwarfLineAddrFragment : public MCFragment {
  /// make up the address delta between two .loc dwarf directives.
  const MCExpr *AddrDelta;

-  SmallString<8> Contents;
-
 public:
  MCDwarfLineAddrFragment(int64_t LineDelta, const MCExpr &AddrDelta,
                          MCSection *Sec = nullptr)
-      : MCFragment(FT_Dwarf, false, Sec), LineDelta(LineDelta),
-        AddrDelta(&AddrDelta) {
-    Contents.push_back(0);
-  }
+      : MCEncodedFragmentWithFixups<8, 1>(FT_Dwarf, false, Sec),
+        LineDelta(LineDelta), AddrDelta(&AddrDelta) {}

  /// \name Accessors
  /// @{
@ -540,9 +537,6 @@ class MCDwarfLineAddrFragment : public MCFragment {

  const MCExpr &getAddrDelta() const { return *AddrDelta; }

-  SmallString<8> &getContents() { return Contents; }
-  const SmallString<8> &getContents() const { return Contents; }
-
  /// @}

  static bool classof(const MCFragment *F) {
--- a/include/llvm/MC/MCInstrAnalysis.h
+++ b/include/llvm/MC/MCInstrAnalysis.h
@ -64,7 +64,7 @@ class MCInstrAnalysis {

  /// Returns true if at least one of the register writes performed by
  /// \param Inst implicitly clears the upper portion of all super-registers.
-  /// 
+  ///
  /// Example: on X86-64, a write to EAX implicitly clears the upper half of
  /// RAX. Also (still on x86) an XMM write perfomed by an AVX 128-bit
  /// instruction implicitly clears the upper portion of the correspondent
@ -87,6 +87,19 @@ class MCInstrAnalysis {
                                    const MCInst &Inst,
                                    APInt &Writes) const;

+  /// Returns true if \param Inst is a dependency breaking instruction for the
+  /// given subtarget.
+  ///
+  /// The value computed by a dependency breaking instruction is not dependent
+  /// on the inputs. An example of dependency breaking instruction on X86 is
+  /// `XOR %eax, %eax`.
+  /// TODO: In future, we could implement an alternative approach where this
+  /// method returns `true` if the input instruction is not dependent on
+  /// some/all of its input operands. An APInt mask could then be used to
+  /// identify independent operands.
+  virtual bool isDependencyBreaking(const MCSubtargetInfo &STI,
+                                    const MCInst &Inst) const;
+
  /// Given a branch instruction try to get the address the branch
  /// targets. Return true on success, and the address in Target.
  virtual bool
--- a/include/llvm/MC/MCParser/AsmCond.h
+++ b/include/llvm/MC/MCParser/AsmCond.h
@ -15,7 +15,7 @@ namespace llvm {
 /// AsmCond - Class to support conditional assembly
 ///
 /// The conditional assembly feature (.if, .else, .elseif and .endif) is
-/// implemented with AsmCond that tells us what we are in the middle of 
+/// implemented with AsmCond that tells us what we are in the middle of
 /// processing.  Ignore can be either true or false.  When true we are ignoring
 /// the block of code in the middle of a conditional.

--- a/include/llvm/MC/MCStreamer.h
+++ b/include/llvm/MC/MCStreamer.h
@ -297,8 +297,8 @@ class MCStreamer {
  /// If the comment includes embedded \n's, they will each get the comment
  /// prefix as appropriate.  The added comment should not end with a \n.
  /// By default, each comment is terminated with an end of line, i.e. the
-  /// EOL param is set to true by default. If one prefers not to end the 
-  /// comment with a new line then the EOL param should be passed 
+  /// EOL param is set to true by default. If one prefers not to end the
+  /// comment with a new line then the EOL param should be passed
  /// with a false value.
  virtual void AddComment(const Twine &T, bool EOL = true) {}

--- a/include/llvm/Object/MachO.h
+++ b/include/llvm/Object/MachO.h
@ -333,7 +333,7 @@ class MachOObjectFile : public ObjectFile {

  relocation_iterator locrel_begin() const;
  relocation_iterator locrel_end() const;
-  
+
  void moveRelocationNext(DataRefImpl &Rel) const override;
  uint64_t getRelocationOffset(DataRefImpl Rel) const override;
  symbol_iterator getRelocationSymbol(DataRefImpl Rel) const override;
--- a/include/llvm/PassAnalysisSupport.h
+++ b/include/llvm/PassAnalysisSupport.h
@ -231,7 +231,7 @@ AnalysisType &Pass::getAnalysisID(AnalysisID PI) const {
  // should be a small number, we just do a linear search over a (dense)
  // vector.
  Pass *ResultPass = Resolver->findImplPass(PI);
-  assert(ResultPass && 
+  assert(ResultPass &&
         "getAnalysis*() called on an analysis that was not "
         "'required' by pass!");

--- a/include/llvm/PassRegistry.h
+++ b/include/llvm/PassRegistry.h
@ -9,7 +9,7 @@
 //
 // This file defines PassRegistry, a class that is used in the initialization
 // and registration of passes.  At application startup, passes are registered
-// with the PassRegistry, which is later provided to the PassManager for 
+// with the PassRegistry, which is later provided to the PassManager for
 // dependency resolution and similar tasks.
 //
 //===----------------------------------------------------------------------===//
--- a/include/llvm/ProfileData/Coverage/CoverageMapping.h
+++ b/include/llvm/ProfileData/Coverage/CoverageMapping.h
@ -207,7 +207,7 @@ struct CounterMappingRegion {
    /// A CodeRegion associates some code with a counter
    CodeRegion,

-    /// An ExpansionRegion represents a file expansion region that associates 
+    /// An ExpansionRegion represents a file expansion region that associates
    /// a source range with the expansion of a virtual source file, such as
    /// for a macro instantiation or #include file.
    ExpansionRegion,
--- a/include/llvm/Support/ARMBuildAttributes.h
+++ b/include/llvm/Support/ARMBuildAttributes.h
@ -213,6 +213,8 @@ enum {
  // Tag_ABI_VFP_args, (=28), uleb128
  BaseAAPCS = 0,
  HardFPAAPCS = 1,
+  ToolChainFPPCS = 2,
+  CompatibleFPAAPCS = 3,

  // Tag_FP_HP_extension, (=36), uleb128
  AllowHPFP = 1, // Allow use of Half Precision FP
--- a/include/llvm/Support/DataExtractor.h
+++ b/include/llvm/Support/DataExtractor.h
@ -15,7 +15,7 @@

 namespace llvm {

-/// An auxiliary type to facilitate extraction of 3-byte entities. 
+/// An auxiliary type to facilitate extraction of 3-byte entities.
 struct Uint24 {
  uint8_t Bytes[3];
  Uint24(uint8_t U) {
--- a/include/llvm/Support/GenericDomTree.h
+++ b/include/llvm/Support/GenericDomTree.h
@ -530,11 +530,10 @@ class DominatorTreeBase {
  /// CFG about its children and inverse children. This implies that deletions
  /// of CFG edges must not delete the CFG nodes before calling this function.
  ///
-  /// Batch updates should be generally faster when performing longer sequences
-  /// of updates than calling insertEdge/deleteEdge manually multiple times, as
-  /// it can reorder the updates and remove redundant ones internally.
-  /// The batch updater is also able to detect sequences of zero and exactly one
-  /// update -- it's optimized to do less work in these cases.
+  /// The applyUpdates function can reorder the updates and remove redundant
+  /// ones internally. The batch updater is also able to detect sequences of
+  /// zero and exactly one update -- it's optimized to do less work in these
+  /// cases.
  ///
  /// Note that for postdominators it automatically takes care of applying
  /// updates on reverse edges internally (so there's no need to swap the
@ -854,10 +853,15 @@ class DominatorTreeBase {
    assert(isReachableFromEntry(B));
    assert(isReachableFromEntry(A));

+    const unsigned ALevel = A->getLevel();
    const DomTreeNodeBase<NodeT> *IDom;
-    while ((IDom = B->getIDom()) != nullptr && IDom != A && IDom != B)
+
+    // Don't walk nodes above A's subtree. When we reach A's level, we must
+    // either find A or be in some other subtree not dominated by A.
+    while ((IDom = B->getIDom()) != nullptr && IDom->getLevel() >= ALevel)
      B = IDom;  // Walk up the tree
-    return IDom != nullptr;
+
+    return B == A;
  }

  /// Wipe this tree's state without releasing any resources.
--- a/include/llvm/Support/MemoryBuffer.h
+++ b/include/llvm/Support/MemoryBuffer.h
@ -43,7 +43,6 @@ class MemoryBuffer {
  const char *BufferStart; // Start of the buffer.
  const char *BufferEnd;   // End of the buffer.

-
 protected:
  MemoryBuffer() = default;

@ -148,9 +147,6 @@ class MemoryBuffer {
  virtual BufferKind getBufferKind() const = 0;

  MemoryBufferRef getMemBufferRef() const;
-
-private:
-  virtual void anchor();
 };

 /// This class is an extension of MemoryBuffer, which allows copy-on-write
--- a/include/llvm/Support/SmallVectorMemoryBuffer.h
+++ b/include/llvm/Support/SmallVectorMemoryBuffer.h
@ -49,6 +49,9 @@ class SmallVectorMemoryBuffer : public MemoryBuffer {
    init(this->SV.begin(), this->SV.end(), false);
  }

+  // Key function.
+  ~SmallVectorMemoryBuffer() override;
+
  StringRef getBufferIdentifier() const override { return BufferName; }

  BufferKind getBufferKind() const override { return MemoryBuffer_Malloc; }
@ -56,7 +59,6 @@ class SmallVectorMemoryBuffer : public MemoryBuffer {
 private:
  SmallVector<char, 0> SV;
  std::string BufferName;
-  void anchor() override;
 };

 } // namespace llvm
--- a/include/llvm/Support/TargetOpcodes.def
+++ b/include/llvm/Support/TargetOpcodes.def
@ -470,12 +470,15 @@ HANDLE_TARGET_OPCODE(G_BSWAP)
 /// Generic AddressSpaceCast.
 HANDLE_TARGET_OPCODE(G_ADDRSPACE_CAST)

+/// Generic block address
+HANDLE_TARGET_OPCODE(G_BLOCK_ADDR)
+
 // TODO: Add more generic opcodes as we move along.

 /// Marker for the end of the generic opcode.
 /// This is used to check if an opcode is in the range of the
 /// generic opcodes.
-HANDLE_TARGET_OPCODE_MARKER(PRE_ISEL_GENERIC_OPCODE_END, G_ADDRSPACE_CAST)
+HANDLE_TARGET_OPCODE_MARKER(PRE_ISEL_GENERIC_OPCODE_END, G_BLOCK_ADDR)

 /// BUILTIN_OP_END - This must be the last enum value in this list.
 /// The target-specific post-isel opcode values start here.
--- a/include/llvm/Support/xxhash.h
+++ b/include/llvm/Support/xxhash.h
@ -38,10 +38,12 @@
 #ifndef LLVM_SUPPORT_XXHASH_H
 #define LLVM_SUPPORT_XXHASH_H

+#include "llvm/ADT/ArrayRef.h"
 #include "llvm/ADT/StringRef.h"

 namespace llvm {
 uint64_t xxHash64(llvm::StringRef Data);
+uint64_t xxHash64(llvm::ArrayRef<uint8_t> Data);
 }

 #endif
--- a/include/llvm/Target/GenericOpcodes.td
+++ b/include/llvm/Target/GenericOpcodes.td
@ -131,6 +131,13 @@ def G_ADDRSPACE_CAST : GenericInstruction {
  let InOperandList = (ins type1:$src);
  let hasSideEffects = 0;
 }
+
+def G_BLOCK_ADDR : GenericInstruction {
+  let OutOperandList = (outs type0:$dst);
+  let InOperandList = (ins unknown:$ba);
+  let hasSideEffects = 0;
+}
+
 //------------------------------------------------------------------------------
 // Binary ops.
 //------------------------------------------------------------------------------
--- a/include/llvm/Target/TargetCallingConv.td
+++ b/include/llvm/Target/TargetCallingConv.td
@ -1,10 +1,10 @@
 //===- TargetCallingConv.td - Target Calling Conventions ---*- tablegen -*-===//
-// 
+//
 //                     The LLVM Compiler Infrastructure
 //
 // This file is distributed under the University of Illinois Open Source
 // License. See LICENSE.TXT for details.
-// 
+//
 //===----------------------------------------------------------------------===//
 //
 // This file defines the target-independent interfaces with which targets
--- a/include/llvm/Target/TargetInstrPredicate.td
+++ b/include/llvm/Target/TargetInstrPredicate.td
@ -13,7 +13,7 @@
 // an instruction. Each MCInstPredicate class has a well-known semantic, and it
 // is used by a PredicateExpander to generate code for MachineInstr and/or
 // MCInst.
-// 
+//
 // MCInstPredicate definitions can be used to construct MCSchedPredicate
 // definitions. An MCSchedPredicate can be used in place of a SchedPredicate
 // when defining SchedReadVariant and SchedWriteVariant used by a processor
@ -63,7 +63,7 @@
 //
 // New MCInstPredicate classes must be added to this file. For each new class
 // XYZ, an "expandXYZ" method must be added to the PredicateExpander.
-// 
+//
 //===----------------------------------------------------------------------===//

 // Forward declarations.
--- a/include/llvm/Transforms/Scalar/SpeculativeExecution.h
+++ b/include/llvm/Transforms/Scalar/SpeculativeExecution.h
@ -82,7 +82,7 @@ class SpeculativeExecutionPass
  bool considerHoistingFromTo(BasicBlock &FromBlock, BasicBlock &ToBlock);

  // If true, this pass is a nop unless the target architecture has branch
-  // divergence.  
+  // divergence.
  const bool OnlyIfDivergentTarget = false;

  TargetTransformInfo *TTI = nullptr;
--- a/include/llvm/Transforms/Utils/CodeExtractor.h
+++ b/include/llvm/Transforms/Utils/CodeExtractor.h
@ -74,7 +74,7 @@ class Value;
    /// vararg functions can be extracted. This is safe, if all vararg handling
    /// code is extracted, including vastart. If AllowAlloca is true, then
    /// extraction of blocks containing alloca instructions would be possible,
-    /// however code extractor won't validate whether extraction is legal. 
+    /// however code extractor won't validate whether extraction is legal.
    CodeExtractor(ArrayRef<BasicBlock *> BBs, DominatorTree *DT = nullptr,
                  bool AggregateArgs = false, BlockFrequencyInfo *BFI = nullptr,
                  BranchProbabilityInfo *BPI = nullptr,
--- a/include/llvm/Transforms/Utils/FunctionComparator.h
+++ b/include/llvm/Transforms/Utils/FunctionComparator.h
@ -18,7 +18,7 @@
 #include "llvm/ADT/DenseMap.h"
 #include "llvm/ADT/StringRef.h"
 #include "llvm/IR/Attributes.h"
-#include "llvm/IR/Instructions.h" 
+#include "llvm/IR/Instructions.h"
 #include "llvm/IR/Operator.h"
 #include "llvm/IR/ValueMap.h"
 #include "llvm/Support/AtomicOrdering.h"
--- a/include/llvm/Transforms/Utils/SymbolRewriter.h
+++ b/include/llvm/Transforms/Utils/SymbolRewriter.h
@ -134,7 +134,7 @@ class RewriteSymbolPass : public PassInfoMixin<RewriteSymbolPass> {
 private:
  void loadAndParseMapFiles();

-  SymbolRewriter::RewriteDescriptorList Descriptors;  
+  SymbolRewriter::RewriteDescriptorList Descriptors;
 };

 } // end namespace llvm
--- a/lib/Analysis/AliasSetTracker.cpp
+++ b/lib/Analysis/AliasSetTracker.cpp
@ -142,7 +142,7 @@ void AliasSet::addPointer(AliasSetTracker &AST, PointerRec &Entry,
        Alias = SetMayAlias;
        AST.TotalMayAliasSetSize += size();
      } else {
-        // First entry of must alias must have maximum size!        
+        // First entry of must alias must have maximum size!
        P->updateSizeAndAAInfo(Size, AAInfo);
      }
      assert(Result != NoAlias && "Cannot be part of must set!");
@ -251,9 +251,9 @@ void AliasSetTracker::clear() {
  for (PointerMapType::iterator I = PointerMap.begin(), E = PointerMap.end();
       I != E; ++I)
    I->second->eraseFromList();
-  
+
  PointerMap.clear();
-  
+
  // The alias sets should all be clear now.
  AliasSets.clear();
 }
@ -269,7 +269,7 @@ AliasSet *AliasSetTracker::mergeAliasSetsForPointer(const Value *Ptr,
  for (iterator I = begin(), E = end(); I != E;) {
    iterator Cur = I++;
    if (Cur->Forward || !Cur->aliasesPointer(Ptr, Size, AAInfo, AA)) continue;
-    
+
    if (!FoundSet) {      // If this is the first alias set ptr can go into.
      FoundSet = &*Cur;   // Remember it.
    } else {              // Otherwise, we must merge the sets.
@ -336,13 +336,13 @@ AliasSet &AliasSetTracker::getAliasSetForPointer(Value *Pointer,
    // Return the set!
    return *Entry.getAliasSet(*this)->getForwardedTarget(*this);
  }
-  
+
  if (AliasSet *AS = mergeAliasSetsForPointer(Pointer, Size, AAInfo)) {
    // Add it to the alias set it aliases.
    AS->addPointer(*this, Entry, Size, AAInfo);
    return *AS;
  }
-  
+
  // Otherwise create a new alias set to hold the loaded pointer.
  AliasSets.push_back(new AliasSet());
  AliasSets.back().addPointer(*this, Entry, Size, AAInfo);
@ -526,10 +526,10 @@ void AliasSetTracker::deleteValue(Value *PtrVal) {
    AS->SetSize--;
    TotalMayAliasSetSize--;
  }
-  
+
  // Stop using the alias set.
  AS->dropRef(*this);
-  
+
  PointerMap.erase(I);
 }

--- a/lib/Analysis/BasicAliasAnalysis.cpp
+++ b/lib/Analysis/BasicAliasAnalysis.cpp
@ -28,6 +28,7 @@
 #include "llvm/Analysis/MemoryLocation.h"
 #include "llvm/Analysis/TargetLibraryInfo.h"
 #include "llvm/Analysis/ValueTracking.h"
+#include "llvm/Analysis/PhiValues.h"
 #include "llvm/IR/Argument.h"
 #include "llvm/IR/Attributes.h"
 #include "llvm/IR/CallSite.h"
@ -93,7 +94,8 @@ bool BasicAAResult::invalidate(Function &Fn, const PreservedAnalyses &PA,
  // depend on them.
  if (Inv.invalidate<AssumptionAnalysis>(Fn, PA) ||
      (DT && Inv.invalidate<DominatorTreeAnalysis>(Fn, PA)) ||
-      (LI && Inv.invalidate<LoopAnalysis>(Fn, PA)))
+      (LI && Inv.invalidate<LoopAnalysis>(Fn, PA)) ||
+      (PV && Inv.invalidate<PhiValuesAnalysis>(Fn, PA)))
    return true;

  // Otherwise this analysis result remains valid.
@ -1527,34 +1529,70 @@ AliasResult BasicAAResult::aliasPHI(const PHINode *PN, LocationSize PNSize,
      return Alias;
    }

-  SmallPtrSet<Value *, 4> UniqueSrc;
  SmallVector<Value *, 4> V1Srcs;
  bool isRecursive = false;
-  for (Value *PV1 : PN->incoming_values()) {
-    if (isa<PHINode>(PV1))
-      // If any of the source itself is a PHI, return MayAlias conservatively
-      // to avoid compile time explosion. The worst possible case is if both
-      // sides are PHI nodes. In which case, this is O(m x n) time where 'm'
-      // and 'n' are the number of PHI sources.
+  if (PV)  {
+    // If we have PhiValues then use it to get the underlying phi values.
+    const PhiValues::ValueSet &PhiValueSet = PV->getValuesForPhi(PN);
+    // If we have more phi values than the search depth then return MayAlias
+    // conservatively to avoid compile time explosion. The worst possible case
+    // is if both sides are PHI nodes. In which case, this is O(m x n) time
+    // where 'm' and 'n' are the number of PHI sources.
+    if (PhiValueSet.size() > MaxLookupSearchDepth)
      return MayAlias;
-
-    if (EnableRecPhiAnalysis)
-      if (GEPOperator *PV1GEP = dyn_cast<GEPOperator>(PV1)) {
-        // Check whether the incoming value is a GEP that advances the pointer
-        // result of this PHI node (e.g. in a loop). If this is the case, we
-        // would recurse and always get a MayAlias. Handle this case specially
-        // below.
-        if (PV1GEP->getPointerOperand() == PN && PV1GEP->getNumIndices() == 1 &&
-            isa<ConstantInt>(PV1GEP->idx_begin())) {
-          isRecursive = true;
-          continue;
+    // Add the values to V1Srcs
+    for (Value *PV1 : PhiValueSet) {
+      if (EnableRecPhiAnalysis) {
+        if (GEPOperator *PV1GEP = dyn_cast<GEPOperator>(PV1)) {
+          // Check whether the incoming value is a GEP that advances the pointer
+          // result of this PHI node (e.g. in a loop). If this is the case, we
+          // would recurse and always get a MayAlias. Handle this case specially
+          // below.
+          if (PV1GEP->getPointerOperand() == PN && PV1GEP->getNumIndices() == 1 &&
+              isa<ConstantInt>(PV1GEP->idx_begin())) {
+            isRecursive = true;
+            continue;
+          }
        }
      }
-
-    if (UniqueSrc.insert(PV1).second)
      V1Srcs.push_back(PV1);
+    }
+  } else {
+    // If we don't have PhiInfo then just look at the operands of the phi itself
+    // FIXME: Remove this once we can guarantee that we have PhiInfo always
+    SmallPtrSet<Value *, 4> UniqueSrc;
+    for (Value *PV1 : PN->incoming_values()) {
+      if (isa<PHINode>(PV1))
+        // If any of the source itself is a PHI, return MayAlias conservatively
+        // to avoid compile time explosion. The worst possible case is if both
+        // sides are PHI nodes. In which case, this is O(m x n) time where 'm'
+        // and 'n' are the number of PHI sources.
+        return MayAlias;
+
+      if (EnableRecPhiAnalysis)
+        if (GEPOperator *PV1GEP = dyn_cast<GEPOperator>(PV1)) {
+          // Check whether the incoming value is a GEP that advances the pointer
+          // result of this PHI node (e.g. in a loop). If this is the case, we
+          // would recurse and always get a MayAlias. Handle this case specially
+          // below.
+          if (PV1GEP->getPointerOperand() == PN && PV1GEP->getNumIndices() == 1 &&
+              isa<ConstantInt>(PV1GEP->idx_begin())) {
+            isRecursive = true;
+            continue;
+          }
+        }
+
+      if (UniqueSrc.insert(PV1).second)
+        V1Srcs.push_back(PV1);
+    }
  }

+  // If V1Srcs is empty then that means that the phi has no underlying non-phi
+  // value. This should only be possible in blocks unreachable from the entry
+  // block, but return MayAlias just in case.
+  if (V1Srcs.empty())
+    return MayAlias;
+
  // If this PHI node is recursive, set the size of the accessed memory to
  // unknown to represent all the possible values the GEP could advance the
  // pointer to.
@ -1879,7 +1917,8 @@ BasicAAResult BasicAA::run(Function &F, FunctionAnalysisManager &AM) {
                       AM.getResult<TargetLibraryAnalysis>(F),
                       AM.getResult<AssumptionAnalysis>(F),
                       &AM.getResult<DominatorTreeAnalysis>(F),
-                       AM.getCachedResult<LoopAnalysis>(F));
+                       AM.getCachedResult<LoopAnalysis>(F),
+                       AM.getCachedResult<PhiValuesAnalysis>(F));
 }

 BasicAAWrapperPass::BasicAAWrapperPass() : FunctionPass(ID) {
@ -1891,12 +1930,12 @@ char BasicAAWrapperPass::ID = 0;
 void BasicAAWrapperPass::anchor() {}

 INITIALIZE_PASS_BEGIN(BasicAAWrapperPass, "basicaa",
-                      "Basic Alias Analysis (stateless AA impl)", true, true)
+                      "Basic Alias Analysis (stateless AA impl)", false, true)
 INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)
 INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
 INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)
 INITIALIZE_PASS_END(BasicAAWrapperPass, "basicaa",
-                    "Basic Alias Analysis (stateless AA impl)", true, true)
+                    "Basic Alias Analysis (stateless AA impl)", false, true)

 FunctionPass *llvm::createBasicAAWrapperPass() {
  return new BasicAAWrapperPass();
@ -1907,10 +1946,12 @@ bool BasicAAWrapperPass::runOnFunction(Function &F) {
  auto &TLIWP = getAnalysis<TargetLibraryInfoWrapperPass>();
  auto &DTWP = getAnalysis<DominatorTreeWrapperPass>();
  auto *LIWP = getAnalysisIfAvailable<LoopInfoWrapperPass>();
+  auto *PVWP = getAnalysisIfAvailable<PhiValuesWrapperPass>();

  Result.reset(new BasicAAResult(F.getParent()->getDataLayout(), F, TLIWP.getTLI(),
                                 ACT.getAssumptionCache(F), &DTWP.getDomTree(),
-                                 LIWP ? &LIWP->getLoopInfo() : nullptr));
+                                 LIWP ? &LIWP->getLoopInfo() : nullptr,
+                                 PVWP ? &PVWP->getResult() : nullptr));

  return false;
 }
@ -1920,6 +1961,7 @@ void BasicAAWrapperPass::getAnalysisUsage(AnalysisUsage &AU) const {
  AU.addRequired<AssumptionCacheTracker>();
  AU.addRequired<DominatorTreeWrapperPass>();
  AU.addRequired<TargetLibraryInfoWrapperPass>();
+  AU.addUsedIfAvailable<PhiValuesWrapperPass>();
 }

 BasicAAResult llvm::createLegacyPMBasicAAResult(Pass &P, Function &F) {
--- a/lib/Analysis/CFGPrinter.cpp
+++ b/lib/Analysis/CFGPrinter.cpp
@ -124,7 +124,7 @@ namespace {
 }

 char CFGPrinterLegacyPass::ID = 0;
-INITIALIZE_PASS(CFGPrinterLegacyPass, "dot-cfg", "Print CFG of function to 'dot' file", 
+INITIALIZE_PASS(CFGPrinterLegacyPass, "dot-cfg", "Print CFG of function to 'dot' file",
                false, true)

 PreservedAnalyses CFGPrinterPass::run(Function &F,
--- a/lib/Analysis/CallGraph.cpp
+++ b/lib/Analysis/CallGraph.cpp
@ -166,7 +166,7 @@ void CallGraphNode::print(raw_ostream &OS) const {
    OS << "Call graph node for function: '" << F->getName() << "'";
  else
    OS << "Call graph node <<null function>>";
-  
+
  OS << "<<" << this << ">>  #uses=" << getNumReferences() << '\n';

  for (const auto &I : *this) {
--- a/lib/Analysis/CallGraphSCCPass.cpp
+++ b/lib/Analysis/CallGraphSCCPass.cpp
@ -41,7 +41,7 @@ using namespace llvm;

 #define DEBUG_TYPE "cgscc-passmgr"

-static cl::opt<unsigned> 
+static cl::opt<unsigned>
 MaxIterations("max-cg-scc-iterations", cl::ReallyHidden, cl::init(4));

 STATISTIC(MaxSCCIterations, "Maximum CGSCCPassMgr iterations on one SCC");
@ -97,13 +97,13 @@ class CGPassManager : public ModulePass, public PMDataManager {
  }

  PassManagerType getPassManagerType() const override {
-    return PMT_CallGraphPassManager; 
+    return PMT_CallGraphPassManager;
  }
-  
+
 private:
  bool RunAllPassesOnSCC(CallGraphSCC &CurSCC, CallGraph &CG,
                         bool &DevirtualizedCall);
-  
+
  bool RunPassOnSCC(Pass *P, CallGraphSCC &CurSCC,
                    CallGraph &CG, bool &CallGraphUpToDate,
                    bool &DevirtualizedCall);
@ -142,21 +142,21 @@ bool CGPassManager::RunPassOnSCC(Pass *P, CallGraphSCC &CurSCC,
      if (EmitICRemark)
        emitInstrCountChangedRemark(P, M, InstrCount);
    }
-    
+
    // After the CGSCCPass is done, when assertions are enabled, use
    // RefreshCallGraph to verify that the callgraph was correctly updated.
 #ifndef NDEBUG
    if (Changed)
      RefreshCallGraph(CurSCC, CG, true);
 #endif
-    
+
    return Changed;
  }
-  
+
  assert(PM->getPassManagerType() == PMT_FunctionPassManager &&
         "Invalid CGPassManager member");
  FPPassManager *FPP = (FPPassManager*)P;
-  
+
  // Run pass P on all functions in the current SCC.
  for (CallGraphNode *CGN : CurSCC) {
    if (Function *F = CGN->getFunction()) {
@ -168,7 +168,7 @@ bool CGPassManager::RunPassOnSCC(Pass *P, CallGraphSCC &CurSCC,
      F->getContext().yield();
    }
  }
-  
+
  // The function pass(es) modified the IR, they may have clobbered the
  // callgraph.
  if (Changed && CallGraphUpToDate) {
@ -199,7 +199,7 @@ bool CGPassManager::RefreshCallGraph(const CallGraphSCC &CurSCC, CallGraph &CG,

  bool MadeChange = false;
  bool DevirtualizedCall = false;
-  
+
  // Scan all functions in the SCC.
  unsigned FunctionNo = 0;
  for (CallGraphSCC::iterator SCCIdx = CurSCC.begin(), E = CurSCC.end();
@ -207,14 +207,14 @@ bool CGPassManager::RefreshCallGraph(const CallGraphSCC &CurSCC, CallGraph &CG,
    CallGraphNode *CGN = *SCCIdx;
    Function *F = CGN->getFunction();
    if (!F || F->isDeclaration()) continue;
-    
+
    // Walk the function body looking for call sites.  Sync up the call sites in
    // CGN with those actually in the function.

    // Keep track of the number of direct and indirect calls that were
    // invalidated and removed.
    unsigned NumDirectRemoved = 0, NumIndirectRemoved = 0;
-    
+
    // Get the set of call sites currently in the function.
    for (CallGraphNode::iterator I = CGN->begin(), E = CGN->end(); I != E; ) {
      // If this call site is null, then the function pass deleted the call
@ -226,7 +226,7 @@ bool CGPassManager::RefreshCallGraph(const CallGraphSCC &CurSCC, CallGraph &CG,
          CallSites.count(I->first) ||

          // If the call edge is not from a call or invoke, or it is a
-          // instrinsic call, then the function pass RAUW'd a call with 
+          // instrinsic call, then the function pass RAUW'd a call with
          // another value. This can happen when constant folding happens
          // of well known functions etc.
          !CallSite(I->first) ||
@ -236,18 +236,18 @@ bool CGPassManager::RefreshCallGraph(const CallGraphSCC &CurSCC, CallGraph &CG,
               CallSite(I->first).getCalledFunction()->getIntrinsicID()))) {
        assert(!CheckingMode &&
               "CallGraphSCCPass did not update the CallGraph correctly!");
-        
+
        // If this was an indirect call site, count it.
        if (!I->second->getFunction())
          ++NumIndirectRemoved;
-        else 
+        else
          ++NumDirectRemoved;
-        
+
        // Just remove the edge from the set of callees, keep track of whether
        // I points to the last element of the vector.
        bool WasLast = I + 1 == E;
        CGN->removeCallEdge(I);
-        
+
        // If I pointed to the last element of the vector, we have to bail out:
        // iterator checking rejects comparisons of the resultant pointer with
        // end.
@ -256,10 +256,10 @@ bool CGPassManager::RefreshCallGraph(const CallGraphSCC &CurSCC, CallGraph &CG,
        E = CGN->end();
        continue;
      }
-      
+
      assert(!CallSites.count(I->first) &&
             "Call site occurs in node multiple times");
-      
+
      CallSite CS(I->first);
      if (CS) {
        Function *Callee = CS.getCalledFunction();
@ -269,7 +269,7 @@ bool CGPassManager::RefreshCallGraph(const CallGraphSCC &CurSCC, CallGraph &CG,
      }
      ++I;
    }
-    
+
    // Loop over all of the instructions in the function, getting the callsites.
    // Keep track of the number of direct/indirect calls added.
    unsigned NumDirectAdded = 0, NumIndirectAdded = 0;
@ -280,7 +280,7 @@ bool CGPassManager::RefreshCallGraph(const CallGraphSCC &CurSCC, CallGraph &CG,
        if (!CS) continue;
        Function *Callee = CS.getCalledFunction();
        if (Callee && Callee->isIntrinsic()) continue;
-        
+
        // If this call site already existed in the callgraph, just verify it
        // matches up to expectations and remove it from CallSites.
        DenseMap<Value*, CallGraphNode*>::iterator ExistingIt =
@ -290,11 +290,11 @@ bool CGPassManager::RefreshCallGraph(const CallGraphSCC &CurSCC, CallGraph &CG,

          // Remove from CallSites since we have now seen it.
          CallSites.erase(ExistingIt);
-          
+
          // Verify that the callee is right.
          if (ExistingNode->getFunction() == CS.getCalledFunction())
            continue;
-          
+
          // If we are in checking mode, we are not allowed to actually mutate
          // the callgraph.  If this is a case where we can infer that the
          // callgraph is less precise than it could be (e.g. an indirect call
@ -303,10 +303,10 @@ bool CGPassManager::RefreshCallGraph(const CallGraphSCC &CurSCC, CallGraph &CG,
          if (CheckingMode && CS.getCalledFunction() &&
              ExistingNode->getFunction() == nullptr)
            continue;
-          
+
          assert(!CheckingMode &&
                 "CallGraphSCCPass did not update the CallGraph correctly!");
-          
+
          // If not, we either went from a direct call to indirect, indirect to
          // direct, or direct to different direct.
          CallGraphNode *CalleeNode;
@ -328,7 +328,7 @@ bool CGPassManager::RefreshCallGraph(const CallGraphSCC &CurSCC, CallGraph &CG,
          MadeChange = true;
          continue;
        }
-        
+
        assert(!CheckingMode &&
               "CallGraphSCCPass did not update the CallGraph correctly!");

@ -341,11 +341,11 @@ bool CGPassManager::RefreshCallGraph(const CallGraphSCC &CurSCC, CallGraph &CG,
          CalleeNode = CG.getCallsExternalNode();
          ++NumIndirectAdded;
        }
-        
+
        CGN->addCalledFunction(CS, CalleeNode);
        MadeChange = true;
      }
-    
+
    // We scanned the old callgraph node, removing invalidated call sites and
    // then added back newly found call sites.  One thing that can happen is
    // that an old indirect call site was deleted and replaced with a new direct
@ -359,13 +359,13 @@ bool CGPassManager::RefreshCallGraph(const CallGraphSCC &CurSCC, CallGraph &CG,
    if (NumIndirectRemoved > NumIndirectAdded &&
        NumDirectRemoved < NumDirectAdded)
      DevirtualizedCall = true;
-    
+
    // After scanning this function, if we still have entries in callsites, then
    // they are dangling pointers.  WeakTrackingVH should save us for this, so
    // abort if
    // this happens.
    assert(CallSites.empty() && "Dangling pointers found in call sites map");
-    
+
    // Periodically do an explicit clear to remove tombstones when processing
    // large scc's.
    if ((FunctionNo & 15) == 15)
@ -392,7 +392,7 @@ bool CGPassManager::RefreshCallGraph(const CallGraphSCC &CurSCC, CallGraph &CG,
 bool CGPassManager::RunAllPassesOnSCC(CallGraphSCC &CurSCC, CallGraph &CG,
                                      bool &DevirtualizedCall) {
  bool Changed = false;
-  
+
  // Keep track of whether the callgraph is known to be up-to-date or not.
  // The CGSSC pass manager runs two types of passes:
  // CallGraphSCC Passes and other random function passes.  Because other
@ -406,7 +406,7 @@ bool CGPassManager::RunAllPassesOnSCC(CallGraphSCC &CurSCC, CallGraph &CG,
  for (unsigned PassNo = 0, e = getNumContainedPasses();
       PassNo != e; ++PassNo) {
    Pass *P = getContainedPass(PassNo);
-    
+
    // If we're in -debug-pass=Executions mode, construct the SCC node list,
    // otherwise avoid constructing this string as it is expensive.
    if (isPassDebuggingExecutionsOrMore()) {
@ -423,23 +423,23 @@ bool CGPassManager::RunAllPassesOnSCC(CallGraphSCC &CurSCC, CallGraph &CG,
      dumpPassInfo(P, EXECUTION_MSG, ON_CG_MSG, Functions);
    }
    dumpRequiredSet(P);
-    
+
    initializeAnalysisImpl(P);
-    
+
    // Actually run this pass on the current SCC.
    Changed |= RunPassOnSCC(P, CurSCC, CG,
                            CallGraphUpToDate, DevirtualizedCall);
-    
+
    if (Changed)
      dumpPassInfo(P, MODIFICATION_MSG, ON_CG_MSG, "");
    dumpPreservedSet(P);
-    
-    verifyPreservedAnalysis(P);      
+
+    verifyPreservedAnalysis(P);
    removeNotPreservedAnalysis(P);
    recordAvailableAnalysis(P);
    removeDeadPasses(P, "", ON_CG_MSG);
  }
-  
+
  // If the callgraph was left out of date (because the last pass run was a
  // functionpass), refresh it before we move on to the next SCC.
  if (!CallGraphUpToDate)
@ -452,7 +452,7 @@ bool CGPassManager::RunAllPassesOnSCC(CallGraphSCC &CurSCC, CallGraph &CG,
 bool CGPassManager::runOnModule(Module &M) {
  CallGraph &CG = getAnalysis<CallGraphWrapperPass>().getCallGraph();
  bool Changed = doInitialization(CG);
-  
+
  // Walk the callgraph in bottom-up SCC order.
  scc_iterator<CallGraph*> CGI = scc_begin(&CG);

@ -485,7 +485,7 @@ bool CGPassManager::runOnModule(Module &M) {
      DevirtualizedCall = false;
      Changed |= RunAllPassesOnSCC(CurSCC, CG, DevirtualizedCall);
    } while (Iteration++ < MaxIterations && DevirtualizedCall);
-    
+
    if (DevirtualizedCall)
      LLVM_DEBUG(dbgs() << "  CGSCCPASSMGR: Stopped iteration after "
                        << Iteration
@ -500,7 +500,7 @@ bool CGPassManager::runOnModule(Module &M) {
 /// Initialize CG
 bool CGPassManager::doInitialization(CallGraph &CG) {
  bool Changed = false;
-  for (unsigned i = 0, e = getNumContainedPasses(); i != e; ++i) {  
+  for (unsigned i = 0, e = getNumContainedPasses(); i != e; ++i) {
    if (PMDataManager *PM = getContainedPass(i)->getAsPMDataManager()) {
      assert(PM->getPassManagerType() == PMT_FunctionPassManager &&
             "Invalid CGPassManager member");
@ -515,7 +515,7 @@ bool CGPassManager::doInitialization(CallGraph &CG) {
 /// Finalize CG
 bool CGPassManager::doFinalization(CallGraph &CG) {
  bool Changed = false;
-  for (unsigned i = 0, e = getNumContainedPasses(); i != e; ++i) {  
+  for (unsigned i = 0, e = getNumContainedPasses(); i != e; ++i) {
    if (PMDataManager *PM = getContainedPass(i)->getAsPMDataManager()) {
      assert(PM->getPassManagerType() == PMT_FunctionPassManager &&
             "Invalid CGPassManager member");
@ -541,7 +541,7 @@ void CallGraphSCC::ReplaceNode(CallGraphNode *Old, CallGraphNode *New) {
    Nodes[i] = New;
    break;
  }
-  
+
  // Update the active scc_iterator so that it doesn't contain dangling
  // pointers to the old CallGraphNode.
  scc_iterator<CallGraph*> *CGI = (scc_iterator<CallGraph*>*)Context;
@ -555,18 +555,18 @@ void CallGraphSCC::ReplaceNode(CallGraphNode *Old, CallGraphNode *New) {
 /// Assign pass manager to manage this pass.
 void CallGraphSCCPass::assignPassManager(PMStack &PMS,
                                         PassManagerType PreferredType) {
-  // Find CGPassManager 
+  // Find CGPassManager
  while (!PMS.empty() &&
         PMS.top()->getPassManagerType() > PMT_CallGraphPassManager)
    PMS.pop();

  assert(!PMS.empty() && "Unable to handle Call Graph Pass");
  CGPassManager *CGP;
-  
+
  if (PMS.top()->getPassManagerType() == PMT_CallGraphPassManager)
    CGP = (CGPassManager*)PMS.top();
  else {
-    // Create new Call Graph SCC Pass Manager if it does not exist. 
+    // Create new Call Graph SCC Pass Manager if it does not exist.
    assert(!PMS.empty() && "Unable to create Call Graph Pass Manager");
    PMDataManager *PMD = PMS.top();

@ -608,7 +608,7 @@ namespace {
  class PrintCallGraphPass : public CallGraphSCCPass {
    std::string Banner;
    raw_ostream &OS;       // raw_ostream to print on.
-    
+
  public:
    static char ID;

@ -640,10 +640,10 @@ namespace {
      }
      return false;
    }
-    
+
    StringRef getPassName() const override { return "Print CallGraph IR"; }
  };
-  
+
 } // end anonymous namespace.

 char PrintCallGraphPass::ID = 0;
--- a/lib/Analysis/DemandedBits.cpp
+++ b/lib/Analysis/DemandedBits.cpp
@ -272,7 +272,7 @@ void DemandedBits::performAnalysis() {
    // Analysis already completed for this function.
    return;
  Analyzed = true;
-  
+
  Visited.clear();
  AliveBits.clear();

@ -367,7 +367,7 @@ void DemandedBits::performAnalysis() {

 APInt DemandedBits::getDemandedBits(Instruction *I) {
  performAnalysis();
-  
+
  const DataLayout &DL = I->getModule()->getDataLayout();
  auto Found = AliveBits.find(I);
  if (Found != AliveBits.end())
--- a/lib/Analysis/GlobalsModRef.cpp
+++ b/lib/Analysis/GlobalsModRef.cpp
@ -409,7 +409,7 @@ bool GlobalsAAResult::AnalyzeIndirectGlobalMemory(GlobalVariable *GV) {
  if (Constant *C = GV->getInitializer())
    if (!C->isNullValue())
      return false;
-    
+
  // Walk the user list of the global.  If we find anything other than a direct
  // load or store, bail out.
  for (User *U : GV->users()) {
@ -464,7 +464,7 @@ bool GlobalsAAResult::AnalyzeIndirectGlobalMemory(GlobalVariable *GV) {
  return true;
 }

-void GlobalsAAResult::CollectSCCMembership(CallGraph &CG) {  
+void GlobalsAAResult::CollectSCCMembership(CallGraph &CG) {
  // We do a bottom-up SCC traversal of the call graph.  In other words, we
  // visit all callees before callers (leaf-first).
  unsigned SCCID = 0;
@ -633,7 +633,7 @@ static bool isNonEscapingGlobalNoAliasWithLoad(const GlobalValue *GV,
  Inputs.push_back(V);
  do {
    const Value *Input = Inputs.pop_back_val();
-    
+
    if (isa<GlobalValue>(Input) || isa<Argument>(Input) || isa<CallInst>(Input) ||
        isa<InvokeInst>(Input))
      // Arguments to functions or returns from functions are inherently
@ -654,7 +654,7 @@ static bool isNonEscapingGlobalNoAliasWithLoad(const GlobalValue *GV,
    if (auto *LI = dyn_cast<LoadInst>(Input)) {
      Inputs.push_back(GetUnderlyingObject(LI->getPointerOperand(), DL));
      continue;
-    }  
+    }
    if (auto *SI = dyn_cast<SelectInst>(Input)) {
      const Value *LHS = GetUnderlyingObject(SI->getTrueValue(), DL);
      const Value *RHS = GetUnderlyingObject(SI->getFalseValue(), DL);
@ -672,7 +672,7 @@ static bool isNonEscapingGlobalNoAliasWithLoad(const GlobalValue *GV,
      }
      continue;
    }
-    
+
    return false;
  } while (!Inputs.empty());

@ -754,7 +754,7 @@ bool GlobalsAAResult::isNonEscapingGlobalNoAlias(const GlobalValue *GV,
      // non-addr-taken globals.
      continue;
    }
-    
+
    // Recurse through a limited number of selects, loads and PHIs. This is an
    // arbitrary depth of 4, lower numbers could be used to fix compile time
    // issues if needed, but this is generally expected to be only be important
--- a/lib/Analysis/InstructionSimplify.cpp
+++ b/lib/Analysis/InstructionSimplify.cpp
@ -65,6 +65,48 @@ static Value *SimplifyCastInst(unsigned, Value *, Type *,
 static Value *SimplifyGEPInst(Type *, ArrayRef<Value *>, const SimplifyQuery &,
                              unsigned);

+static Value *foldSelectWithBinaryOp(Value *Cond, Value *TrueVal,
+                                     Value *FalseVal) {
+  BinaryOperator::BinaryOps BinOpCode;
+  if (auto *BO = dyn_cast<BinaryOperator>(Cond))
+    BinOpCode = BO->getOpcode();
+  else
+    return nullptr;
+
+  CmpInst::Predicate ExpectedPred, Pred1, Pred2;
+  if (BinOpCode == BinaryOperator::Or) {
+    ExpectedPred = ICmpInst::ICMP_NE;
+  } else if (BinOpCode == BinaryOperator::And) {
+    ExpectedPred = ICmpInst::ICMP_EQ;
+  } else
+    return nullptr;
+
+  // %A = icmp eq %TV, %FV
+  // %B = icmp eq %X, %Y (and one of these is a select operand)
+  // %C = and %A, %B
+  // %D = select %C, %TV, %FV
+  // -->
+  // %FV
+
+  // %A = icmp ne %TV, %FV
+  // %B = icmp ne %X, %Y (and one of these is a select operand)
+  // %C = or %A, %B
+  // %D = select %C, %TV, %FV
+  // -->
+  // %TV
+  Value *X, *Y;
+  if (!match(Cond, m_c_BinOp(m_c_ICmp(Pred1, m_Specific(TrueVal),
+                                      m_Specific(FalseVal)),
+                             m_ICmp(Pred2, m_Value(X), m_Value(Y)))) ||
+      Pred1 != Pred2 || Pred1 != ExpectedPred)
+    return nullptr;
+
+  if (X == TrueVal || X == FalseVal || Y == TrueVal || Y == FalseVal)
+    return BinOpCode == BinaryOperator::Or ? TrueVal : FalseVal;
+
+  return nullptr;
+}
+
 /// For a boolean type or a vector of boolean type, return false or a vector
 /// with every element false.
 static Constant *getFalse(Type *Ty) {
@ -1283,6 +1325,23 @@ static Value *SimplifyLShrInst(Value *Op0, Value *Op1, bool isExact,
  if (match(Op0, m_NUWShl(m_Value(X), m_Specific(Op1))))
    return X;

+  // ((X << A) | Y) >> A -> X  if effective width of Y is not larger than A.
+  // We can return X as we do in the above case since OR alters no bits in X.
+  // SimplifyDemandedBits in InstCombine can do more general optimization for
+  // bit manipulation. This pattern aims to provide opportunities for other
+  // optimizers by supporting a simple but common case in InstSimplify.
+  Value *Y;
+  const APInt *ShRAmt, *ShLAmt;
+  if (match(Op1, m_APInt(ShRAmt)) &&
+      match(Op0, m_c_Or(m_NUWShl(m_Value(X), m_APInt(ShLAmt)), m_Value(Y))) &&
+      *ShRAmt == *ShLAmt) {
+    const KnownBits YKnown = computeKnownBits(Y, Q.DL, 0, Q.AC, Q.CxtI, Q.DT);
+    const unsigned Width = Op0->getType()->getScalarSizeInBits();
+    const unsigned EffWidthY = Width - YKnown.countMinLeadingZeros();
+    if (EffWidthY <= ShRAmt->getZExtValue())
+      return X;
+  }
+
  return nullptr;
 }

@ -3752,6 +3811,9 @@ static Value *SimplifySelectInst(Value *Cond, Value *TrueVal, Value *FalseVal,
          simplifySelectWithICmpCond(Cond, TrueVal, FalseVal, Q, MaxRecurse))
    return V;

+  if (Value *V = foldSelectWithBinaryOp(Cond, TrueVal, FalseVal))
+    return V;
+
  return nullptr;
 }

@ -4604,149 +4666,131 @@ static bool maskIsAllZeroOrUndef(Value *Mask) {
  return true;
 }

-template <typename IterTy>
-static Value *SimplifyIntrinsic(Function *F, IterTy ArgBegin, IterTy ArgEnd,
-                                const SimplifyQuery &Q, unsigned MaxRecurse) {
+static Value *simplifyUnaryIntrinsic(Function *F, Value *Op0,
+                                     const SimplifyQuery &Q) {
+  // Idempotent functions return the same result when called repeatedly.
  Intrinsic::ID IID = F->getIntrinsicID();
-  unsigned NumOperands = std::distance(ArgBegin, ArgEnd);
+  if (IsIdempotent(IID))
+    if (auto *II = dyn_cast<IntrinsicInst>(Op0))
+      if (II->getIntrinsicID() == IID)
+        return II;

-  // Unary Ops
-  if (NumOperands == 1) {
-    // Perform idempotent optimizations
-    if (IsIdempotent(IID)) {
-      if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(*ArgBegin)) {
-        if (II->getIntrinsicID() == IID)
-          return II;
-      }
-    }
-
-    Value *IIOperand = *ArgBegin;
-    Value *X;
-    switch (IID) {
-    case Intrinsic::fabs: {
-      if (SignBitMustBeZero(IIOperand, Q.TLI))
-        return IIOperand;
-      return nullptr;
-    }
-    case Intrinsic::bswap: {
-      // bswap(bswap(x)) -> x
-      if (match(IIOperand, m_BSwap(m_Value(X))))
-        return X;
-      return nullptr;
-    }
-    case Intrinsic::bitreverse: {
-      // bitreverse(bitreverse(x)) -> x
-      if (match(IIOperand, m_BitReverse(m_Value(X))))
-        return X;
-      return nullptr;
-    }
-    case Intrinsic::exp: {
-      // exp(log(x)) -> x
-      if (Q.CxtI->hasAllowReassoc() &&
-          match(IIOperand, m_Intrinsic<Intrinsic::log>(m_Value(X))))
-        return X;
-      return nullptr;
-    }
-    case Intrinsic::exp2: {
-      // exp2(log2(x)) -> x
-      if (Q.CxtI->hasAllowReassoc() &&
-          match(IIOperand, m_Intrinsic<Intrinsic::log2>(m_Value(X))))
-        return X;
-      return nullptr;
-    }
-    case Intrinsic::log: {
-      // log(exp(x)) -> x
-      if (Q.CxtI->hasAllowReassoc() &&
-          match(IIOperand, m_Intrinsic<Intrinsic::exp>(m_Value(X))))
-        return X;
-      return nullptr;
-    }
-    case Intrinsic::log2: {
-      // log2(exp2(x)) -> x
-      if (Q.CxtI->hasAllowReassoc() &&
-          match(IIOperand, m_Intrinsic<Intrinsic::exp2>(m_Value(X)))) {
-        return X;
-      }
-      return nullptr;
-    }
-    default:
-      return nullptr;
-    }
+  Value *X;
+  switch (IID) {
+  case Intrinsic::fabs:
+    if (SignBitMustBeZero(Op0, Q.TLI)) return Op0;
+    break;
+  case Intrinsic::bswap:
+    // bswap(bswap(x)) -> x
+    if (match(Op0, m_BSwap(m_Value(X)))) return X;
+    break;
+  case Intrinsic::bitreverse:
+    // bitreverse(bitreverse(x)) -> x
+    if (match(Op0, m_BitReverse(m_Value(X)))) return X;
+    break;
+  case Intrinsic::exp:
+    // exp(log(x)) -> x
+    if (Q.CxtI->hasAllowReassoc() &&
+        match(Op0, m_Intrinsic<Intrinsic::log>(m_Value(X)))) return X;
+    break;
+  case Intrinsic::exp2:
+    // exp2(log2(x)) -> x
+    if (Q.CxtI->hasAllowReassoc() &&
+        match(Op0, m_Intrinsic<Intrinsic::log2>(m_Value(X)))) return X;
+    break;
+  case Intrinsic::log:
+    // log(exp(x)) -> x
+    if (Q.CxtI->hasAllowReassoc() &&
+        match(Op0, m_Intrinsic<Intrinsic::exp>(m_Value(X)))) return X;
+    break;
+  case Intrinsic::log2:
+    // log2(exp2(x)) -> x
+    if (Q.CxtI->hasAllowReassoc() &&
+        match(Op0, m_Intrinsic<Intrinsic::exp2>(m_Value(X)))) return X;
+    break;
+  default:
+    break;
  }

-  // Binary Ops
-  if (NumOperands == 2) {
-    Value *LHS = *ArgBegin;
-    Value *RHS = *(ArgBegin + 1);
-    Type *ReturnType = F->getReturnType();
+  return nullptr;
+}

-    switch (IID) {
-    case Intrinsic::usub_with_overflow:
-    case Intrinsic::ssub_with_overflow: {
-      // X - X -> { 0, false }
-      if (LHS == RHS)
-        return Constant::getNullValue(ReturnType);
-
-      // X - undef -> undef
-      // undef - X -> undef
-      if (isa<UndefValue>(LHS) || isa<UndefValue>(RHS))
-        return UndefValue::get(ReturnType);
-
-      return nullptr;
-    }
-    case Intrinsic::uadd_with_overflow:
-    case Intrinsic::sadd_with_overflow: {
-      // X + undef -> undef
-      if (isa<UndefValue>(LHS) || isa<UndefValue>(RHS))
-        return UndefValue::get(ReturnType);
-
-      return nullptr;
-    }
-    case Intrinsic::umul_with_overflow:
-    case Intrinsic::smul_with_overflow: {
-      // 0 * X -> { 0, false }
-      // X * 0 -> { 0, false }
-      if (match(LHS, m_Zero()) || match(RHS, m_Zero()))
-        return Constant::getNullValue(ReturnType);
-
-      // undef * X -> { 0, false }
-      // X * undef -> { 0, false }
-      if (match(LHS, m_Undef()) || match(RHS, m_Undef()))
-        return Constant::getNullValue(ReturnType);
-
-      return nullptr;
-    }
-    case Intrinsic::load_relative: {
-      Constant *C0 = dyn_cast<Constant>(LHS);
-      Constant *C1 = dyn_cast<Constant>(RHS);
-      if (C0 && C1)
+static Value *simplifyBinaryIntrinsic(Function *F, Value *Op0, Value *Op1,
+                                      const SimplifyQuery &Q) {
+  Intrinsic::ID IID = F->getIntrinsicID();
+  Type *ReturnType = F->getReturnType();
+  switch (IID) {
+  case Intrinsic::usub_with_overflow:
+  case Intrinsic::ssub_with_overflow:
+    // X - X -> { 0, false }
+    if (Op0 == Op1)
+      return Constant::getNullValue(ReturnType);
+    // X - undef -> undef
+    // undef - X -> undef
+    if (isa<UndefValue>(Op0) || isa<UndefValue>(Op1))
+      return UndefValue::get(ReturnType);
+    break;
+  case Intrinsic::uadd_with_overflow:
+  case Intrinsic::sadd_with_overflow:
+    // X + undef -> undef
+    if (isa<UndefValue>(Op0) || isa<UndefValue>(Op1))
+      return UndefValue::get(ReturnType);
+    break;
+  case Intrinsic::umul_with_overflow:
+  case Intrinsic::smul_with_overflow:
+    // 0 * X -> { 0, false }
+    // X * 0 -> { 0, false }
+    if (match(Op0, m_Zero()) || match(Op1, m_Zero()))
+      return Constant::getNullValue(ReturnType);
+    // undef * X -> { 0, false }
+    // X * undef -> { 0, false }
+    if (match(Op0, m_Undef()) || match(Op1, m_Undef()))
+      return Constant::getNullValue(ReturnType);
+    break;
+  case Intrinsic::load_relative:
+    if (auto *C0 = dyn_cast<Constant>(Op0))
+      if (auto *C1 = dyn_cast<Constant>(Op1))
        return SimplifyRelativeLoad(C0, C1, Q.DL);
-      return nullptr;
-    }
-    case Intrinsic::powi:
-      if (ConstantInt *Power = dyn_cast<ConstantInt>(RHS)) {
-        // powi(x, 0) -> 1.0
-        if (Power->isZero())
-          return ConstantFP::get(LHS->getType(), 1.0);
-        // powi(x, 1) -> x
-        if (Power->isOne())
-          return LHS;
-      }
-      return nullptr;
-    case Intrinsic::maxnum:
-    case Intrinsic::minnum:
-      // If one argument is NaN, return the other argument.
-      if (match(LHS, m_NaN()))
-        return RHS;
-      if (match(RHS, m_NaN()))
-        return LHS;
-      return nullptr;
-    default:
-      return nullptr;
+    break;
+  case Intrinsic::powi:
+    if (auto *Power = dyn_cast<ConstantInt>(Op1)) {
+      // powi(x, 0) -> 1.0
+      if (Power->isZero())
+        return ConstantFP::get(Op0->getType(), 1.0);
+      // powi(x, 1) -> x
+      if (Power->isOne())
+        return Op0;
    }
+    break;
+  case Intrinsic::maxnum:
+  case Intrinsic::minnum:
+    // If one argument is NaN, return the other argument.
+    if (match(Op0, m_NaN())) return Op1;
+    if (match(Op1, m_NaN())) return Op0;
+    break;
+  default:
+    break;
  }

-  // Simplify calls to llvm.masked.load.*
+  return nullptr;
+}
+
+template <typename IterTy>
+static Value *simplifyIntrinsic(Function *F, IterTy ArgBegin, IterTy ArgEnd,
+                                const SimplifyQuery &Q) {
+  // Intrinsics with no operands have some kind of side effect. Don't simplify.
+  unsigned NumOperands = std::distance(ArgBegin, ArgEnd);
+  if (NumOperands == 0)
+    return nullptr;
+
+  Intrinsic::ID IID = F->getIntrinsicID();
+  if (NumOperands == 1)
+    return simplifyUnaryIntrinsic(F, ArgBegin[0], Q);
+
+  if (NumOperands == 2)
+    return simplifyBinaryIntrinsic(F, ArgBegin[0], ArgBegin[1], Q);
+
+  // Handle intrinsics with 3 or more arguments.
  switch (IID) {
  case Intrinsic::masked_load: {
    Value *MaskArg = ArgBegin[2];
@ -4756,6 +4800,19 @@ static Value *SimplifyIntrinsic(Function *F, IterTy ArgBegin, IterTy ArgEnd,
      return PassthruArg;
    return nullptr;
  }
+  case Intrinsic::fshl:
+  case Intrinsic::fshr: {
+    Value *ShAmtArg = ArgBegin[2];
+    const APInt *ShAmtC;
+    if (match(ShAmtArg, m_APInt(ShAmtC))) {
+      // If there's effectively no shift, return the 1st arg or 2nd arg.
+      // TODO: For vectors, we could check each element of a non-splat constant.
+      APInt BitWidth = APInt(ShAmtC->getBitWidth(), ShAmtC->getBitWidth());
+      if (ShAmtC->urem(BitWidth).isNullValue())
+        return ArgBegin[IID == Intrinsic::fshl ? 0 : 1];
+    }
+    return nullptr;
+  }
  default:
    return nullptr;
  }
@ -4780,7 +4837,7 @@ static Value *SimplifyCall(ImmutableCallSite CS, Value *V, IterTy ArgBegin,
    return nullptr;

  if (F->isIntrinsic())
-    if (Value *Ret = SimplifyIntrinsic(F, ArgBegin, ArgEnd, Q, MaxRecurse))
+    if (Value *Ret = simplifyIntrinsic(F, ArgBegin, ArgEnd, Q))
      return Ret;

  if (!canConstantFoldCallTo(CS, F))
--- a/lib/Analysis/LazyValueInfo.cpp
+++ b/lib/Analysis/LazyValueInfo.cpp
@ -725,7 +725,7 @@ bool LazyValueInfoImpl::solveBlockValueNonLocal(ValueLatticeElement &BBLV,
  // frequently arranged such that dominating ones come first and we quickly
  // find a path to function entry.  TODO: We should consider explicitly
  // canonicalizing to make this true rather than relying on this happy
-  // accident.  
+  // accident.
  for (pred_iterator PI = pred_begin(BB), E = pred_end(BB); PI != E; ++PI) {
    ValueLatticeElement EdgeResult;
    if (!getEdgeValue(Val, *PI, BB, EdgeResult))
--- a/lib/Analysis/LoopAccessAnalysis.cpp
+++ b/lib/Analysis/LoopAccessAnalysis.cpp
@ -176,8 +176,8 @@ const SCEV *llvm::replaceSymbolicStrideSCEV(PredicatedScalarEvolution &PSE,

 /// Calculate Start and End points of memory access.
 /// Let's assume A is the first access and B is a memory access on N-th loop
-/// iteration. Then B is calculated as:  
-///   B = A + Step*N . 
+/// iteration. Then B is calculated as:
+///   B = A + Step*N .
 /// Step value may be positive or negative.
 /// N is a calculated back-edge taken count:
 ///     N = (TripCount > 0) ? RoundDown(TripCount -1 , VF) : 0
@ -1317,7 +1317,7 @@ bool MemoryDepChecker::couldPreventStoreLoadForward(uint64_t Distance,
  return false;
 }

-/// Given a non-constant (unknown) dependence-distance \p Dist between two 
+/// Given a non-constant (unknown) dependence-distance \p Dist between two
 /// memory accesses, that have the same stride whose absolute value is given
 /// in \p Stride, and that have the same type size \p TypeByteSize,
 /// in a loop whose takenCount is \p BackedgeTakenCount, check if it is
@ -1336,19 +1336,19 @@ static bool isSafeDependenceDistance(const DataLayout &DL, ScalarEvolution &SE,

  // If we can prove that
  //      (**) |Dist| > BackedgeTakenCount * Step
-  // where Step is the absolute stride of the memory accesses in bytes, 
+  // where Step is the absolute stride of the memory accesses in bytes,
  // then there is no dependence.
  //
-  // Ratioanle: 
-  // We basically want to check if the absolute distance (|Dist/Step|) 
-  // is >= the loop iteration count (or > BackedgeTakenCount). 
-  // This is equivalent to the Strong SIV Test (Practical Dependence Testing, 
-  // Section 4.2.1); Note, that for vectorization it is sufficient to prove 
+  // Ratioanle:
+  // We basically want to check if the absolute distance (|Dist/Step|)
+  // is >= the loop iteration count (or > BackedgeTakenCount).
+  // This is equivalent to the Strong SIV Test (Practical Dependence Testing,
+  // Section 4.2.1); Note, that for vectorization it is sufficient to prove
  // that the dependence distance is >= VF; This is checked elsewhere.
-  // But in some cases we can prune unknown dependence distances early, and 
-  // even before selecting the VF, and without a runtime test, by comparing 
-  // the distance against the loop iteration count. Since the vectorized code 
-  // will be executed only if LoopCount >= VF, proving distance >= LoopCount 
+  // But in some cases we can prune unknown dependence distances early, and
+  // even before selecting the VF, and without a runtime test, by comparing
+  // the distance against the loop iteration count. Since the vectorized code
+  // will be executed only if LoopCount >= VF, proving distance >= LoopCount
  // also guarantees that distance >= VF.
  //
  const uint64_t ByteStride = Stride * TypeByteSize;
@ -1360,8 +1360,8 @@ static bool isSafeDependenceDistance(const DataLayout &DL, ScalarEvolution &SE,
  uint64_t DistTypeSize = DL.getTypeAllocSize(Dist.getType());
  uint64_t ProductTypeSize = DL.getTypeAllocSize(Product->getType());

-  // The dependence distance can be positive/negative, so we sign extend Dist; 
-  // The multiplication of the absolute stride in bytes and the 
+  // The dependence distance can be positive/negative, so we sign extend Dist;
+  // The multiplication of the absolute stride in bytes and the
  // backdgeTakenCount is non-negative, so we zero extend Product.
  if (DistTypeSize > ProductTypeSize)
    CastedProduct = SE.getZeroExtendExpr(Product, Dist.getType());
@ -2212,24 +2212,24 @@ void LoopAccessInfo::collectStridedAccess(Value *MemAccess) {
                       "versioning:");
  LLVM_DEBUG(dbgs() << "  Ptr: " << *Ptr << " Stride: " << *Stride << "\n");

-  // Avoid adding the "Stride == 1" predicate when we know that 
+  // Avoid adding the "Stride == 1" predicate when we know that
  // Stride >= Trip-Count. Such a predicate will effectively optimize a single
  // or zero iteration loop, as Trip-Count <= Stride == 1.
-  // 
+  //
  // TODO: We are currently not making a very informed decision on when it is
  // beneficial to apply stride versioning. It might make more sense that the
-  // users of this analysis (such as the vectorizer) will trigger it, based on 
-  // their specific cost considerations; For example, in cases where stride 
+  // users of this analysis (such as the vectorizer) will trigger it, based on
+  // their specific cost considerations; For example, in cases where stride
  // versioning does  not help resolving memory accesses/dependences, the
-  // vectorizer should evaluate the cost of the runtime test, and the benefit 
-  // of various possible stride specializations, considering the alternatives 
-  // of using gather/scatters (if available). 
-  
+  // vectorizer should evaluate the cost of the runtime test, and the benefit
+  // of various possible stride specializations, considering the alternatives
+  // of using gather/scatters (if available).
+
  const SCEV *StrideExpr = PSE->getSCEV(Stride);
-  const SCEV *BETakenCount = PSE->getBackedgeTakenCount();  
+  const SCEV *BETakenCount = PSE->getBackedgeTakenCount();

  // Match the types so we can compare the stride and the BETakenCount.
-  // The Stride can be positive/negative, so we sign extend Stride; 
+  // The Stride can be positive/negative, so we sign extend Stride;
  // The backdgeTakenCount is non-negative, so we zero extend BETakenCount.
  const DataLayout &DL = TheLoop->getHeader()->getModule()->getDataLayout();
  uint64_t StrideTypeSize = DL.getTypeAllocSize(StrideExpr->getType());
@ -2243,7 +2243,7 @@ void LoopAccessInfo::collectStridedAccess(Value *MemAccess) {
    CastedBECount = SE->getZeroExtendExpr(BETakenCount, StrideExpr->getType());
  const SCEV *StrideMinusBETaken = SE->getMinusSCEV(CastedStride, CastedBECount);
  // Since TripCount == BackEdgeTakenCount + 1, checking:
-  // "Stride >= TripCount" is equivalent to checking: 
+  // "Stride >= TripCount" is equivalent to checking:
  // Stride - BETakenCount > 0
  if (SE->isKnownPositive(StrideMinusBETaken)) {
    LLVM_DEBUG(
--- a/lib/Analysis/MemDepPrinter.cpp
+++ b/lib/Analysis/MemDepPrinter.cpp
@ -118,7 +118,7 @@ bool MemDepPrinter::runOnFunction(Function &F) {
    } else {
      SmallVector<NonLocalDepResult, 4> NLDI;
      assert( (isa<LoadInst>(Inst) || isa<StoreInst>(Inst) ||
-               isa<VAArgInst>(Inst)) && "Unknown memory instruction!"); 
+               isa<VAArgInst>(Inst)) && "Unknown memory instruction!");
      MDA.getNonLocalPointerDependency(Inst, NLDI);

      DepSet &InstDeps = Deps[Inst];
--- a/lib/Analysis/MemoryDependenceAnalysis.cpp
+++ b/lib/Analysis/MemoryDependenceAnalysis.cpp
@ -26,6 +26,7 @@
 #include "llvm/Analysis/MemoryLocation.h"
 #include "llvm/Analysis/OrderedBasicBlock.h"
 #include "llvm/Analysis/PHITransAddr.h"
+#include "llvm/Analysis/PhiValues.h"
 #include "llvm/Analysis/TargetLibraryInfo.h"
 #include "llvm/Analysis/ValueTracking.h"
 #include "llvm/IR/Attributes.h"
@ -1513,6 +1514,8 @@ void MemoryDependenceResults::invalidateCachedPointerInfo(Value *Ptr) {
  RemoveCachedNonLocalPointerDependencies(ValueIsLoadPair(Ptr, false));
  // Flush load info for the pointer.
  RemoveCachedNonLocalPointerDependencies(ValueIsLoadPair(Ptr, true));
+  // Invalidate phis that use the pointer.
+  PV.invalidateValue(Ptr);
 }

 void MemoryDependenceResults::invalidateCachedPredecessors() {
@ -1671,6 +1674,9 @@ void MemoryDependenceResults::removeInstruction(Instruction *RemInst) {
    }
  }

+  // Invalidate phis that use the removed instruction.
+  PV.invalidateValue(RemInst);
+
  assert(!NonLocalDeps.count(RemInst) && "RemInst got reinserted?");
  LLVM_DEBUG(verifyRemoved(RemInst));
 }
@ -1730,7 +1736,8 @@ MemoryDependenceAnalysis::run(Function &F, FunctionAnalysisManager &AM) {
  auto &AC = AM.getResult<AssumptionAnalysis>(F);
  auto &TLI = AM.getResult<TargetLibraryAnalysis>(F);
  auto &DT = AM.getResult<DominatorTreeAnalysis>(F);
-  return MemoryDependenceResults(AA, AC, TLI, DT);
+  auto &PV = AM.getResult<PhiValuesAnalysis>(F);
+  return MemoryDependenceResults(AA, AC, TLI, DT, PV);
 }

 char MemoryDependenceWrapperPass::ID = 0;
@ -1741,6 +1748,7 @@ INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)
 INITIALIZE_PASS_DEPENDENCY(AAResultsWrapperPass)
 INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
 INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)
+INITIALIZE_PASS_DEPENDENCY(PhiValuesWrapperPass)
 INITIALIZE_PASS_END(MemoryDependenceWrapperPass, "memdep",
                    "Memory Dependence Analysis", false, true)

@ -1758,6 +1766,7 @@ void MemoryDependenceWrapperPass::getAnalysisUsage(AnalysisUsage &AU) const {
  AU.setPreservesAll();
  AU.addRequired<AssumptionCacheTracker>();
  AU.addRequired<DominatorTreeWrapperPass>();
+  AU.addRequired<PhiValuesWrapperPass>();
  AU.addRequiredTransitive<AAResultsWrapperPass>();
  AU.addRequiredTransitive<TargetLibraryInfoWrapperPass>();
 }
@ -1773,7 +1782,8 @@ bool MemoryDependenceResults::invalidate(Function &F, const PreservedAnalyses &P
  // Check whether the analyses we depend on became invalid for any reason.
  if (Inv.invalidate<AAManager>(F, PA) ||
      Inv.invalidate<AssumptionAnalysis>(F, PA) ||
-      Inv.invalidate<DominatorTreeAnalysis>(F, PA))
+      Inv.invalidate<DominatorTreeAnalysis>(F, PA) ||
+      Inv.invalidate<PhiValuesAnalysis>(F, PA))
    return true;

  // Otherwise this analysis result remains valid.
@ -1789,6 +1799,7 @@ bool MemoryDependenceWrapperPass::runOnFunction(Function &F) {
  auto &AC = getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F);
  auto &TLI = getAnalysis<TargetLibraryInfoWrapperPass>().getTLI();
  auto &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();
-  MemDep.emplace(AA, AC, TLI, DT);
+  auto &PV = getAnalysis<PhiValuesWrapperPass>().getResult();
+  MemDep.emplace(AA, AC, TLI, DT, PV);
  return false;
 }
--- a/lib/Analysis/MustExecute.cpp
+++ b/lib/Analysis/MustExecute.cpp
@ -235,7 +235,7 @@ class MustExecuteAnnotatedWriter : public AssemblyAnnotationWriter {
  }


-  void printInfoComment(const Value &V, formatted_raw_ostream &OS) override {  
+  void printInfoComment(const Value &V, formatted_raw_ostream &OS) override {
    if (!MustExec.count(&V))
      return;

@ -245,7 +245,7 @@ class MustExecuteAnnotatedWriter : public AssemblyAnnotationWriter {
      OS << " ; (mustexec in " << NumLoops << " loops: ";
    else
      OS << " ; (mustexec in: ";
-    
+
    bool first = true;
    for (const Loop *L : Loops) {
      if (!first)
@ -264,6 +264,6 @@ bool MustExecutePrinter::runOnFunction(Function &F) {

  MustExecuteAnnotatedWriter Writer(F, DT, LI);
  F.print(dbgs(), &Writer);
-  
+
  return false;
 }
--- a/lib/Analysis/ScalarEvolution.cpp
+++ b/lib/Analysis/ScalarEvolution.cpp
@ -4839,7 +4839,7 @@ ScalarEvolution::createAddRecFromPHIWithCastsImpl(const SCEVUnknown *SymbolicPHI

  // Construct the extended SCEV: (Ext ix (Trunc iy (Expr) to ix) to iy)
  // for each of StartVal and Accum
-  auto getExtendedExpr = [&](const SCEV *Expr, 
+  auto getExtendedExpr = [&](const SCEV *Expr,
                             bool CreateSignExtend) -> const SCEV * {
    assert(isLoopInvariant(Expr, L) && "Expr is expected to be invariant");
    const SCEV *TruncatedExpr = getTruncateExpr(Expr, TruncTy);
@ -4935,11 +4935,11 @@ ScalarEvolution::createAddRecFromPHIWithCasts(const SCEVUnknown *SymbolicPHI) {
  return Rewrite;
 }

-// FIXME: This utility is currently required because the Rewriter currently 
-// does not rewrite this expression: 
-// {0, +, (sext ix (trunc iy to ix) to iy)} 
+// FIXME: This utility is currently required because the Rewriter currently
+// does not rewrite this expression:
+// {0, +, (sext ix (trunc iy to ix) to iy)}
 // into {0, +, %step},
-// even when the following Equal predicate exists: 
+// even when the following Equal predicate exists:
 // "%step == (sext ix (trunc iy to ix) to iy)".
 bool PredicatedScalarEvolution::areAddRecsEqualWithPreds(
    const SCEVAddRecExpr *AR1, const SCEVAddRecExpr *AR2) const {
--- a/lib/Analysis/TargetTransformInfo.cpp
+++ b/lib/Analysis/TargetTransformInfo.cpp
@ -721,7 +721,7 @@ struct ReductionData {
 static Optional<ReductionData> getReductionData(Instruction *I) {
  Value *L, *R;
  if (m_BinOp(m_Value(L), m_Value(R)).match(I))
-    return ReductionData(RK_Arithmetic, I->getOpcode(), L, R); 
+    return ReductionData(RK_Arithmetic, I->getOpcode(), L, R);
  if (auto *SI = dyn_cast<SelectInst>(I)) {
    if (m_SMin(m_Value(L), m_Value(R)).match(SI) ||
        m_SMax(m_Value(L), m_Value(R)).match(SI) ||
@ -730,8 +730,8 @@ static Optional<ReductionData> getReductionData(Instruction *I) {
        m_UnordFMin(m_Value(L), m_Value(R)).match(SI) ||
        m_UnordFMax(m_Value(L), m_Value(R)).match(SI)) {
      auto *CI = cast<CmpInst>(SI->getCondition());
-      return ReductionData(RK_MinMax, CI->getOpcode(), L, R); 
-    }   
+      return ReductionData(RK_MinMax, CI->getOpcode(), L, R);
+    }
    if (m_UMin(m_Value(L), m_Value(R)).match(SI) ||
        m_UMax(m_Value(L), m_Value(R)).match(SI)) {
      auto *CI = cast<CmpInst>(SI->getCondition());
@ -851,11 +851,11 @@ static ReductionKind matchPairwiseReduction(const ExtractElementInst *ReduxRoot,

  // We look for a sequence of shuffle,shuffle,add triples like the following
  // that builds a pairwise reduction tree.
-  //  
+  //
  //  (X0, X1, X2, X3)
  //   (X0 + X1, X2 + X3, undef, undef)
  //    ((X0 + X1) + (X2 + X3), undef, undef, undef)
-  //  
+  //
  // %rdx.shuf.0.0 = shufflevector <4 x float> %rdx, <4 x float> undef,
  //       <4 x i32> <i32 0, i32 2 , i32 undef, i32 undef>
  // %rdx.shuf.0.1 = shufflevector <4 x float> %rdx, <4 x float> undef,
@ -916,7 +916,7 @@ matchVectorSplittingReduction(const ExtractElementInst *ReduxRoot,

  // We look for a sequence of shuffles and adds like the following matching one
  // fadd, shuffle vector pair at a time.
-  //  
+  //
  // %rdx.shuf = shufflevector <4 x float> %rdx, <4 x float> undef,
  //                           <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
  // %bin.rdx = fadd <4 x float> %rdx, %rdx.shuf
@ -927,7 +927,7 @@ matchVectorSplittingReduction(const ExtractElementInst *ReduxRoot,

  unsigned MaskStart = 1;
  Instruction *RdxOp = RdxStart;
-  SmallVector<int, 32> ShuffleMask(NumVecElems, 0); 
+  SmallVector<int, 32> ShuffleMask(NumVecElems, 0);
  unsigned NumVecElemsRemain = NumVecElems;
  while (NumVecElemsRemain - 1) {
    // Check for the right reduction operation.
@ -1093,7 +1093,7 @@ int TargetTransformInfo::getInstructionThroughput(const Instruction *I) const {
  case Instruction::InsertElement: {
    const InsertElementInst * IE = cast<InsertElementInst>(I);
    ConstantInt *CI = dyn_cast<ConstantInt>(IE->getOperand(2));
-    unsigned Idx = -1; 
+    unsigned Idx = -1;
    if (CI)
      Idx = CI->getZExtValue();
    return getVectorInstrCost(I->getOpcode(),
@ -1104,7 +1104,7 @@ int TargetTransformInfo::getInstructionThroughput(const Instruction *I) const {
    // TODO: Identify and add costs for insert/extract subvector, etc.
    if (Shuffle->changesLength())
      return -1;
-    
+
    if (Shuffle->isIdentity())
      return 0;

--- a/lib/Analysis/ValueTracking.cpp
+++ b/lib/Analysis/ValueTracking.cpp
@ -71,7 +71,7 @@
 #include <cassert>
 #include <cstdint>
 #include <iterator>
-#include <utility>     
+#include <utility>

 using namespace llvm;
 using namespace llvm::PatternMatch;
@ -3828,7 +3828,7 @@ static bool checkRippleForSignedAdd(const KnownBits &LHSKnown,

  // If either of the values is known to be non-negative, adding them can only
  // overflow if the second is also non-negative, so we can assume that.
-  // Two non-negative numbers will only overflow if there is a carry to the 
+  // Two non-negative numbers will only overflow if there is a carry to the
  // sign bit, so we can check if even when the values are as big as possible
  // there is no overflow to the sign bit.
  if (LHSKnown.isNonNegative() || RHSKnown.isNonNegative()) {
@ -3855,7 +3855,7 @@ static bool checkRippleForSignedAdd(const KnownBits &LHSKnown,
  }

  // If we reached here it means that we know nothing about the sign bits.
-  // In this case we can't know if there will be an overflow, since by 
+  // In this case we can't know if there will be an overflow, since by
  // changing the sign bits any two values can be made to overflow.
  return false;
 }
@ -3905,7 +3905,7 @@ static OverflowResult computeOverflowForSignedAdd(const Value *LHS,
  // operands.
  bool LHSOrRHSKnownNonNegative =
      (LHSKnown.isNonNegative() || RHSKnown.isNonNegative());
-  bool LHSOrRHSKnownNegative = 
+  bool LHSOrRHSKnownNegative =
      (LHSKnown.isNegative() || RHSKnown.isNegative());
  if (LHSOrRHSKnownNonNegative || LHSOrRHSKnownNegative) {
    KnownBits AddKnown = computeKnownBits(Add, DL, /*Depth=*/0, AC, CxtI, DT);
@ -4454,7 +4454,7 @@ static SelectPatternResult matchMinMax(CmpInst::Predicate Pred,
  SPR = matchMinMaxOfMinMax(Pred, CmpLHS, CmpRHS, TrueVal, FalseVal, Depth);
  if (SPR.Flavor != SelectPatternFlavor::SPF_UNKNOWN)
    return SPR;
-  
+
  if (Pred != CmpInst::ICMP_SGT && Pred != CmpInst::ICMP_SLT)
    return {SPF_UNKNOWN, SPNB_NA, false};

@ -4630,7 +4630,7 @@ static SelectPatternResult matchSelectPattern(CmpInst::Predicate Pred,
    case FCmpInst::FCMP_OLE: return {SPF_FMINNUM, NaNBehavior, Ordered};
    }
  }
-  
+
  if (isKnownNegation(TrueVal, FalseVal)) {
    // Sign-extending LHS does not change its sign, so TrueVal/FalseVal can
    // match against either LHS or sext(LHS).
--- a/lib/AsmParser/LLParser.cpp
+++ b/lib/AsmParser/LLParser.cpp
@ -842,7 +842,7 @@ static void maybeSetDSOLocal(bool DSOLocal, GlobalValue &GV) {
 }

 /// parseIndirectSymbol:
-///   ::= GlobalVar '=' OptionalLinkage OptionalPreemptionSpecifier 
+///   ::= GlobalVar '=' OptionalLinkage OptionalPreemptionSpecifier
 ///                     OptionalVisibility OptionalDLLStorageClass
 ///                     OptionalThreadLocal OptionalUnnamedAddr
 //                      'alias|ifunc' IndirectSymbol
@ -3935,7 +3935,7 @@ bool LLParser::ParseMDField(LocTy Loc, StringRef Name, EmissionKindField &Result
  Lex.Lex();
  return false;
 }
-  
+
 template <>
 bool LLParser::ParseMDField(LocTy Loc, StringRef Name,
                            DwarfAttEncodingField &Result) {
--- a/lib/Bitcode/Writer/BitcodeWriter.cpp
+++ b/lib/Bitcode/Writer/BitcodeWriter.cpp
@ -3809,7 +3809,7 @@ void IndexBitcodeWriter::writeCombinedGlobalValueSummary() {
          continue;
        // The mapping from OriginalId to GUID may return a GUID
        // that corresponds to a static variable. Filter it out here.
-        // This can happen when 
+        // This can happen when
        // 1) There is a call to a library function which does not have
        // a CallValidId;
        // 2) There is a static variable with the  OriginalGUID identical
--- a/lib/CodeGen/AntiDepBreaker.h
+++ b/lib/CodeGen/AntiDepBreaker.h
@ -46,7 +46,7 @@ class LLVM_LIBRARY_VISIBILITY AntiDepBreaker {
                                         MachineBasicBlock::iterator End,
                                         unsigned InsertPosIndex,
                                         DbgValueVector &DbgValues) = 0;
-  
+
  /// Update liveness information to account for the current
  /// instruction, which will not be scheduled.
  virtual void Observe(MachineInstr &MI, unsigned Count,
--- a/lib/CodeGen/AsmPrinter/AddressPool.cpp
+++ b/lib/CodeGen/AsmPrinter/AddressPool.cpp
@ -24,8 +24,26 @@ unsigned AddressPool::getIndex(const MCSymbol *Sym, bool TLS) {
  return IterBool.first->second.Number;
 }

+
+void AddressPool::emitHeader(AsmPrinter &Asm, MCSection *Section) {
+  static const uint8_t AddrSize = Asm.getDataLayout().getPointerSize();
+  Asm.OutStreamer->SwitchSection(Section);
+
+  uint64_t Length = sizeof(uint16_t) // version
+                  + sizeof(uint8_t)  // address_size
+                  + sizeof(uint8_t)  // segment_selector_size
+                  + AddrSize * Pool.size(); // entries
+  Asm.emitInt32(Length); // TODO: Support DWARF64 format.
+  Asm.emitInt16(Asm.getDwarfVersion());
+  Asm.emitInt8(AddrSize);
+  Asm.emitInt8(0); // TODO: Support non-zero segment_selector_size.
+}
+
 // Emit addresses into the section given.
 void AddressPool::emit(AsmPrinter &Asm, MCSection *AddrSection) {
+  if (Asm.getDwarfVersion() >= 5)
+    emitHeader(Asm, AddrSection);
+
  if (Pool.empty())
    return;

--- a/lib/CodeGen/AsmPrinter/AddressPool.h
+++ b/lib/CodeGen/AsmPrinter/AddressPool.h
@ -50,6 +50,9 @@ class AddressPool {
  bool hasBeenUsed() const { return HasBeenUsed; }

  void resetUsedFlag() { HasBeenUsed = false; }
+
+private:
+  void emitHeader(AsmPrinter &Asm, MCSection *Section);
 };

 } // end namespace llvm
--- a/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
+++ b/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
@ -364,7 +364,9 @@ DwarfDebug::DwarfDebug(AsmPrinter *A, Module *M)
  else
    UseSectionsAsReferences = DwarfSectionsAsReferences == Enable;

-  GenerateTypeUnits = GenerateDwarfTypeUnits;
+  // Don't generate type units for unsupported object file formats.
+  GenerateTypeUnits =
+      A->TM.getTargetTriple().isOSBinFormatELF() && GenerateDwarfTypeUnits;

  TheAccelTableKind = computeAccelTableKind(
      DwarfVersion, GenerateTypeUnits, DebuggerTuning, A->TM.getTargetTriple());
@ -886,8 +888,7 @@ void DwarfDebug::endModule() {
    emitDebugInfoDWO();
    emitDebugAbbrevDWO();
    emitDebugLineDWO();
-    // Emit DWO addresses.
-    AddrPool.emit(*Asm, Asm->getObjFileLowering().getDwarfAddrSection());
+    emitDebugAddr();
  }

  // Emit info into the dwarf accelerator table sections.
@ -2136,7 +2137,7 @@ void DwarfDebug::emitDebugRanges() {
    return;
  }

-  if (getDwarfVersion() >= 5 && NoRangesPresent())
+  if (NoRangesPresent())
    return;

  // Start the dwarf ranges section.
@ -2297,6 +2298,12 @@ void DwarfDebug::emitDebugStrDWO() {
                         OffSec, /* UseRelativeOffsets = */ false);
 }

+// Emit DWO addresses.
+void DwarfDebug::emitDebugAddr() {
+  assert(useSplitDwarf() && "No split dwarf?");
+  AddrPool.emit(*Asm, Asm->getObjFileLowering().getDwarfAddrSection());
+}
+
 MCDwarfDwoLineTable *DwarfDebug::getDwoLineTable(const DwarfCompileUnit &CU) {
  if (!useSplitDwarf())
    return nullptr;
--- a/lib/CodeGen/AsmPrinter/DwarfDebug.h
+++ b/lib/CodeGen/AsmPrinter/DwarfDebug.h
@ -447,6 +447,9 @@ class DwarfDebug : public DebugHandlerBase {
  /// Emit the debug str dwo section.
  void emitDebugStrDWO();

+  /// Emit DWO addresses.
+  void emitDebugAddr();
+
  /// Flags to let the linker know we have emitted new style pubnames. Only
  /// emit it here if we don't have a skeleton CU for split dwarf.
  void addGnuPubAttributes(DwarfCompileUnit &U, DIE &D) const;
--- a/lib/CodeGen/AsmPrinter/DwarfExpression.h
+++ b/lib/CodeGen/AsmPrinter/DwarfExpression.h
@ -112,7 +112,7 @@ class DwarfExpression {
  uint64_t OffsetInBits = 0;
  unsigned DwarfVersion;

-  /// Sometimes we need to add a DW_OP_bit_piece to describe a subregister. 
+  /// Sometimes we need to add a DW_OP_bit_piece to describe a subregister.
  unsigned SubRegisterSizeInBits = 0;
  unsigned SubRegisterOffsetInBits = 0;

--- a/lib/CodeGen/AsmPrinter/DwarfFile.cpp
+++ b/lib/CodeGen/AsmPrinter/DwarfFile.cpp
@ -95,6 +95,6 @@ bool DwarfFile::addScopeVariable(LexicalScope *LS, DbgVariable *Var) {
    }
  } else {
    ScopeVars.Locals.push_back(Var);
-  }    
+  }
  return true;
 }
--- a/lib/CodeGen/AsmPrinter/DwarfUnit.cpp
+++ b/lib/CodeGen/AsmPrinter/DwarfUnit.cpp
@ -1182,7 +1182,7 @@ DIE *DwarfUnit::getOrCreateModule(const DIModule *M) {
    addString(MDie, dwarf::DW_AT_LLVM_include_path, M->getIncludePath());
  if (!M->getISysRoot().empty())
    addString(MDie, dwarf::DW_AT_LLVM_isysroot, M->getISysRoot());
-  
+
  return &MDie;
 }

@ -1691,7 +1691,7 @@ void DwarfUnit::emitCommonHeader(bool UseOffsets, dwarf::UnitType UT) {
 }

 void DwarfTypeUnit::emitHeader(bool UseOffsets) {
-  DwarfUnit::emitCommonHeader(UseOffsets, 
+  DwarfUnit::emitCommonHeader(UseOffsets,
                              DD->useSplitDwarf() ? dwarf::DW_UT_split_type
                                                  : dwarf::DW_UT_type);
  Asm->OutStreamer->AddComment("Type Signature");
--- a/lib/CodeGen/AtomicExpandPass.cpp
+++ b/lib/CodeGen/AtomicExpandPass.cpp
@ -362,19 +362,19 @@ IntegerType *AtomicExpand::getCorrespondingIntegerType(Type *T,

 /// Convert an atomic load of a non-integral type to an integer load of the
 /// equivalent bitwidth.  See the function comment on
-/// convertAtomicStoreToIntegerType for background.  
+/// convertAtomicStoreToIntegerType for background.
 LoadInst *AtomicExpand::convertAtomicLoadToIntegerType(LoadInst *LI) {
  auto *M = LI->getModule();
  Type *NewTy = getCorrespondingIntegerType(LI->getType(),
                                            M->getDataLayout());

  IRBuilder<> Builder(LI);
-  
+
  Value *Addr = LI->getPointerOperand();
  Type *PT = PointerType::get(NewTy,
                              Addr->getType()->getPointerAddressSpace());
  Value *NewAddr = Builder.CreateBitCast(Addr, PT);
-  
+
  auto *NewLI = Builder.CreateLoad(NewAddr);
  NewLI->setAlignment(LI->getAlignment());
  NewLI->setVolatile(LI->isVolatile());
@ -452,7 +452,7 @@ StoreInst *AtomicExpand::convertAtomicStoreToIntegerType(StoreInst *SI) {
  Type *NewTy = getCorrespondingIntegerType(SI->getValueOperand()->getType(),
                                            M->getDataLayout());
  Value *NewVal = Builder.CreateBitCast(SI->getValueOperand(), NewTy);
-  
+
  Value *Addr = SI->getPointerOperand();
  Type *PT = PointerType::get(NewTy,
                              Addr->getType()->getPointerAddressSpace());
@ -920,14 +920,14 @@ Value *AtomicExpand::insertRMWLLSCLoop(
 /// the equivalent bitwidth.  We used to not support pointer cmpxchg in the
 /// IR.  As a migration step, we convert back to what use to be the standard
 /// way to represent a pointer cmpxchg so that we can update backends one by
-/// one. 
+/// one.
 AtomicCmpXchgInst *AtomicExpand::convertCmpXchgToIntegerType(AtomicCmpXchgInst *CI) {
  auto *M = CI->getModule();
  Type *NewTy = getCorrespondingIntegerType(CI->getCompareOperand()->getType(),
                                            M->getDataLayout());

  IRBuilder<> Builder(CI);
-  
+
  Value *Addr = CI->getPointerOperand();
  Type *PT = PointerType::get(NewTy,
                              Addr->getType()->getPointerAddressSpace());
@ -935,8 +935,8 @@ AtomicCmpXchgInst *AtomicExpand::convertCmpXchgToIntegerType(AtomicCmpXchgInst *

  Value *NewCmp = Builder.CreatePtrToInt(CI->getCompareOperand(), NewTy);
  Value *NewNewVal = Builder.CreatePtrToInt(CI->getNewValOperand(), NewTy);
-  
-  
+
+
  auto *NewCI = Builder.CreateAtomicCmpXchg(NewAddr, NewCmp, NewNewVal,
                                            CI->getSuccessOrdering(),
                                            CI->getFailureOrdering(),
--- a/lib/CodeGen/BuiltinGCs.cpp
+++ b/lib/CodeGen/BuiltinGCs.cpp
@ -8,7 +8,7 @@
 //===----------------------------------------------------------------------===//
 //
 // This file contains the boilerplate required to define our various built in
-// gc lowering strategies.  
+// gc lowering strategies.
 //
 //===----------------------------------------------------------------------===//

--- a/lib/CodeGen/CriticalAntiDepBreaker.cpp
+++ b/lib/CodeGen/CriticalAntiDepBreaker.cpp
@ -530,7 +530,7 @@ BreakAntiDependencies(const std::vector<SUnit> &SUnits,
    // Kill instructions can define registers but are really nops, and there
    // might be a real definition earlier that needs to be paired with uses
    // dominated by this kill.
-    
+
    // FIXME: It may be possible to remove the isKill() restriction once PR18663
    // has been properly fixed. There can be value in processing kills as seen
    // in the AggressiveAntiDepBreaker class.
--- a/lib/CodeGen/GCMetadata.cpp
+++ b/lib/CodeGen/GCMetadata.cpp
@ -159,7 +159,7 @@ GCStrategy *GCModuleInfo::getGCStrategy(const StringRef Name) {
  auto NMI = GCStrategyMap.find(Name);
  if (NMI != GCStrategyMap.end())
    return NMI->getValue();
-  
+
  for (auto& Entry : GCRegistry::entries()) {
    if (Name == Entry.getName()) {
      std::unique_ptr<GCStrategy> S = Entry.instantiate();
@ -171,11 +171,11 @@ GCStrategy *GCModuleInfo::getGCStrategy(const StringRef Name) {
  }

  if (GCRegistry::begin() == GCRegistry::end()) {
-    // In normal operation, the registry should not be empty.  There should 
+    // In normal operation, the registry should not be empty.  There should
    // be the builtin GCs if nothing else.  The most likely scenario here is
-    // that we got here without running the initializers used by the Registry 
+    // that we got here without running the initializers used by the Registry
    // itself and it's registration mechanism.
-    const std::string error = ("unsupported GC: " + Name).str() + 
+    const std::string error = ("unsupported GC: " + Name).str() +
      " (did you remember to link and initialize the CodeGen library?)";
    report_fatal_error(error);
  } else
--- a/lib/CodeGen/GlobalISel/IRTranslator.cpp
+++ b/lib/CodeGen/GlobalISel/IRTranslator.cpp
@ -11,6 +11,7 @@
 //===----------------------------------------------------------------------===//

 #include "llvm/CodeGen/GlobalISel/IRTranslator.h"
+#include "llvm/ADT/PostOrderIterator.h"
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/ADT/ScopeExit.h"
 #include "llvm/ADT/SmallSet.h"
@ -33,6 +34,7 @@
 #include "llvm/CodeGen/TargetRegisterInfo.h"
 #include "llvm/CodeGen/TargetSubtargetInfo.h"
 #include "llvm/IR/BasicBlock.h"
+#include "llvm/IR/CFG.h"
 #include "llvm/IR/Constant.h"
 #include "llvm/IR/Constants.h"
 #include "llvm/IR/DataLayout.h"
@ -1503,6 +1505,8 @@ bool IRTranslator::translate(const Constant &C, unsigned Reg) {
      Ops.push_back(getOrCreateVReg(*CV->getOperand(i)));
    }
    EntryBuilder.buildMerge(Reg, Ops);
+  } else if (auto *BA = dyn_cast<BlockAddress>(&C)) {
+    EntryBuilder.buildBlockAddress(Reg, BA);
  } else
    return false;

@ -1611,19 +1615,20 @@ bool IRTranslator::runOnMachineFunction(MachineFunction &CurMF) {
    ArgIt++;
  }

-  // And translate the function!
-  for (const BasicBlock &BB : F) {
-    MachineBasicBlock &MBB = getMBB(BB);
+  // Need to visit defs before uses when translating instructions.
+  ReversePostOrderTraversal<const Function *> RPOT(&F);
+  for (const BasicBlock *BB : RPOT) {
+    MachineBasicBlock &MBB = getMBB(*BB);
    // Set the insertion point of all the following translations to
    // the end of this basic block.
    CurBuilder.setMBB(MBB);

-    for (const Instruction &Inst : BB) {
+    for (const Instruction &Inst : *BB) {
      if (translate(Inst))
        continue;

      OptimizationRemarkMissed R("gisel-irtranslator", "GISelFailure",
-                                 Inst.getDebugLoc(), &BB);
+                                 Inst.getDebugLoc(), BB);
      R << "unable to translate instruction: " << ore::NV("Opcode", &Inst);

      if (ORE->allowExtraAnalysis("gisel-irtranslator")) {
--- a/lib/CodeGen/GlobalISel/MachineIRBuilder.cpp
+++ b/lib/CodeGen/GlobalISel/MachineIRBuilder.cpp
@ -809,6 +809,15 @@ MachineIRBuilderBase::buildAtomicRMWUmin(unsigned OldValRes, unsigned Addr,
                        MMO);
 }

+MachineInstrBuilder
+MachineIRBuilderBase::buildBlockAddress(unsigned Res, const BlockAddress *BA) {
+#ifndef NDEBUG
+  assert(getMRI()->getType(Res).isPointer() && "invalid res type");
+#endif
+
+  return buildInstr(TargetOpcode::G_BLOCK_ADDR).addDef(Res).addBlockAddress(BA);
+}
+
 void MachineIRBuilderBase::validateTruncExt(unsigned Dst, unsigned Src,
                                            bool IsExtend) {
 #ifndef NDEBUG
--- a/lib/CodeGen/GlobalMerge.cpp
+++ b/lib/CodeGen/GlobalMerge.cpp
@ -56,7 +56,7 @@
 // - it makes linker optimizations less useful (order files, LOHs, ...)
 // - it forces usage of indexed addressing (which isn't necessarily "free")
 // - it can increase register pressure when the uses are disparate enough.
-// 
+//
 // We use heuristics to discover the best global grouping we can (cf cl::opts).
 //
 // ===---------------------------------------------------------------------===//
--- a/Show More
+++ b/Show More