[Aarch64] Customer lowering of CTPOP to SIMD should check for NEON
availability
This ensures llvm's AArch64 backend does not emit floating point
instructions if they are disabled.
Fix transformation of add with pc argument to adr for non-immediate
arguments.
This fixes an "Unimplemented" error when assembling certain ARM add
instructions with pc-relative arguments.
Reported by: sbruno
PR: 196412, 196423
PowerPC: CTR shouldn't fire if a TLS call is in the loop
Determining the address of a TLS variable results in a function call in
certain TLS models. This means that a simple ICmpInst might actually
result in invalidating the CTR register.
In such cases, do not attempt to rely on the CTR register for loop
optimization purposes.
This fixes PR22034.
Differential Revision: http://reviews.llvm.org/D6786
This fixes a "Invalid PPC CTR loop" error when compiling parts of libc
for PowerPC-32.
[PowerPC] Replace foul hackery with real calls to __tls_get_addr
My original support for the general dynamic and local dynamic TLS
models contained some fairly obtuse hacks to generate calls to
__tls_get_addr when lowering a TargetGlobalAddress. Rather than
generating real calls, special GET_TLS_ADDR nodes were used to wrap
the calls and only reveal them at assembly time. I attempted to
provide correct parameter and return values by chaining CopyToReg and
CopyFromReg nodes onto the GET_TLS_ADDR nodes, but this was also not
fully correct. Problems were seen with two back-to-back stores to TLS
variables, where the call sequences ended up overlapping with unhappy
results. Additionally, since these weren't real calls, the proper
register side effects of a call were not recorded, so clobbered values
were kept live across the calls.
The proper thing to do is to lower these into calls in the first
place. This is relatively straightforward; see the changes to
PPCTargetLowering::LowerGlobalTLSAddress() in PPCISelLowering.cpp.
The changes here are standard call lowering, except that we need to
track the fact that these calls will require a relocation. This is
done by adding a machine operand flag of MO_TLSLD or MO_TLSGD to the
TargetGlobalAddress operand that appears earlier in the sequence.
The calls to LowerCallTo() eventually find their way to
LowerCall_64SVR4() or LowerCall_32SVR4(), which call FinishCall(),
which calls PrepareCall(). In PrepareCall(), we detect the calls to
__tls_get_addr and immediately snag the TargetGlobalTLSAddress with
the annotated relocation information. This becomes an extra operand
on the call following the callee, which is expected for nodes of type
tlscall. We change the call opcode to CALL_TLS for this case. Back
in FinishCall(), we change it again to CALL_NOP_TLS for 64-bit only,
since we require a TOC-restore nop following the call for the 64-bit
ABIs.
During selection, patterns in PPCInstrInfo.td and PPCInstr64Bit.td
convert the CALL_TLS nodes into BL_TLS nodes, and convert the
CALL_NOP_TLS nodes into BL8_NOP_TLS nodes. This replaces the code
removed from PPCAsmPrinter.cpp, as the BL_TLS or BL8_NOP_TLS
nodes can now be emitted normally using their patterns and the
associated printTLSCall print method.
Finally, as a result of these changes, all references to get-tls-addr
in its various guises are no longer used, so they have been removed.
There are existing TLS tests to verify the changes haven't messed
anything up). I've added one new test that verifies that the problem
with the original code has been fixed.
This fixes a fatal "Bad machine code" error when compiling parts of
libgomp for 32-bit PowerPC.
Add parsing of 'foo@local".
Summary:
Currently, it supports generating, but not parsing, this expression.
Test added as well.
Test Plan: New test added, no regressions due to this.
Reviewers: hfinkel
Reviewed By: hfinkel
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D6672
Pull in r224494 from upstream llvm trunk (by Justin Hibbits):
Add a corresponding '@LOCAL' parse to match r224415.
Pointed out by Jim Grosbach.
[PowerPC] Add JMP_SLOT relocation definitions
This will be required by upcoming patches for LLDB support.
Patch by Justin Hibbits!
Pull in r221510 from upstream llvm trunk (by Justin Hibbits):
Add Position-independent Code model Module API.
Summary:
This makes PIC levels a Module flag attribute, which can be queried by the
backend. The flag is named `PIC Level`, and can have a value of:
0 - Backend-default
1 - Small-model (-fpic)
2 - Large-model (-fPIC)
These match the `-pic-level' command line argument for clang, and the value of the
preprocessor macro `__PIC__'.
Test Plan:
New flags tests specific for the 'PIC Level' module flag.
Tests to be added as part of a future commit for PowerPC, which will use this new API.
Reviewers: rafael, echristo
Reviewed By: rafael, echristo
Subscribers: rafael, llvm-commits
Differential Revision: http://reviews.llvm.org/D5882
Pull in r221791 from upstream llvm trunk (by Justin Hibbits):
Add support for small-model PIC for PowerPC.
Summary:
Large-model was added first. With the addition of support for multiple PIC
models in LLVM, now add small-model PIC for 32-bit PowerPC, SysV4 ABI. This
generates more optimal code, for shared libraries with less than about 16380
data objects.
Test Plan: Test cases added or updated
Reviewers: joerg, hfinkel
Reviewed By: hfinkel
Subscribers: jholewinski, mcrosier, emaste, llvm-commits
Differential Revision: http://reviews.llvm.org/D5399
Together, these changes implement small-model PIC support for PowerPC.
Thanks to Justin Hibbits and Roman Divacky for their assistance in
getting this working.
Divacky):
Introduce CPUStringIsValid() into MCSubtargetInfo and use it for ARM
.cpu parsing.
Previously .cpu directive in ARM assembler didnt switch to the new
CPU and therefore acted as a nop. This implemented real action for
.cpu and eg. allows to assembler FreeBSD kernel with -integrated-as.
Change the name to be in style.
Add a FIXME as requested by Renato Golin.
arm asm: Let .fpu enable instructions, PR20447.
I'm not very happy with duplicating the fpu->feature mapping in ARMAsmParser.cpp
and in clang's driver. See the bug for a patch that doesn't do that, and the
review thread [1] for why this duplication exists.
1: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140811/231052.html
This makes the .fpu directive work properly, so we can successfully
assemble several .S files using the directive, under lib/libc/arm.
Allow CP10/CP11 operations on ARMv5/v6
Those registers are VFP/NEON and vector instructions should be used instead,
but old cores rely on those co-processors to enable VFP unwinding. This change
was prompted by the libc++abi's unwinding routine and is also present in many
legacy low-level bare-metal code that we ought to compile/assemble.
Fixing bug PR20025 and allowing PR20529 to proceed with a fix in libc++abi.
This enables assembling certain ARM instructions used in libgcc.
Revert "Added inst combine transforms for single bit tests from Chris's note"
This reverts commit r210006, it miscompiled libapr which is used in who
knows how many projects.
A test has been added to ensure that we don't regress again.
This fixes a miscompilation in libapr, which caused problems in svnlite.
AArch64: add support for dynamic-loader relocations
LLD needs them, and it's good to be able to print them properly when
our object dumpers encounter them.
Patch by Daniel Stewart.
This is needed for supporting the upgrade to a newer LLDB snapshot.
which gcc in base cannot handle. Replace these with C++98 equivalents.
While here, add the patch for the adapted fix.
Reported by: bz, kib
Pointy hat to: dim
MFC after: 1 week
X-MFC-With: r274442
Totally forget deallocated SDNodes in SDDbgInfo.
What would happen before that commit is that the SDDbgValues associated with
a deallocated SDNode would be marked Invalidated, but SDDbgInfo would keep
a map entry keyed by the SDNode pointer pointing to this list of invalidated
SDDbgNodes. As the memory gets reused, the list might get wrongly associated
with another new SDNode. As the SDDbgValues are cloned when they are transfered,
this can lead to an exponential number of SDDbgValues being produced during
DAGCombine like in http://llvm.org/bugs/show_bug.cgi?id=20893
Note that the previous behavior wasn't really buggy as the invalidation made
sure that the SDDbgValues won't be used. This commit can be considered a
memory optimization and as such is really hard to validate in a unit-test.
This should fix abnormally large memory usage and resulting OOM crashes
when compiling certain ports with debug information.
Reported by: Dmitry Marakasov <amdmi3@amdmi3.ru>
Upstream PRs: http://llvm.org/PR19031http://llvm.org/PR20893
MFC after: 1 week
AsmParser: Disable Darwin-style macro argument expansion on non-darwin targets.
There is code in the wild that relies on $0 not being expanded.
This fixes some cases of using $ signs in literals being incorrectly
assembled.
Reported by: Richard Henderson
Upstream PR: http://llvm.org/PR21500
MFC after: 3 days
Set trunc store action to Expand for all X86 targets.
When compiling without SSE2, isTruncStoreLegal(F64, F32) would return
Legal, whereas with SSE2 it would return Expand. And since the Target
doesn't seem to actually handle a truncstore for double -> float, it
would just output a store of a full double in the space for a float
hence overwriting other bits on the stack.
Patch by Luqman Aden!
This should fix clang -O0 on i386 assigning garbage to floats, in
certain scenarios.
PR: 187437
Submitted by: cebd@gmail.com
Obtained from: http://llvm.org/viewvc/llvm-project?rev=217410&view=rev
MFC after: 3 days
Reverts 271025 but still functionally patches it. Original intent is still
the same. Pointed out by rdivacky.
MFV: Only emit movw on ARMv6T2
Building for the FreeBSD default target ARMv6 was emitting movw ASM on certain
test cases (found building qmake4/5 for ARM). Don't do that, moreover, the AS
in base doesn't understand this instruction for this target. One would need
to use --integrated-as to get this to build if desired.
http://llvm.org/viewvc/llvm-project?view=revision&revision=216989
Submitted by: ian
Reviewed by: dim
Obtained from: llvm.org
MFC after: 2 days
Relnotes: yes
Building for the FreeBSD default target ARMv6 was emitting movw ASM on certain
test cases (found building qmake4/5 for ARM). Don't do that, moreover, the AS
in base doesn't understand this instruction for this target. One would need
to use --integrated-as to get this to build if desired.
http://llvm.org/viewvc/llvm-project?view=revision&revision=216989
Submitted by: ian
Reviewed by: dim
Obtained from: llvm.org
MFC after: 2 days
[PPC64] Fix PR20071 (fctiduz generated for targets lacking that
instruction)
PR20071 identifies a problem in PowerPC's fast-isel implementation
for floating-point conversion to integer. The fctiduz instruction
was added in Power ISA 2.06 (i.e., Power7 and later). However, this
instruction is being generated regardless of which 64-bit PowerPC
target is selected.
The intent is for fast-isel to punt to DAG selection when this
instruction is not available. This patch implements that change.
For testing purposes, the existing fast-isel-conversion.ll test adds
a RUN line for -mcpu=970 and tests for the expected code generation.
Additionally, the existing test fast-isel-conversion-p5.ll was found
to be incorrectly expecting the unavailable instruction to be
generated. I've removed these test variants since we have adequate
coverage in fast-isel-conversion.ll.
This is needed to compile clang with debug+asserts on older powerpc64
and ppc970 targets.
Requested by: jhibbits
MFC after: 3 days
Legalizer: Add support for splitting insert_subvectors.
We handle this by spilling the whole thing to the stack and doing the
insertion as a store.
PR19492. This happens in real code because the vectorizer creates
v2i128 when AVX is enabled.
This fixes a "fatal error: error in backend: Do not know how to split
the result of this operator!" message encountered during compilation of
the net-p2p/libtorrent-rasterbar port.
Reported by: Evgeniy <iron@mail.ua>
MFC after: 3 days
Debug info: Support variadic functions.
Variadic functions have an unspecified parameter tag after the last
argument. In IR this is represented as an unspecified parameter in the
subroutine type.
Paired commit with CFE r202185.
rdar://problem/13690847
This re-applies r202184 + a bugfix in DwarfDebug's argument handling.
This merge includes a change to use the LLVM 3.4 API in
lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp:
DwarfUnit -> CompileUnit
Sponsored by: DARPA, AFRL
ISel: Make VSELECT selection terminate in cases where the condition type has to
be split and the result type widened.
When the condition of a vselect has to be split it makes no sense widening the
vselect and thereby widening the condition. We end up in an endless loop of
widening (vselect result type) and splitting (condition mask type) doing this.
Instead, split both the condition and the vselect and widen the result.
I ran this over the test suite with i686 and mattr=+sse and saw no regressions.
Fixes PR18036.
With this fix the original problem case from the graphics/rawtherapee
port (posted in http://llvm.org/PR18036 ) now compiles within ~97MB RSS.
Reported by: mandree
MFC after: 1 week
Reland "Fix miscompile of MS inline assembly with stack realignment"
This re-lands commit r196876, which was reverted in r196879.
The tests have been fixed to pass on platforms with a stack alignment
larger than 4.
Update to clang side tests will land shortly.
Pull in r196986 from upstream llvm trunk (by Reid Kleckner):
Revert the backend fatal error from r196939
The combination of inline asm, stack realignment, and dynamic allocas
turns out to be too common to reject out of hand.
ASan inserts empy inline asm fragments and uses aligned allocas.
Compiling any trivial function containing a dynamic alloca with ASan is
enough to trigger the check.
XFAIL the test cases that would be miscompiled and add one that uses the
relevant functionality.
Pull in r202930 from upstream llvm trunk (by Hans Wennborg):
Check for dynamic allocas and inline asm that clobbers sp before building
selection dag (PR19012)
In X86SelectionDagInfo::EmitTargetCodeForMemcpy we check with MachineFrameInfo
to make sure that ESI isn't used as a base pointer register before we choose to
emit rep movs (which clobbers esi).
The problem is that MachineFrameInfo wouldn't know about dynamic allocas or
inline asm that clobbers the stack pointer until SelectionDAGBuilder has
encountered them.
This patch fixes the problem by checking for such things when building the
FunctionLoweringInfo.
Differential Revision: http://llvm-reviews.chandlerc.com/D2954
Together, these commits fix the problem encountered in the devel/emacs
port on the i386 architecture, where a combination of stack realignment,
alloca() and memcpy() could incidentally clobber the %esi register,
leading to segfaults in the temacs build-time utility.
See also: http://llvm.org/PR18171 and http://llvm.org/PR19012
Reported by: ashish
PR: ports/183064
MFC after: 1 week
Fix a crash that occurs when PWD is invalid.
MCJIT needs to be able to run in hostile environments, even when PWD
is invalid. There's no need to crash MCJIT in this case.
The obvious fix is to simply leave MCContext's CompilationDir empty
when PWD can't be determined. This way, MCJIT clients,
and other clients that link with LLVM don't need a valid working directory.
If we do want to guarantee valid CompilationDir, that should be done
only for clients of getCompilationDir(). This is as simple as checking
for an empty string.
The only current use of getCompilationDir is EmitGenDwarfInfo, which
won't conceivably run with an invalid working dir. However, in the
purely hypothetically and untestable case that this happens, the
AT_comp_dir will be omitted from the compilation_unit DIE.
This should help fix assertions occurring with ports-mgmt/tinderbox,
when it is using jails, and sometimes invalidates clang's current
working directory.
Reported by: decke
MFC after: 2 weeks
X-MFC-With: r261991
Lower FNEG just like FABS to fneg[ds] and fmov[ds], thus avoiding
expensive libcall. Also, Qp_neg is not implemented on at least
FreeBSD. This is also what gcc is doing.
SPARC: Implement TRAP lowering. Matches what GCC emits.
This lets clang emit "ta 5" for trap instructions on sparc64, instead of
emitting a call to abort(), making it possible to link the kernel.
Implement SPARCv9 atomic_swap_64 with a pseudo.
The SWAP instruction only exists in a 32-bit variant, but the 64-bit
atomic swap can be implemented in terms of CASX, like the other
atomic rmw primitives.
Submitted by: rdivacky
all of the features in the current working draft of the upcoming C++
standard, provisionally named C++1y.
The code generator's performance is greatly increased, and the loop
auto-vectorizer is now enabled at -Os and -O2 in addition to -O3. The
PowerPC backend has made several major improvements to code generation
quality and compile time, and the X86, SPARC, ARM32, Aarch64 and SystemZ
backends have all seen major feature work.
Release notes for llvm and clang can be found here:
<http://llvm.org/releases/3.4/docs/ReleaseNotes.html>
<http://llvm.org/releases/3.4/tools/clang/docs/ReleaseNotes.html>
MFC after: 1 month
Don't use nopl in cpus that don't support it.
Patch by Mikulas Patocka. I added the test. I checked that for cpu names that
gas knows about, it also doesn't generate nopl.
The modified cpus:
i686 - there are i686-class CPUs that don't have nopl: Via c3, Transmeta
Crusoe, Microsoft VirtualBox - see
https://bbs.archlinux.org/viewtopic.php?pid=775414
k6, k6-2, k6-3, winchip-c6, winchip2 - these are 586-class CPUs
via c3 c3-2 - see https://bugs.archlinux.org/task/19733 as a proof that
Via c3 and c3-Nehemiah don't have nopl
PR: bin/185777
MFC after: 3 days
X86: cvtpi2ps is just an SSE instruction with MMX operands. It has no AVX
equivalent.
Give it the right register format so we can also emit it when AVX is enabled.
This should fix a "Cannot select: intrinsic %llvm.x86.sse.cvtpi2ps" fatal error
in clang while building the gnuradio port for amd64.
Reported by: db
MFC after: 3 days
The basic problem is that some mainstream programs cannot deal with the way
clang optimizes tail calls, as in this example:
int foo(void);
int bar(void) {
return foo();
}
where the call is transformed to:
calll .L0$pb
.L0$pb:
popl %eax
.Ltmp0:
addl $_GLOBAL_OFFSET_TABLE_+(.Ltmp0-.L0$pb), %eax
movl foo@GOT(%eax), %eax
popl %ebp
jmpl *%eax # TAILCALL
However, the GOT references must all be resolved at dlopen() time, and so this
approach cannot be used with lazy dynamic linking (e.g. using RTLD_LAZY), which
usually populates the PLT with stubs that perform the actual resolving.
This patch changes X86TargetLowering::LowerCall() to skip tail call
optimization, if the called function is a global or external symbol.
This fixes problems with loading X.org driver modules, which could occur
when X.org was compiled on i386 with tailcall optimization on, for which
ports r312583 was committed as a workaround. After this change, the
workaround can be removed.
MFC after: 3 days
CaptureTracking: Plug a loophole in the "too many uses" heuristic.
The heuristic was added to avoid spending too much compile time in a
specially crafted test case (PR17461, PR16474) with many uses on a
select or bitcast instruction can still trigger the slow case. Add a
check for that case.
This only affects compile time, don't have a good way to test it.
This fixes the excessive compile time spent on a specific file of the
graphics/rawtherapee port.
Reported by: mandree
MFC after: 3 days
X86: Don't fold spills into SSE operations if the stack is unaligned.
Regalloc can emit unaligned spills nowadays, but we can't fold the
spills into SSE ops if we can't guarantee alignment. PR12250.
This fixes unaligned SSE accesses (leading to a SIGBUS) which could
occur in the ffmpeg ports.
Approved by: re (kib)
Reported by: tijl
MFC after: 3 days
Add ms_abi and sysv_abi attribute handling.
Based on a patch by Benno Rice!
This will help to develop EFI support.
Approved by: re (kib)
Verified by: benno
MFC after: 1 week
Remove invalid assert in DAGTypeLegalizer::RemapValue
There is a comment at the top of DAGTypeLegalizer::PerformExpensiveChecks
which, in part, says:
// Note that these invariants may not hold momentarily when processing a node:
// the node being processed may be put in a map before being marked Processed.
Unfortunately, this assert would be valid only if the above-mentioned invariant
held unconditionally. This was causing llc to assert when, in fact,
everything was fine.
Thanks to Richard Sandiford for investigating this issue!
Fixes PR16562.
This fixes assertions which could occur in the multimedia/ffmpeg1 and
multimedia/ffmpeg2 ports.
Approved by: re (hrs)
Reported by: Matthias Apitz <guru@unixarea.de>
MFC after: 3 days
The X86FixupLEAs pass for Intel Atom must not call
convertToThreeAddress on ADD16rr opcodes, if src1 != src, since that
would cause convertToThreeAddress to try to create a virtual register.
This is not permitted after register allocation, which is when the
X86FixupLEAs pass runs.
This patch fixes PR16785.
Pull in r191715 from upstream llvm trunk:
Forgot to add a break statement.
This should enable building the x11-toolskits/libXaw port with
CPUTYPE=atom.
Approved by: re (gjb)
Reported by: Kenta Suzumoto <kentas@hush.com>
MFC after: 3 days
ISelDAG: spot chain cycles involving MachineNodes
Previously, the DAGISel function WalkChainUsers was spotting that it
had entered already-selected territory by whether a node was a
MachineNode (amongst other things). Since it's fairly common practice
to insert MachineNodes during ISelLowering, this was not the correct
check.
Looking around, it seems that other nodes get their NodeId set to -1
upon selection, so this makes sure the same thing happens to all
MachineNodes and uses that characteristic to determine whether we
should stop looking for a loop during selection.
This should fix PR15840.
Specifically, this fixes the long-standing assertion failure when
compiling the multimedia/gstreamer port on i386. Thanks to Tijl
Coosemans for his help in getting upstream to fix it.
Approved by: re (marius)
InstCombine: Check for zero shift amounts before subtracting one
causing integer overflow.
PR17026. Also avoid undefined shifts and shift amounts larger than 64
bits (those are always undef because we can't represent integer types
that large).
This should fix assertion failures when building the emulators/xmame
port.
Reported by: bapt
Author: Daniel Malea <daniel.malea@intel.com>
Date: Thu Aug 1 21:18:16 2013 +0000
Fixed the Intel-syntax X86 disassembler to respect the (existing)
option for hexadecimal immediates, to match AT&T syntax. This also
brings a new option for C-vs-MASM-style hex.
Patch by Richard Mitton
Reviewed: http://llvm-reviews.chandlerc.com/D1243
FastISel can only append to basic blocks.
Compute the insertion point from the end of the basic block instead of
skipping labels from the front.
This caused failures in landing pads when live-in copies where inserted
before instruction selection.
I missed this change in r252720; without it, certain compilation flags
can cause exception labels to not be generated, but still referenced,
leading to link errors.
Reported by: zeising
MFC after: 3 days
Add MachineBasicBlock::addLiveIn().
This function adds a live-in physical register to an MBB and ensures
that it is copied to a virtual register immediately.
Pull in r185615 from llvm trunk:
Live-in copies go *after* EH_LABELs.
This will soon be tested by exception handling working at all.
Pull in r185617 from llvm trunk:
Simplify landing pad lowering.
Stop using the ISD::EXCEPTIONADDR and ISD::EHSELECTION when lowering
landing pad arguments. These nodes were previously legalized into
CopyFromReg nodes, but that never worked properly because the
CopyFromReg node weren't guaranteed to be scheduled at the top of the
basic block.
This meant the exception pointer and selector registers could be
clobbered before being copied to a virtual register.
This patch copies the two physical registers to virtual registers at
the beginning of the basic block, and lowers the landingpad instruction
directly to two CopyFromReg nodes reading the *virtual* registers. This
is safe because virtual registers don't get clobbered.
A future patch will remove the ISD::EXCEPTIONADDR and ISD::EHSELECTION
nodes.
Together, these changes fix llvm PR 16038 ('qt4 webcore file results in
"Bad machine code: Using an undefined physical register"'), and should
make it possible again to compile the www/qt4-webkit port again on the
i386 arch, without using a CPUTYPE=i686 or higher setting.
the stack in a leaf function that uses TLS.
The issue is, when using TLS, the function is no longer a leaf as it calls
__aeabi_read_tp. With statically linked programs this is not an issue as
it doesn't make use of the stack, however with dynamically linked
applications we enter rtld which does use the stack and makes assumptions
about it's alignment.
This is only a temporary fix until a better patch can be made and submitted
upstream.
Make PrologEpilogInserter save/restore all callee saved registers in
functions which call __builtin_unwind_init()
__builtin_unwind_init() is an undocumented gcc intrinsic which has
this effect, and is used in libgcc_eh.
Goes part of the way toward fixing PR8541.
This obsoletes the ugly hack to libgcc's unwind code from r245272, and
should also work for other arches, so revert the hack too.
[ms-inline asm] Fix a crasher when we fail on a direct match.
The issue was that the MatchingInlineAsm and VariantID args to the
MatchInstructionImpl function weren't being set properly. Specifically, when
parsing intel syntax, the parser thought it was parsing inline assembly in the
at&t dialect; that will never be the case.
The crash was caused when the emitter tried to emit the instruction, but the
operands weren't set. When parsing inline assembly we only set the opcode, not
the operands, which is used to lookup the instruction descriptor.
rdar://13854391 and PR15945
Also, this commit reverts r176036. Now that we're correctly parsing the intel
syntax the pushad/popad don't match properly. I've reimplemented that fix using
a MnemonicAlias.
Pull in r183907 from llvm trunk:
X86: Make the cmov aliases work with intel syntax too.
These commits make a number of Intel-style inline assembly mnemonics
aliases (occurring in several ports) work properly, which could cause
assertions otherwise.
Reported by: kwm, bapt
PR15662: Optimized debug info produces out of order function
parameters
When a function is inlined we lazily construct the variables
representing the function's parameters. After that, we add any
remaining unused parameters.
If the function doesn't use all the parameters, or uses them out of
order, then the DWARF would produce them in that order, producing a
parameter order that doesn't match the source.
This fix causes us to always keep the arg variables at the start of
the variable list & in the original order from the source.
Reported by: avg
MFC after: 1 week
LoopVectorize: LoopSimplify can't canonicalize loops with an
indirectbr in it, don't assert on those cases.
Fixes PR16139.
This should fix clang assertion failures when optimizing at -O3, similar
to:
Assertion failed: (TheLoop->getLoopPreheader() && "No preheader!!"),
function canVectorize, file
contrib/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp, line 2171.
Reported by: O. Hartmann <ohartman@zedat.fu-berlin.de>
PR: ports/178332, ports/178977
MFC after: 3 days
LoopVectorize: getConsecutiveVector must respect signed arithmetic
We were passing an i32 to ConstantInt::get where an i64 was needed and we must
also pass the sign if we pass negatives numbers. The start index passed to
getConsecutiveVector must also be signed.
Should fix PR15882.
This should fix Firefox crashes some people have been reporting, when it
is compiled with -O3.
LoopVectorizer: Fix 15830. When scalarizing and unrolling stores make
sure that the order in which the elements are scalarized is the same
as the original order.
This fixes a miscompilation in FreeBSD's regex library.
This should fix lib/libc/regex/regcomp.c at -O3 with clang 3.3 r178860
on CPUs with SSE. Before this change, the vectorizer could incorrectly
rearrange the second loop in computejumps(), leading to possibly invalid
entries in the re_gets::charjump table.
The net result was that for example "sed s/@CC@/foo/" failed to work
correctly, leading to trouble with many configure scripts.
upcoming 3.3 release (branching and freezing expected in a few weeks).
Preliminary release notes can be found at the usual location:
<http://llvm.org/docs/ReleaseNotes.html>
An MFC is planned once the actual 3.3 release is finished.
X86: Disable cmov-memory patterns on subtargets without cmov.
Fixes PR15115.
For the i386 arch, this should enable cmov instructions only on
-march=pentiumpro and higher. Since our default CPU is i486, cmov
instructions will now be disabled by default.
MFC after: 1 week
MCParser: Reject .balign with non-pow2 alignments.
GNU as rejects them and there are configure scripts in the wild that
check if the assembler rejects ".align 3" to determine whether the
alignment is in bytes or powers of two.
MFC after: 3 days
X86: Disable generation of rep;movsl when %esi is used as a base pointer.
This happens when there is both stack realignment and a dynamic alloca in the
function. If we overwrite %esi (rep;movsl uses fixed registers) we'll lose the
base pointer and the next register spill will write into oblivion.
Fixes PR15249 and unbreaks firefox on i386/freebsd. Mozilla uses dynamic allocas
and freebsd a 4 byte stack alignment.
MFC after: 1 week
Fix another SROA crasher, PR14601.
This was a silly oversight, we weren't pruning allocas which were used
by variable-length memory intrinsics from the set that could be widened
and promoted as integers. Fix that.
This should fix the following assertion failure:
Assertion failed: (CanSROA), function visitUsers, file
/usr/src/lib/clang/libllvmscalaropts/../../../contrib/llvm/lib/Transforms/Scalar/SROA.cpp,
line 2395.
Reported by: gerald
X86: fcmov doesn't handle all possible EFLAGS, fall back to a branch
for the others.
Otherwise it will try to use SSE patterns and fail horribly if sse is
disabled.
Fixes PR14035.
This should fix the following assertion failure:
Assertion failed: (Reg >= X86::FP0 && Reg <= X86::FP6 && "Expected FP
register!"), function getFPReg, file
contrib/llvm/lib/Target/X86/X86FloatingPoint.cpp, line 330.
which can show up when compiling contrib/compiler-rt, using -march=i686
through -march=pentium3 (CPU's which do support fcmov, but don't support
SSE2).
MFC after: 1 week
Make sure always-inline functions get inlined. <rdar://problem/12423986>
Without this change, when the estimated cost for inlining a function with
an "alwaysinline" attribute was lower than the inlining threshold, the
getInlineCost function was returning that estimated cost rather than the
special InlineCost::AlwaysInlineCost value. That is fine in the normal
inlining case, but it can fail when the inliner considers the opportunity
cost of inlining into an internal or linkonce-odr function. It may decide
not to inline the always-inline function in that case. The fix here is just
to make getInlineCost always return the special value for always-inline
functions. I ran into this building clang with libc++. Tablegen failed to
link because of an always-inline function that was not inlined. I have been
unable to reduce the testcase down to a reasonable size.
This should fix the link errors that were reported when atf-run was
compiled with clang -stdlib=libc++. In this case, at -O3 optimization,
some calls to basic_ios::clear() were not inlined, even when the
function was marked __always_inline__.
Reported by: Jan Beich <jbeich@tormail.org>
MFC after: 1 week
X86: Disable long nops for all cpus prior to pentiumpro/i686.
This is the safest approach for now. If you think long nops matter a
lot for performance, compile with -march=i686 or higher. :)
MFC after: 3 days
When creating MCAsmBackend pass the CPU string as well. In X86AsmBackend
store this and use it to not emit long nops when the CPU is geode which
doesnt support them.
Fixes PR11212.
Pull in r164133 from upstream clang trunk:
Follow up on llvm r164132.
This should prevent illegal instructions when building world on Geode
CPUs (e.g. Soekris).
MFC after: 3 days
X86: Emitting x87 fsin/fcos for sinf/cosf is not safe without unsafe
fp math.
This should make clang emit calls to libm for sinf/cosf by default.
MFC after: 1 week
Allow unique_file to take a mode for file permissions, but default
to user only read/write.
and r156592 from upstream clang trunk:
For final output files create them with mode 0664 to match other
compilers and expected defaults.
This should fix clang creating files with mode 0600.
Reported by: James <james@hicag.org>
MFC after: 3 days
Make sure the non-SSE lowering for fences correctly clobbers EFLAGS.
PR11768.
In particular, this fixes segfaults during the build of devel/icu on
i386. The __sync_synchronize() builtin used for implementing icu's
internal barrier could lead to incorrect behaviour.
MFC after: 3 days
too-thorough cleanup of unused files, in r213695. Also make sure these
get installed under /usr/share/doc.
Submitted by: rwatson, brooks
Pointy hat to: dim
MFC after: 3 days
LLVM_HOSTTRIPLE that is defined during the cross-tools stage.
Using clang, you can now build amd64 world and kernel on i386, and vice
versa. Other arches still need work.
There are several bugfixes in this update, but the most important one is
to ensure __start_ and __stop_ symbols for linker sets and kernel module
metadata are always emitted in object files:
http://llvm.org/bugs/show_bug.cgi?id=9292
Before this fix, if you compiled kernel modules with clang, they would
not be properly processed by kldxref, and if they had any dependencies,
the kernel would fail to load those. Another problem occurred when
attempting to mount a tmpfs filesystem, which would result in 'operation
not supported by device'.
This commit merges the latest LLVM sources from the vendor space. It
also updates the build glue to match the new sources. Clang's version
number is changed to match LLVM's, which means /usr/include/clang/2.0
has been renamed to /usr/include/clang/2.8.
Obtained from: projects/clangbsd