[SLP] Vectorize for all-constant entries.
This should fix libc++'s iostream initialization SIGBUSing on amd64,
whenever the global cout symbol is not aligned to 16 bytes.
Some further explanation: libc++'s iostream.cpp contains the definitions
of std::cout, std::cerr and so on. These global objects are effectively
declared with an alignment of 8 bytes. When an executable is linked
against libc++.so, it can sometimes get a copy of the global object,
which is then at the same alignment.
However, with clang 3.7.0, the initialization of these global objects
will incorrectly use SSE instructions (e.g. movdqa), whenever the
optimization level is high enough, and SSE is enabled, such as on amd64.
When any of these objects is not aligned to 16 bytes, this will result
in a SIGBUS during iostream initialization. In contrast, clang 3.6.x
and earlier took the 8 byte alignment into consideration, and avoided
SSE for those particular operations.
After bisecting of upstream changes, I found that the above revision
caused the change of this behavior, so I am reverting it now as a
workaround, while a discussion and test case is being prepared for
upstream.
set div/rem default values to 'expensive' in TargetTransformInfo's
cost model
...because that's what the cost model was intended to do.
As discussed in D12882, this fix has a temporary unintended
consequence for SimplifyCFG: it causes us to not speculate an fdiv.
However, two wrongs make PR24818 right, and two wrongs make PR24343
act right even though it's really still wrong.
I intend to correct SimplifyCFG and add to CodeGenPrepare to account
for this cost model change and preserve the righteousness for the bug
report cases.
https://llvm.org/bugs/show_bug.cgi?id=24818https://llvm.org/bugs/show_bug.cgi?id=24343
Differential Revision: http://reviews.llvm.org/D12882
This fixes the too-eager fdiv hoisting in pow(), which could lead to
unexpected floating point exceptions.
Add missing atomic libcall support.
Support for emitting libcalls for __atomic_fetch_nand and
__atomic_{add,sub,and,or,xor,nand}_fetch was missing; add it, and some
test cases.
Differential Revision: http://reviews.llvm.org/D10847
This fixes "cannot compile this atomic library call yet" errors when
compiling code which calls the above builtins, on arm < v6.
Reverting r239883 and r240720:
------------------------------------------------------------------------
r239883 | echristo | 2015-06-17 00:09:32 -0700 (Wed, 17 Jun 2015) | 16 lines
Update the intel intrinsic headers to use the target attribute support.
This involved removing the conditional inclusion and replacing them
with target attributes matching the original conditional inclusion
and checks. The testcase update removes the macro checks for each
file and replaces them with usage of the __target__ attribute, e.g.:
int __attribute__((__target__(("sse3")))) foo(int a) {
_mm_mwait(0, 0);
return 4;
}
This usage does require the enclosing function have the requisite
__target__ attribute for inlining and code generation - also for
any macro intrinsic uses in the enclosing function. There's no change
for existing uses of the intrinsic headers.
------------------------------------------------------------------------
------------------------------------------------------------------------
r240720 | silvas | 2015-06-25 16:22:11 -0700 (Thu, 25 Jun 2015) | 6 lines
Remove `requires` for x86 CPU features.
Ever since the target attributes change, we don't need to guard these
headers with `requires`. Actually it's a bit worse, because if we do
then they are included textually under the covers, causing declarations
to appear in submodules they aren't supposed to be in.
------------------------------------------------------------------------
This reverts the changes to the intrinsics headers in trunk, which could
result in some ports' configure scripts misdetecting SSE (and higher)
support.
[SCCP] Turn loads of null into undef instead of zero initialized values
Surprisingly, this is a correctness issue: the mmx type exists for
calling convention purposes, LLVM doesn't have a zero representation for
them.
This partially fixes PR23999.
Pull in r241143 from upstream llvm trunk (by David Majnemer):
[LoopUnroll] Use undef for phis with no value live
We would create a phi node with a zero initialized operand instead of
undef in the case where no value was originally available. This was
problematic for x86_mmx which has no null value.
These fix a "Cannot create a null constant of that type!" error when
compiling the graphics/sdl2_gfx port with MMX enabled.
Reported by: amdmi3
Notable upstream commits (upstream revision in parens):
- Add a JSON producer to LLDB (228636)
- Don't crash on bad DWARF expression (228729)
- Add support of DWARFv3 DW_OP_form_tls_address (231342)
- Assembly profiler for MIPS64 (232619)
- Handle FreeBSD/arm64 core files (233273)
- Read/Write register for MIPS64 (233685)
- Rework LLDB system initialization (233758)
- SysV ABI for aarch64 (236098)
- MIPS software single stepping (236696)
- FreeBSD/arm live debugging support (237303)
- Assembly profiler for mips32 (237420)
- Parse function name from DWARF DW_AT_abstract_origin (238307)
- Improve LLDB prompt handling (238313)
- Add real time signals support to FreeBSDSignals (238316)
- Fix race in IOHandlerProcessSTDIO (238423)
- MIPS64 Branch instruction emulation for SW single stepping (238820)
- Improve OSType initialization in elf object file's arch_spec (239148)
- Emulation of MIPS64 floating-point branch instructions (239996)
- ABI Plugin for MIPS32 (239997)
- ABI Plugin for MIPS64 (240123)
- MIPS32 branch emulation and single stepping (240373)
- Improve instruction emulation based stack unwinding on ARM (240533)
- Add branch emulation to aarch64 instruction emulator (240769)
MC: Allow multiple comma-separated expressions on the .uleb128 directive.
For compatiblity with GNU as. Binutils documents this as
'.uleb128 expressions'. Subtle, isn't it?
Reported by: sbruno
PR: 199554
MFC after: 3 days
Fix assert instantiating string init of static variable
... when the variable's type is a typedef of a ConstantArrayType. Just
look through the typedef (and any other sugar). We only use the
constant array type here to get the element count.
This fixes an assertion failure when building the games/redeclipse port.
Reported by: amdmi3
As is described at http://llvm.org/bugs/show_bug.cgi?id=22408, the GNU
linkers ld.bfd and ld.gold currently only support a subset of the
whole range of AArch64 ELF TLS relocations. Furthermore, they assume
that some of the code sequences to access thread-local variables are
produced in a very specific sequence. When the sequence is not as the
linker expects, it can silently mis-relaxe/mis-optimize the
instructions.
Even if that wouldn't be the case, it's good to produce the exact
sequence, as that ensures that linkers can perform optimizing
relaxations.
This patch:
* implements support for 16MiB TLS area size instead of 4GiB TLS area
size. Ideally clang would grow an -mtls-size option to allow support
for both, but that's not part of this patch.
* by default doesn't produce local dynamic access patterns, as even
modern ld.bfd and ld.gold linkers do not support the associated
relocations. An option (-aarch64-elf-ldtls-generation) is added to
enable generation of local dynamic code sequence, but is off by
default.
* makes sure that the exact expected code sequence for local dynamic
and general dynamic accesses is produced, by making use of a new
pseudo instruction. The patch also removes two
(AArch64ISD::TLSDESC_BLR, AArch64ISD::TLSDESC_CALL) pre-existing
AArch64-specific pseudo SDNode instructions that are superseded by
the new one (TLSDESC_CALLSEQ).
Submitted by: Kristof Beyls
Differential Revision: https://reviews.freebsd.org/D2175
ARM: treat [N x i32] and [N x i64] as AAPCS composite types
The logic is almost there already, with our special homogeneous
aggregate handling. Tweaking it like this allows front-ends to emit
AAPCS compliant code without ever having to count registers or add
discarded padding arguments.
Only arrays of i32 and i64 are needed to model AAPCS rules, but I
decided to apply the logic to all integer arrays for more consistency.
This fixes a possible "Unexpected member type for HA" error when
compiling lib/msun/bsdsrc/b_tgamma.c for armv6.
Reported by: Jakub Palider <jpa@semihalf.com>
These are generated, and not "optimized" in any way, since I am not
entirely sure of the syntax or format of this type of file. Feel free
to suggest ways of shortening these lists.
The general idea is the same for all three files, though:
* Get rid of upstream build infrastructure (CMakeLists, Makefiles, etc)
* Delete tests, tools and utilities we don't want or use (including
samples)
* Remove various bits of upstream metadata files that we don't want or
use (.arcconfig, .gitignore, etc)
LoopRotate: When reconstructing loop simplify form don't split edges
from indirectbrs.
Yet another chapter in the endless story. While this looks like we
leave the loop in a non-canonical state this replicates the logic in
LoopSimplify so it doesn't diverge from the canonical form in any way.
http://llvm.org/PR21968
This fixes a "Cannot split critical edge from IndirectBrInst" assertion
failure when building the devel/radare2 port.
PR: 195480, 196987
MFC after: 3 days
FreeBSD core files have no section table and thus LLDB's OS and vendor
detection logic does not work. If we encounter such an ELF file, update
an unknown OS to match the host.
This is not really the correct way to handle this, but more extensive
rework of ObjectFileELF will be needed and this change restores cross-
arch core debugging until that can be completed.
There's an unfortunate layering issue between LLDB's Process/POSIX and
Process/{FreeBSD,Linux}, exposed by a refactoring in upstream revision
218568. Work around it by adding explicit #if defined(__FreeBSD__)
guards to include the correct header.
[mips] Enable arithmetic and binary operations for the i128 data type.
Summary:
This patch adds support for some operations that were missing from
128-bit integer types (add/sub/mul/sdiv/udiv... etc.). With these
changes we can support the __int128_t and __uint128_t data types
from C/C++.
Depends on D7125
Reviewers: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7143
This fixes "error in backend" messages, when compiling parts of
compiler-rt using 128-bit integer types for mips64.
Reported by: sbruno
PR: 197259
[FastIsel][X86] Fix invalid register replacement for bool args
Summary:
Consider the following IR:
%3 = load i8* undef
%4 = trunc i8 %3 to i1
%5 = call %jl_value_t.0* @foo(..., i1 %4, ...)
ret %jl_value_t.0* %5
Bools (that are the result of direct truncs) are lowered as whatever
the argument to the trunc was and a "and 1", causing the part of the
MBB responsible for this argument to look something like this:
%vreg8<def,tied1> = AND8ri %vreg7<kill,tied0>, 1, %EFLAGS<imp-def>; GR8:%vreg8,%vreg7
Later, when the load is lowered, it will insert
%vreg15<def> = MOV8rm %vreg14, 1, %noreg, 0, %noreg; mem:LD1[undef] GR8:%vreg15 GR64:%vreg14
but remember to (at the end of isel) replace vreg7 by vreg15. Now for
the bug. In fast isel lowering, we mistakenly mark vreg8 as the result
of the load instead of the trunc. This adds a fixup to have
vreg8 replaced by whatever the result of the load is as well, so
we end up with
%vreg15<def,tied1> = AND8ri %vreg15<kill,tied0>, 1, %EFLAGS<imp-def>; GR8:%vreg15
which is an SSA violation and causes problems later down the road.
This fixes PR21557.
Test Plan: Test test case from PR21557 is added to the test suite.
Reviewers: ributzka
Reviewed By: ributzka
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D6245
This fixes a possible assertion failure when compiling toolbox.cxx from
LibreOffice 4.3.5.
Reported by: kwm
[X86] Convert esp-relative movs of function arguments to pushes, step 2
This moves the transformation introduced in r223757 into a separate MI pass.
This allows it to cover many more cases (not only cases where there must be a
reserved call frame), and perform rudimentary call folding. It still doesn't
have a heuristic, so it is enabled only for optsize/minsize, with stack
alignment <= 8, where it ought to be a fairly clear win.
(Re-commit of r227728)
Differential Revision: http://reviews.llvm.org/D6789
This helps to get sys/boot/i386/boot2 below the required size again,
when optimizing with -Oz.
Allows Clang to use LLVM's fixes-x18 option
This patch allows clang to have llvm reserve the x18
platform register on AArch64. FreeBSD will use this in the kernel for
per-cpu data but has no need to reserve this register in userland so
will need this flag to reserve it.
This uses llvm r226664 to allow this register to be reserved.
Patch by Andrew Turner.
Requested by: andrew
AArch64: add backend option to reserve x18 (platform register)
AAPCS64 says that it's up to the platform to specify whether x18 is
reserved, and a first step on that way is to add a flag controlling
it.
From: Andrew Turner <andrew@fubar.geek.nz>
Requested by: andrew
triple ids
This only allows testing and does not change the defaults for mips/mips64.
They still build/use gcc by default.
Differential Revision: https://reviews.freebsd.org/D1190
Reviewed by: dim
[Aarch64] Customer lowering of CTPOP to SIMD should check for NEON
availability
This ensures llvm's AArch64 backend does not emit floating point
instructions if they are disabled.
Fix transformation of add with pc argument to adr for non-immediate
arguments.
This fixes an "Unimplemented" error when assembling certain ARM add
instructions with pc-relative arguments.
Reported by: sbruno
PR: 196412, 196423
PR20228: don't retain a pointer to a vector element after the
container has been resized.
This fixes a possible crash when compiling certain parts of libc++'s
type_traits header.
PowerPC: CTR shouldn't fire if a TLS call is in the loop
Determining the address of a TLS variable results in a function call in
certain TLS models. This means that a simple ICmpInst might actually
result in invalidating the CTR register.
In such cases, do not attempt to rely on the CTR register for loop
optimization purposes.
This fixes PR22034.
Differential Revision: http://reviews.llvm.org/D6786
This fixes a "Invalid PPC CTR loop" error when compiling parts of libc
for PowerPC-32.
[PowerPC] Replace foul hackery with real calls to __tls_get_addr
My original support for the general dynamic and local dynamic TLS
models contained some fairly obtuse hacks to generate calls to
__tls_get_addr when lowering a TargetGlobalAddress. Rather than
generating real calls, special GET_TLS_ADDR nodes were used to wrap
the calls and only reveal them at assembly time. I attempted to
provide correct parameter and return values by chaining CopyToReg and
CopyFromReg nodes onto the GET_TLS_ADDR nodes, but this was also not
fully correct. Problems were seen with two back-to-back stores to TLS
variables, where the call sequences ended up overlapping with unhappy
results. Additionally, since these weren't real calls, the proper
register side effects of a call were not recorded, so clobbered values
were kept live across the calls.
The proper thing to do is to lower these into calls in the first
place. This is relatively straightforward; see the changes to
PPCTargetLowering::LowerGlobalTLSAddress() in PPCISelLowering.cpp.
The changes here are standard call lowering, except that we need to
track the fact that these calls will require a relocation. This is
done by adding a machine operand flag of MO_TLSLD or MO_TLSGD to the
TargetGlobalAddress operand that appears earlier in the sequence.
The calls to LowerCallTo() eventually find their way to
LowerCall_64SVR4() or LowerCall_32SVR4(), which call FinishCall(),
which calls PrepareCall(). In PrepareCall(), we detect the calls to
__tls_get_addr and immediately snag the TargetGlobalTLSAddress with
the annotated relocation information. This becomes an extra operand
on the call following the callee, which is expected for nodes of type
tlscall. We change the call opcode to CALL_TLS for this case. Back
in FinishCall(), we change it again to CALL_NOP_TLS for 64-bit only,
since we require a TOC-restore nop following the call for the 64-bit
ABIs.
During selection, patterns in PPCInstrInfo.td and PPCInstr64Bit.td
convert the CALL_TLS nodes into BL_TLS nodes, and convert the
CALL_NOP_TLS nodes into BL8_NOP_TLS nodes. This replaces the code
removed from PPCAsmPrinter.cpp, as the BL_TLS or BL8_NOP_TLS
nodes can now be emitted normally using their patterns and the
associated printTLSCall print method.
Finally, as a result of these changes, all references to get-tls-addr
in its various guises are no longer used, so they have been removed.
There are existing TLS tests to verify the changes haven't messed
anything up). I've added one new test that verifies that the problem
with the original code has been fixed.
This fixes a fatal "Bad machine code" error when compiling parts of
libgomp for 32-bit PowerPC.
Add parsing of 'foo@local".
Summary:
Currently, it supports generating, but not parsing, this expression.
Test added as well.
Test Plan: New test added, no regressions due to this.
Reviewers: hfinkel
Reviewed By: hfinkel
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D6672
Pull in r224494 from upstream llvm trunk (by Justin Hibbits):
Add a corresponding '@LOCAL' parse to match r224415.
Pointed out by Jim Grosbach.
[PowerPC] Add JMP_SLOT relocation definitions
This will be required by upcoming patches for LLDB support.
Patch by Justin Hibbits!
Pull in r221510 from upstream llvm trunk (by Justin Hibbits):
Add Position-independent Code model Module API.
Summary:
This makes PIC levels a Module flag attribute, which can be queried by the
backend. The flag is named `PIC Level`, and can have a value of:
0 - Backend-default
1 - Small-model (-fpic)
2 - Large-model (-fPIC)
These match the `-pic-level' command line argument for clang, and the value of the
preprocessor macro `__PIC__'.
Test Plan:
New flags tests specific for the 'PIC Level' module flag.
Tests to be added as part of a future commit for PowerPC, which will use this new API.
Reviewers: rafael, echristo
Reviewed By: rafael, echristo
Subscribers: rafael, llvm-commits
Differential Revision: http://reviews.llvm.org/D5882
Pull in r221791 from upstream llvm trunk (by Justin Hibbits):
Add support for small-model PIC for PowerPC.
Summary:
Large-model was added first. With the addition of support for multiple PIC
models in LLVM, now add small-model PIC for 32-bit PowerPC, SysV4 ABI. This
generates more optimal code, for shared libraries with less than about 16380
data objects.
Test Plan: Test cases added or updated
Reviewers: joerg, hfinkel
Reviewed By: hfinkel
Subscribers: jholewinski, mcrosier, emaste, llvm-commits
Differential Revision: http://reviews.llvm.org/D5399
Together, these changes implement small-model PIC support for PowerPC.
Thanks to Justin Hibbits and Roman Divacky for their assistance in
getting this working.
Implement vaarg lowering for ppc32. Lowering of scalars and
aggregates is supported. Complex numbers are not.
This adds va_args support for PowerPC (32 bit) to clang.
Implement vaarg lowering for ppc32. Lowering of scalars and
aggregates is supported. Complex numbers are not.
This adds va_args support for PowerPC (32 bit) to clang.
Reviewed by: jhibbits
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D1308