[PowerPC] Refactor soft-float support, and enable PPC64 soft float
This change enables soft-float for PowerPC64, and also makes
soft-float disable all vector instruction sets for both 32-bit and
64-bit modes. This latter part is necessary because the PPC backend
canonicalizes many Altivec vector types to floating-point types, and
so soft-float breaks scalarization support for many operations. Both
for embedded targets and for operating-system kernels desiring
soft-float support, it seems reasonable that disabling hardware
floating-point also disables vector instructions (embedded targets
without hardware floating point support are unlikely to have Altivec,
etc. and operating system kernels desiring not to use floating-point
registers to lower syscall cost are unlikely to want to use vector
registers either). If someone needs this to work, we'll need to
change the fact that we promote many Altivec operations to act on
v4f32. To make it possible to disable Altivec when soft-float is
enabled, hardware floating-point support needs to be expressed as a
positive feature, like the others, and not a negative feature,
because target features cannot have dependencies on the disabling of
some other feature. So +soft-float has now become -hard-float.
Fixes PR26970.
Pull in r283061 from upstream clang trunk (by Hal Finkel):
[PowerPC] Enable soft-float for PPC64, and +soft-float -> -hard-float
Enable soft-float support on PPC64, as the backend now supports it.
Also, the backend now uses -hard-float instead of +soft-float, so set
the target features accordingly.
Fixes PR26970.
Reported by: Mark Millard
PR: 214433
[PPC] Set SP after loading data from stack frame, if no red zone is
present
Follow-up to r280705: Make sure that the SP is only restored after
all data is loaded from the stack frame, if there is no red zone.
This completes the fix for
https://llvm.org/bugs/show_bug.cgi?id=26519.
Differential Revision: https://reviews.llvm.org/D24466
Reported by: Mark Millard
PR: 214433
Call Frame Optimization on i386 and libunwind, by disallowing the
optimization for i386-freebsd12.
This should fix some instances of broken exception handling when frame
pointers are omitted, in particular some unittests run during the build
of editors/libreoffice.
This hack will be removed as soon as upstream has implemented a more
permanent fix for this problem.
Upstream PR: https://llvm.org/bugs/show_bug.cgi?id=30879
Reviewed by: emaste
PR: 212343
[AArch64] Don't blindly lower f16/f128 FCCMPs.
Instead, extend f16 (like we do when lowering a standalone SETCC),
and let f128 be legalized to the RT calls.
Fixes PR26803.
This fixes a fatal "Cannot select" backend error when building the
net/freerdp port for AArch64.
PR: 214380
MFC after: 3 days
[AArch64] PR28877: Don't assume we're running after legalization when
creating vcvtfp2fxs
Summary:
The DAG combine transformation that was generating the
aarch64_neon_vcvtfp2fxs node was assuming that all inputs where legal
and wasn't accounting that the input could be a v4f64 if we're trying
to do the transformation before legalization. We now bail out in this
case.
All illegal types besides v4f64 were already rejected.
Fixes https://llvm.org/bugs/show_bug.cgi?id=28877
Reviewers: jmolloy
Subscribers: aemerson, rengolin, llvm-commits
Differential Revision: https://reviews.llvm.org/D23261
This fixes several ports on AArch64.
Requested by: andrew
MFC after: 3 days
Previously most messages included a newline in the string, but a few of
them were missing. Fix these and simplify by just adding the newline in
the _LIBUNWIND_LOG macro itself.
While here correct 'libuwind' typo (missing 'n').
Upstream LLVM libunwind commits r280086 and r280103.
[x86] don't try to create a vector integer inst for an SSE1 target
(PR30512)
This bug was introduced with:
http://reviews.llvm.org/rL272511
We need to restrict the lowering to v4f32 comparisons because that's
all SSE1 can handle.
This should fix:
https://llvm.org/bugs/show_bug.cgi?id=28044
This avoids a "Do not know how to custom type legalize this operation"
error when building the multimedia/ffmpeg port on i386 with SSE enabled.
[PPC] Claim stack frame before storing into it, if no red zone is
present
Unlike PPC64, PPC32/SVRV4 does not have red zone. In the absence of
it there is no guarantee that this part of the stack will not be
modified by any interrupt. To avoid this, make sure to claim the
stack frame first before storing into it.
This fixes https://llvm.org/bugs/show_bug.cgi?id=26519.
Differential Revision: https://reviews.llvm.org/D24093
Add ISD::EH_DWARF_CFA, simplify @llvm.eh.dwarf.cfa on Mips, fix on
PowerPC
LLVM has an @llvm.eh.dwarf.cfa intrinsic, used to lower the
GCC-compatible __builtin_dwarf_cfa() builtin. As pointed out in
PR26761, this is currently broken on PowerPC (and likely on ARM as
well). Currently, @llvm.eh.dwarf.cfa is lowered using:
ADD(FRAMEADDR, FRAME_TO_ARGS_OFFSET)
where FRAME_TO_ARGS_OFFSET defaults to the constant zero. On x86,
FRAME_TO_ARGS_OFFSET is lowered to 2*SlotSize. This setup, however,
does not work for PowerPC. Because of the way that the stack layout
works, the canonical frame address is not exactly (FRAMEADDR +
FRAME_TO_ARGS_OFFSET) on PowerPC (there is a lower save-area offset
as well), so it is not just a matter of implementing
FRAME_TO_ARGS_OFFSET for PowerPC (unless we redefine its semantics --
We can do that, since it is currently used only for
@llvm.eh.dwarf.cfa lowering, but the better to directly lower the CFA
construct itself (since it can be easily represented as a
fixed-offset FrameIndex)). Mips currently does this, but by using a
custom lowering for ADD that specifically recognizes the (FRAMEADDR,
FRAME_TO_ARGS_OFFSET) pattern.
This change introduces a ISD::EH_DWARF_CFA node, which by default
expands using the existing logic, but can be directly lowered by the
target. Mips is updated to use this method (which simplifies its
implementation, and I suspect makes it more robust), and updates
PowerPC to do the same.
Fixes PR26761.
Differential Revision: https://reviews.llvm.org/D24038
[PowerPC] Don't spill the frame pointer twice
When a function contains something, such as inline asm, which
explicitly clobbers the register used as the frame pointer, don't
spill it twice. If we need a frame pointer, it will be saved/restored
in the prologue/epilogue code. Explicitly spilling it again will
reuse the same spill slot used by the prologue/epilogue code, thus
clobbering the saved value. The same applies to the base-pointer or
PIC-base register.
Partially fixes PR26856. Thanks to Ulrich for his analysis and the
small inline-asm reproducer.
[PowerPC] Add support for -mlongcall
The "long call" option forces the use of the indirect calling
sequence for all calls (even those that don't really need it). GCC
provides this option; This is helpful, under certain circumstances,
for building very-large binaries, and some other specialized use
cases.
Fixes PR19098.
Pull in r280041 from upstream clang trunk (by Hal Finkel):
[PowerPC] Add support for -mlongcall
Add support for GCC's PowerPC -mlongcall option; the backend supports
the corresponding target feature as of r280040.
Fixes PR19098.
Don't reduce the width of vector mul if the target doesn't support
SSE2.
The patch is to fix PR30298, which is caused by rL272694. The
solution is to bail out if the target has no SSE2.
Differential Revision: https://reviews.llvm.org/D24288
This fixes building the multimedia/libx264 port on i386.
[AArch64] Return the correct size for TLSDESC_CALLSEQ
The branch relaxation pass is computing the wrong offsets because it assumes
TLSDESC_CALLSEQ eats up 4 bytes, when in fact it is lowered to an instruction
sequence taking up 16 bytes. This can become a problem in huge files with lots
of TLS accesses, as it may slowly move branch targets out of the range computed
by the branch relaxation pass.
Fixes PR24234 https://llvm.org/bugs/show_bug.cgi?id=24234
Differential Revision: https://reviews.llvm.org/D22870
This fixes "error in backend: fixup value out of range" when compiling
the misc/talkfilters port for AArch64.
Reported by: sbruno
PR: 201762
MFC after: 3 days
positive diagnostics from -Wvarargs about enum parameters, e.g.:
cddl/contrib/opensolaris/lib/libnvpair/libnvpair.c:388:15: error: passing an object that undergoes default argument promotion to 'va_start' has undefined behavior
[-Werror,-Wvarargs]
va_start(ap, which);
^
cddl/contrib/opensolaris/lib/libnvpair/libnvpair.c:382:66: note: parameter of type 'enum nvlist_prtctl_fmt' is declared here
nvlist_prtctl_dofmt(nvlist_prtctl_t pctl, enum nvlist_prtctl_fmt which, ...)
^
Fix for pr24346: arm asm label calculation error in sub
Some ARM instructions encode 32-bit immediates as a 8-bit integer
(0-255) and a 4-bit rotation (0-30, even) in its least significant 12
bits. The original fixup, FK_Data_4, patches the instruction by the
value bit-to-bit, regardless of the encoding. For example, assuming
the label L1 and L2 are 0x0 and 0x104 respectively, the following
instruction:
add r0, r0, #(L2 - L1) ; expects 0x104, i.e., 260
would be assembled to the following, which adds 1 to r0, instead of
260:
e2800104 add r0, r0, #4, 2 ; equivalently 1
The new fixup kind fixup_arm_mod_imm takes care of the encoding:
e2800f41 add r0, r0, #260
Patch by Ting-Yuan Huang!
This fixes label calculation for ARM assembly, and is needed to enable
ARM assembly sources for OpenSSL.
Requested by: jkim
MFC after: 3 days
[X86] AMD Bobcat CPU (btver1) doesn't support XSAVE
btver1 is a SSSE3/SSE4a only CPU - it doesn't have AVX and doesn't
support XSAVE.
Differential Revision: http://reviews.llvm.org/D17682
Pull in r262782 from upstream llvm trunk (by Simon Pilgrim):
[X86] AMD Bobcat CPU (btver1) doesn't support XSAVE
btver1 is a SSSE3/SSE4a only CPU - it doesn't have AVX and doesn't
support XSAVE.
Differential Revision: http://reviews.llvm.org/D17683
This ensures clang does not emit AVX instructions for CPUTYPE=btver1.
Reported by: Michel Depeige <demik+freebsd@lostwave.net>
PR: 211864
MFC after: 3 days
_Unwind_Exception is required to be double word aligned. GCC has
interpreted this to mean "use the maximum useful alignment for the
target" so follow that lead.
Obtained from: LLVM review D22543
FreeBSD uses LLVM's libunwind on FreeBSD/arm64 today (and we expect to
use it more widely in the future) and it requires the EH frame segment
in static binaries.
Reviewed by: dim
Obtained from: Clang commit r266123
MFC after: 3 days
Relnotes: yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D7250
For historical reasons Darwin/i386 has ebp and esp swapped in the
eh_frame register numbering. That is:
Darwin Other
Reg # eh_frame eh_frame DWARF
===== ======== ======== =====
4 ebp esp esp
5 esp ebp ebp
Although the UNW_X86_* constants are not supposed to be coupled to
DWARF / eh_frame numbering they are currently conflated in LLVM
libunwind, and thus we require the non-Darwin numbering.
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
This may be reworked upstream but in the interim should address the
stack usage issue reported in the PR.
PR: 206384
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
The key improvement is that it may be built without cross-unwinding
support, which significantly reduces the stack space requirement.
MFC after: 1 week
Relnotes: yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D7123
Only attempt to detect AVG if SSE2 is available
Summary:
In PR29973 Sanjay Patel reported an assertion failure when a certain
loop was optimized, for a target without SSE2 support. It turned out
this was because of the AVG pattern detection introduced in rL253952.
Prevent the assertion failure by bailing out early in
`detectAVGPattern()`, if the target does not support SSE2.
Also add a minimized test case.
Reviewers: congh, eli.friedman, spatel
Subscribers: emaste, llvm-commits
Differential Revision: http://reviews.llvm.org/D20905
This should fix assertion failures ("Requires at least SSE2!") when
building the games/0ad port with CPUTYPE=pentium3.
Reported by: madpilot
[VectorUtils] Fix nasty use-after-free
In truncateToMinimalBitwidths() we were RAUW'ing an instruction then
erasing it. However, that intruction could be cached in the map we're
iterating over. The first check is "I->use_empty()" which in most
cases would return true, as the (deleted) object was RAUW'd first so
would have zero use count. However in some cases the object could
have been polluted or written over and this wouldn't be the case.
Also it makes valgrind, asan and traditionalists who don't like their
compiler to crash sad.
No testcase as there are no externally visible symptoms apart from a
crash if the stars align.
Fixes PR26509.
This should fix crashes when building a number of ports on arm64.
Reported by: andrew
Make __FreeBSD_cc_version predefined macro configurable at build time
The `FreeBSDTargetInfo` class has always set the `__FreeBSD_cc_version`
predefined macro to a rather static value, calculated from the major OS
version.
In the FreeBSD base system, we will start incrementing the value of this
macro whenever we make any signifant change to clang, so we need a way
to configure the macro's value at build time.
Use `FREEBSD_CC_VERSION` for this, which we can define in the FreeBSD
build system using either the `-D` command line option, or an include
file. Stock builds will keep the earlier value.
Differential Revision: http://reviews.llvm.org/D20037
Follow-up commits will start using the __FreeBSD_cc_version to determine
whether a bootstrap compiler has to be built during buildworld.
Pass dwarf-version to cc1as.
Fix PR26999 - crashing in cc1as with any '*bsd' target.
This should fix possible crashes when using -g in combination with
-save-temps.
[X86] Emit a proper ADJCALLSTACKDOWN in EmitLoweredTLSAddr
We forgot to add the second machine operand to our ADJCALLSTACKDOWN,
resulting in crashes in PEI.
This fixes PR27071.
This should fix an assertion failure during buildworld, when using -Os,
and targeting either i386 directly, or building the 32-bit libraries on
amd64.
Reported by: Eric Camachat <eric.camachat@gmail.com>
Add <atomic> to ThreadPool.h, since std::atomic is used
Summary:
Apparently, when compiling with gcc 5.3.2 for powerpc64, the order of
headers is such that it gets an error about std::atomic<> use in
ThreadPool.h, since this header is not included explicitly. See also:
https://llvm.org/bugs/show_bug.cgi?id=27058
Fix this by including <atomic>. Patch by Bryan Drewery.
Reviewers: chandlerc, joker.eph
Subscribers: bdrewery, llvm-commits
Differential Revision: http://reviews.llvm.org/D18460
the safe point to insert the prologue and epilogue of the function) on
X86. This prevents problems with some functions using TLS, such as in
jemalloc, and which was the cause for Address Sanitizer crashes. The
correct fix is still being discussed upstream.
Fix PR26134: When substituting into default template arguments, keep
CurContext unchanged.
Or, do not set Sema's CurContext to the template declaration's when
substituting into default template arguments of said template
declaration.
If we do push the template declaration context on to Sema, and the
template declaration is at namespace scope, Sema can get confused and
try and do odr analysis when substituting into default template
arguments, even though the substitution could be occurring within a
dependent context.
I'm not sure why this was being done, perhaps there was concern that
if a default template argument referred to a previous template
parameter, it might not be found during substitution - but all
regression tests pass, and I can't craft a test that would cause it
to fails (if some one does, please inform me, and i'll craft a
different fix for the PR).
This patch removes a single line of code, but unfortunately adds more
than it removes, because of the tests. Some day I still hope to
commit a patch that removes far more lines than it adds, while
leaving clang better for it ;)
Sorry that r253590 ("Change the expression evaluation context from
Unevaluated to ConstantEvaluated while substituting into non-type
template argument defaults") caused the PR!
This fix will be merged to the upstream release_38 branch soon, but we
need it now, to fix a failure in the databases/sfcgal port.
[DwarfDebug] Move MergeValues to .cpp, NFC
Pull in r257979 from upstream llvm trunk, by Keno Fischer:
[DwarfDebug] Don't merge DebugLocEntries if their pieces overlap
Summary:
Later in DWARF emission we check that DebugLocEntries have
non-overlapping pieces, so we should create any such entries
by merging here.
Fixes PR26163.
Reviewers: aprantl
Differential Revision: http://reviews.llvm.org/D16249
Again, these will be merged to the official release_38 branch soon, but
we need them ASAP.
be merged to the official release_38 branch soon, but we need it ASAP):
Stop increasing alignment of externally-visible globals on ELF
platforms.
With ELF, the alignment of a global variable in a shared library will
get copied into an executables linked against it, if the executable even
accesss the variable. So, it's not possible to implicitly increase
alignment based on access patterns, or you'll break existing binaries.
This happened to affect libc++'s std::cout symbol, for example. See
thread: http://thread.gmane.org/gmane.comp.compilers.clang.devel/45311
(This is a re-commit of r257719, without the bug reported in
PR26144. I've tweaked the code to not assert-fail in
enforceKnownAlignment when computeKnownBits doesn't recurse far enough
to find the underlying Alloca/GlobalObject value.)
Differential Revision: http://reviews.llvm.org/D16145
was not recognized anymore for arm targets. Fix this by adding the
correct sub-arch to the xscale definition in ARMTargetParser.def. This
fix (from Andrew Turner) has also been submitted upstream.
llvm's LinkAllPasses.h. This caused some of the calls not to be
emitted, if the optimization level was -O2 or higher.
Conversely, if you used -O1 or lower, calls to e.g. RunningOnValgrind()
would be emitted, leading to link failures, because we did not include
Valgrind.cpp into libllvmsupport. Therefore, add it unconditionally.
Noticed by: ian
from upstream clang trunk, which sets the default debug tuning back to
gdb. The lldb debug tuning is not yet grokked completely by our ELF
manipulation tools.
As with previous imports a number of plugins not immediately relevant
to FreeBSD have been excluded:
ABIMacOSX_i386
ABIMacOSX_arm
ABIMacOSX_arm64
ABISysV_hexagon
AppleObjCRuntimeV2
AppleObjCRuntimeV1
SystemRuntimeMacOSX
RenderScriptRuntime
GoLanguageRuntime
GoLanguage
ObjCLanguage
ObjCPlusPlusLanguage
ObjectFilePECOFF
DynamicLoaderWindowsDYLD
platform_linux
platform_netbsd
PlatformWindows
PlatformKalimba
platform_android
DynamicLoaderMacOSXDYLD
ObjectContainerUniversalMachO
PlatformRemoteiOS
PlatformMacOSX
OperatingSystemGo
printed with -v. We have historically put a date stamp there (roughly
corresponding to the date of import), but this has never been used for
anything, and the patch has also never been upstreamed, so let's get rid
of it now.
bugfix-only release, with no new features.
Please note that from 3.5.0 onwards, clang and llvm require C++11
support to build; see UPDATING for more information.
breakpoint. The value doesn't need to be adjusted as it is already
correctly returned from the kernel.
This allows lldb to set breakpoints, and stop on them, however more work
is needed, for example single stepping fails to stop.
Discussed with: emaste
Change this to DWARF2, in the simplest way possible. (Upstream, this
was fixed in clang trunk r250173, but this was done along with a lot of
shuffling around of debug option handling, so it cannot be applied
as-is.)
Noticed by: des
MFC after: 3 days
Refactor library decision for -fopenmp support from Darwin into a
function for sharing with other platforms.
Pull in r248424 from upstream clang trunk (by Jörg Sonnenberger):
Push OpenMP linker flags after linker input on Darwin. Don't add any
libraries if -nostdlib is specified. Test.
Pull in r248426 from upstream clang trunk (by Jörg Sonnenberger):
Support linking against OpenMP runtime on NetBSD.
Pull in r250657 from upstream clang trunk (by Dimitry Andric):
Support linking against OpenMP runtime on FreeBSD.
[x86] Fix wrong lowering of vsetcc nodes (PR25080).
Function LowerVSETCC (in X86ISelLowering.cpp) worked under the wrong
assumption that for non-AVX512 targets, the source type and destination type
of a type-legalized setcc node were always the same type.
This assumption was unfortunately incorrect; the type legalizer is not always
able to promote the return type of a setcc to the same type as the first
operand of a setcc.
In the case of a vsetcc node, the legalizer firstly checks if the first input
operand has a legal type. If so, then it promotes the return type of the vsetcc
to that same type. Otherwise, the return type is promoted to the 'next legal
type', which, for vectors of MVT::i1 is always a 128-bit integer vector type.
Example (-mattr=+avx):
%0 = trunc <8 x i32> %a to <8 x i23>
%1 = icmp eq <8 x i23> %0, zeroinitializer
The initial selection dag for the code above is:
v8i1 = setcc t5, t7, seteq:ch
t5: v8i23 = truncate t2
t2: v8i32,ch = CopyFromReg t0, Register:v8i32 %vreg1
t7: v8i32 = build_vector of all zeroes.
The type legalizer would firstly check if 't5' has a legal type. If so, then it
would reuse that same type to promote the return type of the setcc node.
Unfortunately 't5' is of illegal type v8i23, and therefore it cannot be used to
promote the return type of the setcc node. Consequently, the setcc return type
is promoted to v8i16. Later on, 't5' is promoted to v8i32 thus leading to the
following dag node:
v8i16 = setcc t32, t25, seteq:ch
where t32 and t25 are now values of type v8i32.
Before this patch, function LowerVSETCC would have wrongly expanded the setcc
to a single X86ISD::PCMPEQ. Surprisingly, ISel was still able to match an
instruction. In our case, ISel would have matched a VPCMPEQWrr:
t37: v8i16 = X86ISD::VPCMPEQWrr t36, t25
However, t36 and t25 are both VR256, while the result type is instead of class
VR128. This inconsistency ended up causing the insertion of COPY instructions
like this:
%vreg7<def> = COPY %vreg3; VR128:%vreg7 VR256:%vreg3
Which is an invalid full copy (not a sub register copy).
Eventually, the backend would have hit an UNREACHABLE "Cannot emit physreg copy
instruction" in the attempt to expand the malformed pseudo COPY instructions.
This patch fixes the problem adding the missing logic in LowerVSETCC to handle
the corner case of a setcc with 128-bit return type and 256-bit operand type.
This problem was originally reported by Dimitry as PR25080. It has been latent
for a very long time. I have added the minimal reproducible from that bugzilla
as test setcc-lowering.ll.
Differential Revision: http://reviews.llvm.org/D13660
This should fix the "Cannot emit physreg copy instruction" errors when
compiling contrib/wpa/src/common/ieee802_11_common.c, and CPUTYPE is set
to a CPU supporting AVX (e.g. sandybridge, ivybridge).
[SLP] Vectorize for all-constant entries.
This should fix libc++'s iostream initialization SIGBUSing on amd64,
whenever the global cout symbol is not aligned to 16 bytes.
Some further explanation: libc++'s iostream.cpp contains the definitions
of std::cout, std::cerr and so on. These global objects are effectively
declared with an alignment of 8 bytes. When an executable is linked
against libc++.so, it can sometimes get a copy of the global object,
which is then at the same alignment.
However, with clang 3.7.0, the initialization of these global objects
will incorrectly use SSE instructions (e.g. movdqa), whenever the
optimization level is high enough, and SSE is enabled, such as on amd64.
When any of these objects is not aligned to 16 bytes, this will result
in a SIGBUS during iostream initialization. In contrast, clang 3.6.x
and earlier took the 8 byte alignment into consideration, and avoided
SSE for those particular operations.
After bisecting of upstream changes, I found that the above revision
caused the change of this behavior, so I am reverting it now as a
workaround, while a discussion and test case is being prepared for
upstream.
set div/rem default values to 'expensive' in TargetTransformInfo's
cost model
...because that's what the cost model was intended to do.
As discussed in D12882, this fix has a temporary unintended
consequence for SimplifyCFG: it causes us to not speculate an fdiv.
However, two wrongs make PR24818 right, and two wrongs make PR24343
act right even though it's really still wrong.
I intend to correct SimplifyCFG and add to CodeGenPrepare to account
for this cost model change and preserve the righteousness for the bug
report cases.
https://llvm.org/bugs/show_bug.cgi?id=24818https://llvm.org/bugs/show_bug.cgi?id=24343
Differential Revision: http://reviews.llvm.org/D12882
This fixes the too-eager fdiv hoisting in pow(), which could lead to
unexpected floating point exceptions.
Add missing atomic libcall support.
Support for emitting libcalls for __atomic_fetch_nand and
__atomic_{add,sub,and,or,xor,nand}_fetch was missing; add it, and some
test cases.
Differential Revision: http://reviews.llvm.org/D10847
This fixes "cannot compile this atomic library call yet" errors when
compiling code which calls the above builtins, on arm < v6.
Reverting r239883 and r240720:
------------------------------------------------------------------------
r239883 | echristo | 2015-06-17 00:09:32 -0700 (Wed, 17 Jun 2015) | 16 lines
Update the intel intrinsic headers to use the target attribute support.
This involved removing the conditional inclusion and replacing them
with target attributes matching the original conditional inclusion
and checks. The testcase update removes the macro checks for each
file and replaces them with usage of the __target__ attribute, e.g.:
int __attribute__((__target__(("sse3")))) foo(int a) {
_mm_mwait(0, 0);
return 4;
}
This usage does require the enclosing function have the requisite
__target__ attribute for inlining and code generation - also for
any macro intrinsic uses in the enclosing function. There's no change
for existing uses of the intrinsic headers.
------------------------------------------------------------------------
------------------------------------------------------------------------
r240720 | silvas | 2015-06-25 16:22:11 -0700 (Thu, 25 Jun 2015) | 6 lines
Remove `requires` for x86 CPU features.
Ever since the target attributes change, we don't need to guard these
headers with `requires`. Actually it's a bit worse, because if we do
then they are included textually under the covers, causing declarations
to appear in submodules they aren't supposed to be in.
------------------------------------------------------------------------
This reverts the changes to the intrinsics headers in trunk, which could
result in some ports' configure scripts misdetecting SSE (and higher)
support.
[SCCP] Turn loads of null into undef instead of zero initialized values
Surprisingly, this is a correctness issue: the mmx type exists for
calling convention purposes, LLVM doesn't have a zero representation for
them.
This partially fixes PR23999.
Pull in r241143 from upstream llvm trunk (by David Majnemer):
[LoopUnroll] Use undef for phis with no value live
We would create a phi node with a zero initialized operand instead of
undef in the case where no value was originally available. This was
problematic for x86_mmx which has no null value.
These fix a "Cannot create a null constant of that type!" error when
compiling the graphics/sdl2_gfx port with MMX enabled.
Reported by: amdmi3
Notable upstream commits (upstream revision in parens):
- Add a JSON producer to LLDB (228636)
- Don't crash on bad DWARF expression (228729)
- Add support of DWARFv3 DW_OP_form_tls_address (231342)
- Assembly profiler for MIPS64 (232619)
- Handle FreeBSD/arm64 core files (233273)
- Read/Write register for MIPS64 (233685)
- Rework LLDB system initialization (233758)
- SysV ABI for aarch64 (236098)
- MIPS software single stepping (236696)
- FreeBSD/arm live debugging support (237303)
- Assembly profiler for mips32 (237420)
- Parse function name from DWARF DW_AT_abstract_origin (238307)
- Improve LLDB prompt handling (238313)
- Add real time signals support to FreeBSDSignals (238316)
- Fix race in IOHandlerProcessSTDIO (238423)
- MIPS64 Branch instruction emulation for SW single stepping (238820)
- Improve OSType initialization in elf object file's arch_spec (239148)
- Emulation of MIPS64 floating-point branch instructions (239996)
- ABI Plugin for MIPS32 (239997)
- ABI Plugin for MIPS64 (240123)
- MIPS32 branch emulation and single stepping (240373)
- Improve instruction emulation based stack unwinding on ARM (240533)
- Add branch emulation to aarch64 instruction emulator (240769)
MC: Allow multiple comma-separated expressions on the .uleb128 directive.
For compatiblity with GNU as. Binutils documents this as
'.uleb128 expressions'. Subtle, isn't it?
Reported by: sbruno
PR: 199554
MFC after: 3 days
Fix assert instantiating string init of static variable
... when the variable's type is a typedef of a ConstantArrayType. Just
look through the typedef (and any other sugar). We only use the
constant array type here to get the element count.
This fixes an assertion failure when building the games/redeclipse port.
Reported by: amdmi3
As is described at http://llvm.org/bugs/show_bug.cgi?id=22408, the GNU
linkers ld.bfd and ld.gold currently only support a subset of the
whole range of AArch64 ELF TLS relocations. Furthermore, they assume
that some of the code sequences to access thread-local variables are
produced in a very specific sequence. When the sequence is not as the
linker expects, it can silently mis-relaxe/mis-optimize the
instructions.
Even if that wouldn't be the case, it's good to produce the exact
sequence, as that ensures that linkers can perform optimizing
relaxations.
This patch:
* implements support for 16MiB TLS area size instead of 4GiB TLS area
size. Ideally clang would grow an -mtls-size option to allow support
for both, but that's not part of this patch.
* by default doesn't produce local dynamic access patterns, as even
modern ld.bfd and ld.gold linkers do not support the associated
relocations. An option (-aarch64-elf-ldtls-generation) is added to
enable generation of local dynamic code sequence, but is off by
default.
* makes sure that the exact expected code sequence for local dynamic
and general dynamic accesses is produced, by making use of a new
pseudo instruction. The patch also removes two
(AArch64ISD::TLSDESC_BLR, AArch64ISD::TLSDESC_CALL) pre-existing
AArch64-specific pseudo SDNode instructions that are superseded by
the new one (TLSDESC_CALLSEQ).
Submitted by: Kristof Beyls
Differential Revision: https://reviews.freebsd.org/D2175
ARM: treat [N x i32] and [N x i64] as AAPCS composite types
The logic is almost there already, with our special homogeneous
aggregate handling. Tweaking it like this allows front-ends to emit
AAPCS compliant code without ever having to count registers or add
discarded padding arguments.
Only arrays of i32 and i64 are needed to model AAPCS rules, but I
decided to apply the logic to all integer arrays for more consistency.
This fixes a possible "Unexpected member type for HA" error when
compiling lib/msun/bsdsrc/b_tgamma.c for armv6.
Reported by: Jakub Palider <jpa@semihalf.com>
These are generated, and not "optimized" in any way, since I am not
entirely sure of the syntax or format of this type of file. Feel free
to suggest ways of shortening these lists.
The general idea is the same for all three files, though:
* Get rid of upstream build infrastructure (CMakeLists, Makefiles, etc)
* Delete tests, tools and utilities we don't want or use (including
samples)
* Remove various bits of upstream metadata files that we don't want or
use (.arcconfig, .gitignore, etc)