[X86] Add 'sahf' CPU feature to frontend
Summary:
Make clang accept `-msahf` (and `-mno-sahf`) flags to activate the
`+sahf` feature for the backend, for bug 36028 (Incorrect use of
pushf/popf enables/disables interrupts on amd64 kernels). This was
originally submitted in bug 36037 by Jonathan Looney
<jonlooney@gmail.com>.
As described there, GCC also uses `-msahf` for this feature, and the
backend already recognizes the `+sahf` feature. All that is needed is
to teach clang to pass this on to the backend.
The mapping of feature support onto CPUs may not be complete; rather,
it was chosen to match LLVM's idea of which CPUs support this feature
(see lib/Target/X86/X86.td).
I also updated the affected test case (CodeGen/attr-target-x86.c) to
match the emitted output.
Reviewers: craig.topper, coby, efriedma, rsmith
Reviewed By: craig.topper
Subscribers: emaste, cfe-commits
Differential Revision: https://reviews.llvm.org/D43394
Pull in r328944 from upstream llvm trunk (by Chandler Carruth):
[x86] Expose more of the condition conversion routines in the public
API for X86's instruction information. I've now got a second patch
under review that needs these same APIs. This bit is nicely
orthogonal and obvious, so landing it. NFC.
Pull in r329414 from upstream llvm trunk (by Craig Topper):
[X86] Merge itineraries for CLC, CMC, and STC.
These are very simple flag setting instructions that appear to only
be a single uop. They're unlikely to need this separation.
Pull in r329657 from upstream llvm trunk (by Chandler Carruth):
[x86] Introduce a pass to begin more systematically fixing PR36028
and similar issues.
The key idea is to lower COPY nodes populating EFLAGS by scanning the
uses of EFLAGS and introducing dedicated code to preserve the
necessary state in a GPR. In the vast majority of cases, these uses
are cmovCC and jCC instructions. For such cases, we can very easily
save and restore the necessary information by simply inserting a
setCC into a GPR where the original flags are live, and then testing
that GPR directly to feed the cmov or conditional branch.
However, things are a bit more tricky if arithmetic is using the
flags. This patch handles the vast majority of cases that seem to
come up in practice: adc, adcx, adox, rcl, and rcr; all without
taking advantage of partially preserved EFLAGS as LLVM doesn't
currently model that at all.
There are a large number of operations that techinaclly observe
EFLAGS currently but shouldn't in this case -- they typically are
using DF. Currently, they will not be handled by this approach.
However, I have never seen this issue come up in practice. It is
already pretty rare to have these patterns come up in practical code
with LLVM. I had to resort to writing MIR tests to cover most of the
logic in this pass already. I suspect even with its current amount
of coverage of arithmetic users of EFLAGS it will be a significant
improvement over the current use of pushf/popf. It will also produce
substantially faster code in most of the common patterns.
This patch also removes all of the old lowering for EFLAGS copies,
and the hack that forced us to use a frame pointer when EFLAGS copies
were found anywhere in a function so that the dynamic stack
adjustment wasn't a problem. None of this is needed as we now lower
all of these copies directly in MI and without require stack
adjustments.
Lots of thanks to Reid who came up with several aspects of this
approach, and Craig who helped me work out a couple of things
tripping me up while working on this.
Differential Revision: https://reviews.llvm.org/D45146
Pull in r329673 from upstream llvm trunk (by Chandler Carruth):
[x86] Model the direction flag (DF) separately from the rest of
EFLAGS.
This cleans up a number of operations that only claimed te use EFLAGS
due to using DF. But no instructions which we think of us setting
EFLAGS actually modify DF (other than things like popf) and so this
needlessly creates uses of EFLAGS that aren't really there.
In fact, DF is so restrictive it is pretty easy to model. Only STD,
CLD, and the whole-flags writes (WRFLAGS and POPF) need to model
this.
I've also somewhat cleaned up some of the flag management instruction
definitions to be in the correct .td file.
Adding this extra register also uncovered a failure to use the
correct datatype to hold X86 registers, and I've corrected that as
necessary here.
Differential Revision: https://reviews.llvm.org/D45154
Together, these should ensure clang does not use pushf/popf sequences to
save and restore flags, avoiding problems with unrelated flags (such as
the interrupt flag) being restored unexpectedly.
Requested by: jtl
PR: 225330
MFC after: 1 week
Strip @VER suffices from the LTO output.
This fixes pr36623.
The problem is that we have to parse versions out of names before LTO
so that LTO can use that information.
When we get the LTO produced .o files, we replace the previous symbols
with the LTO produced ones, but they still have @ in their names.
We could just trim the name directly, but calling parseSymbolVersion
to do it is simpler.
This is a follow-up to r331366, since we discovered that lld could
append version strings to symbols twice, when using Link Time
Optimization.
MFC after: 3 months
X-MFC-With: r327952
Don't treat .symver as a regular alias definition.
This patch starts simplifying the handling of .symver.
For now it just moves the responsibility for creating an alias down to
the streamer. With that the asm streamer can pass a .symver unchanged,
which is nice since gas cannot parse "foo@bar = zed".
In a followup I hope to move the handling down to the writer so that
we don't need special hacks for avoiding breaking names with @@@ on
windows.
Pull in r327160 from upstream llvm trunk (by Rafael Espindola):
Delay creating an alias for @@@.
With this we only create an alias for @@@ once we know if it should
use @ or @@. This avoids last minutes renames and hacks to handle MS
names.
This only handles the ELF writer. LTO still has issues with @@@
aliases.
Pull in r327928 from upstream llvm trunk (by Vitaly Buka):
Object: Move attribute calculation into RecordStreamer. NFC
Summary: Preparation for D44274
Reviewers: pcc, espindola
Subscribers: hiraditya
Differential Revision: https://reviews.llvm.org/D44276
Pull in r327930 from upstream llvm trunk (by Vitaly Buka):
Object: Fix handling of @@@ in .symver directive
Summary:
name@@@nodename is going to be replaced with name@@nodename if symbols is
defined in the assembled file, or name@nodename if undefined.
https://sourceware.org/binutils/docs/as/Symver.html
Fixes PR36623
Reviewers: pcc, espindola
Subscribers: mehdi_amini, hiraditya
Differential Revision: https://reviews.llvm.org/D44274
Together, these changes fix handling of @@@ in .symver directives when
doing Link Time Optimization.
Reported by: Shawn Webb <shawn.webb@hardenedbsd.org>
MFC after: 3 months
X-MFC-With: r327952
This is originally based on a patch from David Chisnall for soft-float
N64 but has since been updated to support O32, N32, and hard-float ABIs.
The soft-float O32, N32, and N64 support has been committed upstream.
The hard-float changes are still in review upstream.
Enable LLVM_LIBUNWIND on mips when building with a suitable (C+11-capable)
toolchain. This has been tested with external GCC for all ABIs and
O32 and N64 with clang.
Reviewed by: emaste
Obtained from: CheriBSD (original N64 patch)
Sponsored by: DARPA / AFRL
Differential Revision: https://reviews.freebsd.org/D14701
[CodeGen] Fix TBAA info for accesses to members of base classes
Resolves:
Bug 35724 - regression (r315984): fatal error: error in backend:
Broken function found (Did not see access type in access path!)
https://bugs.llvm.org/show_bug.cgi?id=35724
Differential Revision: https://reviews.llvm.org/D41547
This fixes "Did not see access type in access path" fatal errors when
building the devel/gdb port (version 8.1).
Reported by: jbeich
PR: 226658
MFC after: 3 months
X-MFC-With: r327952
[ConstantFolding, InstSimplify] Handle more vector GEPs
This patch addresses some additional cases where the compiler crashes
upon encountering vector GEPs. This should fix PR36116.
Differential Revision: https://reviews.llvm.org/D44219
Reference: https://bugs.llvm.org/show_bug.cgi?id=36116
This fixes an assertion when building the emulators/snes9x port.
Reported by: jbeich
PR: 225471
MFC after: 3 months
X-MFC-With: r327952
[ARM] Fix for PR36577
Don't PerformSHLSimplify if the given node is used by a node that
also uses a constant because we may get stuck in an infinite combine
loop.
bugzilla: https://bugs.llvm.org/show_bug.cgi?id=36577
Patch by Sam Parker.
Differential Revision: https://reviews.llvm.org/D44097
This fixes a hang when compiling one particular file in java/openjdk8
for armv6 and armv7.
Reported by: swills
PR: 226388
PR36157: When injecting an implicit function declaration in C89, find
the right DeclContext rather than injecting it wherever we happen to
be.
This avoids creating functions whose DeclContext is a struct or
similar.
This fixes assertion failures when parsing certain not-completely-valid
struct declarations.
Reported by: ae
PR: 225862
MFC after: 3 months
X-MFC-With: r327952
Fix for #31362 - ms_abi is implemented incorrectly for values >=16
bytes.
Summary:
This patch is a fix for following issue:
https://bugs.llvm.org/show_bug.cgi?id=31362 The problem was caused by
front end lowering C calling conventions without taking into account
calling conventions enforced by attribute. In this case win64cc was
no correctly lowered on targets other than Windows.
Reviewed By: rnk (Reid Kleckner)
Differential Revision: https://reviews.llvm.org/D43016
Author: belickim <mateusz.belicki@intel.com>
This fixes clang 6.0.0 assertions when building the emulators/wine and
emulators/wine-devel ports, and should also make it use the correct
Windows calling conventions. Bump __FreeBSD_version to make the fix
easy to detect.
PR: 224863
MFC after: 3 months
X-MFC-With: r327952
6.0.0 (branches/release_60 r324090).
This introduces retpoline support, with the -mretpoline flag. The
upstream initial commit message (r323155 by Chandler Carruth) contains
quite a bit of explanation. Quoting:
Introduce the "retpoline" x86 mitigation technique for variant #2 of
the speculative execution vulnerabilities disclosed today,
specifically identified by CVE-2017-5715, "Branch Target Injection",
and is one of the two halves to Spectre.
Summary:
First, we need to explain the core of the vulnerability. Note that
this is a very incomplete description, please see the Project Zero
blog post for details:
https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html
The basis for branch target injection is to direct speculative
execution of the processor to some "gadget" of executable code by
poisoning the prediction of indirect branches with the address of
that gadget. The gadget in turn contains an operation that provides a
side channel for reading data. Most commonly, this will look like a
load of secret data followed by a branch on the loaded value and then
a load of some predictable cache line. The attacker then uses timing
of the processors cache to determine which direction the branch took
*in the speculative execution*, and in turn what one bit of the
loaded value was. Due to the nature of these timing side channels and
the branch predictor on Intel processors, this allows an attacker to
leak data only accessible to a privileged domain (like the kernel)
back into an unprivileged domain.
The goal is simple: avoid generating code which contains an indirect
branch that could have its prediction poisoned by an attacker. In
many cases, the compiler can simply use directed conditional branches
and a small search tree. LLVM already has support for lowering
switches in this way and the first step of this patch is to disable
jump-table lowering of switches and introduce a pass to rewrite
explicit indirectbr sequences into a switch over integers.
However, there is no fully general alternative to indirect calls. We
introduce a new construct we call a "retpoline" to implement indirect
calls in a non-speculatable way. It can be thought of loosely as a
trampoline for indirect calls which uses the RET instruction on x86.
Further, we arrange for a specific call->ret sequence which ensures
the processor predicts the return to go to a controlled, known
location. The retpoline then "smashes" the return address pushed onto
the stack by the call with the desired target of the original
indirect call. The result is a predicted return to the next
instruction after a call (which can be used to trap speculative
execution within an infinite loop) and an actual indirect branch to
an arbitrary address.
On 64-bit x86 ABIs, this is especially easily done in the compiler by
using a guaranteed scratch register to pass the target into this
device. For 32-bit ABIs there isn't a guaranteed scratch register
and so several different retpoline variants are introduced to use a
scratch register if one is available in the calling convention and to
otherwise use direct stack push/pop sequences to pass the target
address.
This "retpoline" mitigation is fully described in the following blog
post: https://support.google.com/faqs/answer/7625886
We also support a target feature that disables emission of the
retpoline thunk by the compiler to allow for custom thunks if users
want them. These are particularly useful in environments like
kernels that routinely do hot-patching on boot and want to hot-patch
their thunk to different code sequences. They can write this custom
thunk and use `-mretpoline-external-thunk` *in addition* to
`-mretpoline`. In this case, on x86-64 thu thunk names must be:
```
__llvm_external_retpoline_r11
```
or on 32-bit:
```
__llvm_external_retpoline_eax
__llvm_external_retpoline_ecx
__llvm_external_retpoline_edx
__llvm_external_retpoline_push
```
And the target of the retpoline is passed in the named register, or in
the case of the `push` suffix on the top of the stack via a `pushl`
instruction.
There is one other important source of indirect branches in x86 ELF
binaries: the PLT. These patches also include support for LLD to
generate PLT entries that perform a retpoline-style indirection.
The only other indirect branches remaining that we are aware of are
from precompiled runtimes (such as crt0.o and similar). The ones we
have found are not really attackable, and so we have not focused on
them here, but eventually these runtimes should also be replicated for
retpoline-ed configurations for completeness.
For kernels or other freestanding or fully static executables, the
compiler switch `-mretpoline` is sufficient to fully mitigate this
particular attack. For dynamic executables, you must compile *all*
libraries with `-mretpoline` and additionally link the dynamic
executable and all shared libraries with LLD and pass `-z
retpolineplt` (or use similar functionality from some other linker).
We strongly recommend also using `-z now` as non-lazy binding allows
the retpoline-mitigated PLT to be substantially smaller.
When manually apply similar transformations to `-mretpoline` to the
Linux kernel we observed very small performance hits to applications
running typic al workloads, and relatively minor hits (approximately
2%) even for extremely syscall-heavy applications. This is largely
due to the small number of indirect branches that occur in
performance sensitive paths of the kernel.
When using these patches on statically linked applications,
especially C++ applications, you should expect to see a much more
dramatic performance hit. For microbenchmarks that are switch,
indirect-, or virtual-call heavy we have seen overheads ranging from
10% to 50%.
However, real-world workloads exhibit substantially lower performance
impact. Notably, techniques such as PGO and ThinLTO dramatically
reduce the impact of hot indirect calls (by speculatively promoting
them to direct calls) and allow optimized search trees to be used to
lower switches. If you need to deploy these techniques in C++
applications, we *strongly* recommend that you ensure all hot call
targets are statically linked (avoiding PLT indirection) and use both
PGO and ThinLTO. Well tuned servers using all of these techniques saw
5% - 10% overhead from the use of retpoline.
We will add detailed documentation covering these components in
subsequent patches, but wanted to make the core functionality
available as soon as possible. Happy for more code review, but we'd
really like to get these patches landed and backported ASAP for
obvious reasons. We're planning to backport this to both 6.0 and 5.0
release streams and get a 5.0 release with just this cherry picked
ASAP for distros and vendors.
This patch is the work of a number of people over the past month:
Eric, Reid, Rui, and myself. I'm mailing it out as a single commit
due to the time sensitive nature of landing this and the need to
backport it. Huge thanks to everyone who helped out here, and
everyone at Intel who helped out in discussions about how to craft
this. Also, credit goes to Paul Turner (at Google, but not an LLVM
contributor) for much of the underlying retpoline design.
Reviewers: echristo, rnk, ruiu, craig.topper, DavidKreitzer
Subscribers: sanjoy, emaste, mcrosier, mgorny, mehdi_amini, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D41723
MFC after: 3 months
X-MFC-With: r327952
PR: 224669
The root problem is that we were creating a PT_LOAD just for the header.
That was technically valid, but inconvenient: we should not be making
the ELF discontinuous.
The solution is to allow a section with LMAExpr to be added to a PT_LOAD
if that PT_LOAD doesn't already have a LMAExpr.
LLVM PR: 36017
Obtained from: LLVM r323625 by Rafael Espindola
If two sections are in the same PT_LOAD, their relatives offsets,
virtual address and physical addresses are all the same.
[Rafael] initially wanted to have a single global LMAOffset, on the
assumption that every ELF file was in practiced loaded contiguously in
both physical and virtual memory.
Unfortunately that is not the case. The linux kernel has:
LOAD 0x200000 0xffffffff81000000 0x0000000001000000 0xced000 0xced000 R E 0x200000
LOAD 0x1000000 0xffffffff81e00000 0x0000000001e00000 0x15f000 0x15f000 RW 0x200000
LOAD 0x1200000 0x0000000000000000 0x0000000001f5f000 0x01b198 0x01b198 RW 0x200000
LOAD 0x137b000 0xffffffff81f7b000 0x0000000001f7b000 0x116000 0x1ec000 RWE 0x200000
The delta for all but the third PT_LOAD is the same:
0xffffffff80000000. [Rafael] thinks the 3rd one is a hack for implementing
per cpu data, but we can't break that.
Obtained from: LLVM r323456 by Rafael Espindola
This fixes the crash reported at [LLVM] PR36083.
The issue is that we were trying to put all the sections in the same
PT_LOAD and crashing trying to write past the end of the file.
This also adds accounting for used space in LMARegion, without it all
3 PT_LOADs would have the same physical address.
Obtained from: LLVM r323449 by Rafael Espindola
[X86] Make -mavx512f imply -mfma and -mf16c in the frontend like it
does in the backend.
Similarly, make -mno-fma and -mno-f16c imply -mno-avx512f.
Withou this "-mno-sse -mavx512f" ends up with avx512f being enabled
in the frontend but disabled in the backend.
Reported by: pawel
PR: 225488
[COST]Fix PR35865: Fix cost model evaluation for shuffle on X86.
Summary:
If the vector type is transformed to non-vector single type, the
compile may crash trying to get vector information about non-vector
type.
Reviewers: RKSimon, spatel, mkuper, hfinkel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41862
This should fix "Not a vector MVT!" errors when building the
games/dhewm3 port.
Reported by: jbeich
PR: 225271
[ValueTracking] remove overzealous assert
The test is derived from a failing fuzz test:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=5008
Credit to @rksimon for pointing out the problem.
This should fix "Bad flavor while matching min/max" errors when building
the graphics/libsixel and science/kst2 ports.
Reported by: jbeich
PR: 225268, 225269
When a section placement (AT) command references the section itself,
the physical address of the section in the ELF header was calculated
incorrectly due to alignment happening right after the location
pointer's value was captured.
The problem was diagnosed and the first version of the patch written
by Erick Reyes.
Obtained from: LLVM r322421 by Rafael Espindola
The problem we had with it is that anything inside an AT is an
expression, so we failed to parse the section name because of the - in
it.
Requested by: royger
Obtained from: LLVM r322801 by Rafael Espindola
AT> lma_region expression allows to specify the memory region
for section load address.
Should fix [upstream LLVM] PR35684.
LLVM review: https://reviews.llvm.org/D41397
Obtained from: LLVM r322359 by George Rimar
Allow usage of X86-prefixes as separate instrs.
Differential Revision: https://reviews.llvm.org/D42102
This should fix parse errors when x86 prefixes (such as 'lock' and
'rep') are followed by various non-mnemonic tokens, e.g. comments, .byte
directives and labels.
PR: 224669,225054
[LV] Don't call recordVectorLoopValueForInductionCast for
newly-created IV from a trunc.
Summary:
This method is supposed to be called for IVs that have casts in their
use-def chains that are completely ignored after vectorization under
PSE. However, for truncates of such IVs the same InductionDescriptor
is used during creation/widening of both original IV based on PHINode
and new IV based on TruncInst.
This leads to unintended second call to
recordVectorLoopValueForInductionCast with a VectorLoopVal set to the
newly created IV for a trunc and causes an assert due to attempt to
store new information for already existing entry in the map. This is
wrong and should not be done.
Fixes PR35773.
Reviewers: dorit, Ayal, mssimpso
Reviewed By: dorit
Subscribers: RKSimon, dim, dcaballe, hsaito, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D41913
This should fix "Vector value already set for part" assertions when
building the net/iodine and sysutils/daa2iso ports.
Reported by: jbeich
PR: 224867,224868
[SLP] Fix PR35777: Incorrect handling of aggregate values.
Summary:
Fixes the bug with incorrect handling of InsertValue|InsertElement
instrucions in SLP vectorizer. Currently, we may use incorrect
ExtractElement instructions as the operands of the original
InsertValue|InsertElement instructions.
Reviewers: mkuper, hfinkel, RKSimon, spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41767
This should fix "Invalid InsertValueInst operands!" errors when building
certain parts of editors/libreoffice.
Reported by: jbeich
PR: 225086
Fix thread race between SectionPiece's OutputOff and Live members
Summary:
As reported in bug 35788, rL316280 reintroduces a race between two
members of SectionPiece, which share the same 64 bit memory location.
To fix the race, check the hash before checking the Live member, as
suggested by Rafael.
Reviewers: ruiu, rafael
Reviewed By: ruiu
Subscribers: smeenai, emaste, llvm-commits
Differential Revision: https://reviews.llvm.org/D41884
[CGP] Fix Complex addressing mode for offset
If the offset is differ in two addressing mode we can continue only
if ScaleReg is not set due to we will use it as merge of different
offsets.
It should fix PR35799 and PR35805.
Reviewers: john.brawn, reames
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41227
This should fix "ScaledReg == nullptr" assertions when building the
graphics/xpx, mail/alpine and editors/pico-alpine ports.
Reported by: jbeich
PR: 224866, 224995
Do not use parallelForEach to call maybeCompress().
Currently LLVM's paralellForEach has a problem with reentracy.
That caused https://bugs.llvm.org/show_bug.cgi?id=35788 (lld somtimes
hangs while linking Ruby 2.4) because maybeCompress calls writeTo
which uses paralellForEach.
This patch is to avoid using paralellForEach to call maybeCompress to
workaround the issue.
This should fix potential hangs when linking parts of ruby24.
[ELF] Compress debug sections after assignAddresses and support
custom layout
Previously, in r320472, I moved the calculation of section offsets
and sizes for compressed debug sections into maybeCompress, which
happens before assignAddresses, so that the compression had the
required information. However, I failed to take account of
relocations that patch such sections. This had two effects:
1. A race condition existed when a debug section referred to a
different debug section (see PR35788).
2. References to symbols in non-debug sections would be patched
incorrectly. This is because the addresses of such symbols are not
calculated until after assignAddresses (this was a partial
regression caused by r320472, but they could still have been
broken before, in the event that a custom layout was used in a
linker script).
assignAddresses does not need to know about the output section size
of non-allocatable sections, because they do not affect the value of
Dot. This means that there is no longer a reason not to support
custom layout of compressed debug sections, as far as I'm aware.
These two points allow for delaying when maybeCompress can be called,
removing the need for the loop I previously added to calculate the
section size, and therefore the race condition. Furthermore, by
delaying, we fix the issues of relocations getting incorrect symbol
values, because they have now all been finalized.
This should fix thread race conditions when linking parts of ruby24.
We normally want to ignore SHT_NOBITS sections when computing
offsets. The sh_offset of section itself seems to be irrelevant and
* If the section is in the middle of a PT_LOAD, it will make no
difference on the computed offset of the followup section.
* If it is in the end of a PT_LOAD, we want to avoid its alignment
changing the offset of the followup sections.
The issue is if it is at the start of the PT_LOAD. In that case we do
have to align it so that the following sections have congruent
address and offset module the page size. We were not handling this
case.
This should fix freebsd kernel link.
In particular, this fixes ctfmerge and/or objcopy throwing "Layout
constraint violation" errors when processing an lld-linked kernel.