lli: LLVM ERROR: Cannot select: X86ISD::WrapperRIP TargetGlobalTLSAddress:i64

Running the following code with clang++ -S -emit-llvm main.cpp && lli main.ll on Linux(Debian)

#include <future>

int main () {
  return std::async([]{return 1;}).get();
}

fails to run on lli due to the following error:

LLVM ERROR: Cannot select: 0xd012e0: 
     i64 = X86ISD::WrapperRIP TargetGlobalTLSAddress:i64<i8** @_ZSt15__once_callable> 0 [TF=10]

 0xd020c0: i64 = TargetGlobalTLSAddress<i8** @_ZSt15__once_callable> 0 [TF=10]
In function: _ZSt9call_onceIMNSt13__future_base13_State_baseV2EFvPSt8functionIFSt10unique_ptrINS0_12_Result_baseENS4_8_DeleterEEvEEPbEJPS1_S9_SA_EEvRSt9once_flagOT_DpOT0_

Questions:

What does it mean?

Are there any compiler-flags that fix this problem?

what specific features is libstdc++ using that cause this issue ?

How does my problem relate to Bug 21431 ?

The motivation behind this questions is to understand the differences between libc++ and libstdc++ that leads to this specific error message (on Linux) in llvm’s orcjit.

ps.: i’ve also asked this question in stackoverflow

Sent with Mailtrack

  • LLVM-dev (clang is mostly about the frontend and this is a backend failure), you may have more change to get an answer.

I’ve seen the same problem, but didn’t find solution back then.
I can give a hint that it is related to a thread local storage (notice TLS in the name).

The same result can be reproduced by this simple program:

    thread_local int x = 0;
    int main() {
      return 0;
    }

When compiled into IR it produces similar error:

LLVM ERROR: Cannot select: t19: i64 = X86ISD::WrapperRIP TargetGlobalTLSAddress:i64<i32* @x> 0 [TF=19]
  t18: i64 = TargetGlobalTLSAddress<i32* @x> 0 [TF=19]
In function: _ZTW1x

LLVM ERROR: Cannot select: 0xd012e0:
     i64 = X86ISD::WrapperRIP TargetGlobalTLSAddress:i64<i8** @_ZSt15__once_callable> 0 [TF=10]

0xd020c0: i64 = TargetGlobalTLSAddress<i8** @_ZSt15__once_callable> 0 [TF=10]
In function: _ZSt9call_onceIMNSt13__future_base13_State_baseV2EFvPSt8functionIFSt10unique_ptrINS0_12_Result_baseENS4_8_DeleterEEvEEPbEJPS1_S9_SA_EEvRSt9once_flagOT_DpOT0_

Questions:

What does it mean?

It's a missing feature/bug in the x86 backend. The specific problem is
that it seems we don't support thread-local variables with what Clang
& GCC would call "-mcmodel=large".

That's a compilation mode where the assembly is emitted that can run
from anywhere in memory. During normal compilation "small" is used,
the compiler assumes code will end up in the low 2GB of memory, and
the linker makes sure this is true.

Unfortunately when JITing, you can't usually guarantee that the OS
will give you memory in the low 2GB so the default is "large", which
is obviously less robust since it's less commonly used.

Are there any compiler-flags that fix this problem?

Unfortunately it doesn't seem so. Using "lli -relocation-model=pic"
gets around the immediate problem but then the JIT dynamic loader
can't cope with the relocations that get generated.

what specific features is libstdc++ using that cause this issue ?

Any thread-local storage would do it.

How does my problem relate to Bug 21431?

Looks like exactly the same issue.

Cheers.

Tim.

Thanks for your quick answer.

It’s a missing feature/bug in the x86 backend
did anyone try to implement an IR-transformer to emulate thread_local ?

JIT dynamic loader an’t cope with the relocations that get generated.

how difficult would it be to fix this ?

I’ve seen the same problem, but didn’t find solution back then.
I can give a hint that it is related to a thread local storage (notice TLS in the name).

The same result can be reproduced by this simple program:

thread_local int x = 0;
int main() {
return 0;
}

When compiled into IR it produces similar error:

LLVM ERROR: Cannot select: t19: i64 = X86ISD::WrapperRIP TargetGlobalTLSAddress:i64<i32* @x> 0 [TF=19]
t18: i64 = TargetGlobalTLSAddress<i32* @x> 0 [TF=19]
In function: _ZTW1x

interestingly this works on my machine.

llvm-ir attached

main.ll (814 Bytes)

got a minimal example now:

extern thread_local int tls;
int main() {
tls = 42;
return 0;
}

llvm-ir:

; ModuleID = ‘main.cpp’
target datalayout = “e-m:e-i64:64-f80:128-n8:16:32:64-S128”
target triple = “x86_64-pc-linux-gnu”

@tls = thread_local global i32 0, align 4

; Function Attrs: norecurse uwtable
define i32 @main() #0 {
%1 = alloca i32, align 4
store i32 0, i32* %1, align 4
%2 = call i32* @_ZTW3tls()
store i32 37, i32* %2, align 4
ret i32 0
}

define weak_odr hidden i32* @_ZTW3tls() {
ret i32* @tls
}

attributes #0 = { norecurse uwtable “disable-tail-calls”=“false” “less-precise-fpmad”=“false” “no-frame-pointer-elim”=“true” “no-frame-pointer-elim-non-leaf” “no-infs-fp-math”=“false” “no-nans-fp-math”=“false” “stack-protector-buffer-size”=“8” “target-cpu”=“x86-64” “target-features”="+fxsr,+mmx,+sse,+sse2" “unsafe-fp-math”=“false” “use-soft-float”=“false” }

!llvm.ident = !{!0}

!0 = !{!“clang version 3.8.1-12 (tags/RELEASE_381/final)”}

I should have mentioned that I run it on OS X and i doesn’t work =\
IR also attached.

main.ll (1.35 KB)

What exactly do the compiler flags-femulated-tls and tls-model do ?
Why does tls-emulation not solve the problem ?

Looking at the generated IR, it seems not to remove thread_local variable declarations.
What is the reasoning behind that ?

What exactly do the compiler flags`-femulated-tls` and `tls-model` do ?
Why does tls-emulation not solve the problem ?

It requires runtime support, specifically the __emultls_get_address
function by the looks of it. That's not available on all platforms
(it's not supplied by macOS for example) and is likely to be slower
than native TLS if it is.

Looking at the generated IR, it seems not to remove thread_local variable declarations.
What is the reasoning behind that ?

Using "-emit-llvm" doesn't actually print out the very final form of
the LLVM IR. There are a few passes that run on the IR but are
considered part of the final "CodeGen" phase. As a rule of thumb those
are passes that are required for correctness reasons, in this case
because without the LowerEmuTLS.cpp pass affected backends couldn't
handle TLS constructs.

Unfortunately it doesn't look like lli has support for emulated TLS
either, though that would be pretty simple to add.

Cheers.

Tim.

Unfortunately it doesn’t look like lli has support for emulated TLS either, though that would be pretty simple to add.
As an experiment I’ve llvm::createLowerEmuTLSPass into lli which added @__emutls_v.x and @__emutls_v.t.
However i didn’t have any __emultls_get_address calls in my IR.
Is there a llvm pass or compiler-flag that replaces thread_locals with appropriate __emultls_get_address calls ?

Oh, interesting. Looks like the pass only does a pretty small part of
the work. The rest happens to the DAG in
TargetLowering::LowerToTLSEmulatedModel, based on
TargetOptions::EmulatedTLS (there's a reasonable chance that would
also automatically add the pass).

Tim.

[Adding llvm-dev back to list]

Thanks for sharing your insights,
so in theory i could build an llvm pass that calls TargetLowering::LowerToTLSEmulatedModel for each llvm::Function and it should work if i link a runtime that provides __emultls_get_address.

I'm afraid not, that function is called as part of converting from a
Function to a MachineFunction. It operates on a completely different
representation to normal LLVM IR. The only way to trigger it is to set
the right field in the TargetOptions.

>It's a missing feature/bug in the x86 backend. The specific problem is
>that it seems we don't support thread-local variables with what Clang
>& GCC would call "-mcmodel=large".

Would it be a better approach to fix this issue ?

Yes, I think that'd be the proper fix for the issue. It's possible the
JIT parts don't support TLS at all yet either, which would mean we
have to implement it there.

How difficult would it be ?

I don't think there are any fundamental difficulties; the ABI already
exists because GCC can cope. On the JIT side, it'll be a case of
supporting some relocations (pretty simple) and getting the output
object's thread-local sections registered with the system somehow
(more difficult, I'm not sure what each platform's callbacks are).

It's probably measured in days even for someone who knows many of the
details already though.

Cheers.

Tim.

I’m looking currently into a patch which has been written for JuliaLang and is supposed to fix TLS for linux.
Unfortunately it is based on llvm3.6 and uses RuntimeDyLdELF::findGOTEntry.
Is there any equivalent method in llvm4.0 ?

Over one year passed now, did anybody made any progress regarding this issue?