About thread_local in 3.0

Hi LLVM,

I am using 3.0, and I have a question about the __thread in c and
thread_local in LLVM IR: O1-O4 and the final linked code behave
differently.

////////////////////////////////////
The following C code is from the LLVM testcase
(SingleSource/UnitTests/Threads/2010-12-08-tls.c)

#include <stdio.h>

__thread int a = 4;

int foo (void)
{
  return a;
}

int main (void) {
  printf("a is %d\n", foo());
  return 0;
}

It contains a __thread attribute. Both GCC's output program and
clang's produce "a is 4", and then return. The command line options
are the default.
gcc 2010-12-08-tls.c
clang 2010-12-08-tls.c

////////////////////////////////////
The following code (2010-12-08-tls.O1.ll) is generated by
clang -c -O1 -emit-llvm 2010-12-08-tls.c -o 2010-12-08-tls.O1.bc
llvm-dis 2010-12-08-tls.O1.bc

@a = thread_local global i32 4, align 4
@.str = private unnamed_addr constant [9 x i8] c"a is %d\0A\00", align 1

define i32 @foo() nounwind readonly {
  %1 = load i32* @a, align 4, !tbaa !0
  ret i32 %1
}

define i32 @main() nounwind {
  %1 = tail call i32 @foo()
  %2 = tail call i32 (i8*, ...)* @printf(i8* getelementptr inbounds
([9 x i8]* @.str, i32 0, i32 0), i32 %1) nounwind
  ret i32 0
}

declare i32 @printf(i8* nocapture, ...) nounwind

!0 = metadata !{metadata !"int", metadata !1}
!1 = metadata !{metadata !"omnipotent char", metadata !2}
!2 = metadata !{metadata !"Simple C/C++ TBAA", null}

If I jit the code, say "lli 2010-12-08-tls.O1.bc", I got

0 lli 0x091ac1a8
1 lli 0x091ac7e7
2 0xffffe400 + 0
3 lli 0x08f7a103
llvm::ExecutionEngine::runFunctionAsMain(llvm::Function*,
std::vector<std::string, std::allocator<std::string> > const&, char
const* const*) + 1459
4 lli 0x0861df9e main + 3374
5 libc.so.6 0xb75b7ace __libc_start_main + 254
Stack dump:
0. Program arguments: lli 2010-12-08-tls.O4.bc
Segmentation fault

If I lli the code, say "lli --force-interpreter=true
2010-12-08-tls.O1.bc", I got
  "a is 24"

////////////////////////////////////
The following code (2010-12-08-tls.O4.ll) is generated by
clang -c -O4 -emit-llvm 2010-12-08-tls.c -o 2010-12-08-tls.O4.bc
llvm-dis 2010-12-08-tls.O4.bc

@a = thread_local global i32 4, align 4
@.str = private unnamed_addr constant [9 x i8] c"a is %d\0A\00", align 1

define i32 @foo() nounwind readonly {
  %1 = load i32* @a, align 4, !tbaa !0
  ret i32 %1
}

define i32 @main() nounwind {
  %1 = load i32* @a, align 4, !tbaa !0
  %2 = tail call i32 (i8*, ...)* @printf(i8* getelementptr inbounds
([9 x i8]* @.str, i32 0, i32 0), i32 %1) nounwind
  ret i32 0
}

declare i32 @printf(i8* nocapture, ...) nounwind

!0 = metadata !{metadata !"int", metadata !1}
!1 = metadata !{metadata !"omnipotent char", metadata !2}
!2 = metadata !{metadata !"Simple C/C++ TBAA", null}

We can see, O4 does inlining, but otherwise its output is the same to
O1's. And I got the same results from jit and lli.

////////////////////////////////////
The following code (2010-12-08-tls.ld.ll) is generated by
opt -std-link-opt 2010-12-08-tls.O1.bc -o 2010-12-08-tls.ld.bc
llvm-dis 2010-12-08-tls.ld.bc

@.str = private unnamed_addr constant [9 x i8] c"a is %d\0A\00", align 1

define i32 @main() nounwind {
  %1 = tail call i32 (i8*, ...)* @printf(i8* getelementptr inbounds
([9 x i8]* @.str, i32 0, i32 0), i32 4) nounwind
  ret i32 0
}

declare i32 @printf(i8* nocapture, ...) nounwind

We can see link-opt value-numbered %1 with 4. Now, jit and lli produce
the results "a is 24", which is same to clang and gcc's native
output's. I am not familiar with thread_local and the __thread in C,
was wondering if this is the expected behavior of thread_local.
Thanks.

Hi Jianzhou Zhao, I think it is simple: the JIT doesn't support thread local
storage, and thus aborts when it sees you trying to use a thread local variable.
It would be nicer if it printed a helpful error message though. On my machine
I get
   $ lli tl.ll
   Cannot allocate thread local storage on this arch!
   UNREACHABLE executed at llvm/lib/Target/X86/X86JITInfo.cpp:576!

The new MC-JIT (pass -use-mcjit to lli) should support this one day, but I'm
not sure it does yet. As for the interpreter, it should output an error
message saying it doesn't support thread local storage rather than producing
a random wrong result.

You could open a bug report about the poor error messages. As for thread local
support I think the MC-JIT developers have this on their list of things to do
already.

Ciao, Duncan.