Thread migration during function execution, semantics of thread local variables

Hi LLVM-dev,

I am working on a runtime system that has task migration, e.g. a task can be migrated between different threads. So a function can start executing on one thread, call a function (that might call into the function),
and then execute onto a different thread. This poses a problem with thread local variables. As an example, take the program below. After the call to callee we might have switched threads and thus we need to
recalculate the location of the thread local variable.

@var = available_externally thread_local global i32 0, align 4

declare void @callee()

define signext i32 @main() nounwind {
%0 = load i32, i32* @var, align 4
call void @callee()
%1 = load i32, i32* @var, align 4
%2 = icmp eq i32 %0, %1
%3 = zext i1 %2 to i32
ret i32 %3

As far as I can tell there is no current mechanism to inform LLVM that thread migration might occur, and it depends on the backend what behaviour you might get.

As an example compiling with x86-64-unknown-linux-gnu, we get:

movq var@GOTTPOFF(%rip), %rbx
movl %fs:(%rbx), %ebp
callq callee@PLT
xorl %eax, %eax
cmpl %fs:(%rbx), %ebp

Which happens to be correct. On Darwin on the other hand:

movq _var@TLVP(%rip), %rdi
callq *(%rdi)
movq %rax, %rbx
movl (%rax), %ebp
callq _callee
xorl %eax, %eax
cmpl (%rbx), %ebp

the address for the TLS get’s CSE’d, and thus the load could be incorrect.

Has there been any prior work on supporting thread migration + thread local storage?

Kind regards,

There was a discussion on a very similar topic with regards to C++20 coroutines back in November/December 2020 entitled “[RFC] Coroutine and pthread_self”. It discusses exactly the same issues you will run into – although for coroutines, the issue only occurs in early optimization passes, because eventually the coroutine with yield-points gets transformed into a “normal” function.

Note that TLS access is not the only problem you have – the removal of redundant function-calls across a thread-switch will also be a problem, e.g. as enabled by LLVM IR’s “readnone” attribute (which is generated from C attribute((const)) which is present e.g. on pthread_self).

See the thread starting here:

and then into the next month here:

The work in this area has not yet been completed.