OSX: Darwin Kernel Version 22.1.0 (Apple M1 chip)
Clang:Apple clang version 14.0.0 (clang-1400.0.29.202)
PoC C++ Code:
thread_local int* tls = nullptr;
// using libcontext to jump stack.
void jump_stack();
void* test() {
// before jump_stack, assume we are at thread 1.
int *cur_tls = tls;
jump_stack();
// after jump stack, we are at another thread 2.
// we need to reload tls again.
cur_tls = tls;
}
compile command :
clang++ -c test.cpp --std=c++11 -g -O0
generated code:
; void* test() {
0: ff c3 00 d1 sub sp, sp, #48
4: fd 7b 02 a9 stp x29, x30, [sp, #32]
8: fd 83 00 91 add x29, sp, #32
c: 00 00 00 90 adrp x0, 0x0 <ltmp0+0xc>
10: 00 00 40 f9 ldr x0, [x0]
14: 08 00 40 f9 ldr x8, [x0]
18: 00 01 3f d6 blr x8
1c: e0 07 00 f9 str x0, [sp, #8]
; int *cur_tls = tls;
20: 08 00 40 f9 ldr x8, [x0]
24: e8 0b 00 f9 str x8, [sp, #16]
; jump_stack();
28: 00 00 00 94 bl 0x28 <ltmp0+0x28>
2c: e0 07 40 f9 ldr x0, [sp, #8]
; cur_tls = tls;
30: 08 00 40 f9 ldr x8, [x0]
34: e8 0b 00 f9 str x8, [sp, #16]
; }
38: a0 83 5f f8 ldur x0, [x29, #-8]
3c: fd 7b 42 a9 ldp x29, x30, [sp, #32]
40: ff c3 00 91 add sp, sp, #48
44: c0 03 5f d6 ret
before jump_stack
, the tls
has cached into [sp, #16]
, after jump_stack
then reload [sp, #16]
into cur_tls
which the tls
belong to the thread 1
not the thread 2
.
Is there are any clang options to disable this optimization to reload thread_local variable always belong to current thread.
On Linux( 5.15.0-66-generic, x86_64) , using clang Ubuntu clang version 14.0.6
, generated code:
0000000000000000 <_Z4testv>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 48 83 ec 10 sub $0x10,%rsp
8: 64 48 8b 04 25 00 00 mov %fs:0x0,%rax
f: 00 00
11: 48 89 45 f8 mov %rax,-0x8(%rbp)
15: e8 00 00 00 00 callq 1a <_Z4testv+0x1a>
1a: 64 48 8b 04 25 00 00 mov %fs:0x0,%rax
21: 00 00
23: 48 89 45 f8 mov %rax,-0x8(%rbp)
27: 31 c0 xor %eax,%eax
29: 48 83 c4 10 add $0x10,%rsp
2d: 5d pop %rbp
2e: c3 retq
after jump_stack
, tls
reload from fs:0x0
, it works well.