Sorry, but I disagree with this conclusion.
I don’t think that posix defines any of the API/ABI involved in TLS here - that’s mostly the C++ standard, and the Itanium C++ ABI on top of it.
Anyway, to more concrete matters: It’s not a matter about the storage and how that’s allocated and accessed; both use the same. From the Compiler Explorer link above:
testY():
push rax
cmp qword ptr [rip + thread-local initialization routine for y@GOTPCREL], 0
je .LBB1_2
call thread-local initialization routine for y@PLT
.LBB1_2:
mov rax, qword ptr [rip + y@GOTTPOFF]
mov eax, dword ptr fs:[rax]
pop rcx
ret
testZ():
mov rax, qword ptr [rip + z@GOTTPOFF]
mov eax, dword ptr fs:[rax]
ret
Both these use mov rax, qword ptr [rip + y@GOTTPOFF] and mov eax, dword ptr fs:[rax] for accessing the TLS storage for this variable directly - the intitializer doesn’t return anything that we use for locating it.
This is all about invoking a potential routine to initialize the storage for this variable - if that requires runtime code.
The example above does show one class with a constexpr constructor, and one entirely without constructors - but even so, as long as we don’t see the definition of the TLS variables, we don’t know how they are meant to be initialized.
If we leave out the constexpr from the MyType constructor, and if we remove the extern from the variable definitions, we get one hint at this - Compiler Explorer. From Clang:
<source>:16:17: error: initializer for thread-local variable must be a constant expression
16 | __thread MyType z;
| ^
<source>:16:17: note: use 'thread_local' to allow this
1 error generated.
And from GCC:
<source>:16:17: error: non-local variable 'z' declared '__thread' needs dynamic initialization
16 | __thread MyType z;
| ^
<source>:16:17: note: C++11 'thread_local' allows dynamic initialization and destruction
Compiler returned: 1
So that explains one half of it; __thread doesn’t allow dynamic initialization of such variables, and therefore there’s no need to call any initializer for it.
Also, if we readd the constexpr, but keep the extern removed, so that we see the definition of the variables, then the testX and testY functions lose the calls to the init functions - as we know these types don’t require any initialization. And in that case, the file also contains a definition of the thread local variable initialization routine so that we can see for ourselves that it doesn’t do anything about allocating the storage.
But given the complete definitions of classes MyType and MyType2 here, one with only a constexpr constructor, and the other one lacking constructors, it would seem reasonable that they wouldn’t require any dynamic initialization at all?
No - the variables can be initialized with other things that require runtime calls. We could have this:
class MyType2 {
int x, y, z;
public:
void put(int v) { x=v; }
int get() { return x; }
};
MyType2 getInitializer();
thread_local MyType2 y = getInitializer();
Now in this case, Compiler Explorer, we do have a non-constexpr initialization of y, which is allowed by C++ for thread_local variables. And it requires any translation unit that doesn’t see the definition of the variable to try to call the initializer function, if it exists.
Given this overhead, it’s simplest if thread_local variables are kept static, so all accesses are within the same translation unit, which both avoids having to call the potential initializer, and also avoids having to generate the initializer.
(For more side notes; the call to the initializer function also first checks if the initializer function even exists; it’s referenced as a weak symbol, so that it can resolve as a null pointer if it doesn’t exist at all. In GCC targeting mingw, there are known issues with weak symbols, which makes that whole concept of cross-translation unit thread_local variables prone to not work there - see e.g. [lldb] Make the thread_local g_global_boundary accessed from a single… · llvm/llvm-project@7106f58 · GitHub for an example of a fix for such a case.)