Thread Local Storage doesn't optimize in c++ clang-cl

(I’m on windows, migrating to clang-cl from MSVC in this case, on x64 hardware)

if I compile the following code as c++, with clang-cl, with /Ox, the accesses to local storage are not in any way optimized, and appear the same as without the /Ox flag. It seems like thread local storage is maybe also treated like volatile, which it is sort of the opposite of, but any such thread local variable could also have volatile associated with it, I’m just saying it’s like every access to the structure in thread local storage does initialization of the thread local storage space. (compiling it as C, the local storage accesses are also somewhat optimized (though I would really expect the region to be initiliazed once and just referenced later overall, but that’s not what compilers do apparently)

MSVC and GCC both optimize better

#include <stdio.h>

#if !defined( __NO_THREAD_LOCAL__ ) && ( defined( _MSC_VER ) || defined( __WATCOMC__ ) )
#  define HAS_TLS 1
#  ifdef __cplusplus
#    define DeclareThreadLocal static thread_local
#    define DeclareThreadVar  thread_local
#  else
#    define DeclareThreadLocal static __declspec(thread)
#    define DeclareThreadVar __declspec(thread)
#  endif
#elif !defined( __NO_THREAD_LOCAL__ ) && ( defined( __GNUC__ ) || defined( __MAC__ ) )
#    define HAS_TLS 1
#    ifdef __cplusplus
#      define DeclareThreadLocal static thread_local
#      define DeclareThreadVar thread_local
#    else
#    define DeclareThreadLocal static __thread
#    define DeclareThreadVar __thread
#  endif
#else
// if no HAS_TLS
#  define DeclareThreadLocal static
#  define DeclareThreadVar
#endif


struct my_thread_info {
	int pThread;
	int nThread;
};
DeclareThreadLocal  struct my_thread_info _MyThreadInfo;

int f( void ) {
    if( !_MyThreadInfo.pThread )
        _MyThreadInfo.pThread = 1;
	int a = _MyThreadInfo.pThread;
	int b = _MyThreadInfo.pThread;
	int c = _MyThreadInfo.pThread;
	printf( "Use vars %d %d %d\n", a, b, c );
    return _MyThreadInfo.pThread;

}


int main( void ) {
    f();
}

According to Clang Compiler User’s Manual — Clang 19.0.0git documentation

/Ox Deprecated (same as /Og /Oi /Ot /Oy /Ob2); use /O2 instead

Can you try what this does with /O2 or /Ot (aka -O3 in clang++)?

I don’t think changing the /O flag will help. Here’s the example above on Godbolt: Compiler Explorer

The TLS initialization stuff was implemented here: ⚙ D115456 [MS] Implement on-demand TLS initialization for Microsoft CXX ABI Maybe that has some hint about what’s going wrong.

Well - I can get clang on linux - I don’t have clang for windows only clang-cl… and linux thread local storage is a different ABI… but it does appear to optimize there…

and it appears the command line options for clang-cl are cl compatible, not GCC compatible…

A handy trick to know is that clang detects its own name to decide on its driver. On Linux, clang-cl and clang++ are symlinks of clang, while Windows uses cookies.bso if you copy the clang-cl and rename it to clang++, you’ll get the GCC like driver