(I’m on windows, migrating to clang-cl from MSVC in this case, on x64 hardware)
if I compile the following code as c++, with clang-cl, with /Ox, the accesses to local storage are not in any way optimized, and appear the same as without the /Ox flag. It seems like thread local storage is maybe also treated like volatile, which it is sort of the opposite of, but any such thread local variable could also have volatile associated with it, I’m just saying it’s like every access to the structure in thread local storage does initialization of the thread local storage space. (compiling it as C, the local storage accesses are also somewhat optimized (though I would really expect the region to be initiliazed once and just referenced later overall, but that’s not what compilers do apparently)
MSVC and GCC both optimize better
#include <stdio.h>
#if !defined( __NO_THREAD_LOCAL__ ) && ( defined( _MSC_VER ) || defined( __WATCOMC__ ) )
# define HAS_TLS 1
# ifdef __cplusplus
# define DeclareThreadLocal static thread_local
# define DeclareThreadVar thread_local
# else
# define DeclareThreadLocal static __declspec(thread)
# define DeclareThreadVar __declspec(thread)
# endif
#elif !defined( __NO_THREAD_LOCAL__ ) && ( defined( __GNUC__ ) || defined( __MAC__ ) )
# define HAS_TLS 1
# ifdef __cplusplus
# define DeclareThreadLocal static thread_local
# define DeclareThreadVar thread_local
# else
# define DeclareThreadLocal static __thread
# define DeclareThreadVar __thread
# endif
#else
// if no HAS_TLS
# define DeclareThreadLocal static
# define DeclareThreadVar
#endif
struct my_thread_info {
int pThread;
int nThread;
};
DeclareThreadLocal struct my_thread_info _MyThreadInfo;
int f( void ) {
if( !_MyThreadInfo.pThread )
_MyThreadInfo.pThread = 1;
int a = _MyThreadInfo.pThread;
int b = _MyThreadInfo.pThread;
int c = _MyThreadInfo.pThread;
printf( "Use vars %d %d %d\n", a, b, c );
return _MyThreadInfo.pThread;
}
int main( void ) {
f();
}