status of 'thread_local'


What is the status of 'thread_local' support in clang? is that something that will be delivered in the next few months? later? never will be delivered? has been delivered, but the website not updated?


The 3.2 release notes have a note about tls: (3.2 isn't out yet)

AFAIK, thread locals are supported in C using the __thread keyword on platform that support it for some times (OS X 10.7 for Mach-O, ELF platforms, …).

-- Jean-Daniel

that tells me about LLVM, but what about clang supporting the keyword?

Is it C-only? I ran a test with C++ and it didn't work. Does it work with C++ and I need to supply some flag to compiler to enable it? I'll poke around with google and see what info I can dig up.

Can you define “does not work” ?

I just compiled the following code in C++ mode, and it generate the expected llvm assembly:

__thread int foo;

int main(int arcg, char **argv) { return 0; }

====== output:

@foo = thread_local global i32 0, align 4

define i32 @main(i32 %arcg, i8** %argv) nounwind uwtable ssp {
%retval = alloca i32, align 4

It shouldn't be. I ran a test with C++ and it *did* work. Or, at least
it gave correct LLVM IR (and correct assembly).

My (C++) test was:

__thread int foo;

int bar() {
    return foo;

Compiled with bog-standard C++ options this worked fine. Could you
tell us what you thought was wrong with your output?



hmmm. I just ran the test against the 10.7 SDK again and it worked. I’m not sure what I had wrong the first time.

ok. “__thread” works for me. I have to encapsulate the implementation anyway as my code currently uses boost::thread_specific_ptr and it must continue to work on 10.6.

thanks all for the info.


What is the lifetime on a __thread variable? It gets destructed at the time the thread exits?

What is the lifetime on a __thread variable? It gets destructed at the time
the thread exits?

Pretty much. From C++11 (3.7.2):

"All variables declared with the thread_local keyword have thread
storage duration. The storage for these
entities shall last for the duration of the thread in which they are
created. There is a distinct object or
reference per thread, and use of the declared name refers to the
entity associated with the current thread."

"A variable with thread storage duration shall be initialized before
its first odr-use (3.2) and, if constructed,
shall be destroyed on thread exit."

I wouldn't swear to LLVM implementing these semantics yet though. The
initial TLS specification forbade non-trivial constructors and had no
mention of destructors. In fact, from a quick test it looks like LLVM
destroys the object at program exit, and only in one thread (unless
__cxa_atexit and to a lesser extent .init_array are more magical than
I knew).


ok. I'll test it and see.

ok. I ran the test with ‘__thread’. I did not get the expected results.

When I ran the code below, I got one instance of ‘a’ created on thread 2 which was later released on the main thread by cxa_exit().

If I moved ‘x’ to be a global variable (file scope), I still got one instance constructed by cxx_global_var_init() before main was invoked.

I verified that I am getting multiple worker threads during the running of this app. So, I would expect each worker thread to get a copy of ‘x’.

Is my expectation wrong? Is this feature not fully implemented or is there some option that needs to be enabled?

I tested building against the 10.7 & 10.8 SDKs. Xcode 4.5.1 with clang & libc++ in c++11 mode.

#include <Foundation/Foundation.h>

#include <dispatch/dispatch.h>

class a

a(size_t theLoopId) : mLoopId(theLoopId)
std::thread::id tmpCurrentThreadId = std::this_thread::get_id();
std::hashstd::thread::id tmpThreadHasher;
mThreadId = tmpThreadHasher(tmpCurrentThreadId);

NSLog(@“construct. thread=%zX, loop=%zX”, mThreadId, mLoopId);

NSLog(@“destruct. thread=%zX, loop=%zX”, mThreadId, mLoopId);

size_t mThreadId;
size_t mLoopId;
size_t counter;

struct drop_pointer


struct drop_pointer<tpType*>
typedef tpType type;

struct deleteDispatchgroup
void operator()(dispatch_group_t thePtr)
//not needed if ARC is enabled on 10.8+
#if __MAC_OS_X_VERSION_MAX_ALLOWED < 1080 || !__has_feature(objc_arc)

//__thread a x(-1);

int main(int argc, const char * argv[])
dispatch_queue_t tmpDispatchQueue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
std::unique_ptr< drop_pointer<dispatch_group_t>::type, deleteDispatchgroup > tmpDipatchGroupSPtr( dispatch_group_create() );

size_t tmpCounter = 0;

for(size_t i = 1; i <= 50; ++i)

auto tmpHandler = &tmpCounter, i // (size_t theId)
static __thread a x(i);

for(size_t j= 0; j < 10000000; ++j)

tmpCounter += j * i;
x.counter += j * i;


dispatch_group_async(tmpDipatchGroupSPtr.get(), tmpDispatchQueue, tmpHandler);

dispatch_group_wait(tmpDipatchGroupSPtr.get(), DISPATCH_TIME_FOREVER);

// insert code here…
std::cout << “Hello, World!\n”;
return tmpCounter;


sorry. I just re-read your message and realized that I confirmed what you already said. I didn’t correctly understand what you were telling me on the first pass.

So, I suppose the answer to my original question is that this feature is a work-in-progress and I should stick with boost::thread_specific_ptr for now.



I just updated to Xcode 4.6. I note the following:

$ /Applications/ --version
Apple clang version 4.2 (tags/Apple/clang-424.0.11) (based on LLVM 3.2svn)
Target: x86_64-apple-darwin12.2.0
Thread model: posix

It is my understanding from the release notes, that LLVM 3.2 is support thread-local storage. I just re-ran my test using the '__thread' keyword from the last time I asked about this and I still just get one instance of the object rather than one-per-thread.

Is this something that should be working?


Nothing has changed in this area recently; the change mentioned in the
3.2 release notes primarily affects optimization, not user-visible


In addition to Eli's informative answer, please note that while LLVM and Clang are open-source, Xcode 4.6 and the "Apple clang version 4.2" are still under NDA. Please do not discuss their specifics outside of the closed developer forums.

(I hate to be that guy, but...)

thanks for the info.

Is this functionality going to be implemented in clang on OSX any time soon or is it a ways off?


Thread local storage for simple objects (which have a trivial
constructor/destructor) should be working. Beyond that, I don't know
of anyone planning to work on thread-local storage in the near future.


Ok. When I removed the constructor and destructor from my class, I got the expected behavior.