status of 'thread_local'

My question was about a clarification on the release notes of LLVM 3.2 and clang's implementation of thread local storage support…none of this info requires any revealing of Apple's state secrets to Google or Microsoft whom I am sure don't have apple developer accounts and don't download apple developer seeds for themselves.

-James

The __thread keyword is a C extension (it not part of the standard). Using it with C++ as is even less specified than using it with C.

Moreover, it has already be specified in the previous discussion that supporting C++ TLS required OS support. Updating Xcode does not change that.

-- Jean-Daniel

As far as I understand, Mac OS X supports TLS since 10.7 Lion. At least the dynamic linker has code which seems to indicate it is supported.

Moreover, it has already be specified in the previous discussion that supporting C++ TLS required OS support. Updating Xcode does not change that.

As far as I understand, Mac OS X supports TLS since 10.7 Lion. At least the dynamic linker has code which seems to indicate it is supported.

There is 2 thing to consider. The system must know how to reserve memory on each thread to store the thread local variable (the part Lion implements), which is enough to use TLS with simple C construct.
The second part is to support invocation of initialization/cleanup functions for TLS values each time a thread is created/destroyed. This part is require to support C++ object with a constructor/destructor. I didn't dig in the dyld / libc / kernel sources for OS X.8, but I don't think it support it yet.

--
/Jacob Carlborg

_______________________________________________
cfe-dev mailing list
cfe-dev@cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev

-- Jean-Daniel

Right, I guess I was mostly thinking of the first part.

The __thread keyword is a C extension (it not part of the standard). Using it with C++ as is even less specified than using it with C.

so?

If the 'thread_local' keyword was implemented, I'd use that…but it isn't implemented, so I'm testing with what is there. I don't really care what the keyword is.

Moreover, it has already be specified in the previous discussion that supporting C++ TLS required OS support. Updating Xcode does not change that.

You'll recall that it was suggested to me that LLVM 3.2 offered something new and interesting in that regard. So, I was checking it out. It didn't work as I expected, so, simply put...I was asking for education. Eli then gave me the meaningful answer that allowed me to understand what was happening. The question for me is… is what is implemented good enough for me to replace boost::thread_specific_ptr on 10.7+.

BTW: I went looking for clear and precise documentation on the web about clang's built-in support for thread-local storage, but I could not find it. Had I found that, I would not have come here asking questions about it.

-James

The Release Notes comment is not about clang TLS support, but about LLVM TLS support (especially the ability to specify a model for TSL variables).
If you want to know what the new feature is, the LLVM documentation give you exactly what you're looking for: http://llvm.org/docs/LangRef.html#globalvars

-- Jean-Daniel

I read that. What I deduced from that information was that perhaps LLVM 3.1 was incapable of properly supporting '__thread'. So, when I acquired LLVM 3.2, I re-tried the test and found that it still did not work. So, I was seeking to educate myself as to why it didn't work. Is such behavior not understandable for some reason? I'm not really sure what you are seeking to communicate to me on this topic. As far as I am concerned, the question is answered...I understand the limitations and the thread can terminate.

-James

hi.

I just updated to Xcode 4.6. I note the following:

$ /Applications/Xcode46-DP1.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang --version
Apple clang version 4.2 (tags/Apple/clang-424.0.11) (based on LLVM 3.2svn)
Target: x86_64-apple-darwin12.2.0
Thread model: posix

It is my understanding from the release notes, that LLVM 3.2 is support thread-local storage. I just re-ran my test using the '__thread' keyword from the last time I asked about this and I still just get one instance of the object rather than one-per-thread.

The __thread keyword is a C extension (it not part of the standard). Using it with C++ as is even less specified than using it with C.

Moreover, it has already be specified in the previous discussion that supporting C++ TLS required OS support. Updating Xcode does not change that.

gcc 4.8 now implements thread_local with a performance penalty for global thread_local variables: http://gcc.gnu.org/gcc-4.8/changes.html#cxx

I guess that function-local thread_local variables can use the same scheme for initialization as function-local static variables.

I would be very interested to know what this “penalty” is. I have a couple idea of what it could be, but no idea about what it really is.

– Matthieu

hi.

I just updated to Xcode 4.6. I note the following:

$ /Applications/Xcode46-DP1.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang --version
Apple clang version 4.2 (tags/Apple/clang-424.0.11) (based on LLVM 3.2svn)
Target: x86_64-apple-darwin12.2.0
Thread model: posix

It is my understanding from the release notes, that LLVM 3.2 is support thread-local storage. I just re-ran my test using the ‘__thread’ keyword from the last time I asked about this and I still just get one instance of the object rather than one-per-thread.

The __thread keyword is a C extension (it not part of the standard). Using it with C++ as is even less specified than using it with C.

Moreover, it has already be specified in the previous discussion that supporting C++ TLS required OS support. Updating Xcode does not change that.

gcc 4.8 now implements thread_local with a performance penalty for global thread_local variables: http://gcc.gnu.org/gcc-4.8/changes.html#cxx

I guess that function-local thread_local variables can use the same scheme for initialization as function-local static variables

I would be very interested to know what this “penalty” is. I have a couple idea of what it could be, but no idea about what it really is.

Actually it look like GCC converts thread_local access into function call with lazy initialization of thread_local variable.

http://stackoverflow.com/questions/13106049/c11-gcc-4-8-thread-local-performance-penalty (especially the third answer that was post after your last comment on this same page)

There is 2 things I would like to know though; How does it handle destruction at the end of the thread, and why it can’t avoid the access penalty for POD and base types. The compiler should be smart enough to detect what type require complex access, and what type support direct access.

– Matthieu

– Jean-Daniel

Thanks!

I had not seen Kenny’s answer, glad someone finally had more than an educated guess to present.

Regarding the penalty for simple POD construction, unfortunately it might not be trivial. For a static thread_local it’s quite obvious whether the value can be computed right off the bat or not, however for an extern thread_local the initializer is invisible, so the optimization is not possible.

Regarding destructors, I don’t see how it could be supported.

All in all I would have preferred that they went with a similar scheme to C++ globals by having a function for initialization and another for destruction called upon entry and destruction. Furthermore it’s unclear to me what interactions this have with the std::async deferred policy. With the implementation being able to use a thread pool under the hood, this would require recycling the thread_local variables… and I doubt it’s covered.

– Matthieu

It has nothing to do with the type of the object. In C, the variable has to be initialized with a constant. In C++, the initializer can be an expression with arbitrary side-effects. If we can’t see that initializer, then the only way we can use simple access is to force initialization aggressively at the start of every thread, which is technically legal under the standard but not really acceptable.

Destruction involves registering a thread-local destructor with the runtime during lazy initialization.

There was quite a long thread on cxx-abi-dev about this.

John.

Thank you for the answers and for the pointer. I will have a look at the list archives for the details.

– Jean-Daniel