How to use thread local storage with ORC JIT?

Hello everyone,

I’m using the ORC JIT to load and execute some IR files which I compiled with Clang-Cl from some source files - the host is Windows 64bit. Apparently, clang decided to generate code using the thread local storage for some of the code. When now loading and jitting those files I get the following undefined references:

__emutls_get_address

__emutls_v._Init_thread_epoch

A while ago I was told, that those references are coming from compiler-rt - so I build that project and at least I found “__emutls_get_address” in the “clang_rt.builtins-x86_64.lib” file. However, I haven’t found “__emutls_v._Init_thread_epoch” anyway and don’t know what to do with this symbol.

Any ideas?

Also, in case you see double - I asked this question in the LLVM Discord already with no big success…

Kind greetings

Björn

I am no expert, so take anything I say with a pinch of salt, but my understanding is that Orc JIT does not support native thread-local storage, it just emulates it. I think what you have to do is to compile your source code with Clang’s -femulated-tls option, to avoid native thread-local storage calls.

Geoff

Hey Geoff,

Sadly this didn’t changed the undefined symbols I encounter :c

+Lang Hames (though I think he’s not got much bandwidth for a few months - so responses may be delayed)

Hi All,

Björn – I know we discussed this on discord already, but something just occurred to me: Are you compiling your target executable (whatever you’re JITing in to, probably the same process as your JIT) with -femulated-tls? If so, have you made sure to include at least one thread local variable that is either used or marked as used (to ensure it’s not dead stripped)? JIT’d code will be looking for those symbols in your executable so you’ll need to make sure they’re linked in there.

I am no expert, so take anything I say with a pinch of salt, but my understanding is that Orc JIT does not support native thread-local storage, it just emulates it.

That is true for now, but I hope to have native thread locals supported (on MachO at least) by LLVM 12. The ORC runtime prototype [1] already contains support for POD thread locals, and I will be aiming to add support for thread locals with nontrivial constructors in the near future.

– Lang.

Hey Lang,

I can’t remember anymore but I think I haven’t tried with -femulated-tls.

I also forwarded the question to the Cling people because I think they got it running – but also haven’t heard back from there. I decided against emulated-tls and tried resolving the Windows symbols myself by adding another IR file to it which was compiled as the following:
extern “C”

{

void Sleep(unsigned long dwMilliseconds);

static constexpr int EpochStart = std::numeric_limits::min();

unsigned int _tls_index = 0;

int _Init_global_epoch = EpochStart;

int _Init_thread_epoch = EpochStart;

void _Init_thread_header(volatile int* ptss)

{

while(true)

{

/* Try to acquire the first initialization lock */

int oldTss = _InterlockedCompareExchange(reinterpret_cast<volatile long*>(ptss), -1, 0);

if(oldTss == -1)

{

/* Busy, wait for the other thread to do the initialization */

Sleep(0);

continue;

}

/* Either we acquired the lock and the caller will do the initializaion

or the initialization is complete and the caller will skip it */

break;

}

}

void _Init_thread_footer(int *ptss)

{

ptss = _InterlockedIncrement(reinterpret_cast<long>(&_Init_global_epoch));

}

void _Init_thread_abort(volatile int* ptss)

{

/* Abort the initialization */

_InterlockedAnd(reinterpret_cast<volatile long*>(ptss), 0);

}

}

This was good enough for me until I hear a better way to do this – cause I got a feel that this is not really working? However my static variables were only initialized once no matter how many threads I threw at.

I haven’t mentioned that solution yet cause I’m not much confident about it’s reliability. Maybe I should show this to the cling people too…

Kind greetings

Björn

Bjoern,

In your 15 February email you said you tried that and it didn’t help…?

Geoff

Ohhh right!
I did try it…. Thanks for the reminder >o<

In your 15 February email you said you tried that and it didn’t help…?

Ohhh right!
I did try it…. Thanks for the reminder >o<

I think you will need to compile both the JIT’d code and the executor process with -femulated-tls and include at least one thread-local in the executor in order for this to work: If you only compile your JIT’d code with -femulated-tls then you’ll have a reference to __emutls_get_address as you’ve seen, but no definition for it. That symbol would usually be pulled in from compiler-rt by the static linker, but you’re running under the JIT. Your two options are to either add compiler-rt to your JIT (this is untested on Windows, and consequently risky), or to compile your executor with -femulated-tls and include at least one thread local in it – this will force the static linker to compile __emutls_get_address into your executor (lower risk, but may not work if __emutls_get_address is marked non-exported from compiler-rt).

– Lang.