TLS with MCJIT (an experimental patch)

andykaylor · May 8, 2013, 8:54pm

Hi David,

Following up on the problems we discussed yesterday on IRC regarding TLS with MCJIT, I’ve put together the attached experimental patch.

This patch makes three changes:

SectionMemoryManager is changed to request memory below the 2GB boundary by default.
sys::Memory::allocateMappedMemory is changed to set the MAP_32BIT flag if the requested “near” block is below the 2GB boundary.
RuntimeDyldELF is changed to recognize the possibility of external data symbols.

Of these changes, items 2 and 3 are probably reasonable things to commit into trunk, and depending on how this turns out I will do so. Item 1 is a bit heavy-handed as presented here, but it suggests the type of thing that subclasses of SectionMemoryManager could do to make this work. If we had a way to communicate the code model to the memory manager from RuntimeDyld/MCJIT (and we obviously should!) then SectionMemoryManager could do something like this when small or medium memory models are selected on applicable platforms.

When I tried this patch with the test case you provided yesterday it got through the compilation phase with lli using the small code model and the static relocation model, but it ultimately failed (but failed gracefully) because it couldn’t resolve the ‘_ThreadRuneLocale’ symbol. Resolution of external symbols is meant to be handled by the memory manager, so I thought perhaps you could get something working with this patch.

Please give this a try and let me know how it works.

Thanks,

Andy

tls-experimental.patch (2.41 KB)

davidchisnall · May 9, 2013, 1:52am

Hi,

Unfortunately, I can't compile this patch. MAP_32BIT is a Linuxism that doesn't work on FreeBSD (or OS X, or, as far as I can tell, anywhere except Linux). We can consider adding something similar to FreeBSD (although I'm hesitant to encourage anything that increases the determinism of the memory layout of JITed code, for security reasons), but it doesn't seem ideal.

David

andykaylor · May 9, 2013, 5:58pm

Can you try it without the MAP_32BIT part? It won't be as reliable, but if the memory addresses it is asking for are available it could work.

I agree that there are good reasons not to lock in on a single memory address, but I'm curious as to what other obstacles might be lurking behind the ones we know about. If the patch works when memory is loaded below 2GB then it would be possible to right a sophisticated memory manager that surveys the available memory in that space and selects an appropriate block in some non-deterministic manner.

-Andy

davidchisnall · May 10, 2013, 6:05pm

Without the MSP_32BIT part, I consistently hit this assertion:

Assertion failed: ((Type == ELF::R_X86_64_32 && (Value <= UINT32_MAX)) || (Type == ELF::R_X86_64_32S && ((int64_t)Value <= INT32_MAX && (int64_t)Value >= INT32_MIN))), function resolveX86_64Relocation, file ../lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp, line 222.

David

andykaylor · May 15, 2013, 12:17am

Hi David,

I believe that assertion indicates that something didn't get loaded into the lower 2GB of address space. That is, the memory manager isn't allocating memory in that range.

I'm sure there must be a way to allocate memory in that range on FreeBSD. The system loader has to do it, right? I just don't know what makes it happen.

-Andy

rnk · May 15, 2013, 12:46pm

Can you elaborate on why MCJIT TLS support needs code in the low 2 GB? What piece of data do you need to be reachable? It sounds like this was discussed on IRC, but I’m curious.

Does the MCJIT even have the reachability problems of the old JIT? If you build an object file in memory, presumably you can measure it and then allocate +x memory for it all at once, instead of the old model of not knowing how big it was going to be.

If we build a module at a time, presumably separate modules don’t need to be reachable w.r.t. each other, since they can use PLT-style stubs.

andykaylor · May 15, 2013, 5:11pm

I don’t think this is actually a TLS-specific problem. The TLS case just exposed a couple of other shortcomings in the current code base.

The problem is two-fold. First, MCJIT doesn’t support the PIC relocation model for most platforms. Second, the MC code generation doesn’t work with large code model and the static relocation model.

Because of these two issues, to try to get TLS working, we wanted to generate code with the static relocation model and the small code model. It’s the small code model that requires code to be loaded in the lower 2GB. In particular, when you use small code model with static relocation model MC generates relocations that assume 32-bit addresses (R_X86_64_32). Once this relocation is generated, the RuntimeDyld doesn’t have enough information to be able to fake it if the address it needs to write into the relocation is bigger than 32-bits.

For PC-relative relocations, we can just rely on everything being loaded in proximity, and in fact that happens even with the large memory model. For “absolute” 32-bit relocations that doesn’t work.

-Andy

davidchisnall · May 22, 2013, 10:22am

I've asked around, and we don't seem to have anything that can do it. Checking the code for rtld, it explicitly asks for memory at a specific address and keeps track of the regions it has used.

David

Rafael_Avila_de_Espi · May 22, 2013, 12:01pm

I believe that assertion indicates that something didn't get loaded into the lower 2GB of address space. That is, the memory manager isn't allocating memory in that range.

I'm sure there must be a way to allocate memory in that range on FreeBSD. The system loader has to do it, right? I just don't know what makes it happen.

I've asked around, and we don't seem to have anything that can do it. Checking the code for rtld, it explicitly asks for memory at a specific address and keeps track of the regions it has used.

I was under the impression that, in the small memory model, each .so
had to be small, but because of the use of GOTs and PLTs they could be
anywhere in memory. If we allocate the tls memory in the same
allocator call that allocates space for the text section this would
work, no?

David

Cheers,
Rafael

davidchisnall · May 22, 2013, 1:19pm

Why the private message? If unintentional, please forward this to the list.

Ooops, forgot to hit reply-all. Didn't the LLVM lists used to default to reply-to-list behaviour?

So, the JIT is analogous to dlopen, so it should be using general
dynamic and local dynamic models. It is only the initial exec and
local exec that require the dynamic linker to allocate memory at
startup.

The dynamic linker will have allocated the memory because the TLS variable in question is provided by libc. It is already allocated before the JIT'd code runs. The JIT'd code just needs to refer to it.

Rafael_Avila_de_Espi · May 22, 2013, 1:28pm

So, the JIT is analogous to dlopen, so it should be using general
dynamic and local dynamic models. It is only the initial exec and
local exec that require the dynamic linker to allocate memory at
startup.

The dynamic linker will have allocated the memory because the TLS variable in question is provided by libc. It is already allocated before the JIT'd code runs. The JIT'd code just needs to refer to it.

OK. Are we generating generic dynamic code to do so? It will look like

.byte 0x66
leaq x@tlsgd(%rip),%rdi ; R_X86_64_TLSGD to symbol x (MCJIT has to
create a GOT entry)
.word 0x6666
rex64
call __tls get_addr@plt ; R_X86_64_PLT32 to __tls_get_addr (MCJIT
has to create a GOT and a PLT entry)

This should work from any place in memory. I wouldn't be surprised if
these relocations are not implemented yet, but that should be all that
is needed to get tls working.

Cheers,
Rafael

davidchisnall · May 22, 2013, 1:37pm

That was, indeed, where this discussion started. Andrew's suggestion was to use the small code model, in the hope that this would fix some things. The lack of support for these relocations is what is stopping my code from working with MCJIT, and your removal of EH is stopping it working with the legacy JIT.

David

Rafael_Avila_de_Espi · May 22, 2013, 1:42pm

Well, these relocations are there because of the general dynamic tls
model, so they would be present on all code models.

Cheers,
Rafael

Konstantin_Tokarev · May 22, 2013, 1:58pm

http://www.unicom.com/pw/reply-to-harmful.html

andykaylor · May 22, 2013, 4:30pm

To clarify, MCJIT currently has no GOT support whatsoever for ELF with x86-64 and ARM (and probably others). My experimental patch was meant as an attempt to get TLS working with static relocation model and small code model. It's the combination of these two that requires memory in the lower 2GB. MCJIT works with static and large, but the MC code generator has a problem with TLS and large code model.

Obviously we just need to get PIC support in place for MCJIT.

-Andy

Rafael_Avila_de_Espi · May 22, 2013, 5:28pm

To clarify, MCJIT currently has no GOT support whatsoever for ELF with x86-64 and ARM (and probably others).

No, I added a bare minimal to get EH working...

My experimental patch was meant as an attempt to get TLS working with static relocation model and small code model. It's the combination of these two that requires memory in the lower 2GB. MCJIT works with static and large, but the MC code generator has a problem with TLS and large code model.

I see. Yes, on that model codegen would produce local exec TLS model
and we would only need R_X86_64_TPOFF32 (and making sure the code was
close to the tls block).

Obviously we just need to get PIC support in place for MCJIT.

Agreed.

Thanks,
Rafael

Keno_Fischer2 · January 20, 2014, 5:30am

I’d like to take a crack at this. Was there any more progress or work I should be aware off?

Thanks,
Keno

andykaylor · January 20, 2014, 6:38pm

Hi Keno,

I believe that PIC/GOT support is working for x86-64 ELF targets. I’m not sure what effect that had on the proposed TLS implementation.

-Andy

Topic		Replies	Views
[PATCH][MCJIT][Orc] RTDyldMemoryManager refactor. LLVM Dev List Archives	0	59	March 30, 2015
JITted code and thread-local storage LLVM Dev List Archives	6	70	August 22, 2016
TLS in RuntimeDyld / Linux x86_64 LLVM Dev List Archives	1	88	September 17, 2017
[MCJIT] TLS relocation design LLVM Dev List Archives	2	68	January 16, 2015
[PATCH] TLS support for Windows 32+64bit LLVM Dev List Archives	5	79	January 26, 2012

TLS with MCJIT (an experimental patch)

Related Topics