TLS with MCJIT (an experimental patch)

Hi David,

Following up on the problems we discussed yesterday on IRC regarding TLS with MCJIT, I’ve put together the attached experimental patch.

This patch makes three changes:

  1. SectionMemoryManager is changed to request memory below the 2GB boundary by default.

  2. sys::Memory::allocateMappedMemory is changed to set the MAP_32BIT flag if the requested “near” block is below the 2GB boundary.

  3. RuntimeDyldELF is changed to recognize the possibility of external data symbols.

Of these changes, items 2 and 3 are probably reasonable things to commit into trunk, and depending on how this turns out I will do so. Item 1 is a bit heavy-handed as presented here, but it suggests the type of thing that subclasses of SectionMemoryManager could do to make this work. If we had a way to communicate the code model to the memory manager from RuntimeDyld/MCJIT (and we obviously should!) then SectionMemoryManager could do something like this when small or medium memory models are selected on applicable platforms.

When I tried this patch with the test case you provided yesterday it got through the compilation phase with lli using the small code model and the static relocation model, but it ultimately failed (but failed gracefully) because it couldn’t resolve the ‘_ThreadRuneLocale’ symbol. Resolution of external symbols is meant to be handled by the memory manager, so I thought perhaps you could get something working with this patch.

Please give this a try and let me know how it works.

Thanks,

Andy

tls-experimental.patch (2.41 KB)

Hi,

Unfortunately, I can't compile this patch. MAP_32BIT is a Linuxism that doesn't work on FreeBSD (or OS X, or, as far as I can tell, anywhere except Linux). We can consider adding something similar to FreeBSD (although I'm hesitant to encourage anything that increases the determinism of the memory layout of JITed code, for security reasons), but it doesn't seem ideal.

David

Can you try it without the MAP_32BIT part? It won't be as reliable, but if the memory addresses it is asking for are available it could work.

I agree that there are good reasons not to lock in on a single memory address, but I'm curious as to what other obstacles might be lurking behind the ones we know about. If the patch works when memory is loaded below 2GB then it would be possible to right a sophisticated memory manager that surveys the available memory in that space and selects an appropriate block in some non-deterministic manner.

-Andy

Without the MSP_32BIT part, I consistently hit this assertion:

Assertion failed: ((Type == ELF::R_X86_64_32 && (Value <= UINT32_MAX)) || (Type == ELF::R_X86_64_32S && ((int64_t)Value <= INT32_MAX && (int64_t)Value >= INT32_MIN))), function resolveX86_64Relocation, file ../lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp, line 222.

David

Hi David,

I believe that assertion indicates that something didn't get loaded into the lower 2GB of address space. That is, the memory manager isn't allocating memory in that range.

I'm sure there must be a way to allocate memory in that range on FreeBSD. The system loader has to do it, right? I just don't know what makes it happen.

-Andy

Can you elaborate on why MCJIT TLS support needs code in the low 2 GB? What piece of data do you need to be reachable? It sounds like this was discussed on IRC, but I’m curious.

Does the MCJIT even have the reachability problems of the old JIT? If you build an object file in memory, presumably you can measure it and then allocate +x memory for it all at once, instead of the old model of not knowing how big it was going to be.

If we build a module at a time, presumably separate modules don’t need to be reachable w.r.t. each other, since they can use PLT-style stubs.

I don’t think this is actually a TLS-specific problem. The TLS case just exposed a couple of other shortcomings in the current code base.

The problem is two-fold. First, MCJIT doesn’t support the PIC relocation model for most platforms. Second, the MC code generation doesn’t work with large code model and the static relocation model.

Because of these two issues, to try to get TLS working, we wanted to generate code with the static relocation model and the small code model. It’s the small code model that requires code to be loaded in the lower 2GB. In particular, when you use small code model with static relocation model MC generates relocations that assume 32-bit addresses (R_X86_64_32). Once this relocation is generated, the RuntimeDyld doesn’t have enough information to be able to fake it if the address it needs to write into the relocation is bigger than 32-bits.

For PC-relative relocations, we can just rely on everything being loaded in proximity, and in fact that happens even with the large memory model. For “absolute” 32-bit relocations that doesn’t work.

-Andy

I've asked around, and we don't seem to have anything that can do it. Checking the code for rtld, it explicitly asks for memory at a specific address and keeps track of the regions it has used.

David

I believe that assertion indicates that something didn't get loaded into the lower 2GB of address space. That is, the memory manager isn't allocating memory in that range.

I'm sure there must be a way to allocate memory in that range on FreeBSD. The system loader has to do it, right? I just don't know what makes it happen.

I've asked around, and we don't seem to have anything that can do it. Checking the code for rtld, it explicitly asks for memory at a specific address and keeps track of the regions it has used.

I was under the impression that, in the small memory model, each .so
had to be small, but because of the use of GOTs and PLTs they could be
anywhere in memory. If we allocate the tls memory in the same
allocator call that allocates space for the text section this would
work, no?

David

Cheers,
Rafael

Why the private message? If unintentional, please forward this to the list.

Ooops, forgot to hit reply-all. Didn't the LLVM lists used to default to reply-to-list behaviour?

So, the JIT is analogous to dlopen, so it should be using general
dynamic and local dynamic models. It is only the initial exec and
local exec that require the dynamic linker to allocate memory at
startup.

The dynamic linker will have allocated the memory because the TLS variable in question is provided by libc. It is already allocated before the JIT'd code runs. The JIT'd code just needs to refer to it.

So, the JIT is analogous to dlopen, so it should be using general
dynamic and local dynamic models. It is only the initial exec and
local exec that require the dynamic linker to allocate memory at
startup.

The dynamic linker will have allocated the memory because the TLS variable in question is provided by libc. It is already allocated before the JIT'd code runs. The JIT'd code just needs to refer to it.

OK. Are we generating generic dynamic code to do so? It will look like

.byte 0x66
leaq x@tlsgd(%rip),%rdi ; R_X86_64_TLSGD to symbol x (MCJIT has to
create a GOT entry)
.word 0x6666
rex64
call __tls get_addr@plt ; R_X86_64_PLT32 to __tls_get_addr (MCJIT
has to create a GOT and a PLT entry)

This should work from any place in memory. I wouldn't be surprised if
these relocations are not implemented yet, but that should be all that
is needed to get tls working.

Cheers,
Rafael

That was, indeed, where this discussion started. Andrew's suggestion was to use the small code model, in the hope that this would fix some things. The lack of support for these relocations is what is stopping my code from working with MCJIT, and your removal of EH is stopping it working with the legacy JIT.

David

Well, these relocations are there because of the general dynamic tls
model, so they would be present on all code models.

Cheers,
Rafael

http://www.unicom.com/pw/reply-to-harmful.html

To clarify, MCJIT currently has no GOT support whatsoever for ELF with x86-64 and ARM (and probably others). My experimental patch was meant as an attempt to get TLS working with static relocation model and small code model. It's the combination of these two that requires memory in the lower 2GB. MCJIT works with static and large, but the MC code generator has a problem with TLS and large code model.

Obviously we just need to get PIC support in place for MCJIT.

-Andy

To clarify, MCJIT currently has no GOT support whatsoever for ELF with x86-64 and ARM (and probably others).

No, I added a bare minimal to get EH working...

My experimental patch was meant as an attempt to get TLS working with static relocation model and small code model. It's the combination of these two that requires memory in the lower 2GB. MCJIT works with static and large, but the MC code generator has a problem with TLS and large code model.

I see. Yes, on that model codegen would produce local exec TLS model
and we would only need R_X86_64_TPOFF32 (and making sure the code was
close to the tls block).

Obviously we just need to get PIC support in place for MCJIT.

Agreed.

Thanks,
Rafael

I’d like to take a crack at this. Was there any more progress or work I should be aware off?

Thanks,
Keno

Hi Keno,

I believe that PIC/GOT support is working for x86-64 ELF targets. I’m not sure what effect that had on the proposed TLS implementation.

-Andy