Position independent code writes absolute pointer

Hello everyone,

I have an issue with some code that I jit/load as position independent code. I have a feeling that it is not possible to solve the issue but I wanted to give it a try.

#include <stdio.h>

int magicValue = 123;

int magicValue2 = 321;

volatile int *pValue = &magicValue;

void printMagicValue()

{

printf(“Planschi…\n”);

printf(“The magic value is %i 0x%p && 0x%p\n”, magicValue, &magicValue, pValue);

}

void setMagicValue(int value)

{

magicValue = value;

}

This is the code which I will load as PIC, for the JTMB I use the following settings:

JTMB->setRelocationModel(llvm::Reloc::PIC_);

JTMB->setCodeModel(llvm::CodeModel::Small);

The code will be loaded into a shared memory. Two process will execute the memory from there, calling “printMagicValue”, “setMagicValue(120)” and “printMagicValue” again. Only the first process will JIT the code, every other process will access it from the shared memory.

The first Process will say:

Planschi…

The magic value is 123 0x00000270BB090038 && 0x00000270BB090038

Planschi…

The magic value is 120 0x00000270BB090038 && 0x00000270BB090038

The second Process will say:

Planschi…

The magic value is 120 0x00000237A5DE0038 && 0x00000270BB090038

Planschi…

The magic value is 120 0x00000237A5DE0038 && 0x00000270BB090038

The values will be read correctly! Hurray! But my problem is, that the pointer ‘pValue’ was written with an absolute value and not with a PIC conform value. The second process will now print the address from the first process. I hoped, that – since the code is PIC – that also the pointers are written PIC like. I think I understand why this is not the case, but can I somehow change this behaviour without calculating the offset myself? My overall goal is to share the entire code between two processes.

I hope my question is somewhat understandable and I hope even more, that there is a solution to this…

Thank you for any help in advance and kind greetings

Björn

I wanted to add an thought to this:

Could it be possible to modify the code on the IR-Level to store PIC/offset address and not absolute address? I’m not familiar with the LLVM IR so I don’t know what is possible and how it effects the code at all.

Hi Gaier,

There's no way to do this automatically in LLVM at the moment. It
sounds kind of related to pointer compression techniques (also not
supported right now).

Could it be possible to modify the code on the IR-Level to store PIC/offset address and not absolute address? I’m not familiar with the LLVM IR so I don’t know what is possible and how it effects the code at all.

It depends how much control you have over the code. You could
instrument code so that it converted all stores of pointers to be
relative to some fixed global (PC-relative doesn't work there because
it will be loaded at a different address, and "relative to the address
it's being stored to" would break memcpy). But that has some major
issues:

1. It's an ABI break, so you have to be able to recompile all code,
including any system libraries you make use of.
2. LLVM can only convert the pointers it knows about, so it would
still be broken by someone storing a pointer via an intptr_t cast and
probably other things I haven't thought of.
3. There probably isn't even a relocation for any statically
initialized pointers. You might be able to convert all of them to use
a dynamic module initializer instead though.
4. I'd expect debugging to go horribly wrong.

Cheers.

Tim.

Hey Tim,

Thank you for the answer! I expected something like that sadly :<

However...

It depends how much control you have over the code. You could instrument code so that it converted all stores of pointers to be relative to some fixed global (PC-relative doesn't work there because it will be loaded at a different address, and "relative to the address it's being stored to" would break memcpy). But that has some major

This sounds interesting from a learning perspective, because I never have done something like that. Is this difficult to do? Also why only convert the stores? Shouldn't I also convert the reads so they are also valid?

Kind greetings
Björn

Hi Bjoern,

> It depends how much control you have over the code. You could instrument code so that it converted all stores of pointers to be relative to some fixed global (PC-relative doesn't work there because it will be loaded at a different address, and "relative to the address it's being stored to" would break memcpy). But that has some major

This sounds interesting from a learning perspective, because I never have done something like that. Is this difficult to do? Also why only convert the stores? Shouldn't I also convert the reads so they are also valid?

Sorry, I meant to say you'd have to undo the transformation on the
loads (and atomicrmw, cmpxchg) too. I think getting something that
sometimes works would actually be quite easy. You'd want to make it a
ModulePass to handle the globals, then you'd iterate through each
function, turning a store like:

    store %type* %val, %type** %ptr

into:

    %val.int = ptrtoint %type* %val to i64
    %val.int.new = sub i64 %val.int, ptrtoint(i8* @__GLOBAL_ANCHOR to i64)
    %val.new = inttoptr i64 %val.int.new to %type*
    store %type* %val.new, %type** %ptr

The corresponding load side would add back @__GLOBAL_ANCHOR. At the
Module level you'd add some kind of tentative definition for
GLOBAL_ANCHOR so it can be merged if needed, and convert a definition
like:

    @var = global i8* @other_global

into

    @var = global i8* null
    define void @__MODULE_INIT() {
      ; Duplicate store code above to put a relative value for
@other_global into @var
    }
    %0 = type { i32, void ()*, i8* }
    @llvm.global_ctors = appending global [1 x %0] [%0 { i32 65535,
void ()* @__MODULE_INIT, i8* null }]

Unfortunately I've also thought of a couple more nasty problems while
writing this out:
1. Things like target-specific vector intrinsics that do loads and
stores might obscure the fact that they're storing a pointer by
casting it to an i64 or something.
2. You'd have to make sure the stack for both programs as in the
shared region or no-one ever used a pointer to a local variable.

Cheers.

Tim.

Hello Tim,

Thank you a lot for the code! Seems like I have to learn more about the LLVM assembly to understand everything in detail.

It is still kinda sad that it is not possible to achieve the behaviour but I understand more and more why it is not possible. Thank you a lot!

Kind greetings
Björn