Transfer information IR to binary

Hi,

I want to know if it is possible to pass information from IR to the final binary (like a constant value)?

I have a module pass in IR who make some transformation and, once the compilation is finished, I need to apply a post-processing.

The post-processing need information from the IR part.

Greetings,

Johan

What kind of constant: type, value and how is it created?

You can make public symbols that you can extract in a linker script to a special section.

Or perhaps you want some metadata that a special late state (machine instr) pass is extracting and adding.

The “best” solution really depends on what you are trying to achieve overall and what kind of data you are working with.

I will try to explain better what I do.

The main goal behind this is to verify that a part of code is not modified by someone else (it is an integrity check).

To do this, I create in IR a function who take 2 parameters, a begin and an end value.

This function perform an hash over the code area (from begin to end) and return it.

At first, I don’t know the addresses and the hash value so I put random value (it is an integer 64 bits).

The function look like uint32_t isModified(uint64_t* begin, uint64_t* end).

Once the compilation is over, I need to update the begin address, end address and the hash value.

When I say the compilation is over, I mean the clang driver has finished all of his action (compiling, linking, etc.).

Greetings,

Johan

And you want this for only SOME bits of code, and that’s why you need to have the IR report what sections are “sensitive”?

It would be fairly easy if the code you want to check is a normal functions, just store the start address of the function, and the length should be doable too at machine code level, but not IR level. If you want to check only the middle of the function, it’s a bit harder.

How are you dealing with the fact that code gets relocated during loading?

[I’m always curious as to how these type of designs cope with someone modifying the checksumming code itself, but that’s another problem - or is this one of these things where the checksum is stored in special hardwareprotected memory?].

HI Mats,

And you want this for only SOME bits of code, and that’s why you need to have the IR report what sections are “sensitive”?

Exactly and I want to chose this once the compilation is over.

It would be fairly easy if the code you want to check is a normal functions, just store the start address of the function, and the length should be doable too at machine code level, but not IR level. If you want to check only the middle of the function, it’s a bit harder.

Sadly I need to be able to check random part of the code. One of the problem I have is even if a get the address in the IR level, you still need the CRC value.

So normally, you still have to pass theses values to the post-processing in order to compute the hash.

How are you dealing with the fact that code gets relocated during loading?

You talk about the loading phase during the link? If yes, this is why a do the post-processing after the link.

If you talk about something like the -fpie parameter, I used a small trick.

The function who will call isModified will calculate the offset dynamically.

To do this, you get the address in IR (in C++ this is like (uint64_t)std::addressof(main);) and I remove a constant value.

During the post-processing (again) I will update this constant value with the address into the binary.

This will give you an offset and you used it to update the addresses.

[I’m always curious as to how these type of designs cope with someone modifying the checksumming code itself, but that’s another problem - or is this one of these things where the checksum is stored in special hardwareprotected memory?].

For now, I store it into an special place.

My first solution for my problem was to used some temporary file but this is highly impractical.

Maybe I can create a temporary section into the binary but I didn’t find a lot of information about it.

Thanks,

Johan

I don't think there is a simple way to achieve this - even less so if you
want it to be portable across multiple processor architectures.

For a given processor architecture, you could add a pseudo-instruction that
is some unusual form of no-op (e.g. one of the "does nothing" instructions
in x86, with an unusual combination of (redundant) prefix bytes, or some
such) and then scan the generated code for that and store the relevant
information. But this is highly dependent on architecture [and may be
sensitive to false positives]. (Alternatively some illegal instruction and
replace it with no-op during the post-processing).

But I don't think that's a particularly good solution long term.

Maybe someone else has a better idea...

Thank you for the input.

Yeah maybe someone else will have an idea :slight_smile:

Johan