Custom Binary Format Challenges


I hope you are all doing well and thanks in advance. I need to program a transformation of a set of llvm bitcode to have some various techniques woven in. In particular, I need to resolve a given computed target address to one of several in the same way that the function of a dynamic library is resolved, but I need this resolution to happen in the binary target of my choice where I tell it to. It’s basically exactly the same facility as when you compile a group of files as a shared library target. The only difference is, I need this to happen under my control, according to function targets that I can choose and for an argument value that I can also choose as an ordinal to look them up.

I think that I may need to write a compiler pass where this occurs but part of the problem is 1) I don’t know how to make such a thing occur at the bitcode level, 2) and the oridinal is calculated from the instruction pointer.

Can anybody help? Is there a library or function call for calculating lookup tables from a given set of targets given an ordinal? Is there a way to obtain the instruction pointer in llvm bitcode?


You can write it as if you are writing an optimization pass:

It sounds like your highest level is a module, hence you should write a module pass. There is example code on LLVM Programmer’s Manual on how to do a function pass:

Function* targetFunc = ...;

class OurFunctionPass : public FunctionPass {
    OurFunctionPass(): callCounter(0) { }

    virtual runOnFunction(Function& F) {
      for (BasicBlock &B : F) {
        for (Instruction &I: B) {
          if (auto *CallInst = dyn_cast<CallInst>(&I)) {
            // We know we've encountered a call instruction, so we
            // need to determine if it's a call to the
            // function pointed to by m_func or not.
            if (CallInst->getCalledFunction() == targetFunc)

    unsigned callCounter;

Making the FunctionPass a Module pass should be pretty easy with the linked guide. (instead of inheriting from Function Pass you can inherit frmo module pass) Afterwards, you can build your new pass against your LLVM source code and run it using the opt functionality.

Hope I didn’t misunderstood your question – if you have anymore let me know!


Thank you so much!

What about discovering the instruction pointer value?
Also, does anybody know how to embed an artifact as a resource in a binary? I’d like to have two text sections, and have one copied in from another binary.

Hi Kenneth,

Can you elaborate what you mean by instruction pointer value? Like the actual instruction with opcode and operands? With the sample code that I showed you, the instrucrtion pointer in the innermost for loop will have access to the following functions:

Alternatively, you can use the dump() operation to dump the instructions out.

Unfortunately I don’t know how to address your second question. That’s stretching my knowledge in LLVM.


Program counter - EIP, RIP for x86/64. I need to obtain it and pass it as an argument to the function that calculates an ordinal from it.

I think that there must be some way to use the bitcode language to place byte values at a designated offset. Or use the command line to specify the section and offset for the data.

If you can write what you want to output in C with asm statements, clang can show you what the IR should look like.

The bitcode is only a representation of the IR, which is in SSA form. And SSA form assumes an infinite amount of registers, which is not offered by x86. When bitcode gets assembled/compiled to machine language, it breaks down the SSA form into non-SSA format. Personally I don’t know how to use bitcode language to achieve what you want to do.

The closest thing I can think of is the llvm-MC library, keystone and capstone project, :

In fact, I’m also looking for something similar – to be able to specify the machine instructions base solely on the IR. If you found anything let me know!


Well, position independent code has to be woven into the final assembler, and at least one technique uses call ret sequences to eject the instruction pointer value. If that happens, then isn’t it provided for somewhere in the bitcode? I imagine so, but I don’t know where to dig for it. Then again, it may be something that is abstracted away from the bitcode, so that it’s woven in by some lower level pass that’s right next to the assembler selection.

Brenda, could you explain your challenges/objectives to me further?