Question About inserting Instruction?

Hi,

I am working on a project with LLVM. What I need to do is to generate/insert some dummy/dead basic blocls which never have chance to be executed and put some instructions in those dummy/dead basic blocks.

So far, the dummy/dead BB insertion is done. I am trying insert instructions in those dummy/dead BB. Actually, I can insert the legal instructions to dummy/dead BB, however, I really want to insert some illegal instructions/any thing into those BB since those instructions have no chance to be executed. I am not sure if it could be done or not.

For example,
Correct way:
Instruction *NewInst = new LoadInst(…);
NewBB->getInstList().push_back(NewInst);

what I need just put some junk data in the BB, not instructions. From assemble code level, it looks like the following,

a piece of code from correct instructions by disassemble object code.

:00000009 0533709283 add eax, 83927033
:0000000E 05A2B78135 add eax, 3581B7A2
:00000013 C1C819 ror eax, 19
:00000016 05E5167711 add eax, 117716E5
:0000001B 0542F7A8DC add eax, DCA8F742

:00000009 0533709283 add eax, 83927033
:0000000E 7878787878 ??? <<<<<< here is the illegal instruction.
:00000013 23232 ??? <<<<<<
:00000016 05E5167711 add eax, 117716E5
:0000001B 0542F7A8DC add eax, DCA8F742

what I tried is to make *NewInst point to random memory(cast to Instuction pointer) and push_back to instList. But I failed to do it.

Instruction *NewInst = ;
NewBB->getInstList().push_back(NewInst);

So I was wondering if it is allowed in LLVM or not, if so, how to do that?

Let me know if my question is not clear.

Thanks
Qiuyu

I am working on a project with LLVM. What I need to do is to
generate/insert some dummy/dead basic blocls which never have chance to be
executed and put some instructions in those dummy/dead basic blocks.

So far, the dummy/dead BB insertion is done. I am trying insert
instructions in those dummy/dead BB. Actually, I can insert the legal
instructions to dummy/dead BB, however, I really want to insert some
illegal instructions/any thing into those BB since those instructions have
no chance to be executed. I am not sure if it could be done or not.

For example,
Correct way:
    Instruction *NewInst = new LoadInst(...);
    NewBB->getInstList().push_back(NewInst);

what I need just put some junk data in the BB, not instructions. From
assemble code level, it looks like the following,

.....

    what I tried is to make *NewInst point to random memory(cast to
Instuction pointer) and push_back to instList. But I failed to do it.

            Instruction *NewInst = ;
            NewBB->getInstList().push_back(NewInst);

So I was wondering if it is allowed in LLVM or not, if so, how to do that?

Let me know if my question is not clear.

Actually, it's not clear. Why do you want to put bogus Instruction* pointers
into basic blocks? If you are not going to ever use those instructions, you
can create/insert instances of UnreachableInst, or any other instruction you
like.

If you do want to use the inserted instructions in any way, then inserting
bogus Instruction* pointers would just crash your application.

- Volodya

I am working on a project with LLVM. What I need to do is to
generate/insert some dummy/dead basic blocls which never have chance
to be executed and put some instructions in those dummy/dead basic
blocks.

OK.

So far, the dummy/dead BB insertion is done. I am trying insert
instructions in those dummy/dead BB. Actually, I can insert the legal
instructions to dummy/dead BB, however, I really want to insert some
illegal instructions/any thing into those BB since those instructions
have no chance to be executed. I am not sure if it could be done or
not.

First of all, you should be aware that the native code generators will
*REMOVE* all dead code before generating native code, so you might want
to comment out "createUnreacableBlockEliminationPass" in
llvm/lib/<Target>/<Target>TargetMachine.cpp, located in method
<Target>TargetMachine::addPassesToEmitAssembly() .

You should also be careful that you don't run any such passes that are
automatically ran by gccas and/or gccld that will delete these blocks
for you (after all, they are dead!). These passes include unreachable
block elimination (ran by code generators) and CFG simplification
passes, among others.

For example,
Correct way:
    Instruction *NewInst = new LoadInst(...);
    NewBB->getInstList().push_back(NewInst);

what I need just put some junk data in the BB, not instructions. From
assemble code level, it looks like the following,

a piece of code from correct instructions by disassemble object code.
   
:00000009 0533709283 add eax, 83927033
:0000000E 05A2B78135 add eax, 3581B7A2
:00000013 C1C819 ror eax, 19
:00000016 05E5167711 add eax, 117716E5
:0000001B 0542F7A8DC add eax, DCA8F742

:00000009 0533709283 add eax, 83927033
:0000000E 7878787878 ??? <<<<<< here is the illegal instruction.
:00000013 23232 ??? <<<<<<
:00000016 05E5167711 add eax, 117716E5
:0000001B 0542F7A8DC add eax, DCA8F742

    what I tried is to make *NewInst point to random memory(cast to
    Instuction pointer) and push_back to instList. But I failed to do
    it.
    
            Instruction *NewInst = ;
            NewBB->getInstList().push_back(NewInst);

So I was wondering if it is allowed in LLVM or not, if so, how to do that?

LLVM code must not have any dangling pointers, and hence, this is not
valid LLVM.

If you want to generate "invalid native code", the way I would suggest
doing it is to create some LLVM instruction in the dead basic block that
you can easily identify, such as:

* create a new external function, do not define it
* call it from the dead basic block
* then, modify the native code generator for your chosen platform to
  look for the call(s) to the fake external function and create some
  "new instruction", i.e. one that's invalid for the real target but one
  that gives you the bit pattern you want
* you will want to add a new instruction definition to the .td file,
  and then generate it in the instruction selector

However, the question is what is your bigger goal? What you're doing
here is hacking around the optimizers, trying to trick them to not
delete the dead code. Perhaps there is another way to achieve your end
goal, if you could tell us what the big picture is.

This isn't going to work. The LLVM code always has to be well-defined. The way to get the machine code to contain garbage like this is to add an intrinsic, then have the code generator expand it to the garbage you want.

-Chris