Re:RE: Question about inserting instructions

Hi,

Thanks Volodya, Misha and Chris,

> For example,
> Correct way:
> Instruction *NewInst = new LoadInst(...);
> NewBB->getInstList().push_back(NewInst);
>
> what I need just put some junk data in the BB, not instructions. From
> assemble code level, it looks like the following,
>
> a piece of code from correct instructions by disassemble object code.
>
> :00000009 0533709283 add eax, 83927033
> :0000000E 05A2B78135 add eax, 3581B7A2
> :00000013 C1C819 ror eax, 19
> :00000016 05E5167711 add eax, 117716E5
> :0000001B 0542F7A8DC add eax, DCA8F742
>
>
> :00000009 0533709283 add eax, 83927033
> :0000000E 7878787878 ??? <<<<<< here is the illegal instruction.
> :00000013 23232 ??? <<<<<<
> :00000016 05E5167711 add eax, 117716E5
> :0000001B 0542F7A8DC add eax, DCA8F742
>
> what I tried is to make *NewInst point to random memory(cast to
> Instuction pointer) and push_back to instList. But I failed to do
> it.
>
> Instruction *NewInst = ;
> NewBB->getInstList().push_back(NewInst);
>
> So I was wondering if it is allowed in LLVM or not, if so, how to do that?

LLVM code must not have any dangling pointers, and hence, this is not
valid LLVM.

If you want to generate "invalid native code", the way I would suggest
doing it is to create some LLVM instruction in the dead basic block that
you can easily identify, such as:

* create a new external function, do not define it
* call it from the dead basic block
* then, modify the native code generator for your chosen platform to
  look for the call(s) to the fake external function and create some
  "new instruction", i.e. one that's invalid for the real target but one
  that gives you the bit pattern you want
* you will want to add a new instruction definition to the .td file,
  and then generate it in the instruction selector

However, the question is what is your bigger goal? What you're doing
here is hacking around the optimizers, trying to trick them to not
delete the dead code. Perhaps there is another way to achieve your end
goal, if you could tell us what the big picture is.

    Let's say on IR level, regular way the following IR code
       %tmp.0 = getelementptr [10 x sbyte]* %str1, int 0, int 0 ; <sbyte*> [#uses=1] store sbyte 116, sbyte* %tmp.0 %tmp.1 = getelementptr [10 x sbyte]* %str1, int 0, int 1 ; <sbyte*> [#uses=1] store sbyte 101, sbyte* %tmp.1 %tmp.2 = getelementptr [10 x sbyte]* %str1, int 0, int 2 ; <sbyte*> [#uses=1] store sbyte 115, sbyte* %tmp.2 %tmp.3 = getelementptr [10 x sbyte]* %str1, int 0, int 3 ; <sbyte*> [#uses=1] store sbyte 116, sbyte* %tmp.3
will be assembled to
        movb $116, 18(%esp) movb $101, 19(%esp) movb $115, 20(%esp) movb $116, 21(%esp)
But for me, in dummy BB, we'd like to put some meaningless code or illegal code. From assemble machine level, it looks like

                push %eax
                push %ecx
                pop %edx
                pusha
                safh
                cltd
                das
                clc

all of them are legal one-byte x86 machine instructions. Since those instructions have no chance to be executed, so it will not affect the original code. I thought the above machine code cannot be inserted by using new Instruction(....) way because it is IR level. So maybe we can control machineinst generator to generate the above code in dummy bb. By the way, those dummy BBs' name include string " dummy ", so we can identify which BB is dummy on IR level.

If there is a way to be able to get that, I am supposed that like the following,

1. generate some dummy BB on IR level ( working on *.bc by writing a pass)
2. llc *.bc ( generate machine code)
3. as -o *.s *o ( generate object file, or use gcc )
4. ld -o *.out *.o ( generate executable file)

during step 2, we read *.bc code and find dummy BB and put some meaningless machinecode, here, we cannot put some illegal machince code, otherwise, step 3 goes to fail. So is it possible to do that for inserting any machine code into BB? if so, how could we chang llc? I take a look at MachineInstr.c CodeGenerator.c etc, but I still don't know how to do it.

Here is a thing that may be useful to understand what I want to do. Some virus coder, they code a virus by assemble code and insert some meaningless code into virus, but they work on assemble level, so it is easy to get it. For me. I don't know if I could do same thing by another way.

--
This isn't going to work. The LLVM code always has to be well-defined.
The way to get the machine code to contain garbage like this is to add an
intrinsic, then have the code generator expand it to the garbage you want.
    
    So we cannot use LLVM code to this, but I am not clear for the way you mentioned.

Thanks

[snip]

                push %eax
                das
                clc

all of them are legal one-byte x86 machine instructions.

[snip]

If there is a way to be able to get that, I am supposed that like
the following,

1. generate some dummy BB on IR level ( working on *.bc by writing a pass)
2. llc *.bc ( generate machine code)
3. as -o *.s *o ( generate object file, or use gcc )
4. ld -o *.out *.o ( generate executable file)

during step 2, we read *.bc code and find dummy BB and put some
meaningless machinecode, here, we cannot put some illegal machince
code, otherwise, step 3 goes to fail.

Yes, you are correct -- if you want to create illegal code you need to
not use system as. What you need is the ability for llc to create
object files with native code directly, without using the system
assembler. I think someone is working on it, but I'm not sure as to the
status. Otherwise, you will just have some random one-byte
instructions.

So is it possible to do that for inserting any machine code into BB?
if so, how could we chang llc? I take a look at MachineInstr.c
CodeGenerator.c etc, but I still don't know how to do it.

The CodeEmitter would have to be enhanced to allow outputting standard
format object files that ld can process. If you are interested in doing
this, someone can point you in the right direction as to what needs to
be done.

during step 2, we read *.bc code and find dummy BB and put some
meaningless machinecode, here, we cannot put some illegal machince
code, otherwise, step 3 goes to fail.

Yes, you are correct -- if you want to create illegal code you need to
not use system as. What you need is the ability for llc to create
object files with native code directly, without using the system
assembler. I think someone is working on it, but I'm not sure as to the
status. Otherwise, you will just have some random one-byte
instructions.

Actually that's not true. You can make instructions with an asmstring of:

   ".byte 123\n .byte 56\n .byte 86" and those bytes will get emitted to the code stream.

-Chris

if so, how could we chang llc? I take a look at MachineInstr.c
CodeGenerator.c etc, but I still don't know how to do it.

The CodeEmitter would have to be enhanced to allow outputting standard
format object files that ld can process. If you are interested in doing
this, someone can point you in the right direction as to what needs to
be done.

-Chris