converting x86 instructions to LLVM instructions

- In security-testing you sometimes apply black boxing.

I've had a similar idea lately.
http://www.crazylazy.info/blog/content/x86-differently-vine-and-llvm-klee

x86 in general for reverse engeneering purposes isn't very useful.
If you could use LLVM-qemu to get an intermediate representation of a
specific binary and selectively execute functions symbolically, you'd
have a "fuzzer" that reaches code-paths - in any case. That's a much
deeper verification. If you read the KLEE research paper and take a look
at the number of overlooked bugs they were able to identify, this could
be very effective.

I don't know how to modify llvm-qemu to translate x86 to LLVM IL. This
is not trivial: qemu is a very limited "emulation". The "target" x86
won't have MSRs and specific instructions. The abstraction level is
higher. However for unspecific targets it might scale. Marking variables
as symbolic in LLVM bytecode however...

In any case it would be interesting to be able to translate x86 to LLVM
IR. If somebody want's to give that a try let's make a plan ;).

Have fun,
Marius

Hi Marius,

Hi,

Alexandre Gouraud <alexandre.gouraud@enst-bretagne.fr> writes:
> if it does not already exists, could it mean it is a nonsense, then why?

Why don't you compile your program directly to LLVM bitcode?

- In security-testing you sometimes apply black boxing.

Once you use the structure of the machine code of the system under
test to generate test cases it is no longer black box testing though
:slight_smile:

I've had a similar idea lately.
http://www.crazylazy.info/blog/content/x86-differently-vine-and-llvm-klee

x86 in general for reverse engeneering purposes isn't very useful.
If you could use LLVM-qemu to get an intermediate representation of a
specific binary and selectively execute functions symbolically, you'd
have a "fuzzer" that reaches code-paths - in any case. That's a much
deeper verification. If you read the KLEE research paper and take a look
at the number of overlooked bugs they were able to identify, this could
be very effective.

I agree, this is an interesting idea.

I don't know how to modify llvm-qemu to translate x86 to LLVM IL. This
is not trivial: qemu is a very limited "emulation". The "target" x86
won't have MSRs and specific instructions. The abstraction level is
higher.

Actually quite the opposite is true :slight_smile: The emulation is very accurate,
otherwise it would not be possible to take a random operating systems
and run it without modification in full system emulation mode. And
this requires an accurate emulation of other things as well, e.g. the
MMU. After all, the authors of the "Selective Symbolic Execution"
paper have shown that llvm-qemu is suited for this purpose.

Essentially what happens when llvm-qemu translates a basic block of
machine code is that you get a semantically equivalent version of your
machine code in form of LLVM IR. With the LLVM IR operating on a
structure which represents the machine state (a bunch of registers and
some additional state). Regardless of how you translate machine code
to LLVM IR, you somehow need to model the machine state. I highly
doubt that LLVM IR generated by llvm-qemu looks much different than
LLVM IR generated by a hand-written frontend which goes directly from
machine code to LLVM IR.

However for unspecific targets it might scale. Marking variables
as symbolic in LLVM bytecode however...

Well, as your input is machine code you somehow need to specify in
which register you want to put your symbolic value (or at which memory
address). Then you need to map it to LLVM IR, which at least in the
register case is rather straightforward.

In any case it would be interesting to be able to translate x86 to LLVM
IR. If somebody want's to give that a try let's make a plan ;).

Cheers,

Tilmann