converting x86 instructions to LLVM instructions

Dear all,

I am studying a paper (cf below) that says they have written a piece of code to translate x86 instructions to LLVM. I am interested in this, and would like to write the same kind of thing, but not using QEMU as they claim in the paper, but rather with my own pin tool.

From what I have red until now on LLVM, and my relative poor knowledge of assembly, I have the feeling that this is a huge task (many months of work). This holds me back to dig into it and I would well be encouraged if I knew that this is an easy thing for some of you, experts. If there was a public piece of code that does it already would be awesome of course. The paper is mentioning this (http://code.google.com/p/llvm-qemu/) but I am not sure it would lead me where I want.

Any comment welcome.

(http://wwwse.inf.tu-dresden.de/hotdep/S2-1-candea-cameraReady_HotDep_2009.pdf)

Alexandre Gouraud <alexandre.gouraud@enst-bretagne.fr> writes:

like to write the same kind of thing, but not using QEMU as they claim in
the paper, but rather with my own pin tool.

You could also use valgrind to convert x86 to valgrind's IR and then
write a tool to convert that IR to LLVM.

Hi Timo,

Thanks for commenting. I feel like I have to justify why I don’t want to use QEMU, which is fine since my choice is not frozen actually.

QEMU is much more than what I need for dynamically instrumenting software. My goal is automated testing to find bugs, which can quickly be intensive in term of computational load. Thus I am trying to get the smallest (and fastest) tool.
Even using QEMU, I am not sure the piece of code already exists. And if it does, I can still extract it and put where I want. My question is really : is it a long work (several months) or is it just a matter of two weeks? And if it does not already exists, could it mean it is a nonsense, then why?

Finally about your proposition using valgrind, I think I will stick to windows tools because this is what I know better. But it was ironic, wasn’t it?

Alexandre.

2009/9/29 Timo Juhani Lindfors <timo.lindfors@iki.fi>

Hi,

Alexandre Gouraud <alexandre.gouraud@enst-bretagne.fr> writes:

if it does not already exists, could it mean it is a nonsense, then why?

Why don't you compile your program directly to LLVM bitcode?

Alexandre Gouraud wrote:

Thanks for commenting. I feel like I have to justify why I don't want to use
QEMU, which is fine since my choice is not frozen actually.

QEMU is much more than what I need for dynamically instrumenting software.
My goal is automated testing to find bugs, which can quickly be intensive in
term of computational load. Thus I am trying to get the smallest (and
fastest) tool.
Even using QEMU, I am not sure the piece of code already exists. And if it
does, I can still extract it and put where I want. My question is really :
is it a long work (several months) or is it just a matter of two weeks? And
if it does not already exists, could it mean it is a nonsense, then why?

Finally about your proposition using valgrind, I think I will stick to
windows tools because this is what I know better. But it was ironic, wasn't
it?

The problem is that x86 has a very complex instruction set, and decoding it
all takes a lot of software. Valgrind already has the tools to do this
decoding, so it's definitely something I would consider. After all,
Valgrind already does much of what you're trying to do.

Andrew.

Hi Alexandre,

I am studying a paper (cf below) that says they have written a piece of code
to translate x86 instructions to LLVM. I am interested in this, and would
like to write the same kind of thing, but not using QEMU as they claim in
the paper, but rather with my own pin tool.

From what I have red until now on LLVM, and my relative poor knowledge of
assembly, I have the feeling that this is a huge task (many months of work).
This holds me back to dig into it and I would well be encouraged if I knew
that this is an easy thing for some of you, experts. If there was a public
piece of code that does it already would be awesome of course. The paper is
mentioning this (http://code.google.com/p/llvm-qemu/) but I am not sure it
would lead me where I want.

Changing llvm-qemu to use x86 is a matter of a few hours (only trivial
changes to the source code are required). Nevertheless, I believe
valgrind is a better choice for you if your goal is dynamic binary
instrumentation (simply because it was designed for this purpose).
However, it also depends on whether you want to generate LLVM IR or
not. If you do, llvm-qemu might be the better choice.

Are there particular reasons why you want to translate to LLVM IR?
(E.g. the authors of the paper wanted to be able to use KLEE with
machine code)

Cheers,

Tilmann

Are there particular reasons why you want to translate to LLVM IR?
(E.g. the authors of the paper wanted to be able to use KLEE with
machine code)

Hi Tilmann

I want to do the same. Using KLEE with machine code. With such a framework, I could try to do the same that what is explained here : http://research.microsoft.com/en-us/um/people/pg/public_psfiles/ndss2008.pdf
But as you can deduce from the url, nothing is open source there. For this I need an IR I can work on easily, and I think LLVM is a good candidate.

What about your llvm-qemu implementation? You are the author aren’t you? I could not understand from the progress status if this x86 to LLVM translation worked or not.

Hi Timo,

Thanks for commenting. I feel like I have to justify why I don't want to use
QEMU, which is fine since my choice is not frozen actually.

QEMU is much more than what I need for dynamically instrumenting software.
My goal is automated testing to find bugs, which can quickly be intensive in
term of computational load. Thus I am trying to get the smallest (and
fastest) tool.
Even using QEMU, I am not sure the piece of code already exists. And if it
does, I can still extract it and put where I want. My question is really :
is it a long work (several months) or is it just a matter of two weeks? And
if it does not already exists, could it mean it is a nonsense, then why?

Finally about your proposition using valgrind, I think I will stick to
windows tools because this is what I know better. But it was ironic, wasn't
it?

You might want to have a look at DynamoRIO:
http://code.google.com/p/dynamorio/

It is also avialable for Windows.

Martin

You might want to have a look at DynamoRIO:
http://code.google.com/p/dynamorio/

It is also avialable for Windows.

Thanks Martin,

I know dynamorio, but I think it is almost the same thing as Pin.

Hi Alexandre,

I want to do the same. Using KLEE with machine code. With such a framework,
I could try to do the same that what is explained here :
http://research.microsoft.com/en-us/um/people/pg/public_psfiles/ndss2008.pdf
But as you can deduce from the url, nothing is open source there. For this I
need an IR I can work on easily, and I think LLVM is a good candidate
What about your llvm-qemu implementation? You are the author aren't you? I
could not understand from the progress status if this x86 to LLVM
translation worked or not.

Yeah, I wrote llvm-qemu in 2007. At the moment it only supports ARM,
but due to the architecture of qemu it's very easy to change it to any
other target architecture supported by qemu (in qemu terminology:
target = architecture to emulate). I haven't really worked on it for a
long time though, so it has bitrotted quite a bit (e.g. it needs to be
built against LLVM 2.1). Still, it shouldn't be too much work to bring
it up to date (much less than implementing a translator from some
other IR to LLVM IR). I assume all of this has already been done by
the authors of the "Selective Symbolic Execution" paper, and maybe
they're willing to contribute their stuff back? Technically it's all
GPL :slight_smile:

Another thing I have in the back of my mind is retargeting the current
version of qemu to LLVM which uses a new code generator called TCG
(back when llvm-qemu was written qemu used a code generator called
dyngen). Translating from TCG IR to LLVM IR seems to be rather
straightforward. Depending on how much time you want to invest this
might be an option for you too. I guess using qemu as a base for your
system is also interesting because it allows you to do full system
simulation.

Cheers,

Tilmann