disassembly/decompiling

Hi, just read the LLVM 2.6 release announcement, the bit about llvm-mc caught my attention. I've been looking for a tool to disassemble x86 object files into an IR and then reassemble them into x86_64 object code. The immediate use for them would be to convert driver blobs that some vendors provide for their hardware (e.g. the Lucent modem driver) so they can be used in a 64 bit kernel. From the release announcement it looks like llvm-mc isn't ready for this purpose yet, was just curious if this kind of task was anywhere on its roadmap. Thanks...

We don't have anything like that planned, but do plan to do an assembler and disassembler. The disassembler (for x86-16/32/64) is iterating on review comments before it goes in. The assembler is currently being built out and will initially support macho. Translating X86-32 to X86-64 sounds tricky but it could probably be built on some of this infrastructure.

-Chris

Chris Lattner wrote:

Hi, just read the LLVM 2.6 release announcement, the bit about llvm-
mc caught
my attention. I've been looking for a tool to disassemble x86 object
files
into an IR and then reassemble them into x86_64 object code. The
immediate use
for them would be to convert driver blobs that some vendors provide
for their
hardware (e.g. the Lucent modem driver) so they can be used in a 64
bit
kernel. From the release announcement it looks like llvm-mc isn't
ready for
this purpose yet, was just curious if this kind of task was anywhere
on its
roadmap. Thanks...

We don't have anything like that planned, but do plan to do an
assembler and disassembler. The disassembler (for x86-16/32/64) is
iterating on review comments before it goes in. The assembler is
currently being built out and will initially support macho.
Translating X86-32 to X86-64 sounds tricky but it could probably be
built on some of this infrastructure.

Thanks for the response. I guess the real question is how much functionality the disassembler will have. If it only disassembles to assembly source files that's one thing. If it can go all the way to the LLVM IR that should make going to anything else pretty trivial.

Howard Chu wrote:

Chris Lattner wrote:

Hi, just read the LLVM 2.6 release announcement, the bit about llvm-
mc caught
my attention. I've been looking for a tool to disassemble x86 object
files
into an IR and then reassemble them into x86_64 object code. The
immediate use
for them would be to convert driver blobs that some vendors provide
for their
hardware (e.g. the Lucent modem driver) so they can be used in a 64
bit
kernel. From the release announcement it looks like llvm-mc isn't
ready for
this purpose yet, was just curious if this kind of task was anywhere
on its
roadmap. Thanks...

We don't have anything like that planned, but do plan to do an
assembler and disassembler. The disassembler (for x86-16/32/64) is
iterating on review comments before it goes in. The assembler is
currently being built out and will initially support macho.
Translating X86-32 to X86-64 sounds tricky but it could probably be
built on some of this infrastructure.

Thanks for the response. I guess the real question is how much functionality
the disassembler will have. If it only disassembles to assembly source files
that's one thing. If it can go all the way to the LLVM IR that should make
going to anything else pretty trivial.

By the way, another obvious use for this feature would be to re-optimize packaged binaries. E.g. in the Microsoft world there are a lot of apps out there that are optimized specifically for Intel CPUs and don't run as well as they ought to on AMD CPUs. Disassembling the binary into LLVM IR and then re-optimizing it would allow consumers to extract the maximum performance out of software they've purchased, even if the vendor has no interest in supporting them in this way.

Another obvious next step would be to allow re-compiling one platform's binaries to run on another CPU architecture. For most POSIX-based OSes it would only require a thin wrapper library to map the runtime environment from one platform to the other. Going down this direction would kind of reduce the need for the Fat-ELF project underway; if you can easily retarget a binary from one platform to another there's no reason to ship multiple binary formats in a single executable/object file.

Another obvious next step would be to allow re-compiling one platform's
binaries to run on another CPU architecture. For most POSIX-based OSes it
would only require a thin wrapper library to map the runtime environment from
one platform to the other. Going down this direction would kind of reduce the
need for the Fat-ELF project underway; if you can easily retarget a binary
from one platform to another there's no reason to ship multiple binary formats
in a single executable/object file.

How would one deal with things like sizeof(void *) being different on
different platforms, or platform-specific hacks in preprocessor
conditionals? I suppose you could use that thin POSIX wrapper to pass
the MAP_32BIT (at least on Linux) to mmap, and then what you're really
getting is not the 64-bit address space, but the extra registers and
target-specific optimizations.

I'm pretty there aren't plans to use llvm-mc to disassemble binaries
to LLVM IR. Recently some folks were discussing other ways of doing a
similar x86 -> LLVM translation on this list, though, so you might
check the archives.

Reid

It will disassembler encoded instruction bytes (CD 21) into an MCInst which can then be printed to a string "int $21".

-Chris

Even if you could disassemble to IR, you could not recompile it for the same reason that llvm does not give you platform independent C and C++
code is excluded based on preprocessor things that never gets into the generated code.
sizes which could be hardcoded by optimisations and such could be platform dependent.
also transforming machine code or even assembler to llvm IR would be non-trivial…