LLVM for binary analysis


I'm thinking of using LLVM for translating x86 binaries into LLVM IR and
performing further analysis based on the IR. The tool llvm-mc works
great for disassembling hex values. However, is there any way to
translate the disassembled machine code into LLVM IR and analyze them?
Any suggestion/help is greatly appreciated. Thanks!

- Beng

Interesting problem! Two suggestions that may be part of your solution:

  1. Using profile information to compose basic blocks out of the stream of assembly instructions (or does llvm-mc provide you with basic block information already?)

  2. Map assembly instructions of your target back into LLVM IR opcodes. This will not be precise I think, but probably enough for further analysis. However I’m not sure how to handle memory locations (i.e. you would need a way to associate accessed memory locations and used registers to pre-register-allocator labels, and keep track of this association as it changes along the code).

The problem is of my interest and if I have ideas about the memory location handling I will let you know.