Binary to LLVM IR lifter?

Hello Fredi,

This is my experience with a publicly available tool called McSema () which can convert x86 machine code to functional LLVM IR.

  • Pluses of McSema

  • Well documented

  • Fully functional LLVM IR, i.e. the recovered LLVM IR can be re-written to binary and executed.

  • Pluggable control flow graph recovery phase: The tool has 2 independent phases: In the first phase, it extracts control flow graph (cfg) information from the binary (using a tool bin_descend). Then it will write the recovered cfg into a Google Protocol Buffer serialized file. There is also an IDAPython script to recover cfg from within IDA Pro (which is a commercial solution ). In that sense, we can plug-in any solution to recover cfg. In the second phase McSema converts this cfg into LLVM IR.

  • Minuses of LLVM IR recovered from Mcsema:

  • One of the downside of this recovered LLVM IR is that the variable (scalar/aggregate) and type information is not recovered in that LLVM IR. In our group, we are actively working on recovering the variable and type information.

Thanks and Regards,
Sandeep Dasgupta
PhD Student, University of Illinois Urbana Champaign