Reverse engineering for LLVM bit-code

HI,

I am interested in whether LLVM bit-code is ready for a distribution format(stored in software distribution package); is it easy to revert LLVM IR to C/C++ source code like Java byte code? My understanding is that.
1. LLVM IR is more like assembly code, so it is not easy for reverse engineering.
2. If it is easy for reverse engineering, does it mean it is not suitable for distribution format? Otherwise code obfuscation in IR level must be added.

Thanks
Wan Xiaofei

LLVM IR represents higher level than assembler code, it keeps some names and it is easier to revert the IR to source code than a binary format.

The main task of LLVM IR is code generation. I don’t think adding obfuscation has particular worth, those who need it can use tools and approaches specifically aimed at obfuscation. Even simple rename of identifiers in source code makes C/C++ file very difficult to analyze. In other cases one might use anti-debugger tricks or execution code in virtual machine. Everything depends on the level of obfuscation, it is impractical to make LLVM IR a tool for that.

Thanks,
–Serge

Thanks, I just want to get the conclusion “LLVM IR” is easy to be reverted into source code.

Code obfuscation is not worth of discussion here, at least it is not IR’s coverage, haha.

But one more question here, there are some optimization passes are applied in the frontend before generating BC, so it may not easy to revert IR to source code.

Thanks
Wan Xiaofei

HI,

I am interested in whether LLVM bit-code is ready for a distribution format(stored in software distribution package); is it easy to revert LLVM IR to C/C++ source code like Java byte code? My understanding is that.
1. LLVM IR is more like assembly code, so it is not easy for reverse engineering.

IDA and HexRays show that it is extremely possible to reverse engineer assembly code (at least that which comes out of a C/C++ compiler) to C/C++ code. But even though that's the question you asked, it's not what you meant to ask. What makes Java easy to reverse engineer is that it retains full structural typing and names of the original program [1]. LLVM lacks names for fields of structural types (although it does retain struct names and global names), but optimization passes will render all SSA names completely illegible, and they often appear to destroy structural typing a fair amount too.

2. If it is easy for reverse engineering, does it mean it is not suitable for distribution format? Otherwise code obfuscation in IR level must be added.

If you are super-paranoid about reverse-engineering, replace all names of functions with garbage names and all types with equivalent i8 arrays. The resulting IR will pretty much be exactly as informative about the original source code as the resulting assembly will be.

[1] Due to a version control bug, I ended up losing the source code to my C++ project while retaining the resulting library. I found this much easier to decompile than a project I once set myself of decompiling obfuscated Java bytecode (where the only obfuscation that provided a meaningful barrier to comprehension was name obfuscation).