new comer question

Hello LLVMers,

  I am considering LLVM for a project, but I am not sure the handling of
(C/C++ or 3rd party) library code used by user program. My
understanding is that if those library code comes in native format, then
all the benefit of LLVM is lost on them. Can they be decompiled into
LLVM bytecode, or what options are there if I want to write a runtime
pass that has to analyze native code?

Thanks,
-bx

Bin Xin wrote:

Hello LLVMers,

  I am considering LLVM for a project, but I am not sure the handling of
(C/C++ or 3rd party) library code used by user program. My
understanding is that if those library code comes in native format, then
all the benefit of LLVM is lost on them.

More accurately, the LLVM analyzers and optimizers cannot analyze/transform code that is not in LLVM intermediate representation. Calls to functions in native code libraries will appear as calls to external functions.

Can they be decompiled into
LLVM bytecode, or what options are there if I want to write a runtime
pass that has to analyze native code?
  

I think you have several options:

1) If you have the source to the library, you can compile the library to LLVM bitcode and link it into the program.

2) If you don't have source code but the library has simple, well-known interfaces (e.g., libc), you can write your analysis passes to handle calls to the library's functions as special cases. As an example, our work on memory safety cannot insert run-time checks into libc functions like memset(), but it can recognize calls to libc functions and place run-time checks before the calls to prevent the functions from violating memory safety.

3) You can try to convert binary code to LLVM intermediate code either ahead of time or at run-time. There was some work to incorporate LLVM into Qemu (a dynamic binary translation system), and such as system may be able to do this sort of thing for you. There is no static binary to LLVM translator of which I know.

I cannot vouch for the viability of binary translation (I've never used LLVM-Qemu), but the other two have worked successfully in our projects at Illinois.

-- John T.

John Criswell wrote:

Bin Xin wrote:

Hello LLVMers,

  I am considering LLVM for a project, but I am not sure the handling of
(C/C++ or 3rd party) library code used by user program. My
understanding is that if those library code comes in native format, then
all the benefit of LLVM is lost on them.

More accurately, the LLVM analyzers and optimizers cannot
analyze/transform code that is not in LLVM intermediate representation.
Calls to functions in native code libraries will appear as calls to
external functions.

Can they be decompiled into
LLVM bytecode, or what options are there if I want to write a runtime
pass that has to analyze native code?
  

I think you have several options:

1) If you have the source to the library, you can compile the library to
LLVM bitcode and link it into the program.

2) If you don't have source code but the library has simple, well-known
interfaces (e.g., libc), you can write your analysis passes to handle
calls to the library's functions as special cases. As an example, our
work on memory safety cannot insert run-time checks into libc functions
like memset(), but it can recognize calls to libc functions and place
run-time checks before the calls to prevent the functions from violating
memory safety.

3) You can try to convert binary code to LLVM intermediate code either
ahead of time or at run-time. There was some work to incorporate LLVM
into Qemu (a dynamic binary translation system), and such as system may
be able to do this sort of thing for you. There is no static binary to
LLVM translator of which I know.

I cannot vouch for the viability of binary translation (I've never used
LLVM-Qemu), but the other two have worked successfully in our projects
at Illinois.

-- John T.

Thanks for the pointers.

-bx