Consistency of Memory Layout?

Hello, first post here!

I have been in the #llvm IRC a few times wondering the following: If I have C++ program compiled with an unknown compiler, and I have compiled llvm+clang with said compiler and linked them as libraries, and I compile some C++ code with clang+llvm and inject it into said “host” program, how can I obtain/manipulate data from the “guest” program from the “host” program? How do I know the memory layout of an int in the guest program (for example) is the same memory layout as an int from the “host” program? Is it possible the LLVM and Clang API’s are ABI aware (of how LLVM+Clang will compile) and can provide access to the data, int’s, floats, function pointers, etc?

Thank you

  • TFB

If you have an opaque binary, you strictly speaking have no guarantees about what ABI it may be expecting. In practice, though, most binaries are built according to a standardized ABI, so if the program exports symbols and header files to link against, you should (barring bugs or missing features) be able to communicate effectively using a common ABI.

Given your word choice, however, I suspect that you are interested in the case where the program isn’t cooperatively exporting symbols and header files. At this point, recovering the ABI may be possible via parsing debug information and making some educated guesses (if debug symbols are available) or via outright decompilation to recover the ABI. LLVM and Clang do not have facilities to expedite this kind of decompilation.