Hi, we are a couple of student researchers at Stanford working to build a tool that can map basic blocks in a final binary to LLVM-IR basic blocks. The motivation for this is to be able to map paths taken in LLVM-IR-based symbolic execution tools to paths taken through actual assembly instructions – even when the actual execution is taking place on an embedded (e.g. Cortex-m4) device, but the symbolic execution has to take place on a more resourceful machine.
We are wondering what tools provide debug output that might be most useful for this task. We have discovered that llc
will annotate the assembly it generates with basic block numbers, but we cannot find any documentation regarding how to interpret the basic block numbers, and we have not found a clear mapping of those annotations to basic blocks in the actual LLVM IR.
We have also attempted an approach similar to the one suggested in this post, but found that the source location information in the IR debug info and the source information provided by objdump
often do not map to each other well, especially in the presence of inlining, function outlining, and other optimizations.
Is there a way to get llc
to output additional information that will show all of the source basic blocks which were used to generate individual assembly instructions? We realize that for certain cases (e.g. function outlining, conditional instructions, etc.) there may be a one-to-many, or many-to-one mapping, but even that would be very helpful to us.