Map asm instructions back to source code

I work in the embedded sector, where certifying our software can involve juxtaposing specific lines of assembly with their respective source code.

For example, would it be possible, using llvm/clang, to dump all basic blocks containing condition instructions and their corresponding source lines?

I don’t think are any builtin ways to do this but I can think of two things that might be of use or helpful when writing a tool that would do such a thing:

Firstly, objdump (both GNU and llvm-objdump) support outputting assembly output and mapping it back to the original source code as long as you are compiling with debug information. As an example for llvm-objdump this is the command line I often use for that purpose: llvm-objdump -C --line-numbers --x86-asm-syntax=intel --no-leading-addr --no-show-raw-insn <executable>. It will output the assembly code and interleave in comments to which line in which file the subsequent instructions refer to. One could likely parse this format automatically.

Possibly easier would be to get the LLVM IR produced by clang. Every LLVM IR instruction has a debug location metadata attached to it. LLVMs C++ API (and I am assuming C API), provides many ways to walk through the IR as well as to read the line location of an instruction.

This disadvantage of the LLVM IR approach might be that eg. a branch in LLVM IR might not cleanly map to a branch in the generated assembly. Generally speaking it would at that point not contain any optimizations and/or instruction selection information done by the target specific backend. (imagine a branch in LLVM IR, being compiled to a cmov on x86 eg.)

The disadvantage of the llvm-objdump approach would be that you at the very least need to parse the output yourself to get the source locations in the comments. I sadly have no clue how reuseable the LLVM assembler API is and whether you can get a list of instructions from there that would allow you to iterate over the instructions and search for eg. branch instructions, that is something to check if you’d take that approach.

Hope this helps

Thanks for the reply, Zero.

Today, we use a regex-based parser on interleaved src/asm. But the implementation is unreadable, unmaintainable, and untested. It is also supporting different “flavors” of asm.

LLVM IR
As you said, it is not guaranteed to map 1:1 with the emitted assembly. We require all assembly branches to be tracked, unfortunately.

I will look at the LLVM assembler API to see if it can be reused somehow. Ideally there exists some library that simply takes asm as input, and exposes a “walkable” list of sections/instructions.