Map asm instructions back to source code

tony · November 4, 2021, 1:16pm

I work in the embedded sector, where certifying our software can involve juxtaposing specific lines of assembly with their respective source code.

For example, would it be possible, using llvm/clang, to dump all basic blocks containing condition instructions and their corresponding source lines?

zero9178 · November 4, 2021, 5:21pm

I don’t think are any builtin ways to do this but I can think of two things that might be of use or helpful when writing a tool that would do such a thing:

Firstly, objdump (both GNU and llvm-objdump) support outputting assembly output and mapping it back to the original source code as long as you are compiling with debug information. As an example for llvm-objdump this is the command line I often use for that purpose: llvm-objdump -C --line-numbers --x86-asm-syntax=intel --no-leading-addr --no-show-raw-insn <executable>. It will output the assembly code and interleave in comments to which line in which file the subsequent instructions refer to. One could likely parse this format automatically.

Possibly easier would be to get the LLVM IR produced by clang. Every LLVM IR instruction has a debug location metadata attached to it. LLVMs C++ API (and I am assuming C API), provides many ways to walk through the IR as well as to read the line location of an instruction.

This disadvantage of the LLVM IR approach might be that eg. a branch in LLVM IR might not cleanly map to a branch in the generated assembly. Generally speaking it would at that point not contain any optimizations and/or instruction selection information done by the target specific backend. (imagine a branch in LLVM IR, being compiled to a cmov on x86 eg.)

The disadvantage of the llvm-objdump approach would be that you at the very least need to parse the output yourself to get the source locations in the comments. I sadly have no clue how reuseable the LLVM assembler API is and whether you can get a list of instructions from there that would allow you to iterate over the instructions and search for eg. branch instructions, that is something to check if you’d take that approach.

Hope this helps

tony · November 13, 2021, 5:51pm

Thanks for the reply, Zero.

Today, we use a regex-based parser on interleaved src/asm. But the implementation is unreadable, unmaintainable, and untested. It is also supporting different “flavors” of asm.

LLVM IR
As you said, it is not guaranteed to map 1:1 with the emitted assembly. We require all assembly branches to be tracked, unfortunately.

I will look at the LLVM assembler API to see if it can be reused somehow. Ideally there exists some library that simply takes asm as input, and exposes a “walkable” list of sections/instructions.

Topic		Replies	Views
LLVM IR Question Clang Frontend	2	102	June 11, 2010
Tool for mapping assembly basic blocks to IR basic blocks Beginners llvm	2	582	February 24, 2023
IR to binary address mapping LLVM Dev List Archives	8	120	June 13, 2018
source code information in LLVM IR Clang Frontend	2	135	March 29, 2010
Annotating output assembly with input C statements LLVM Dev List Archives	2	86	April 16, 2013

Map asm instructions back to source code

Related Topics