Annotating output assembly with input C statements

Hi,
I'm trying to annotate the final assembly output of my llvm codegen with the corresponding input C statements.
It would've been super easy if the source information were included in the IR debug info. But obviously they are not, and there are good reasons why not !
So I'm bound to collecting all my information in the back-end from the existing debug pseudo instructions. As you know, the debug instructions only have file name and line number information. But without actually parsing the input file and collecting the C statement and function start and end line numbers in the file, it would be a non-trivial task to emit proper source line information in the output assembly. Obviously, it is not recommended to use clang objects in the backend either, so I was wondering if anyone in the llvm community has attempted this before. Currently, I'm planning on writing my own parser but if someone has better solutions, please let me know.

Thanks
Ali

What, specifically, do you need a parser for? Specifying file+line pairs in
debug info is standard practice. The debugger should then have access to
the original C source file to show the code. Including input C code into
every binary with debug info would blow up its size considerably (C files
include a lot of code from include files...) and is unnecessary. Why is
displaying a line of C code corresponding to some file+line pair more than
just reading a specific line from a given file?

Eli

This is a backward compatibility feature to an older compiler for those who do not have (or can’t use) debugger; but want to have an easy way of seeing the generated instructions for each C statement in simple text form.

Of course this problem may also be handled in a separate utility but since it is traditionally done in the compiler, I wanted to see if anyone in the llvm community has any solution other than the trivial one.

(file + line) is enough for debugger, but for my case I need to know if the statement is multi-line in which case I would need to emit all lines before the instructions are emited for that statement; so some parsing would be needed to figure out that information.

A.