For a project that I’m working on, I have this patch in AsmWriter. This allows me to see the instruction address and debug info location for each instruction inline.
diff --git a/llvm/lib/IR/AsmWriter.cpp b/llvm/lib/IR/AsmWriter.cpp
index 0bf8be9ac55f..fd7357741ede 100644
--- a/llvm/lib/IR/AsmWriter.cpp
+++ b/llvm/lib/IR/AsmWriter.cpp
@@ -4191,6 +4191,14 @@ void AssemblyWriter::printInfoComment(const Value &V) {
if (AnnotationWriter) {
AnnotationWriter->printInfoComment(V, Out);
}
+
+ if (auto *I = dyn_cast<Instruction>(&V)) {
+ if (I->getDebugLoc()) {
+ Out << " ; ";
+ I->getDebugLoc().print(Out);
+ }
+ }
+ Out << " ; " << &V;
}
static void maybePrintCallAddrSpace(const Value *Operand, const Instruction *I,
This is super useful for certain tasks. If I want to find the instruction(s) for a particular line of code I can build with debug info and search for the file:line in the output. (It’s also possible to figure out the file:line from the raw debug info metadata that’s also printed, but much more tediously.)
The instruction addresses are useful for determining where a specific instruction came from. For example, I can compile with -print-after-all and search the log for the address to find the pass that introduced it. And if the address disappears from the log, I’ve identified the pass that deleted it.
Sometimes I need more information than that: I need to know the line of code that created the instruction. For that, I run the compiler under rr record. Then I start a replay session, set a breakpoint at the beginning of the responsible pass, let the compiler run up to the breakpoint and then set a watchpoint at the instruction address. Because rr is deterministic, the instruction will receive the same address during the replay and the watchpoint will trigger in the instruction’s constructor, and I can unwind the stack to get a backtrace.
Would it make sense to add this output to mainline LLVM (presumably under a flag)?
By all means, send a PR to add that mode, but I think you’ve pointed to a major deficiency in the textual format. The debug metadata was not engineered to be readable, and we should revisit that over a longer term. Note that MLIR embeds source locations inline with every instruction (loc("/filepath/foo.c":12:1)), similar to how we print DIExpression today.
Maybe we can improve on that by storing the full path out of line, something like: loc(!f"1/file.c":12:1) so that we can embed a compact base name with a line and column. Later on in the metadata we can have a block of !f"1/file.c" = DIFile(...)
On the other hand, this has been annoying enough for readability that it is disabled by default (which is a major flaw in MLIR in my opinion)
(I thought there was a flag to print these at the end of the file instead of inline, but can’t find it just now)
I think LLVM’s style of instruction ... !dbg !123 is short, but also super unreadable, and the DILocation block at the end is also pretty ugly.
Do you think it would make a meaningful difference if we could compress the filenames to some kind of semi-human-readable identifier, kind of like DOS 8.3 names, so we can keep a short, readable, searchable identifier inline, but punt the big filenames to the debug info block at the end of the file?
One option is to use an AssemblyAnnotationWriter to add this support out-of-tree, though there would likely be benefit from having it possible to enable via a command line flag as well. For example, Julia’s code_llvm function annotates the IR with each place where there is a change to file, line, or inlining using this adaptor class: julia/src/disasm.cpp at b79856e7a84b7c945590cafae74efbeaf4d9d8f9 · JuliaLang/julia · GitHub. Like you, I find this annotation indispensable for being able to find the original line of code associated with an instruction. LLVM has a couple open issues (and partial PRs) that may be relevant as well, such as 17465 – assembler output customization
Since my original post I’ve been doing some work on the backend and discovered that a similar feature would be useful for MachineInstr dumps for similar reasons, so I added that as well.