Bug 37728 - [meta] Make llvm passes debug info invariant
37728 – [meta] Make llvm passes debug info invariant
Further discussion on methods.
https://groups.google.com/g/llvm-dev/c/yvbWr4azdh0/m/gy1tQIzIDwAJ
Neil Nelson
Thanks for the links:)
Hi folks, it’s my first post in llvm-dev mailing list, and definitely not the last 
Recently, I found an elf file built with or without debug info has different machine code generated. Sadly, it cannot be reproduced in a piece of code. Here is my investigation.
clang -S -emit-llvm foo.cc <http://foo.cc> -O3 -ggdb3 -o dbg.ll
clang -S -emit-llvm foo.cc <http://foo.cc> -O3 -o rel.ll
Where foo.cc <http://foo.cc> is a cc file in my company of 10k+ LOC and depends on tons of 3rd libraries.
The difference between dbg.ll and rel.ll are the llvm debug intrinsics. Emmmm, looks fine.
llc dbg.ll -o dbg.s
llc rel.ll -o rel.s
And the asm instructions are the same. Emmm, fine again.
llvm-mc -filetype=obj dbg.s -o dbg.o
llvm-mc -filetype=obj rel.s -o rel.o
The 2 obj files generated by LLVM assembler has DIFFERENT machine codes.
74 19 je f20
The obj compiled with debug info use 0x74 to represent a JE instruction, while
0f 84 15 00 00 00 je f20
The obj compiled without debug info use 0x0f 0x84 instead.
What? Why the debug info affects the generation of machine code? As a LLVM beginner, I’m willing to dive deeper to find the root cause.
Thanks in advance.
llvm.dbg.* are intrinsics (subset of Instruction).
DbgInfoIntrinsic
DbgLabelInst
DbgVariableIntrinsic
DbgValueInst: llvm.dbg.value
DbgAddrIntrinsic: llvm.dbg.addr
DbgDeclareInst: llvm.dbg.declare (similar to llvm.dbg.addr, but not control-dependent)
It is very easy to forget accounting for their existence in an optimization pass.
for (Instruction &I : BB) {
if (isa<DbgInfoIntrinsic>(I))
continue;
...
}
for (Instruction &I : instructions(F)) {
if (isa<DbgInfoIntrinsic>(I))
continue;
...
}
If an optimization pass does not skip llvm.dbg.* but makes their occurrences affect its heuristics (for example, counting the number of instructions in a basic block), the transformation result may be different with and w/o llvm.dbg.*.
GCC has -fcompare-debug and it seems that in the past they had fought diligently with the debug-affecting-codegen problems as well. (I am happy to take a stab at implementing it if others think it is mildly useful)
It is not clear how serious the problem in LLVM is. If for example, the llvm-project codebase can be fixed relatively easily, we probably could add a built bot to detect new problems.
Thanks for diving into this. Fwiw, we already have some tooling for identifying and investigating debug-affecting-codegen issues [1][2][3]. I'm not familiar with gcc's -fcompare-debug: while it could be better than what we've got, imho it makes sense to focus on addressing issues we already know about or can trivially detect. (To find lots more of these issues, simply build LNT [4] with the Os and Os-g profiles and diff the object files, or run [3] on the tests for your backend of choice.)
To elaborate on [3] a bit: there appears to be a long tail of codegen difference bugs lurking around in the various backends, but not many (if any? -- it's been a while since I looked) at the IR level. I believe one of the root causes for this is that IR-level use-def chains ignore llvm.dbg.* uses by default (thanks to the ValueAsMetadata abstraction), while MIR-level use-def chains _include_ debug uses by default (see MachineRegisterInfo::use_*). It appears to be way too easy to write backend code that incorrectly assumes that debug uses are not there.
I went on a bit of a spree trying to fix some of those issues in the AArch64 backend, starting with [5]. For a brief moment it was possible to add debug info to all the tests in test/CodeGen/AArch64 and still have all of them pass. Alas, that's no longer true. Adding a buildbot could help with this. It could also be valuable to change the MachineRegisterInfo default to ignore debug uses -- that's a larger change that would require a fair amount of community review and buy-in.
[1] Object file level diffing: https://github.com/vedantk/scripts/blob/master/objdiff_driver.sh
[2] IR-level debug-affecting-codegen detection: https://github.com/vedantk/scripts/blob/master/opt-check-dbg-invar.sh
[3] MIR-level debug-affecting-codegen detection: How to Update Debug Info: A Guide for LLVM Pass Authors — LLVM 15.0.0git documentation (e.g. `llvm-lit test/CodeGen/AArch64 -Dllc="llc -debugify-and-strip-all-safe"`)
[4] GitHub - llvm/llvm-test-suite
[5] rG5c04274dab48
vedant
Really appreciate the links:) I'll study them. A build bot will
definitely be helpful.