Hi all,
Every year at things like the debug-info round table, people (including me) point out a few large problems in LLVM’s debug-info that no-one ever gets around to solving. Most often it’s the presence of debug instructions such as dbg.value getting in the way of optimisations and slowing the compiler down [0,2], with another firm favourite being better line tables / placement of the DWARF is_stmt flag.
Long story short, we (the Sony bunch) got around to prototyping a replacement for dbg.values (as described in [1]) so that they don’t need to be in the instruction stream. The eventual benefit would be faster compile times with -g and fewer debug-info-affects-codegen bugs. Alas, it’s not a simple transformation because once debug instructions cease to exist, certain positions “between” real instructions cannot be described by iterators, removing some of the intention / meaning behind some function calls LLVM makes. The most obvious is that it’s not clear whether moveBefore / moveAfter should also move variable location information, because while some callers simply hoist / sink etc individual instructions, other callers move entire blocks at a time with the same functions including debug instructions.
We’ve got a prototype workaround for this and similar flaws, which is to:
- Make BasicBlock’s instruction list inaccessible (i.e. C++ private),
- Provide instruction moving APIs that require specifying the disposition of the movement of an instruction, i.e., does its movement preserve the order that instructions execute in, or does it re-order instructions?
- Re-write the (few hundred) call sites in LLVM that use existing API calls to move instructions to indicate their disposition.
We think that doing so communicates enough enough information to BasicBlock so that it can Do The Right Thing ™ when shuffling debug-info around, without using debug instructions. It would conveniently make the existing rules for updating source locations [3] part of the instruction API too. The obvious downside is that it places an additional burden on optimisation writers by requiring a disposition when moving instructions around. IMHO, this is a worthy tradeoff for an improved design.
tl;dr, we could make debug-info a “first class” thing in LLVM by making pass authors tell us what’s happening to the “original” control flow of the program when they move instructions around. Doing so would let us move debug-info out of instructions, somewhere where they would cause less trouble. We don’t have a real RFC for exactly what to change, but if this kind of thing is relevant to your interests, please come to the optimised debug-info round-table where we’d be delighted to talk about it.
~
[0] In Reid’s 2018 experiment [1] they almost doubled the amount of time spend optimising.
[1] [llvm-dev] [RFC] Moving llvm.dbg.value out of the instruction stream
[2] A large C++ project I have to hand takes 47% more time to LTO-link when using -g versus -gmlt, i.e. when dbg.values and variable locations are present.
[3] How to Update Debug Info: A Guide for LLVM Pass Authors — LLVM 16.0.0git documentation
–
Thanks,
Jeremy