Prototyping a not-an-instruction dbg.value

jmorse · October 27, 2022, 10:10pm

Hi all,

Every year at things like the debug-info round table, people (including me) point out a few large problems in LLVM’s debug-info that no-one ever gets around to solving. Most often it’s the presence of debug instructions such as dbg.value getting in the way of optimisations and slowing the compiler down [0,2], with another firm favourite being better line tables / placement of the DWARF is_stmt flag.

Long story short, we (the Sony bunch) got around to prototyping a replacement for dbg.values (as described in [1]) so that they don’t need to be in the instruction stream. The eventual benefit would be faster compile times with -g and fewer debug-info-affects-codegen bugs. Alas, it’s not a simple transformation because once debug instructions cease to exist, certain positions “between” real instructions cannot be described by iterators, removing some of the intention / meaning behind some function calls LLVM makes. The most obvious is that it’s not clear whether moveBefore / moveAfter should also move variable location information, because while some callers simply hoist / sink etc individual instructions, other callers move entire blocks at a time with the same functions including debug instructions.

We’ve got a prototype workaround for this and similar flaws, which is to:

Make BasicBlock’s instruction list inaccessible (i.e. C++ private),
Provide instruction moving APIs that require specifying the disposition of the movement of an instruction, i.e., does its movement preserve the order that instructions execute in, or does it re-order instructions?
Re-write the (few hundred) call sites in LLVM that use existing API calls to move instructions to indicate their disposition.

We think that doing so communicates enough enough information to BasicBlock so that it can Do The Right Thing ™ when shuffling debug-info around, without using debug instructions. It would conveniently make the existing rules for updating source locations [3] part of the instruction API too. The obvious downside is that it places an additional burden on optimisation writers by requiring a disposition when moving instructions around. IMHO, this is a worthy tradeoff for an improved design.

tl;dr, we could make debug-info a “first class” thing in LLVM by making pass authors tell us what’s happening to the “original” control flow of the program when they move instructions around. Doing so would let us move debug-info out of instructions, somewhere where they would cause less trouble. We don’t have a real RFC for exactly what to change, but if this kind of thing is relevant to your interests, please come to the optimised debug-info round-table where we’d be delighted to talk about it.

~

[0] In Reid’s 2018 experiment [1] they almost doubled the amount of time spend optimising.
[1] [llvm-dev] [RFC] Moving llvm.dbg.value out of the instruction stream
[2] A large C++ project I have to hand takes 47% more time to LTO-link when using -g versus -gmlt, i.e. when dbg.values and variable locations are present.
[3] How to Update Debug Info: A Guide for LLVM Pass Authors — LLVM 16.0.0git documentation

–
Thanks,
Jeremy

jryans · October 28, 2022, 10:38am

(I am perhaps a bit biased here, since my focus is on making debug info more reliable, and I am not an optimisation pass author.)

From a debug info perspective, this sounds like a big improvement to me. Sure, it will improve compile time, but that’s not even the best part…! By changing APIs like you’ve described, we’ll be able to capture more of the optimisation author’s intent at the call site of each instruction move, and that will help unlock further improvements in debug info reliability.

I suspect that adapting to this API change would not really be that much more work for optimisation authors, as they should already have the type of move they intend in mind anyway, so you’re only asking them to effectively record that in code.

Overall, this sounds great to me, and I hope it will move forward!

jryans · October 28, 2022, 10:52am

Unfortunately, I won’t be able to make it to the US dev meeting this time around, but I am very interested in this topic. I look forward to participating in discussions here, and I am happy to help with code changes / reviews for this idea as well.

As an aside, if someone could take notes of the discussion during the round table and share them on Discourse after the dev meeting, that would be wonderful.

jmorse · November 17, 2022, 5:44pm

Just to close the loop, here are the notes from the round tables: Debug-info round table notes

Topic		Replies	Views
[RFC] Moving llvm.dbg.value out of the instruction stream LLVM Dev List Archives	13	267	October 25, 2018
DEBUG INFO: improve handling of DBG_VALUEs and DebugLocs in CodeGen LLVM Dev List Archives	3	159	May 18, 2018
[RFC] Instruction API changes needed to eliminate debug intrinsics from IR IR & Optimizations debuginfo	20	4049	March 5, 2024
[DebugInfo] The meaning of dbg.value positions LLVM Dev List Archives	0	143	March 1, 2019
Debug-info round table notes LLVM Project debuginfo	3	842	November 21, 2022

Prototyping a not-an-instruction dbg.value

Related topics