[RFC] Instruction API changes needed to eliminate debug intrinsics from IR

Hi,

The EuroLLVM talk I gave on this topic is up [0], the tl;dr is “let’s capture more information about the intention of optimisation passes moving instructions around, so that we can maintain debug-info automatically”. I’ve now uploaded the core parts of these changes, elaborated below. The net effect is that the position of debug-info records can be preserved as if they were instructions, without them actually being stored as instructions – it’s now time to talk about the /other/ hard part of removing debug intrinsics, which is trading slightly increased memory consumption in normal builds for lower consumption + faster compilation in -g builds. (I’ll talk about that in this topic to avoid fragmenting discussion).

In the patches linked below, we’ve added a new pointer field to class Instruction to reference any debug-info “on” the Instructions position. There’s also a new field in BasicBlock to handle scenarios where blocks transiently contain only debug-info [1]. The memory-cost impact of this over builds of an older clang is (a median of) 0.6% more max-RSS for release builds (-O3, no debug-info). There’s broadly no change in memory usage for debug-info builds, although I’m confident we’ll be able to make improvements in some time, we haven’t tried yet. In return, there are compile-time improvements for debug-info builds, see further below. To me and my use case, this is an obviously beneficial trade-off: debug-info builds always take longer and uses more memory, and developers always end up debugging their code, so that’s always the eventual worst case for all software. Trading more memory on a release build to reduce the worst-case debug-info compile time is great. I imagine not everyone agrees though, so I’d like to draw attention to it and ask… is this alright?

In terms of compile time costs: right now on the compile-time-tracker [2] there’s a ~2.5% geomean speedup for -g -flto. Some speedups are disproportionately large, tramp3d-v4 is almost 10% faster. The large C++ code-base mentioned in the talk where we attributed 30% of LTO compile-time to variable locations speeds up about 6%. As mentioned elsewhere, this is a naive / simple implementation, and I think in the mid term we can do a lot better than a 2.5% speedup. There’s a whole bunch of other things we could do with debug-info connections not being in the Value/Metadata hierarchy such as:

  • Hard-coding constant-value variable assignments, there’s no need for them to be part of a large metadata use list,
  • Allocate blocks of debug-info records together: they should never be deleted during optimisation, and nothing should ever be inserted in the middle, so they don’t need to be individually re-orderable or freeable,
  • Almost one-quarter of variables are totally undef by the end of compilation, if we could index them better, we could delete them earlier.
  • Pooling metadata references (maybe, unsure) as typically debug-info records move in packs.
    but this is the first step towards exploring that design space.

There’s also a 0.7% slowdown for normal builds: some of this is because we’re paying the branching cost of debug-info maintenance twice, we repeatedly check whether instructions are debug-info or not even when there aren’t any. A previous experiment disabling getNextNonDebugInfoInstruction and debug-info-iterators demonstrated a 0.3% speedup for normal builds [3], there may be more improvements to redeem, we still need to dig through that. Exactly how to stage this into the repo to avoid everyone paying that cost isn’t clear yet.

For the patches themselves, here’s a quick summary of the main five:

Here’s an illustration of the connections between these data structures:

    Instruction>>>>>>>>>>Instruction>>>>>>>>>>Instruction
         |                                         |
      DPMarker                                  DPMarker
       /   \                                     /   \
      /     \                                   /     \
  DPValue  DPValue                          DPValue  DPValue

Instructions (which are now never dbg.values) point at an optional DPMarker, which itself contains a list of DPValues. The markers represent a position in the program, while the DPValues record a Value and details about source variables, much like dbg.values.

There are a bunch of additional, small(er) patches, every time that dbg.value intrinsics are updated needs additional instrumentation. These are pretty straight forwards though and aren’t as important as thinking about data structures.

Patch reviews would be most welcome, or feedback / opinions on the overall direction we’re taking. If we can get some or all of this in-tree and controlled by a cmake flag, it’ll enable other developers to make their own evaluations of the costs and benefits.

Is anyone opposed to these trade offs? To summarise, according to CTMark there’s a reduction of 2.5% compile time for LTO with debug info builds at the cost of an increase of 0.6% max RSS in release builds. We also remove debug instructions entirely (which will ease maintenance of all optimisation passes) for the cost of maintaining both systems until the work is completed.

[0] 2023 EuroLLVM - What would it take to remove debug intrinsics? - YouTube
[1] This can probably be shuffled somewhere else due to it’s rare use, something like a DenseMap in Function.
[2] LLVM Compile-Time Tracker
[3] LLVM Compile-Time Tracker (don’t look at the -g runs, it was hard-coded off).


Thanks,
Jeremy

5 Likes