[RFC] Instruction API changes needed to eliminate debug intrinsics from IR

Hi,

This is a proposal for some changes to LLVM’s instruction movement/insertion APIs to better describe the movement of debug-info. In order to remove debug intrinsics from IR we need to first improve the expressiveness of the instruction-moving APIs. This RFC proposes a way to achieve that, with a couple of remaining question marks.

A while back in Prototyping a not-an-instruction dbg.value I suggested a pathway towards storing variable location debug-info in something that wasn’t an intrinsic. The benefits are faster -g compiles, and more robust compilation as the presence of debug-intrinsics can interfere with optimisations. We’ve got a prototype at [0] that can build a clang-3.4 binary with no dbg.value intrinsics and produces identical object files [1] – it’s not a complete solution, but we reckon it exposes all of the important design problems. Today I’d like to focus on a single issue: the fact that debug intrinsics have instruction iterators (BasicBlock::iterator) which are functionally important to debug-info, and they won’t exist if we cease using intrinsics. We currently consider any instruction to be “a position in the function”, where a debug-info intrinsic is a position in between two “real” instructions. Without debug intrinsics, we won’t be able to describe “in between” positions any more, we would have to attach the debug-info to instructions themselves which raises more issues. To illustrate, consider this block:

  block:
    dbg.value(...)
    %1 = add...
    dbg.value(...)
    %2 = sub...
    br label %exit

In today’s LLVM, each instruction has an iterator, therefore we can describe any range of “real” instructions and debug-info instructions to be spliced from one block to another. However, if the debug-info was attached to instructions, we can only have iterators that point at the add, sub, and br. That’s strictly less expressive than what we have today, and this causes meaningful differences in debug-info.

I’ve got two problems to illustrate with examples: the movement of instructions with debug-info attached, and inserting new instructions at the start of blocks.

Instruction::moveBefore can be used to move any instruction from one place to another, without any consideration for debug-info. This isn’t a problem today: if code moves a single instruction then it usually doesn’t want debug-info to move too, while code like moveBBContents [2] just moves every instruction from one place to another including debug instructions. However if we attach debug-info to instructions, moveBefore would become responsible for moving the debug-info, and whether it should move or not can be ambiguous. Consider today’s IR:

  bb1:
      dbg.value(...)
      %foo = add %1, %2
      %bar = sub %3, %4

Assume that we can attach the dbg.value information to the add instruction in some way. Now imagine that we called moveBefore, moving the add instruction to another block. If for example we are hoisting the add during CSE then the debug-info should not move with the instruction, it should stay in the source block. However if we’re in a loop like moveBBContents [2] moving all instructions in a block, then it’s necessary for moveBefore to move the debug-info to the destination block. Currently, there’s no way in LLVM to distinguish the two use cases.

The other scenario is inserting one instruction in front of an existing instruction. Should debug-info attached to the existing instruction come before or after the new instruction? We can describe both when debug-info has iterators, but will be forced to pick one if debug-info is attached to instructions. Unfortunately there is no one-size-fits-all solution: in the majority of situations, LLVM today inserts instructions after nearby debug-info intrinsics. However there are other situations where the opposite is needed: for example in MergeBasicBlockIntoOnlyPred [3] where one block is spliced into another, the predecessors block’s instructions should always come before the successor’s debug-info: the blocks are effectively concatenated. Today, this is unambiguous because BasicBlock::begin and getFirstInsertionPt will return iterators to debug-info intrinsics. However, if we don’t use intrinsics for debug-info, there are no means to describe inserting before the debug-info.

~

There’s a fairly simple solution for the first problem, which is to introduce a specialised moveBefore method that moves debug-info too, and update all the call sites in LLVM that need that behaviour. This is achievable as there’s only about twenty call sites that behave in that way. In the prototype linked below the relevant method is called moveBeforePreserving, we renamed moveBefore to moveBeforeBreaking to make the differences explicit. I think it’s useful to have the developer express the “disposition” of the move they’re making, whether it preserves the flow of instructions and thus should transfer debug-info, or whether it breaks the flow.

However, it’s much harder for the insert-at-block-begin example: it’s a much more common code pattern, with between 100 to 200 call sites where this distinction has an actual effect on variable locations. We’ve explored and implemented a solution: store whether an iterator came from begin / getFirstInsertionPt in a bit inside the BasicBlock::iterator object, aka ilist_iterator [4]. This feels bad because it’s taking a value type that’s a single pointer right now and adding more data; however I think it’s inescapable. Iterators are exactly the right design pattern for storing information about positions in a collection. We could:

  • Add the distinction of an iterator being returned by begin or getFirstInsertionPt to the type returned: however that feels ugly and will make ugly compile errors,
  • Add a flag parameter to instruction insertion methods, which is unergonomic and error prone,
  • Have the fact that an iterator was intended to be at the start of the block (before debug-info) signalled at runtime, which requires information in the iterator object.

It’s the latter that we’ve used in our prototype – a variety of instruction insertion methods have had overloads added that take iterators, and then call sites that use begin or getFirstInsertionPt (or getFirstNonPHI) now pass an iterator for the position to insert instructions. This is straightforwards; for additional safety the instruction-accepting insertion APIs could be removed, forcing everyone to use iterators when inserting. That’s invasive, but IMHO a reasonable price to pay for clearer management of debug-info. We’re using two bits in the prototype: because a pair of instruction iterators are a half-open range of instructions, we use the extra bits to signal whether the iterators are inclusive of the debug info attached at the start and end. Coming back to our earlier example:

  bb2:
    dbg.value(...)
    %1 = add...
    dbg.value(...)
    %2 = sub...
    br label %exit

Imagine we have an iterator range from %1 to %2. If we were to splice that range into another block, today LLVM would transfer the add instruction and the following dbg.value. If we attached debug-info to instructions and didn’t have debug intrinsics, that would be the only transfer we could describe. Thus, the two iterator bits signal:

  • For the “first” iterator, whether debug-info attached to the first instruction should be transferred too,
  • For the “last” iterator, whether debug-info attached to the last position should be transferred too,

respectively. That allows us to describe almost all the transfers that could be performed when debug-info is an intrinsic: both dbg.values and the add instruction, or just the add instruction by itself. I think it’s noteworthy that with accessors like begin and getFirstInsertionPt returning iterators with the bits set already, we only found two sites in LLVM where those bits need to be manually updated.

In terms of performance: adding the bit to BasicBlock::iterator has a small compile-time cost to no-debug-info builds (0.05%) according to [5]. However this is going to be paid for by eliminating calls to functions like getNextNonDebugInstruction, which recovers almost 0.3% of compile time [6]. This is definitely not the final word on compile-time performance, but I want to indicate the costs aren’t going to be overwhelming. Change in memory usage is negligible too as we typically don’t store iterators in data structures.

As proof that this technique can work, I’d like to present our prototype [0] (based on llvm-15) as evidence – please ignore the implementation details of what we’re doing to blocks, instructions and storage of debug-info, those are all very changeable. My major concern is how the instruction API changes, how this fans out into the rest of the compiler, and whether it’s a tolerable burden for other compiler developers. Thus, I think it’s noteworthy that we’re “only” touching 200 files, and the vast majority of changes are changing the spelling of moveBefore or passing an iterator into a splice/insertion method.

There’s an additional problem with eliminating iterators, blocks can be temporarily empty (no terminator) but still contain debug-instructions, that should then go in front of any terminator added later. Our solution to that so far is to special-case the insertion of a terminator into a block, which is sufficient to fixup any out-of-place debug-info.

Feedback most welcome: specifically for the two changes that would be needed for the instruction API:

  • The need to use a moveBefore method that explicitly states the intention of the developer wrt. whether debug-info should be moved too.
  • Sticking some extra bits in BasicBlock::iterator to signal at runtime whether a position in a block comes before or after the nearby debug-info.

To provide context, the other things we’re working on are the storage model for variable-location information, and what kind of maintenance would need to be applied during optimisations. We haven’t thought about representing these things in bitcode / textual IR and I’d prefer to avoid thinking about it for now.

Cheers to @StephenTozer for ploughing through a lot of this. CC the usual debug-info folks: @adrian.prantl @dblaikie @rnk @pogo59 @cmtice @OCHyams @jryans @slinder1 , although I think this is probably relevant to the interests of all pass-authors.

[0] Publish our prototype "killing debug-intrinsics" diff by jmorse · Pull Request #1 · jmorse/llvm-project · GitHub
[1] CommandLine.cpp in clang-3.4 bakes the build time into itself, all the other files are identical.
[2] llvm-project/IROutliner.cpp at c42eda5d3692fcd67fc3d043ab37f1950ea653b9 · llvm/llvm-project · GitHub
[3] llvm-project/Local.cpp at a33f018b89c07e0728539b34c158e88a7db49982 · llvm/llvm-project · GitHub
[4] Cue gasps of horror from the audience!
[5] LLVM Compile-Time Tracker
[6] LLVM Compile-Time Tracker NOTE: the -g modes are not meaningful as all debug-info is dropped.

6 Likes

This sounds great, thank you for the write-up!

To recap (so you can tell me if I’m missing something):

  • The current API essentially has this distinction between “preserving”/“breaking” already, it just is not explicitly recorded anywhere because the debug markers are actually Instructions, and so the behavior we want essentially falls out: you generally move all instructions when you are “preserving” and a single instruction when you are “breaking”.
    • An example where debug-info should move with instructions is e.g. inlining. When we move many instructions at once, we naturally choose to bring along the intersperced debug-info instructions. This corresponds to the “preservering” disposition.
    • An example where debug-info should not move with instructions is e.g. CSE. The debug-info describing when e.g. an assignment occurs doesn’t want to be moved relative to the rest of the instructions just because CSE hoisted one of them. This corresponds to the “breaking” disposition.

Assuming I’m following so far, the choice to move the high-level disposition into the iterator seems reasonable to me. If there is not a significant runtime performance hit then I agree doing this in the type system in C++ doesn’t sound appealing.

I think I do get a little lost when trying to understand the piece about begin/getFirstInsertionPt. In the example of:

bb2:
    dbg.value(A, ...)
    %1 = add...
    dbg.value(B, ...)
    %2 = sub...
    br label %exit

Does this translate to something like:

bb2:
    %1 = add... !dbg.attachments {dbg.value(A, ...)}
    %2 = sub... !dbg.attachments {dbg.value(B, ...)}
    br label %exit

?

And so the debug-info “of” a particular instruction takes effect before it?

If so, then I don’t understand the example of MergeBasicBlockIntoOnlyPred. It seems like it would naturally fall out that DestBB->splice(DestBB->begin(), PredBB) would always result in the whole PredBB including its debug info coming before the first instruction and first debug info of DestBB. Maybe that’s your point, as it is using begin, but then I don’t understand in what situation another behavior is possible.

I admittedly have not spent the time to look at the code yet, so sorry if this is an obvious question!

The current API essentially has this distinction between “preserving”/“breaking” already, it just is not explicitly recorded anywhere because the debug markers are actually Instruction s, and so the behavior we want essentially falls out:

Yup, that’s exactly it, and the two examples are correct too,

Does this translate to something like […] and so the debug-info “of” a particular instruction takes effect before it?

That’s correct – I’ve been trying to avoid any textual representation because that’s a discussion I want to avoid, but what you wrote is a fair illustration of the information that would be stored.

If so, then I don’t understand the example of MergeBasicBlockIntoOnlyPred . It seems like it would naturally fall out that DestBB->splice(DestBB->begin(), PredBB) would always result in the whole PredBB including its debug info coming before the first instruction and first debug info of DestBB . Maybe that’s your point, as it is using begin , but then I don’t understand in what situation another behavior is possible.

Taking a quick survey of the calls to “blockSplice” in the prototype code, half of the call sites insert at either begin() or end(), while the other half insert at a specific instruction. There are a few examples in InlineFunction.cpp where groups of allocas are shuffled around, or single blocks are directly inlined into the call-site. In those scenarios, the splice function needs to insert instructions after any attached debug-info, if we had:

bb1:
  %2 = add i32 %1, %0
  call void @foo(%2), !dbg.attachments {dbg.value(A,...)}
  br label %bar

And inlined the contents of foo into the block at the position of the call instruction, the attachments should be moved onto the first instruction inlined so that they’re visible before the call site. If that didn’t happen, they would sink down to the branch instruction when the call was deleted, and this wouldn’t accurately represent the source program. Hence, there’s a need for the splice function know whether the caller wants the moving instructions to come before or after the debug-info at the insertion point.

That being said: there are only about 15 sites where the blockSplice method is actually needed, it wouldn’t be a huge burden to expect callers to express this information as an argument, instead of as information baked into the iterator. The larger problem is several hundred call sites where instructions are inserted at the start of a block with something like moveBefore, and the pass author has made a decision about whether the instruction should come before or after debug intrinsics by calling begin/getFirstInsertionPt for inserting before, or getFirstNonPHIOrDbg for inserting after. When we have debug intrinsics with iterators, that distinction is communicated to the inserting function by the position being a debug-instruction (or not), in the prototype it’s communicated by the bits in the iterator object.

These decisions (insert at the start of the block, before or after debug-info?) have to be preserved to get an identical binary, however it’s not clear whether they’re necessary for preserving the quality of debug-info. Perhaps it doesn’t matter whether a new instruction generated by SCEV goes before or after variable assignments; on the other hand if we sink an instruction from a parent to a successor block then we might care what order it has with the variable assignments in the successor. Baking this information into the iterator avoids having to answer those questions which may seem like an evasion, but on the other hand it guarantees no regressions.

Ah, that makes sense, thank you! I think I just needed to spend a bit more time to get to the cases you were referring to, but took the lazy way out and just asked. Thank you for humoring me :slight_smile:

I definitely see the benefit of retaining the information implied at the creation of the iterator. It seems the most elegant and least error-prone option.

I really like the approach of starting from true “NFC”, and I’m a bit shocked you were able to achieve it! I also particularly enjoy the “inhale”/“exhale” approach around serialization to defer updating tests. I will have to spend a bit more time reading the code, but I’m fairly convinced of the approach.