Hi,
This is a proposal for some changes to LLVM’s instruction movement/insertion APIs to better describe the movement of debug-info. In order to remove debug intrinsics from IR we need to first improve the expressiveness of the instruction-moving APIs. This RFC proposes a way to achieve that, with a couple of remaining question marks.
A while back in Prototyping a not-an-instruction dbg.value I suggested a pathway towards storing variable location debug-info in something that wasn’t an intrinsic. The benefits are faster -g compiles, and more robust compilation as the presence of debug-intrinsics can interfere with optimisations. We’ve got a prototype at [0] that can build a clang-3.4 binary with no dbg.value
intrinsics and produces identical object files [1] – it’s not a complete solution, but we reckon it exposes all of the important design problems. Today I’d like to focus on a single issue: the fact that debug intrinsics have instruction iterators (BasicBlock::iterator
) which are functionally important to debug-info, and they won’t exist if we cease using intrinsics. We currently consider any instruction to be “a position in the function”, where a debug-info intrinsic is a position in between two “real” instructions. Without debug intrinsics, we won’t be able to describe “in between” positions any more, we would have to attach the debug-info to instructions themselves which raises more issues. To illustrate, consider this block:
block:
dbg.value(...)
%1 = add...
dbg.value(...)
%2 = sub...
br label %exit
In today’s LLVM, each instruction has an iterator, therefore we can describe any range of “real” instructions and debug-info instructions to be spliced from one block to another. However, if the debug-info was attached to instructions, we can only have iterators that point at the add
, sub
, and br
. That’s strictly less expressive than what we have today, and this causes meaningful differences in debug-info.
I’ve got two problems to illustrate with examples: the movement of instructions with debug-info attached, and inserting new instructions at the start of blocks.
Instruction::moveBefore
can be used to move any instruction from one place to another, without any consideration for debug-info. This isn’t a problem today: if code moves a single instruction then it usually doesn’t want debug-info to move too, while code like moveBBContents
[2] just moves every instruction from one place to another including debug instructions. However if we attach debug-info to instructions, moveBefore
would become responsible for moving the debug-info, and whether it should move or not can be ambiguous. Consider today’s IR:
bb1:
dbg.value(...)
%foo = add %1, %2
%bar = sub %3, %4
Assume that we can attach the dbg.value
information to the add
instruction in some way. Now imagine that we called moveBefore
, moving the add
instruction to another block. If for example we are hoisting the add
during CSE then the debug-info should not move with the instruction, it should stay in the source block. However if we’re in a loop like moveBBContents
[2] moving all instructions in a block, then it’s necessary for moveBefore
to move the debug-info to the destination block. Currently, there’s no way in LLVM to distinguish the two use cases.
The other scenario is inserting one instruction in front of an existing instruction. Should debug-info attached to the existing instruction come before or after the new instruction? We can describe both when debug-info has iterators, but will be forced to pick one if debug-info is attached to instructions. Unfortunately there is no one-size-fits-all solution: in the majority of situations, LLVM today inserts instructions after nearby debug-info intrinsics. However there are other situations where the opposite is needed: for example in MergeBasicBlockIntoOnlyPred
[3] where one block is spliced into another, the predecessors block’s instructions should always come before the successor’s debug-info: the blocks are effectively concatenated. Today, this is unambiguous because BasicBlock::begin
and getFirstInsertionPt
will return iterators to debug-info intrinsics. However, if we don’t use intrinsics for debug-info, there are no means to describe inserting before the debug-info.
~
There’s a fairly simple solution for the first problem, which is to introduce a specialised moveBefore
method that moves debug-info too, and update all the call sites in LLVM that need that behaviour. This is achievable as there’s only about twenty call sites that behave in that way. In the prototype linked below the relevant method is called moveBeforePreserving
, we renamed moveBefore
to moveBeforeBreaking
to make the differences explicit. I think it’s useful to have the developer express the “disposition” of the move they’re making, whether it preserves the flow of instructions and thus should transfer debug-info, or whether it breaks the flow.
However, it’s much harder for the insert-at-block-begin example: it’s a much more common code pattern, with between 100 to 200 call sites where this distinction has an actual effect on variable locations. We’ve explored and implemented a solution: store whether an iterator came from begin
/ getFirstInsertionPt
in a bit inside the BasicBlock::iterator
object, aka ilist_iterator
[4]. This feels bad because it’s taking a value type that’s a single pointer right now and adding more data; however I think it’s inescapable. Iterators are exactly the right design pattern for storing information about positions in a collection. We could:
- Add the distinction of an iterator being returned by
begin
orgetFirstInsertionPt
to the type returned: however that feels ugly and will make ugly compile errors, - Add a flag parameter to instruction insertion methods, which is unergonomic and error prone,
- Have the fact that an iterator was intended to be at the start of the block (before debug-info) signalled at runtime, which requires information in the iterator object.
It’s the latter that we’ve used in our prototype – a variety of instruction insertion methods have had overloads added that take iterators, and then call sites that use begin
or getFirstInsertionPt
(or getFirstNonPHI
) now pass an iterator for the position to insert instructions. This is straightforwards; for additional safety the instruction-accepting insertion APIs could be removed, forcing everyone to use iterators when inserting. That’s invasive, but IMHO a reasonable price to pay for clearer management of debug-info. We’re using two bits in the prototype: because a pair of instruction iterators are a half-open range of instructions, we use the extra bits to signal whether the iterators are inclusive of the debug info attached at the start and end. Coming back to our earlier example:
bb2:
dbg.value(...)
%1 = add...
dbg.value(...)
%2 = sub...
br label %exit
Imagine we have an iterator range from %1 to %2. If we were to splice that range into another block, today LLVM would transfer the add instruction and the following dbg.value. If we attached debug-info to instructions and didn’t have debug intrinsics, that would be the only transfer we could describe. Thus, the two iterator bits signal:
- For the “first” iterator, whether debug-info attached to the first instruction should be transferred too,
- For the “last” iterator, whether debug-info attached to the last position should be transferred too,
respectively. That allows us to describe almost all the transfers that could be performed when debug-info is an intrinsic: both dbg.value
s and the add
instruction, or just the add
instruction by itself. I think it’s noteworthy that with accessors like begin
and getFirstInsertionPt
returning iterators with the bits set already, we only found two sites in LLVM where those bits need to be manually updated.
In terms of performance: adding the bit to BasicBlock::iterator
has a small compile-time cost to no-debug-info builds (0.05%) according to [5]. However this is going to be paid for by eliminating calls to functions like getNextNonDebugInstruction
, which recovers almost 0.3% of compile time [6]. This is definitely not the final word on compile-time performance, but I want to indicate the costs aren’t going to be overwhelming. Change in memory usage is negligible too as we typically don’t store iterators in data structures.
As proof that this technique can work, I’d like to present our prototype [0] (based on llvm-15) as evidence – please ignore the implementation details of what we’re doing to blocks, instructions and storage of debug-info, those are all very changeable. My major concern is how the instruction API changes, how this fans out into the rest of the compiler, and whether it’s a tolerable burden for other compiler developers. Thus, I think it’s noteworthy that we’re “only” touching 200 files, and the vast majority of changes are changing the spelling of moveBefore
or passing an iterator into a splice/insertion method.
There’s an additional problem with eliminating iterators, blocks can be temporarily empty (no terminator) but still contain debug-instructions, that should then go in front of any terminator added later. Our solution to that so far is to special-case the insertion of a terminator into a block, which is sufficient to fixup any out-of-place debug-info.
Feedback most welcome: specifically for the two changes that would be needed for the instruction API:
- The need to use a
moveBefore
method that explicitly states the intention of the developer wrt. whether debug-info should be moved too. - Sticking some extra bits in
BasicBlock::iterator
to signal at runtime whether a position in a block comes before or after the nearby debug-info.
To provide context, the other things we’re working on are the storage model for variable-location information, and what kind of maintenance would need to be applied during optimisations. We haven’t thought about representing these things in bitcode / textual IR and I’d prefer to avoid thinking about it for now.
Cheers to @StephenTozer for ploughing through a lot of this. CC the usual debug-info folks: @adrian.prantl @dblaikie @rnk @pogo59 @cmtice @OCHyams @jryans @slinder1 , although I think this is probably relevant to the interests of all pass-authors.
[0] Publish our prototype "killing debug-intrinsics" diff by jmorse · Pull Request #1 · jmorse/llvm-project · GitHub
[1] CommandLine.cpp in clang-3.4 bakes the build time into itself, all the other files are identical.
[2] llvm-project/IROutliner.cpp at c42eda5d3692fcd67fc3d043ab37f1950ea653b9 · llvm/llvm-project · GitHub
[3] llvm-project/Local.cpp at a33f018b89c07e0728539b34c158e88a7db49982 · llvm/llvm-project · GitHub
[4] Cue gasps of horror from the audience!
[5] LLVM Compile-Time Tracker
[6] LLVM Compile-Time Tracker NOTE: the -g modes are not meaningful as all debug-info is dropped.