tl;dr following from previous discussions about replacing debug intrinsics with non-instruction equivalents, we are seeking input on what this should look like in IR.
Background
We (Sony) are currently working on a prototype that changes the representation of debug values to no longer use instructions. Previous discussions have been raised on Discourse[0][1] and at LLVM conferences[2] on this topic; this RFC discusses the textual IR design for this model. To briefly summarize our goals in removing debug intrinsics, we are aiming to improve performance by reducing the number of instructions, reduce the amount of pointer chasing when searching for non-debug instructions, and reducing the surface area for -g to affect CodeGen. Since removing debug intrinsics involves making a substantial change to the developer-facing output of the compiler, I expect there to be many concerns regarding the technical limitations of the design, and many further concerns about its legibility, ease-of-editing, and other ergonomic issues; the design presented here is not final, and is intended to start a discussion about what this new debug info should look like, establish what different developers/vendors want from a new design, and hopefully reach a general consensus on the next steps. So without further ado:
Design
Here is an example IR function in current LLVM:
define dso_local i32 @f(i32 %a) !dbg !7 {
entry:
call void @llvm.dbg.value(metadata i32 %a, metadata !10, metadata !DIExpression()), !dbg !20
%b = alloca i32, !dbg !20, !DIAssignID !30
call void @llvm.dbg.declare(metadata i32 %b, metadata !11, metadata !DIExpression()), !dbg !21
%add = add i32 %a, 5, !dbg !21
call void @llvm.dbg.value(metadata i32 %add, metadata !10, metadata !DIExpression()), !dbg !20
call void @llvm.dbg.assign(metadata i32 %add, metadata !12, metadata !DIExpression(), metadata !30, metadata ptr %b, metadata !DIExpression()), !dbg !22
store i32 %add, ptr %b, !dbg !22
ret i32 %add, !dbg !23
}
And here is the same function with our proposed change:
define dso_local i32 @f(i32 %a) !dbg !7 {
entry:
#dbgrecord value { i32 %a, !10, !DIExpression(), !20 }
%b = alloca i32, !dbg !20, !DIAssignID !30
#dbgrecord declare { i32 %b, !11, !DIExpression(), !21 }
%add = add i32 %a, 5, !dbg !21
#dbgrecord value { i32 %add, !10, !DIExpression(), !20 }
#dbgrecord assign { i32 %add, !12, !DIExpression(), !30, ptr %b, !DIExpression(), !22 }
store i32 %add, ptr %b, !dbg !22
ret i32 %add, !dbg !23
}
In this design, debug variable intrinsics are replaced by debug records, which have a different syntax but identical meaning. Debug records are printed in the position at which they become live, as with current intrinsics, but with extra indentation and a different syntax to clearly distinguish them from instructions: each begins with #dbgrecord
, and then a type corresponding to the equivalent debug variable intrinsic, e.g. “value” for “llvm.dbg.value”. The contents of the curly braces are the same as the arguments to a debug intrinsic, except for the absence of the metadata
prefix and including as an extra argument the DILocation
metadata that would normally be attached to the intrinsic. Besides those differences, nothing else fundamentally changes about how to read, write, or edit debug info from the existing design.
Design Goals
One of the main aims of this design is to make sure that debug records don’t look like instructions, so that the IR faithfully represents the API to minimize surprising behaviours. For example, if a developer is debugging an opt pass and they print the IR, it should be obvious to them that if they have a pointer to %inst1
, calling next()
on that pointer should yield a pointer to %inst2
:
%inst1 = op1 %a, %b
#dbgrecord value { %inst1, !10, !DIExpression(), !11 }
%inst2 = op2 %inst1, %c
If instead we used a syntax comparable to an instruction, or even just converted debug records back to instructions for IR, this becomes less clear:
%inst1 = op1 %a, %b
llvm.dbg.value(%inst1, !10, !DIExpression(), !11)
%inst2 = op2 %inst1, %c
In summary, debug records are not instructions, and so it should be made apparent even to developers unfamiliar with IR that they are a distinct language element. Aside from that change, this format retains most of the benefits of the current model, in that it is easy to read debug info, to see when a debug record becomes live, to manually edit, move, add, or delete debug values, and as an additional benefit the indentation should make it easier to skim over debug info when it isn’t of interest to the developer or to skim over regular instructions in the opposite case.
There is one disadvantage to this model however, which is that it does not make obvious the API relationship between instructions and debug values: in the instruction API, debug values are a child of the following instruction (i.e. the instruction at which its live range begins); in this representation, debug values only appear between instructions, so the relationship is unclear or actually misleading since the indent would normally imply that the debug value is a child of the instruction above it. Moving debug values outside of the instruction list altogether would resolve this issue, but would complicate human reading and editing of IR too much; good API design and documentation should be enough to clear up any developer confusion.
Next Steps
The prototype implementation for removing debug intrinsics is still in progress, but some patches that move us towards this goal have already landed, and more will be submitted with time. The textual IR representation for non-instruction debug values is likely the most visible developer-facing change and so demands a healthy amount of input before finalizing a design. All suggestions, comments, and critiques are welcome and desired, but two questions in particular that would be useful to know are: Is this design clear and legible to you? And also, does the design have any apparent limitations (whether present in current LLVM or not) that would interfere with the way you use LLVM?
A patch for LLVM to test out this change will be available soon.
[0] Prototyping a not-an-instruction dbg.value
[1] [RFC] Instruction API changes needed to eliminate debug intrinsics from IR
[2] Debug-info round table notes