Assignment tracking: A better way of specifying variable locations in IR
In optimised code, debug locations for variables in memory can be inaccurate by optimistically specifying memory the variable no longer lives in, or may be accurate with poor coverage by pessimistically using register locations while the value is live in memory. This RFC proposes a way to maximise accuracy and coverage for such variables and presents findings from implementing a prototype. I’m hoping to get feedback on the idea and to discuss the next steps from this point.
The core idea involves tracking more information about source assignments. Rather than just tracking either the memory location (dbg.declare
) or the set of values the variable takes (dbg.value
) we track both, as well as which instruction (if any after optimisation) performs the assignment, and its original (pre-optimisation) position in the IR. I’ve written a prototype that achieves this with a new intrinsic called dbg.assign
, a new metadata attachment called DIAssignID
, and a small dataflow analysis pass.
I’ll be giving a 10-minute talk on this work at EuroLLVM 2022 titled “Improving debug locations for variables in memory”.
Background
Presently in LLVM, a variable backed by an unpromotable alloca
will either have its locations tracked with a dbg.declare
or a set of dbg.value
s depending on the type of the alloca
. Converting the dbg.declare
to a set of dbg.value
s happens in a function called LowerDbgDeclare
that is run early in the optimisation pipeline by InstCombine. A dbg.declare
states the variable has a single location in memory for its lifetime. dbg.value
s mark out where the value of the variable changes and what that value is, mostly avoiding referring to the memory locations.
An alloca that can’t be promoted by mem2reg may have stores to it that still have optimisation opportunities. If LLVM uses a dbg.declare
to describe the variable location, then at points in the program where the stores are moved, modified, or deleted the value in memory and the value of the variable in the source program will not match.
When dbg.value
s are used we generally get correct locations because there are dbg.value
s marking out where the assignments occurred before any optimisations. However, because dbg.value
s generally avoid referring to memory locations, in areas of the program where the memory location is valid and the value of the variable is not live elsewhere (register, constant), LLVM will end up stating that the variable has no location rather than use the memory location.
LLVM must choose between preserving the dbg.declare
or using a set of dbg.value
s early because it uses stores to determine where in the IR the variable changes value - if LLVM generated the set of dbg.value
s after optimisation, it would cause the same problem that we get when a dbg.declare
is used. After optimisation stores no longer necessarily faithfully represent all the source assignments.
The canonical ticket for this problem is PR34136 Improve optimized debug info for escaped variables that live in memory
.
Here are two simple C++ examples to demonstrate the issues occurring with clang-14, starting with an incorrect optimistic variable location:
1 struct Nums { int a, b, c; } glob;
2 __attribute__((__noinline__)) void useAddress(Nums *n) { glob = *n; }
3 __attribute__((__noinline__)) void useValue(int a) { glob.a = a; }
4
5
6 int main() {
7 struct Nums nums = {1, 2, 3}; //< Store to nums.c is eliminated by DSE after GVN propagates 3 to useValue().
8 useValue(nums.c);
9
10 nums.c = nums.a; //< Killing store to nums.c.
11 useValue(nums.c);
12
13 useAddress(&nums); //< nums' address escapes.
14 return 0;
15 }
$ clang++ test.cpp -O2 -g -o test
$ gdb test
GNU gdb (GDB) 10.2
...
Breakpoint 1, main () at test.cpp:7
7 struct Nums nums = {1, 2, 3}; //< Store to nums.c is eliminated by DSE after GVN propagates 3 to useValue().
(gdb) n
8 useValue(nums.c);
(gdb) p nums
$1 = {a = 1, b = 2, c = 0}
The variable location tells the debugger to read the value of nums.c
from memory, which contains 0
. This is misleading because the value passed to useValue
and the value of nums.c
at this point according to the source code is 3
.
The next example demonstrates unnecessarily pessimistic location coverage (no location for x
for part of this function):
1 __attribute__((__optnone__)) int get(int x) { return x; }
2 __attribute__((__optnone__)) void esc(int* c) {}
3 int main() {
4 int x = get(5);
5 get(0);
6 get(0);
7 get(0);
8 esc(&x);
9 return 0;
10 }
$ clang++ test.cpp -O2 -g -o test
$ gdb test
GNU gdb (GDB) 10.2
...
Breakpoint 1, main () at test.cpp:4
4 int x = get(5);
(gdb) n
5 get(0);
(gdb) display x
1: x = 5
(gdb) n
6 get(0);
1: x = <optimised out>
(gdb)
7 get(0);
1: x = <optimised out>
(gdb)
8 esc(&x);
1: x = 5
(gdb) disassemble main
Dump of assembler code for function main():
0x00000000004004c0 <+0>: push %rax
0x00000000004004c1 <+1>: mov $0x5,%edi
0x00000000004004c6 <+6>: call 0x4004a0 <_Z3geti>
0x00000000004004cb <+11>: mov %eax,0x4(%rsp) ;; Store value of x.
0x00000000004004cf <+15>: xor %edi,%edi
0x00000000004004d1 <+17>: call 0x4004a0 <_Z3geti>
0x00000000004004d6 <+22>: xor %edi,%edi
0x00000000004004d8 <+24>: call 0x4004a0 <_Z3geti>
0x00000000004004dd <+29>: xor %edi,%edi
0x00000000004004df <+31>: call 0x4004a0 <_Z3geti>
0x00000000004004e4 <+36>: lea 0x4(%rsp),%rdi
0x00000000004004e9 <+41>: call 0x4004b0 <_Z3escPi>
0x00000000004004ee <+46>: xor %eax,%eax
0x00000000004004f0 <+48>: pop %rcx
0x00000000004004f1 <+49>: ret
End of assembler dump.
We don’t have a location on lines 6 and 7 despite the fact that the result of get(5)
is immediately copied to memory. The compiler had to choose between a memory and register location, and this decision was made early in the optimisation pipeline. In this example it chose the register location; it’s difficult to know the optimal decision in advance. That’s where the proposed solution comes in. We retain enough information to be able to delay this decision until after optimisations.
Solution summary
The core of the proposed solution is to delay choosing between whether to use register or memory locations until after optimisations so that we can make optimal decisions.
The frontend marks each assignment to each variable with a new intrinsic. That intrinsic tracks the stored value, store location, which instruction performs the store, and the original IR position of the store (as it may move during optimisation). That information is carried through optimisations, set up in such a way as to minimise effort required from pass writers. After optimisation, LLVM performs a dataflow analysis to calculate locations that maximise coverage correctly by choosing between memory and register (and implicit) locations at each instruction.
Tracking “which instruction performs the store” is achieved by tagging such instructions with a metadata attachment (DIAssignID
, more detail in the next section) which is maintained through optimisations. Looking at an existing example, tbaa
access tags are preserved to enable optimisations, and we are attaching and preserving metadata in the same way.
Prototype implementation
I’ve prototyped an implementation which can be found here if you want to try it out. The prototype involves the following changes to the compiler:
Add a new IR intrinsic, llvm.dbg.assign
, which has 5 parameters. The first 3 parameters are identical to an llvm.dbg.value
and the next two are new.
llvm.dbg.assign(value, expression, variable, id, destination)
0 value (Value) : The assigned value.
1 expression (DIExpression) : Expression to apply to `value`.
2 variable (DILocalVariable) : The user variable that this location describes.
3 id (DIAssignID) : The id of the instruction performing the source assignment.
4 destination (pointer type Value) : Store destination.
Add a new metadata kind, DIAssignID
, which has no fields; it is just used as an ID (each instance is distinct
since we are relying on location - pointer - equality to distinguish). This metadata is used as both an attachment and, wrapped in MetadataAsValue
, as an argument to dbg.assign
intrinsics. If a DIAssignID
used by a dbg.assign
is attached to one or more instructions, it means that those instructions perform the assignment that the intrinsic is describing. The instruction(s) and intrinsic are “linked”. It is legal (and frequently the case) for a dbg.assign
to use a DIAssignID
that is not attached to any instructions. I call this an “unlinked” dbg.assign
. In this case, it’s likely that the store to memory has been eliminated, but it is also possible that a pass has failed to update metadata on an instruction after optimisation. The latter case is a safe fallback (no wrong locations introduced - we will use the value
instead of the destination
) because it is pessimistic (there may actually be a valid memory location if the store does exist).
The clang frontend no longer emits a dbg.declare
stating that a variable’s alloca
is its stack home. Instead, it emits dbg.assign
s for each variable after each store-like, or “assigning”, instruction (stores and memory setting intrinsics). Each of these assigning instructions gets a distinct DIAssignID
attachment which is used by the dbg.assign
that follows - the instruction and the dbg.assign
are linked. An alloca
is treated as an assignment of undef
and gets a DIAssignID
and dbg.assign
accordingly. For situations that the prototype cannot yet handle - see limitations section - the front end falls back to emitting a dbg.declare
for the variable.
There is a new dataflow analysis “pass” (it’s just a function called by SelectionDAGISel at the moment). It converts dbg.assign
s to conventional debug intrinsics just before ISel so that the prototype only needs to work in IR-land. It is kind of a “LiveDebugValues lite”. It is a fixed-point dataflow analysis that uses debug intrinsics to iteratively determine the location for each variable at every point in the program. It uses a join-semilattice to describe the locations where Unknown > { Memory, Value }
. After computing the locations for each variable, it replaces all of the dbg.assign
intrinsics with either a dbg.declare
or new set of dbg.value
intrinsics (which may include DW_OP_deref expressions - i.e. may include memory locations).
Considerations for pass writers and maintainers
cloning an instruction: nothing new to do. Cloning automatically clones a DIAssignID
attachment. Multiple instructions may have the same DIAssignID
instruction. In this case, the assignment is considered to take place in multiple positions in the program.
moving a non-debug instruction: nothing new to do. Instructions linked to a dbg.assign
have their initial IR position marked by the position of the dbg.assign
.
moving a debug intrinsic: avoid moving a dbg.assign
intrinsic at all costs, unless not doing so would result in the intrinsic being lost (e.g. a block only contains debug intrinsics and is going to be deleted: hoist or, preferably, sink the intrinsics and make them undef).
deleting a non-debug instruction: nothing new to do. Simple DSE does not require any change; it’s safe to delete an instruction with a DIAssignID
attachment. A dbg.assign
that uses a DIAssignID
that is not attached to any instruction indicates that the memory location isn’t valid.
deleting a debug intrinsic: Nothing new to do. Just like for conventional debug intrinsics, unless it is unreachable, it’s almost always incorrect to delete a dbg.assign
intrinsic.
merging stores: In many cases no change is required as DIAssignID attachments are automatically merged if combineMetadata
is called. One way or another, the DIAssignID
attachments must be merged such that new store becomes linked to all the dbg.assign
intrinsics that the merged stores were linked to. This can be achieved simply by calling a helper function Instruction::mergeDIAssignID
.
inlining stores: As stores are inlined we must generate dbg.assign
intrinsics and DIAssignID
attachments as if the stores represent source assignments, just like the in frontend. This isn’t perfect, as stores may have been moved, modified or deleted before inlining, but it does at least keep the information about the variable correct within the non-inlined scope. See the limitations & future work section.
I made minor changes to some passes that were not already obeying the above rules, and had to put in some extra work for these specific passes:
SROA and passes that split stores: Similarly to how dbg.declare
s are handled, clone the dbg.assign
intrinsics linked to the store, update the fragment part of the expression, and give the split stores (and cloned intrinsics) a new DIAssignID
. In other words, treat the split stores as separate assignments. This is also done for partial DSE (e.g. shortening a memset
).
mem2reg inserts dbg.value
intrinsics for variables described by dbg.assign
intrinsics PHIs as it would for dbg.declare
s (it’s safe to mix dbg.assign
and dbg.value
) but doesn’t add intrinsics at store sites as it otherwise would for dbg.declare
s because the store is already tracked by the dbg.assign
. The fact that we need to track PHIs is worth its own separate discussion IMO, but is mostly immaterial to this proposal. It needs to be done here to match existing behaviour for fully promoted variables.
Examples
Here are the examples from the beginning using the prototype to build them to show them working in a debugger and to compare the IR. Note: If you build the first example with the prototype you should debug it with GDB (rather than LLDB, which will run into this issue) if you want to see the fix in action.
Example 1
Running the prototype-built version in gdb:
$ clang++ test.cpp -O2 -g -o test -Xclang -debug-coffee-chat
$ gdb test
GNU gdb (GDB) 10.2
...
Breakpoint 1, main () at test.cpp:7
7 struct Nums nums = {1, 2, 3}; //< Store to nums.c is eliminated by DSE after GVN propagates 3 to useValue().
(gdb) n
8 useValue(nums.c);
(gdb) p nums
$1 = {a = 1, b = 2, c = <synthetic pointer>}
Here’s the IR:
; llvm-14: clang++ -emit-llvm -S -O2 -g -o -
define dso_local noundef i32 @main() local_unnamed_addr #4 !dbg !40 {
entry:
%nums = alloca %struct.Nums, align 8
%0 = bitcast %struct.Nums* %nums to i8*, !dbg !45
call void @llvm.lifetime.start.p0i8(i64 12, i8* nonnull %0) #7, !dbg !45
call void @llvm.dbg.declare(metadata %struct.Nums* %nums, metadata !44, metadata !DIExpression()), !dbg !46
%1 = bitcast %struct.Nums* %nums to i64*, !dbg !46
store i64 8589934593, i64* %1, align 8, !dbg !46
%c = getelementptr inbounds %struct.Nums, %struct.Nums* %nums, i64 0, i32 2, !dbg !47
tail call void @_Z8useValuei(i32 noundef 3), !dbg !48
store i32 1, i32* %c, align 8, !dbg !49, !tbaa !50
tail call void @_Z8useValuei(i32 noundef 1), !dbg !51
call void @_Z10useAddressP4Nums(%struct.Nums* noundef nonnull %nums), !dbg !52
call void @llvm.lifetime.end.p0i8(i64 12, i8* nonnull %0) #7, !dbg !53
ret i32 0, !dbg !53
}
; prototype enabled: clang++ -Xclang -debug-coffee-chat -emit-llvm -S -O2 -g -o -
define dso_local noundef i32 @main() local_unnamed_addr #4 !dbg !44 {
entry:
%nums = alloca %struct.Nums, align 8, !DIAssignID !49
call void @llvm.dbg.assign(metadata i1 undef, metadata !48, metadata !DIExpression(), metadata !49, metadata %struct.Nums* %nums), !dbg !50
%0 = bitcast %struct.Nums* %nums to i8*, !dbg !51
call void @llvm.lifetime.start.p0i8(i64 12, i8* nonnull %0) #6, !dbg !51
%1 = bitcast %struct.Nums* %nums to i64*, !dbg !52
store i64 8589934593, i64* %1, align 8, !dbg !52, !DIAssignID !53
call void @llvm.dbg.assign(metadata i1 undef, metadata !48, metadata !DIExpression(), metadata !53, metadata i8* %0), !dbg !50
call void @llvm.dbg.assign(metadata i1 undef, metadata !48, metadata !DIExpression(DW_OP_LLVM_fragment, 64, 32), metadata !53, metadata i8* undef), !dbg !50
%c = getelementptr inbounds %struct.Nums, %struct.Nums* %nums, i64 0, i32 2, !dbg !54
tail call void @_Z8useValuei(i32 noundef 3), !dbg !55
store i32 1, i32* %c, align 8, !dbg !56, !tbaa !57, !DIAssignID !58
call void @llvm.dbg.assign(metadata i32 1, metadata !48, metadata !DIExpression(DW_OP_LLVM_fragment, 64, 32), metadata !58, metadata i32* %c), !dbg !50
tail call void @_Z8useValuei(i32 noundef 1), !dbg !59
call void @_Z10useAddressP4Nums(%struct.Nums* noundef nonnull %nums), !dbg !60
call void @llvm.lifetime.end.p0i8(i64 12, i8* nonnull %0) #6, !dbg !61
ret i32 0, !dbg !61
}
; IR after the dataflow that converts the assignment tracking debug info to "normal" debug intrinsics for SelectionDAG
define dso_local noundef i32 @main() local_unnamed_addr #4 !dbg !40 {
entry:
%nums = alloca %struct.Nums, align 8
call void @llvm.dbg.value(metadata %struct.Nums* %nums, metadata !44, metadata !DIExpression(DW_OP_deref)), !dbg !45
%0 = bitcast %struct.Nums* %nums to i8*, !dbg !46
call void @llvm.lifetime.start.p0i8(i64 12, i8* nonnull %0) #7, !dbg !46
%1 = bitcast %struct.Nums* %nums to i64*, !dbg !47
store i64 8589934593, i64* %1, align 8, !dbg !47
call void @llvm.dbg.value(metadata i8* undef, metadata !44, metadata !DIExpression(DW_OP_deref, DW_OP_LLVM_fragment, 64, 32)), !dbg !45
call void @llvm.dbg.value(metadata %struct.Nums* %nums, metadata !44, metadata !DIExpression(DW_OP_deref, DW_OP_LLVM_fragment, 0, 64)), !dbg !45
%c = getelementptr inbounds %struct.Nums, %struct.Nums* %nums, i64 0, i32 2, !dbg !48
tail call void @_Z8useValuei(i32 noundef 3), !dbg !49
store i32 1, i32* %c, align 8, !dbg !50, !tbaa !51
call void @llvm.dbg.value(metadata %struct.Nums* %nums, metadata !44, metadata !DIExpression(DW_OP_plus_uconst, 8, DW_OP_deref, DW_OP_LLVM_fragment, 64, 32)), !dbg !45
tail call void @_Z8useValuei(i32 noundef 1), !dbg !52
call void @_Z10useAddressP4Nums(%struct.Nums* noundef nonnull %nums), !dbg !53
call void @llvm.lifetime.end.p0i8(i64 12, i8* nonnull %0) #7, !dbg !54
ret i32 0, !dbg !54
}
Example 2
$ clang++ test.cpp -O2 -g -o test -Xclang -debug-coffee-chat
$ gdb test
GNU gdb (GDB) 10.2
...
Breakpoint 1, main () at test.cpp:4
4 int x = get(5);
(gdb) n
5 get(0);
(gdb) display x
1: x = 5
(gdb) n
6 get(0);
1: x = 5
(gdb) n
7 get(0);
1: x = 5
(gdb) n
8 esc(&x);
1: x = 5
Here’s the IR:
; clang++ -emit-llvm -S -O2 -g -o -
define dso_local noundef i32 @main() local_unnamed_addr #2 !dbg !30 {
entry:
%x = alloca i32, align 4
%0 = bitcast i32* %x to i8*, !dbg !35
call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull %0) #5, !dbg !35
%call = tail call noundef i32 @_Z3geti(i32 noundef 5), !dbg !36
call void @llvm.dbg.value(metadata i32 %call, metadata !34, metadata !DIExpression()), !dbg !37
store i32 %call, i32* %x, align 4, !dbg !38, !tbaa !13
%call1 = tail call noundef i32 @_Z3geti(i32 noundef 0), !dbg !39
%call2 = tail call noundef i32 @_Z3geti(i32 noundef 0), !dbg !40
%call3 = tail call noundef i32 @_Z3geti(i32 noundef 0), !dbg !41
call void @llvm.dbg.value(metadata i32* %x, metadata !34, metadata !DIExpression(DW_OP_deref)), !dbg !37
call void @_Z3escPi(i32* noundef nonnull %x), !dbg !42
call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull %0) #5, !dbg !43
ret i32 0, !dbg !44
}
; clang++ -Xclang -debug-coffee-chat -emit-llvm -S -O2 -g -o -
define dso_local noundef i32 @main() local_unnamed_addr #2 !dbg !34 {
entry:
%x = alloca i32, align 4, !DIAssignID !39
call void @llvm.dbg.assign(metadata i1 undef, metadata !38, metadata !DIExpression(), metadata !39, metadata i32* %x), !dbg !40
%0 = bitcast i32* %x to i8*, !dbg !41
call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull %0) #4, !dbg !41
%call = tail call noundef i32 @_Z3geti(i32 noundef 5), !dbg !42
store i32 %call, i32* %x, align 4, !dbg !43, !tbaa !15, !DIAssignID !44
call void @llvm.dbg.assign(metadata i32 %call, metadata !38, metadata !DIExpression(), metadata !44, metadata i32* %x), !dbg !40
%call1 = tail call noundef i32 @_Z3geti(i32 noundef 0), !dbg !45
%call2 = tail call noundef i32 @_Z3geti(i32 noundef 0), !dbg !46
%call3 = tail call noundef i32 @_Z3geti(i32 noundef 0), !dbg !47
call void @_Z3escPi(i32* noundef nonnull %x), !dbg !48
call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull %0) #4, !dbg !49
ret i32 0, !dbg !50
}
; IR after the dataflow that converts the assignment tracking debug info to "normal" debug intrinsics for SelectionDAG
define dso_local noundef i32 @main() local_unnamed_addr #2 !dbg !30 {
entry:
%x = alloca i32, align 4
call void @llvm.dbg.declare(metadata i32* %x, metadata !34, metadata !DIExpression()), !dbg !35
%0 = bitcast i32* %x to i8*, !dbg !36
call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull %0) #5, !dbg !36
%call = tail call noundef i32 @_Z3geti(i32 noundef 5), !dbg !37
store i32 %call, i32* %x, align 4, !dbg !38, !tbaa !14
%call1 = tail call noundef i32 @_Z3geti(i32 noundef 0), !dbg !39
%call2 = tail call noundef i32 @_Z3geti(i32 noundef 0), !dbg !40
%call3 = tail call noundef i32 @_Z3geti(i32 noundef 0), !dbg !41
call void @_Z3escPi(i32* noundef nonnull %x), !dbg !42
call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull %0) #5, !dbg !43
ret i32 0, !dbg !44
}
Prototype results
This table shows the average of the percentage of each variable’s scope that the variable has locations for (essentially llvm-locstats
’s “%PC bytes covered” stat) in the CTMark projects when compiling with assignment tracking disabled and enabled. The CTMark suit is built in the RelWithDebInfo (-O2 -g) configuration, with the addition of “-fno-omit-frame-pointer” to side-step PR50285.
+-------------------+--------------------------------------+
| | Average of the % of each variable's |
| | scope covered by its locations |
| | (average variable availability) |
+───────────────────+──────────+────────────+──────────────+
| CTMark project | llvm-14 | prototype | difference |
+───────────────────+──────────+────────────+──────────────+
| 7zip | 72.883 | 74.701 | 1.818 |
| ClamAV | 64.028 | 65.875 | 1.846 |
| kimwitu++ | 75.5 | 75.696 | 0.196 |
| SPASS | 64.779 | 67.954 | 3.174 |
| tramp3d-v4 | 87.091 | 87.284 | 0.193 |
| Bullet | 63.209 | 63.126 | -0.083 |
| consumer-typeset | 44.939 | 55.407 | 10.468 |
| lencod | 54.916 | 56.874 | 1.957 |
| mafft | 65.943 | 66.149 | 0.205 |
| sqlite3 | 67.316 | 68.596 | 1.28 |
| average | 66.060 | 68.166 | 2.105 |
+-------------------+----------+------------+--------------+
On average, the coverage sees a net increase of 2.1 across CTMark, or 1.2 if we exclude consumer-typeset as an outlier.
Two additional notes about this statistic. Firstly, due to the way that llvm-dwarfdump
collects these stats, each variable need only have a location for some part of it (fragment) - some number of bits not necessarily equal to the size of the variable - to qualify that the instructions covered as having a location for that variable. This means that “incorrect locations for variable fragments eliminated by assigment tracking” are not represented here. Unfortunately getting that second stat will be more challenging, but on the bright side, having those statistics seperate is useful.
Secondly, this stat only represents whether there is or isn’t a location, not whether that location is correct.
Limitations & future work
Results
The results section covers the limitations with the statistics presented here already. As an additional note on validation, I have inspected the locations generated by the assignment tracking prototype for Bullet fairly deeply (before the stats were known; the low coverage is a coincidence), but for the other examples I only skimmed the DWARF briefly. Some valid locations for variables in Bullet are dropped by the prototype, in part due to some of the limitations outlined below, but it does also remove incorret locations (as advertised). I’ve run simple examples through debuggers manually, and I’ve used dexter over a private suite and also the memvars folder upstream. Note that the memvars folder upstream includes some buggy tests - I’ve fixed them in the prototype patch.
Inlining
The current implementation results in accurate locations for a variable in the function it’s declared in. However, the locations throughout an inlined callee that stores to it may be inaccurate. This is because we only track the assignments to the caller’s variable once inlining occurs - the already-optimised stores are considered assignments and we don’t know if any have been eliminated. An earlier iteration of the prototype tracked stores through pointer arguments which were matched up to caller local variables upon inlining, but I opted for the simpler solution presented in this iteration as a more reliable first step.
PHIs
We don’t have PHIs while assignments are in non-SSA form, so we don’t have dbg.assign
intrinsics describing merged values. This is fine in non-SSA form because the value is implicitly merged in memory. When an alloca
is promoted by mem2reg, the prototype adds dbg.value
s for inserted PHIs to match existing behaviour. When LowerDbgDeclare generates sets of dbg.value
s for variables backed by unpromoted allocas, PHIs don’t exist (yet), but it does insert dbg.value
s after loads, which is behaviour that doesn’t make sense with assignment tracking (I don’t think!). Stores and loads may be promoted to SSA over some parts of the program, such that loads are replaced by PHIs; in this case the dbg.value
added by LowerDbgDeclare
to track the value after the load will now track the PHI value. We don’t get the same with assignment tracking, and I’m not sure how to resolve that. In theory, we could work out where “debug PHIs” occur after optimisation, though the prototype doesn’t currently do this. I am not sure what to do here - my current line of thinking is that we should implement that, but then there’s one more problem. Any values that were propagated along edges that get deleted before the analysis runs won’t be represented, leading to incorrect debug info in that case. See this ticket for an example of this occurring in LLVM outside of the prototype.
Latent issues in LLVM
The prototype relies on describing assignments to parts of variables (fragments).
There are several issues with the way that variable fragments (DW_OP_LLVM_fragment) are handled and interpreted in LLVM. Debug intrinsics describing overlapping fragments are often interpreted incorrectly leading to incorrect locations (one known offender is LiveDebugVariables). As well as being interpreted incorrectly, they can also be interpreted in such a way that unecessarily eliminates locations for parts of the variable - for more info, check out AggregateFragmentsPass in llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp in the prototype patch.
LLDB doesn’t appear to interpret variable locations made up of DW_OP_piece
(generated in expressions using DW_OP_LLVM_fragment
) correctly if parts of the variable have no location (bug report). I mentioned this earlier already but it’s pertinent to the fragment discussion.
I’ve found a small number of other latent issues in LLVM, including that SelectionDAG can spuriously and incorrectly re-order debug intrinsics occasionally.
Unhandled situations
There are some situations that the prototype can’t yet handle for which the clang frontend falls back to using dbg.declare
s currently. Solving these shouldn’t alter the design, just the implementation: variables sharing an alloca
(NRVO), and byval and sret double-indirect variables. One unsolved problem that may result in adjusting the design is that the dbg.assign
s currently encode the offset and size of the store as fragment info. Non-const GEPs cause trouble here since we don’t know which fragment of the variable is being assigned to. This hole in the design needs addressing.
Next steps
If this direction is desirable, I think the next steps are to work on resolving the “PHI”, “Unhandled situations”, and “Latent issues in LLVM” limitations discussed above.
Please let me know what you think.
Related & previous work
The main inspiration for this comes from a chat with Reid Kleckner, Jeremy Morse, and Stephen Tozer. which is summarised in Reid’s email “Notes from dbg.value coffee chat”.
Keno Fischer proposed a way we could track multiple locations for variables in “Proposal for multi location debug info support in LLVM IR” (RFC gist). While that RFC doesn’t try to address the same issues, the multi-location tracking mechanism did inspire a feature in this prototype. The assignment tracking prototype tracks exactly two locations per variable; an SSA value and a memory location (either of which may be unavailable), and does so with a single intrinsic rather than one intrinsic per location as in Keno’s RFC.
Thanks for reading!
Orlando