[RFC] Assignment tracking: A better way of specifying variable locations in IR

Assignment tracking: A better way of specifying variable locations in IR

In optimised code, debug locations for variables in memory can be inaccurate by optimistically specifying memory the variable no longer lives in, or may be accurate with poor coverage by pessimistically using register locations while the value is live in memory. This RFC proposes a way to maximise accuracy and coverage for such variables and presents findings from implementing a prototype. I’m hoping to get feedback on the idea and to discuss the next steps from this point.

The core idea involves tracking more information about source assignments. Rather than just tracking either the memory location (dbg.declare) or the set of values the variable takes (dbg.value) we track both, as well as which instruction (if any after optimisation) performs the assignment, and its original (pre-optimisation) position in the IR. I’ve written a prototype that achieves this with a new intrinsic called dbg.assign, a new metadata attachment called DIAssignID, and a small dataflow analysis pass.

I’ll be giving a 10-minute talk on this work at EuroLLVM 2022 titled “Improving debug locations for variables in memory”.

Background

Presently in LLVM, a variable backed by an unpromotable alloca will either have its locations tracked with a dbg.declare or a set of dbg.values depending on the type of the alloca. Converting the dbg.declare to a set of dbg.values happens in a function called LowerDbgDeclare that is run early in the optimisation pipeline by InstCombine. A dbg.declare states the variable has a single location in memory for its lifetime. dbg.values mark out where the value of the variable changes and what that value is, mostly avoiding referring to the memory locations.

An alloca that can’t be promoted by mem2reg may have stores to it that still have optimisation opportunities. If LLVM uses a dbg.declare to describe the variable location, then at points in the program where the stores are moved, modified, or deleted the value in memory and the value of the variable in the source program will not match.

When dbg.values are used we generally get correct locations because there are dbg.values marking out where the assignments occurred before any optimisations. However, because dbg.values generally avoid referring to memory locations, in areas of the program where the memory location is valid and the value of the variable is not live elsewhere (register, constant), LLVM will end up stating that the variable has no location rather than use the memory location.

LLVM must choose between preserving the dbg.declare or using a set of dbg.values early because it uses stores to determine where in the IR the variable changes value - if LLVM generated the set of dbg.values after optimisation, it would cause the same problem that we get when a dbg.declare is used. After optimisation stores no longer necessarily faithfully represent all the source assignments.

The canonical ticket for this problem is PR34136 Improve optimized debug info for escaped variables that live in memory
.

Here are two simple C++ examples to demonstrate the issues occurring with clang-14, starting with an incorrect optimistic variable location:

     1	struct Nums { int a, b, c; } glob;
     2	__attribute__((__noinline__)) void useAddress(Nums *n) { glob = *n; }
     3	__attribute__((__noinline__)) void useValue(int a) { glob.a = a; }
     4	
     5	
     6	int main() {
     7	  struct Nums nums = {1, 2, 3}; //< Store to nums.c is eliminated by DSE after GVN propagates 3 to useValue().
     8	  useValue(nums.c);
     9	
    10	  nums.c = nums.a;              //< Killing store to nums.c.
    11	  useValue(nums.c);
    12	  
    13	  useAddress(&nums);            //< nums' address escapes.
    14	  return 0;
    15	}
$ clang++ test.cpp -O2 -g -o test
$ gdb test
GNU gdb (GDB) 10.2
...
Breakpoint 1, main () at test.cpp:7
7	  struct Nums nums = {1, 2, 3}; //< Store to nums.c is eliminated by DSE after GVN propagates 3 to useValue().
(gdb) n
8	  useValue(nums.c);
(gdb) p nums
$1 = {a = 1, b = 2, c = 0}

The variable location tells the debugger to read the value of nums.c from memory, which contains 0. This is misleading because the value passed to useValue and the value of nums.c at this point according to the source code is 3.

The next example demonstrates unnecessarily pessimistic location coverage (no location for x for part of this function):

     1	__attribute__((__optnone__)) int  get(int x)  { return x; }
     2	__attribute__((__optnone__)) void esc(int* c) {}
     3	int main() {
     4	  int x = get(5);
     5	  get(0);
     6	  get(0);
     7	  get(0);
     8	  esc(&x);
     9	  return 0;
    10	}
$ clang++ test.cpp -O2 -g -o test
$ gdb test
GNU gdb (GDB) 10.2
...
Breakpoint 1, main () at test.cpp:4
4	  int x = get(5);
(gdb) n
5	  get(0);
(gdb) display x
1: x = 5
(gdb) n
6	  get(0);
1: x = <optimised out>
(gdb) 
7	  get(0);
1: x = <optimised out>
(gdb) 
8	  esc(&x);
1: x = 5
(gdb) disassemble main
Dump of assembler code for function main():
   0x00000000004004c0 <+0>:	push   %rax
   0x00000000004004c1 <+1>:	mov    $0x5,%edi
   0x00000000004004c6 <+6>:	call   0x4004a0 <_Z3geti>
   0x00000000004004cb <+11>:	mov    %eax,0x4(%rsp)           ;; Store value of x.
   0x00000000004004cf <+15>:	xor    %edi,%edi
   0x00000000004004d1 <+17>:	call   0x4004a0 <_Z3geti>
   0x00000000004004d6 <+22>:	xor    %edi,%edi
   0x00000000004004d8 <+24>:	call   0x4004a0 <_Z3geti>
   0x00000000004004dd <+29>:	xor    %edi,%edi
   0x00000000004004df <+31>:	call   0x4004a0 <_Z3geti>
   0x00000000004004e4 <+36>:	lea    0x4(%rsp),%rdi
   0x00000000004004e9 <+41>:	call   0x4004b0 <_Z3escPi>
   0x00000000004004ee <+46>:	xor    %eax,%eax
   0x00000000004004f0 <+48>:	pop    %rcx
   0x00000000004004f1 <+49>:	ret    
End of assembler dump.

We don’t have a location on lines 6 and 7 despite the fact that the result of get(5) is immediately copied to memory. The compiler had to choose between a memory and register location, and this decision was made early in the optimisation pipeline. In this example it chose the register location; it’s difficult to know the optimal decision in advance. That’s where the proposed solution comes in. We retain enough information to be able to delay this decision until after optimisations.

Solution summary

The core of the proposed solution is to delay choosing between whether to use register or memory locations until after optimisations so that we can make optimal decisions.

The frontend marks each assignment to each variable with a new intrinsic. That intrinsic tracks the stored value, store location, which instruction performs the store, and the original IR position of the store (as it may move during optimisation). That information is carried through optimisations, set up in such a way as to minimise effort required from pass writers. After optimisation, LLVM performs a dataflow analysis to calculate locations that maximise coverage correctly by choosing between memory and register (and implicit) locations at each instruction.

Tracking “which instruction performs the store” is achieved by tagging such instructions with a metadata attachment (DIAssignID, more detail in the next section) which is maintained through optimisations. Looking at an existing example, tbaa access tags are preserved to enable optimisations, and we are attaching and preserving metadata in the same way.

Prototype implementation

I’ve prototyped an implementation which can be found here if you want to try it out. The prototype involves the following changes to the compiler:

Add a new IR intrinsic, llvm.dbg.assign, which has 5 parameters. The first 3 parameters are identical to an llvm.dbg.value and the next two are new.

llvm.dbg.assign(value, expression, variable, id, destination)

0 value (Value) : The assigned value.
1 expression (DIExpression) : Expression to apply to `value`.
2 variable (DILocalVariable) : The user variable that this location describes.
3 id (DIAssignID) : The id of the instruction performing the source assignment.
4 destination (pointer type Value) : Store destination.

Add a new metadata kind, DIAssignID, which has no fields; it is just used as an ID (each instance is distinct since we are relying on location - pointer - equality to distinguish). This metadata is used as both an attachment and, wrapped in MetadataAsValue, as an argument to dbg.assign intrinsics. If a DIAssignID used by a dbg.assign is attached to one or more instructions, it means that those instructions perform the assignment that the intrinsic is describing. The instruction(s) and intrinsic are “linked”. It is legal (and frequently the case) for a dbg.assign to use a DIAssignID that is not attached to any instructions. I call this an “unlinked” dbg.assign. In this case, it’s likely that the store to memory has been eliminated, but it is also possible that a pass has failed to update metadata on an instruction after optimisation. The latter case is a safe fallback (no wrong locations introduced - we will use the value instead of the destination) because it is pessimistic (there may actually be a valid memory location if the store does exist).

The clang frontend no longer emits a dbg.declare stating that a variable’s alloca is its stack home. Instead, it emits dbg.assigns for each variable after each store-like, or “assigning”, instruction (stores and memory setting intrinsics). Each of these assigning instructions gets a distinct DIAssignID attachment which is used by the dbg.assign that follows - the instruction and the dbg.assign are linked. An alloca is treated as an assignment of undef and gets a DIAssignID and dbg.assign accordingly. For situations that the prototype cannot yet handle - see limitations section - the front end falls back to emitting a dbg.declare for the variable.

There is a new dataflow analysis “pass” (it’s just a function called by SelectionDAGISel at the moment). It converts dbg.assigns to conventional debug intrinsics just before ISel so that the prototype only needs to work in IR-land. It is kind of a “LiveDebugValues lite”. It is a fixed-point dataflow analysis that uses debug intrinsics to iteratively determine the location for each variable at every point in the program. It uses a join-semilattice to describe the locations where Unknown > { Memory, Value }. After computing the locations for each variable, it replaces all of the dbg.assign intrinsics with either a dbg.declare or new set of dbg.value intrinsics (which may include DW_OP_deref expressions - i.e. may include memory locations).

Considerations for pass writers and maintainers

cloning an instruction: nothing new to do. Cloning automatically clones a DIAssignID attachment. Multiple instructions may have the same DIAssignID instruction. In this case, the assignment is considered to take place in multiple positions in the program.

moving a non-debug instruction: nothing new to do. Instructions linked to a dbg.assign have their initial IR position marked by the position of the dbg.assign.

moving a debug intrinsic: avoid moving a dbg.assign intrinsic at all costs, unless not doing so would result in the intrinsic being lost (e.g. a block only contains debug intrinsics and is going to be deleted: hoist or, preferably, sink the intrinsics and make them undef).

deleting a non-debug instruction: nothing new to do. Simple DSE does not require any change; it’s safe to delete an instruction with a DIAssignID attachment. A dbg.assign that uses a DIAssignID that is not attached to any instruction indicates that the memory location isn’t valid.

deleting a debug intrinsic: Nothing new to do. Just like for conventional debug intrinsics, unless it is unreachable, it’s almost always incorrect to delete a dbg.assign intrinsic.

merging stores: In many cases no change is required as DIAssignID attachments are automatically merged if combineMetadata is called. One way or another, the DIAssignID attachments must be merged such that new store becomes linked to all the dbg.assign intrinsics that the merged stores were linked to. This can be achieved simply by calling a helper function Instruction::mergeDIAssignID.

inlining stores: As stores are inlined we must generate dbg.assign intrinsics and DIAssignID attachments as if the stores represent source assignments, just like the in frontend. This isn’t perfect, as stores may have been moved, modified or deleted before inlining, but it does at least keep the information about the variable correct within the non-inlined scope. See the limitations & future work section.

I made minor changes to some passes that were not already obeying the above rules, and had to put in some extra work for these specific passes:

SROA and passes that split stores: Similarly to how dbg.declares are handled, clone the dbg.assign intrinsics linked to the store, update the fragment part of the expression, and give the split stores (and cloned intrinsics) a new DIAssignID. In other words, treat the split stores as separate assignments. This is also done for partial DSE (e.g. shortening a memset).

mem2reg inserts dbg.value intrinsics for variables described by dbg.assign intrinsics PHIs as it would for dbg.declares (it’s safe to mix dbg.assign and dbg.value) but doesn’t add intrinsics at store sites as it otherwise would for dbg.declares because the store is already tracked by the dbg.assign. The fact that we need to track PHIs is worth its own separate discussion IMO, but is mostly immaterial to this proposal. It needs to be done here to match existing behaviour for fully promoted variables.

Examples

Here are the examples from the beginning using the prototype to build them to show them working in a debugger and to compare the IR. Note: If you build the first example with the prototype you should debug it with GDB (rather than LLDB, which will run into this issue) if you want to see the fix in action.

Example 1

Running the prototype-built version in gdb:

$ clang++ test.cpp -O2 -g -o test -Xclang -debug-coffee-chat 
$ gdb test
GNU gdb (GDB) 10.2
...
Breakpoint 1, main () at test.cpp:7
7	  struct Nums nums = {1, 2, 3}; //< Store to nums.c is eliminated by DSE after GVN propagates 3 to useValue().
(gdb) n
8	  useValue(nums.c);
(gdb) p nums
$1 = {a = 1, b = 2, c = <synthetic pointer>}

Here’s the IR:

; llvm-14: clang++ -emit-llvm -S -O2 -g -o - 
define dso_local noundef i32 @main() local_unnamed_addr #4 !dbg !40 {
entry:
  %nums = alloca %struct.Nums, align 8
  %0 = bitcast %struct.Nums* %nums to i8*, !dbg !45
  call void @llvm.lifetime.start.p0i8(i64 12, i8* nonnull %0) #7, !dbg !45
  call void @llvm.dbg.declare(metadata %struct.Nums* %nums, metadata !44, metadata !DIExpression()), !dbg !46
  %1 = bitcast %struct.Nums* %nums to i64*, !dbg !46
  store i64 8589934593, i64* %1, align 8, !dbg !46
  %c = getelementptr inbounds %struct.Nums, %struct.Nums* %nums, i64 0, i32 2, !dbg !47
  tail call void @_Z8useValuei(i32 noundef 3), !dbg !48
  store i32 1, i32* %c, align 8, !dbg !49, !tbaa !50
  tail call void @_Z8useValuei(i32 noundef 1), !dbg !51
  call void @_Z10useAddressP4Nums(%struct.Nums* noundef nonnull %nums), !dbg !52
  call void @llvm.lifetime.end.p0i8(i64 12, i8* nonnull %0) #7, !dbg !53
  ret i32 0, !dbg !53
}

; prototype enabled: clang++ -Xclang -debug-coffee-chat -emit-llvm -S -O2 -g -o -
define dso_local noundef i32 @main() local_unnamed_addr #4 !dbg !44 {
entry:
  %nums = alloca %struct.Nums, align 8, !DIAssignID !49
  call void @llvm.dbg.assign(metadata i1 undef, metadata !48, metadata !DIExpression(), metadata !49, metadata %struct.Nums* %nums), !dbg !50
  %0 = bitcast %struct.Nums* %nums to i8*, !dbg !51
  call void @llvm.lifetime.start.p0i8(i64 12, i8* nonnull %0) #6, !dbg !51
  %1 = bitcast %struct.Nums* %nums to i64*, !dbg !52
  store i64 8589934593, i64* %1, align 8, !dbg !52, !DIAssignID !53
  call void @llvm.dbg.assign(metadata i1 undef, metadata !48, metadata !DIExpression(), metadata !53, metadata i8* %0), !dbg !50
  call void @llvm.dbg.assign(metadata i1 undef, metadata !48, metadata !DIExpression(DW_OP_LLVM_fragment, 64, 32), metadata !53, metadata i8* undef), !dbg !50
  %c = getelementptr inbounds %struct.Nums, %struct.Nums* %nums, i64 0, i32 2, !dbg !54
  tail call void @_Z8useValuei(i32 noundef 3), !dbg !55
  store i32 1, i32* %c, align 8, !dbg !56, !tbaa !57, !DIAssignID !58
  call void @llvm.dbg.assign(metadata i32 1, metadata !48, metadata !DIExpression(DW_OP_LLVM_fragment, 64, 32), metadata !58, metadata i32* %c), !dbg !50
  tail call void @_Z8useValuei(i32 noundef 1), !dbg !59
  call void @_Z10useAddressP4Nums(%struct.Nums* noundef nonnull %nums), !dbg !60
  call void @llvm.lifetime.end.p0i8(i64 12, i8* nonnull %0) #6, !dbg !61
  ret i32 0, !dbg !61
}

; IR after the dataflow that converts the assignment tracking debug info to "normal" debug intrinsics for SelectionDAG
define dso_local noundef i32 @main() local_unnamed_addr #4 !dbg !40 {
entry:
  %nums = alloca %struct.Nums, align 8
  call void @llvm.dbg.value(metadata %struct.Nums* %nums, metadata !44, metadata !DIExpression(DW_OP_deref)), !dbg !45
  %0 = bitcast %struct.Nums* %nums to i8*, !dbg !46
  call void @llvm.lifetime.start.p0i8(i64 12, i8* nonnull %0) #7, !dbg !46
  %1 = bitcast %struct.Nums* %nums to i64*, !dbg !47
  store i64 8589934593, i64* %1, align 8, !dbg !47
  call void @llvm.dbg.value(metadata i8* undef, metadata !44, metadata !DIExpression(DW_OP_deref, DW_OP_LLVM_fragment, 64, 32)), !dbg !45
  call void @llvm.dbg.value(metadata %struct.Nums* %nums, metadata !44, metadata !DIExpression(DW_OP_deref, DW_OP_LLVM_fragment, 0, 64)), !dbg !45
  %c = getelementptr inbounds %struct.Nums, %struct.Nums* %nums, i64 0, i32 2, !dbg !48
  tail call void @_Z8useValuei(i32 noundef 3), !dbg !49
  store i32 1, i32* %c, align 8, !dbg !50, !tbaa !51
  call void @llvm.dbg.value(metadata %struct.Nums* %nums, metadata !44, metadata !DIExpression(DW_OP_plus_uconst, 8, DW_OP_deref, DW_OP_LLVM_fragment, 64, 32)), !dbg !45
  tail call void @_Z8useValuei(i32 noundef 1), !dbg !52
  call void @_Z10useAddressP4Nums(%struct.Nums* noundef nonnull %nums), !dbg !53
  call void @llvm.lifetime.end.p0i8(i64 12, i8* nonnull %0) #7, !dbg !54
  ret i32 0, !dbg !54
}

Example 2

$ clang++ test.cpp -O2 -g -o test -Xclang -debug-coffee-chat 
$ gdb test
GNU gdb (GDB) 10.2
...
Breakpoint 1, main () at test.cpp:4
4	  int x = get(5);
(gdb) n
5	  get(0);
(gdb) display x
1: x = 5
(gdb) n
6	  get(0);
1: x = 5
(gdb) n
7	  get(0);
1: x = 5
(gdb) n
8	  esc(&x);
1: x = 5

Here’s the IR:

; clang++ -emit-llvm -S -O2 -g -o -
define dso_local noundef i32 @main() local_unnamed_addr #2 !dbg !30 {
entry:
  %x = alloca i32, align 4
  %0 = bitcast i32* %x to i8*, !dbg !35
  call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull %0) #5, !dbg !35
  %call = tail call noundef i32 @_Z3geti(i32 noundef 5), !dbg !36
  call void @llvm.dbg.value(metadata i32 %call, metadata !34, metadata !DIExpression()), !dbg !37
  store i32 %call, i32* %x, align 4, !dbg !38, !tbaa !13
  %call1 = tail call noundef i32 @_Z3geti(i32 noundef 0), !dbg !39
  %call2 = tail call noundef i32 @_Z3geti(i32 noundef 0), !dbg !40
  %call3 = tail call noundef i32 @_Z3geti(i32 noundef 0), !dbg !41
  call void @llvm.dbg.value(metadata i32* %x, metadata !34, metadata !DIExpression(DW_OP_deref)), !dbg !37
  call void @_Z3escPi(i32* noundef nonnull %x), !dbg !42
  call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull %0) #5, !dbg !43
  ret i32 0, !dbg !44
}

; clang++ -Xclang -debug-coffee-chat -emit-llvm -S -O2 -g -o -
define dso_local noundef i32 @main() local_unnamed_addr #2 !dbg !34 {
entry:
  %x = alloca i32, align 4, !DIAssignID !39
  call void @llvm.dbg.assign(metadata i1 undef, metadata !38, metadata !DIExpression(), metadata !39, metadata i32* %x), !dbg !40
  %0 = bitcast i32* %x to i8*, !dbg !41
  call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull %0) #4, !dbg !41
  %call = tail call noundef i32 @_Z3geti(i32 noundef 5), !dbg !42
  store i32 %call, i32* %x, align 4, !dbg !43, !tbaa !15, !DIAssignID !44
  call void @llvm.dbg.assign(metadata i32 %call, metadata !38, metadata !DIExpression(), metadata !44, metadata i32* %x), !dbg !40
  %call1 = tail call noundef i32 @_Z3geti(i32 noundef 0), !dbg !45
  %call2 = tail call noundef i32 @_Z3geti(i32 noundef 0), !dbg !46
  %call3 = tail call noundef i32 @_Z3geti(i32 noundef 0), !dbg !47
  call void @_Z3escPi(i32* noundef nonnull %x), !dbg !48
  call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull %0) #4, !dbg !49
  ret i32 0, !dbg !50
}

; IR after the dataflow that converts the assignment tracking debug info to "normal" debug intrinsics for SelectionDAG
define dso_local noundef i32 @main() local_unnamed_addr #2 !dbg !30 {
entry:
  %x = alloca i32, align 4
  call void @llvm.dbg.declare(metadata i32* %x, metadata !34, metadata !DIExpression()), !dbg !35
  %0 = bitcast i32* %x to i8*, !dbg !36
  call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull %0) #5, !dbg !36
  %call = tail call noundef i32 @_Z3geti(i32 noundef 5), !dbg !37
  store i32 %call, i32* %x, align 4, !dbg !38, !tbaa !14
  %call1 = tail call noundef i32 @_Z3geti(i32 noundef 0), !dbg !39
  %call2 = tail call noundef i32 @_Z3geti(i32 noundef 0), !dbg !40
  %call3 = tail call noundef i32 @_Z3geti(i32 noundef 0), !dbg !41
  call void @_Z3escPi(i32* noundef nonnull %x), !dbg !42
  call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull %0) #5, !dbg !43
  ret i32 0, !dbg !44
}

Prototype results

This table shows the average of the percentage of each variable’s scope that the variable has locations for (essentially llvm-locstats's “%PC bytes covered” stat) in the CTMark projects when compiling with assignment tracking disabled and enabled. The CTMark suit is built in the RelWithDebInfo (-O2 -g) configuration, with the addition of “-fno-omit-frame-pointer” to side-step PR50285.

+-------------------+--------------------------------------+ 
|                   | Average of the % of each variable's  |
|                   | scope covered by its locations       |
|                   | (average variable availability)      |
+───────────────────+──────────+────────────+──────────────+
| CTMark project    | llvm-14  | prototype  | difference   |
+───────────────────+──────────+────────────+──────────────+
| 7zip              | 72.883   | 74.701     | 1.818        |
| ClamAV            | 64.028   | 65.875     | 1.846        |
| kimwitu++         | 75.5     | 75.696     | 0.196        |
| SPASS             | 64.779   | 67.954     | 3.174        |
| tramp3d-v4        | 87.091   | 87.284     | 0.193        |
| Bullet            | 63.209   | 63.126     | -0.083       |
| consumer-typeset  | 44.939   | 55.407     | 10.468       |
| lencod            | 54.916   | 56.874     | 1.957        |
| mafft             | 65.943   | 66.149     | 0.205        |
| sqlite3           | 67.316   | 68.596     | 1.28         |
| average           | 66.060   | 68.166     | 2.105        |
+-------------------+----------+------------+--------------+

On average, the coverage sees a net increase of 2.1 across CTMark, or 1.2 if we exclude consumer-typeset as an outlier.

Two additional notes about this statistic. Firstly, due to the way that llvm-dwarfdump collects these stats, each variable need only have a location for some part of it (fragment) - some number of bits not necessarily equal to the size of the variable - to qualify that the instructions covered as having a location for that variable. This means that “incorrect locations for variable fragments eliminated by assigment tracking” are not represented here. Unfortunately getting that second stat will be more challenging, but on the bright side, having those statistics seperate is useful.

Secondly, this stat only represents whether there is or isn’t a location, not whether that location is correct.

Limitations & future work

Results

The results section covers the limitations with the statistics presented here already. As an additional note on validation, I have inspected the locations generated by the assignment tracking prototype for Bullet fairly deeply (before the stats were known; the low coverage is a coincidence), but for the other examples I only skimmed the DWARF briefly. Some valid locations for variables in Bullet are dropped by the prototype, in part due to some of the limitations outlined below, but it does also remove incorret locations (as advertised). I’ve run simple examples through debuggers manually, and I’ve used dexter over a private suite and also the memvars folder upstream. Note that the memvars folder upstream includes some buggy tests - I’ve fixed them in the prototype patch.

Inlining

The current implementation results in accurate locations for a variable in the function it’s declared in. However, the locations throughout an inlined callee that stores to it may be inaccurate. This is because we only track the assignments to the caller’s variable once inlining occurs - the already-optimised stores are considered assignments and we don’t know if any have been eliminated. An earlier iteration of the prototype tracked stores through pointer arguments which were matched up to caller local variables upon inlining, but I opted for the simpler solution presented in this iteration as a more reliable first step.

PHIs

We don’t have PHIs while assignments are in non-SSA form, so we don’t have dbg.assign intrinsics describing merged values. This is fine in non-SSA form because the value is implicitly merged in memory. When an alloca is promoted by mem2reg, the prototype adds dbg.values for inserted PHIs to match existing behaviour. When LowerDbgDeclare generates sets of dbg.values for variables backed by unpromoted allocas, PHIs don’t exist (yet), but it does insert dbg.values after loads, which is behaviour that doesn’t make sense with assignment tracking (I don’t think!). Stores and loads may be promoted to SSA over some parts of the program, such that loads are replaced by PHIs; in this case the dbg.value added by LowerDbgDeclare to track the value after the load will now track the PHI value. We don’t get the same with assignment tracking, and I’m not sure how to resolve that. In theory, we could work out where “debug PHIs” occur after optimisation, though the prototype doesn’t currently do this. I am not sure what to do here - my current line of thinking is that we should implement that, but then there’s one more problem. Any values that were propagated along edges that get deleted before the analysis runs won’t be represented, leading to incorrect debug info in that case. See this ticket for an example of this occurring in LLVM outside of the prototype.

Latent issues in LLVM

The prototype relies on describing assignments to parts of variables (fragments).

There are several issues with the way that variable fragments (DW_OP_LLVM_fragment) are handled and interpreted in LLVM. Debug intrinsics describing overlapping fragments are often interpreted incorrectly leading to incorrect locations (one known offender is LiveDebugVariables). As well as being interpreted incorrectly, they can also be interpreted in such a way that unecessarily eliminates locations for parts of the variable - for more info, check out AggregateFragmentsPass in llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp in the prototype patch.

LLDB doesn’t appear to interpret variable locations made up of DW_OP_piece (generated in expressions using DW_OP_LLVM_fragment) correctly if parts of the variable have no location (bug report). I mentioned this earlier already but it’s pertinent to the fragment discussion.

I’ve found a small number of other latent issues in LLVM, including that SelectionDAG can spuriously and incorrectly re-order debug intrinsics occasionally.

Unhandled situations

There are some situations that the prototype can’t yet handle for which the clang frontend falls back to using dbg.declares currently. Solving these shouldn’t alter the design, just the implementation: variables sharing an alloca (NRVO), and byval and sret double-indirect variables. One unsolved problem that may result in adjusting the design is that the dbg.assigns currently encode the offset and size of the store as fragment info. Non-const GEPs cause trouble here since we don’t know which fragment of the variable is being assigned to. This hole in the design needs addressing.

Next steps

If this direction is desirable, I think the next steps are to work on resolving the “PHI”, “Unhandled situations”, and “Latent issues in LLVM” limitations discussed above.

Please let me know what you think.

Related & previous work

The main inspiration for this comes from a chat with Reid Kleckner, Jeremy Morse, and Stephen Tozer. which is summarised in Reid’s email “Notes from dbg.value coffee chat”.

Keno Fischer proposed a way we could track multiple locations for variables in “Proposal for multi location debug info support in LLVM IR” (RFC gist). While that RFC doesn’t try to address the same issues, the multi-location tracking mechanism did inspire a feature in this prototype. The assignment tracking prototype tracks exactly two locations per variable; an SSA value and a memory location (either of which may be unavailable), and does so with a single intrinsic rather than one intrinsic per location as in Keno’s RFC.

Thanks for reading!
Orlando

2 Likes

This is great, I am really happy to see this idea moving along. I’m glad if I was able to help in any way, I feel like I’ve had a lot of ideas in this area, and taken very few concrete steps to move things forward.

The core idea of tracking all the assignments separately from stores starting in the frontend seems good.

I want to try to clarify which assignments the frontend should try to describe, and how it should describe the LHS of those assignments. Fully describing anything that can appear as an LValue in C/C++ is really complicated. Currently we try to cover that with a combination of DILocalVariable + DIExpression fragments, but that is very incomplete. Consider assignments like:

struct Bar { int b; };
Bar *wash(Bar *p) { return p; }
void foo() {
  Bar o;
  wash(&o)->b = 1; // can DSE
  wash(&o)->b = 2;
}

What kind of assignment would clang produce for these assignments? What local variable are we updating, what goes on the left hand side? DSE fires here today, and the user can observe stale values in the debugger.

I’d like to clarify what values we promise to be accurate, and I think our promise should focus on assignments where the LHS is a local variable or a field of one.

I think you’ve alluded to this in the section on inlining: “An earlier iteration of the prototype tracked stores through pointer arguments which were matched up to caller local variables upon inlining, but I opted for the simpler solution presented in this iteration as a more reliable first step.” I worry a little bit here about keeping track of assignments to fields of value types, like C++ iterators, which almost always happen in inline wrapper functions.

I think what I’m saying is that, we should dig into this more in the future, and try to come up with some principles to help us decide when debug info is inaccurate (allows the user to see a program state not possible in the source) but still acceptable.

2 Likes

Thank you for the excellent write-up, I can’t wait to watch a recording of your talk!
I have two questions:

  • From a syntax point of view, it looks like DIAssign is used to tie a llvm.dbg.assign intrinsic to exactly one LLVM instruction. Would we be better off if we designed syntax & IR to attach a list of assign intrinsics to an LLVM instruction?
  • Can you explain how this approach works (or could be extended to work) with constants? Particularly with constants that are no longer being materialized and only exist in debug info?

Otherwise, fantastic work!

Forgive me if the answer to this is obvious, but how does this relate to llvm.dbg.addr?

Thanks @rnk, this is really useful feedback.

I’d like to clarify what values we promise to be accurate, and I think our promise should focus on assignments where the LHS is a local variable or a field of one.

This is more or less the promise that the prototype achieves in its current state, but I think it needs to be carefully evaluated.

Here is another simple but philosophically difficult example.

__attribute__((__optnone__)) void step() {}
void useAddress(int *n);

void fun() {
  int local = 5;
  int* local_ptr = &local;

  step();

  *local_ptr = 6;

  step();

  useAddress(&local);
}

SROA/mem2reg removes the local_ptr indirection and then later the first store (local = 5) is eliminated. Assignment tracking currently doesn’t track the indirect store (6) which results in the constant value 5 variable location extending to the rest of the function (the memory location isn’t even re-instated upon reaching useAddress, but I think we can consider that a bug and it should be easy to fix). Using -instcombine-lower-dbg-declare=1 (default) both assignments are recorded because LowerDbgDeclare is run after mem2reg removes the indirection. Using -instcombine-lower-dbg-declare=0, the memory location is always used which results in seeing unitialised memory until the indirect assignment.

For Assignment Tracking, we could recognise that there’s a store to local’s alloca (6) that has no attached DIAssignID metadata during the dataflow. At that point defensively state that the location is unknown and re-instate the memory location upon reaching the escaping function. That would “fix” the issue, but I don’t particularly like it since both other existing methods manage to provide locations capturing the indirect assignment. Alternatively, at the point where the second store (which has no metadata) is encountered, we could optimistically use the memory location. I’m not sure I like that either though, because it goes against our historic preference for higher accuracy over higher coverage.

I think what I’m saying is that, we should dig into this more in the future, and try to come up with some principles to help us decide when debug info is inaccurate (allows the user to see a program state not possible in the source) but still acceptable

This is a really good point, and that would definitely be a useful conversation. I will get back to you in more depth after I’ve had some time to consider this carefully and collect some data.

Thank you @adrian.parntl!

From a syntax point of view, it looks like DIAssign is used to tie a llvm.dbg.assign intrinsic to exactly one LLVM instruction

We can actually end up with multiple instructions linked to an llvm.dbg.assign intrinsic with the metadata. For example, LICM may sink a store into all of the exit blocks of loop, which involves duplicating the store into different blocks. All of the stores retain the DIAssignID metadata tag linking them to the llvm.dbg.assign intrinsic (which stays in the loop).

bb.loop:
  dbg.asssign !1
| \ 
|  \
|   bb.exit.2:
|     store !1
|
bb.exit.1:
  store !1

Would we be better off if we designed syntax & IR to attach a list of assign intrinsics to an LLVM instruction?

We still need to record a position in IR for the llvm.dbg.assign, but I think that we could implement something that is cleaner than using metadata attachments.

Can you explain how this approach works (or could be extended to work) with constants? Particularly with constants that are no longer being materialized and only exist in debug info?

If I understand the question correctly, it handles constants similarly to how llvm.dbg.value handles them. For example:

...
store i32 1, i32* %a, align 1, !DIAssignID !1
call @llvm.dbg.assign(metadata i32 1, !myvar, !DIExpression(), metadata !1, metadata i32* %a)
...

Then after DSE:

...
; No store with attachment !DIAssignID !1 exists so memory location is ignored. The value
; (1st operand) is still known, so we can emit `DW_OP_lit1, DW_OP_stack_value` from here.
call @llvm.dbg.assign(metadata i32 1, !myvar, !DIExpression(), metadata !1, metadata i32* %a)
...

That is equivalent to:

...
call @llvm.dbg.value(metadata i32 1, !myvar, !DIExpression())
...

Did I understand your question correctly?

Forgive me if the answer to this is obvious, but how does this relate to llvm.dbg.addr?

Hi @dmlloyd, that’s a good question. The llvm.dbg.assign intrinsic is similar to llvm.dbg.addr in that both specify a memory location for a variable and both are control-flow dependant. However unlike llvm.dbg.addr, llvm.dbg.assign uses DIAssignID metadata attachments in order to link (or point to) instructions that are performing the assignment that the intrinsic describes. This gives us more information about the relationship between the unoptimised and optimised program which we use in the dataflow.

More abstractly, llvm.dbg.addr specifies that the variable is live in memory at that point in the IR, whereas llvm.dbg.assign specifies there was an assignment at that point. Does that help?

Thanks Orlando! Your example

call @llvm.dbg.assign(metadata i32 1, !myvar, !DIExpression(), metadata !1, metadata i32* %a)

answers both of my questions. It wasn’t clear to me that llvm.dbg.assign would accept constant values as the value parameter. Since it does, that also explains why it may not be a good idea to attach them to other instructions, since there may not be one, such as in the constant or the undef case.

Out of curiosity, would LLVM IR allow us to use a MetadataFromValue(StoreInst) as the first parameter? Maybe by using an assembler syntax like:

%token = store i32 0, i32* %var
call void @llvm.dbg.assign(metadata store_token %token, ...)

?

Ah I see, it’s starting to click. :slight_smile: Thanks!

Out of curiosity, would LLVM IR allow us to use a MetadataFromValue(StoreInst) as the first parameter? Maybe by using an assembler syntax like:

%token = store i32 0, i32* %var
call void @llvm.dbg.assign(metadata store_token %token, ...)

I did consider attempting to use tokens but decided to use metadata for the prototype since it was less work and more flexible (I wasn’t sure exactly how everything was going to work).

One of my original reasons for not using tokens is that llvm.dbg.assign intrinsics reference more than just store instructions: alloca, store, mem* intrinsics. I’m still not sure whether to consider user calls to the mem* functions as assignments - if we do, then llvm.dbg.assign intrinsics will also need to reference those calls (and those functions return a value). However looking back, I don’t think this stops us using tokens. Memory intrinsics don’t return a value so we could use tokens for those, and allocas (and user mem* function calls) could just be referenced via their value.

In any case, using the metadata approach instead of tokens has these two benefits:

  1. Using metadata requires fewer code changes (I think). Stores and mem* intrinsics are not replaced using RAUW. In order to keep @llvm.dbg.assigns up to date we’d need to add many RAUW calls.

  2. Tokens introduce some admin overhead because an llvm.dbg.assign can refer to multiple instructions (like in the LICM example), which means we need to manage a list of references (possibly using DIArgList).

It may be worth dealing with these issues in order to use tokens for the real implementation. I think using tokens would make the IR more readable, and referencing instructions in that way feels more familiar. I’m not sure that those reasons alone are enough though. What do you think?

First, I don’t think it makes sense to involve token types in this. Token types have weird restrictions on IR movement. Store instructions claim to have type void, and we don’t want to change that:
https://github.com/llvm/llvm-project/blob/6e23cd2bf073ab5eb5a103b935af54046d007c52/llvm/lib/IR/Instructions.cpp#L1546

memcpy intrinsic calls are also values of type void.

We could potentially allow metadata to reference instructions of type void. That could potentially remove the need for metadata ids on the side. But, we would need to invent textual IR syntax for this, since void values do not receive names or numbers in an LLVM symbol table.

Regarding RAUW, my understanding is that metadata uses are not replaced by regular RAUW operations anyway. We should ask ourselves, what happens when someone calls eraseFromParent on the store to delete it? Can we live with that behavior? If so, maybe we can do it. I think ValueAsMetadata objects are effectively weak pointers, so maybe this could work.

Changing the textual IR of LLVM is pretty disruptive, despite our disclaimer of its stability. Here are some options:

  1. Just number all void instructions with the usual % sigil anyway. Drastically changes output. Potentially less human readable.
  2. Number only void values with live uses from metadata. Less disruptive, but somewhat non-local.
  3. Use some new sigil, put in the RHS, or something like that.

The more I think about it, the more 2 seems reasonable. An instruction must have a number if ValueAsMetadata exists for it. Also, I think these in-memory data structures are closer to what we want: we really just want to refer to the assigning instruction with as little indirection as possible.

That makes sense, avoiding tokens sounds sensible. Option 2 certainly looks like a good avenue to explore if we go down this route.

What I was trying to get at regarding RAUW is that metadata attachments on instructions are already preserved by the compiler in various circumstances so little work is required to preserve the link between the llvm.dbg.assign and the store(s). Whereas if llvm.dbg.assign referenced stores directly (numbered void values, tokens, etc) we’d need to do additional work in the compiler to preserve the relationship between the instruction and llvm.dbg.assign.

e.g., when a new store is created to replace an existing store it’s likely metadata is already being copied to that new store - the llvm.dbg.assign is linked to that new store automatically. If the llvm.dbg.assign references the store via an explicit Use then we’d need to add a call to RAUW to preserve the relationship (and we’d have to do this all over the compiler).

Maybe it isn’t a big problem? I just think it’s worth noting that the (existing) metadata attachment approach seems be less invasive in terms of changes to the compiler.

It’s also easier to refer to multiple instructions from one llvm.dbg.assign because multiple instructions can have the same metadata attachment.

What benefits do we get by modelling the relationship via an explicit Use (numbered void values, tokens, etc) rather than the current metadata attachment approach?


Regarding RAUW, my understanding is that metadata uses are not replaced by regular RAUW operations anyway.

AFAIK metadata uses are replaced by RAUW by default (code).

What I was trying to get at regarding RAUW is that metadata attachments on instructions are already preserved by the compiler in various circumstances so little work is required to preserve the link between the llvm.dbg.assign and the store(s).

Right, I see, the transforms to keep in mind here are duplications, such as inlining, unswitching, unrolling, etc. Most of these probably only remap non-void values. That’s pretty compelling.

What benefits do we get by modelling the relationship via an explicit Use (numbered void values, tokens, etc) rather than the current metadata attachment approach?

I think there is a simplicity benefit to avoiding indirection when possible. Consider how LLVM IR SSA uses direct pointers instead of some indirection through a side table.

The other benefit is a minor efficiency gain that we do not need another heap allocation for every tracked source level assignment.

We also often overlook the fact that metadata allocations live forever and are not garbage collected, whereas IR instructions have clear ownership. That’s already it’s own problem though, and it may deserve its own solution.


In the end, I think you should go with your plan. Adding metadata uses of void values may be premature optimization. It seems better to make forward progress first and optimize later.

But, we really do need to come back and optimize the memory usage of the compiler’s debug info representation. We’ve done a lot of work on it, but I think there’s still room for improvement.