IR @llvm.dbg.value entries for variables when a phi node has been created

I have been investigating missing variables / incorrect variable values when debugging code compiled at –O1 (and above) and believe that I have tracked the issue down to the interaction of the generation of IR @llvm.dbg.value entries and phi nodes. I would welcome someone who is more familiar with the generation of debug information to help me determine if what I think is going wrong is correct and guidance on the best/correct way of fixing the issue.

[I was using the ARM and AArch64 targets in my investigation, but I believe that this issue is target independent]

The following simple C code reproducer:

int func(int a)

{

int c = 1;

if (a < 0 ) {

c = 2;

}

return c;

}

Generates the following IR when compiling at –O0

define i32 @func(i32 %a) #0 !dbg !8 {

entry:

%a.addr = alloca i32, align 4

%c = alloca i32, align 4

store i32 %a, i32* %a.addr, align 4

call void @llvm.dbg.declare(metadata i32* %a.addr, metadata !12, metadata !13), !dbg !14

call void @llvm.dbg.declare(metadata i32* %c, metadata !15, metadata !13), !dbg !16

store i32 1, i32* %c, align 4, !dbg !16

%0 = load i32, i32* %a.addr, align 4, !dbg !17

%cmp = icmp slt i32 %0, 0, !dbg !19

br i1 %cmp, label %if.then, label %if.end, !dbg !20

if.then: ; preds = %entry

store i32 2, i32* %c, align 4, !dbg !21

br label %if.end, !dbg !23

if.end: ; preds = %if.then, %entry

%1 = load i32, i32* %c, align 4, !dbg !24

ret i32 %1, !dbg !25

}

This generates sensible DWARF location information for the variable c. So debugging the code compiled at –O0 is just fine … looking at the value of c when on the return statement the correct value is returned.

However if I pass this IR through the –mem2reg optimisation phase, the debug information for the variable C becomes incorrect for the return statement:

define i32 @func(i32 %a) #0 !dbg !8 {

entry:

call void @llvm.dbg.value(metadata i32 %a, i64 0, metadata !12, metadata !13), !dbg !14

call void @llvm.dbg.value(metadata i32 1, i64 0, metadata !15, metadata !13), !dbg !16

%cmp = icmp slt i32 %a, 0, !dbg !17

br i1 %cmp, label %if.then, label %if.end, !dbg !19

if.then: ; preds = %entry

call void @llvm.dbg.value(metadata i32 2, i64 0, metadata !15, metadata !13), !dbg !16

br label %if.end, !dbg !20

if.end: ; preds = %if.then, %entry

%c.0 = phi i32 [ 2, %if.then ], [ 1, %entry ]

ret i32 %c.0, !dbg !22

}

The value of the variable c when on the return statement is always incorrectly reported as being the value 2. The generated DWARF location list for the variable c looks something like (the offset 00000038 is beyond the end of the function):

00000013 00000004 00000024 (DW_OP_consts: 1; DW_OP_stack_value)

00000020 00000024 00000038 (DW_OP_consts: 2; DW_OP_stack_value)

I know what is wrong, I thought! After the phi instruction there should be an @llvm.dbg.value call which describes the variable c as having the value returned by the phi, so I manually altered the IR to the following, thinking that the return statement would now be able to generate the correct location information for the variable c:

define i32 @func(i32 %a) #0 !dbg !8 {

entry:

call void @llvm.dbg.value(metadata i32 %a, i64 0, metadata !12, metadata !13), !dbg !14

call void @llvm.dbg.value(metadata i32 1, i64 0, metadata !15, metadata !13), !dbg !16

%cmp = icmp slt i32 %a, 0, !dbg !17

br i1 %cmp, label %if.then, label %if.end, !dbg !19

if.then: ; preds = %entry

call void @llvm.dbg.value(metadata i32 2, i64 0, metadata !15, metadata !13), !dbg !16

br label %if.end, !dbg !20

if.end: ; preds = %if.then, %entry

%c.0 = phi i32 [ 2, %if.then ], [ 1, %entry ]

call void @llvm.dbg.value(metadata i32 %c.0, i64 0, metadata !15, metadata !13), !dbg !16

ret i32 %c.0, !dbg !22

}

Unfortunately adding this additional line makes no difference to the generated debug information, and the value of the variable c is still incorrectly reported to be 2 when on the return statement.

So my question is whether:

  • The above addition to the IR is the correct thing to do [but if so then there is possibly a further issue in SelectionDAGBuilder::visitIntrinsicCall()’s handling of this additional line (where it is currently being discarded)]

  • Some other @lldm.dbg.value entry should be produced to generate the correct debug information.

Keith

+Adrian Prantl who might have some ideas about the representation choices/current/future behavior here

David,

Thanks for forwarding this for Adrian’s attention.

Just a little more information about what I was doing to produce the results … my investigations to narrow down on the issue used the following series of commands:

clang -O0 --target=arm-arm-none-eabi -emit-llvm -g -S test.c

opt -mem2reg -S test.ll -o test-mem2reg.ll

llc -O0 --filetype=obj test-mem2reg.ll -o test.o

Now it occurred to me that maybe I need to use -O1 to llc if I am using the -mem2reg optimisation phase, so I tried that and unfortunately it optimised the code too well. However if I amended the reproducer to

int func(int a)
{
int c = 1;
if (a < 0 ) {
c = 2;
}
c++;
return c;
}

Then the problem manifests itself on the “c++” line with the debug information stating c has the value 2 at that statement.

Keith

I have been investigating missing variables / incorrect variable values when debugging code compiled at –O1 (and above) and believe that I have tracked the issue down to the interaction of the generation of IR @llvm.dbg.value entries and phi nodes. I would welcome someone who is more familiar with the generation of debug information to help me determine if what I think is going wrong is correct and guidance on the best/correct way of fixing the issue.

[I was using the ARM and AArch64 targets in my investigation, but I believe that this issue is target independent]

The following simple C code reproducer:

int func(int a)
{
       int c = 1;
        if (a < 0 ) {
                c = 2;
        }
        return c;
}

Generates the following IR when compiling at –O0

define i32 @func(i32 %a) #0 !dbg !8 {
entry:
  %a.addr = alloca i32, align 4
  %c = alloca i32, align 4
  store i32 %a, i32* %a.addr, align 4
  call void @llvm.dbg.declare(metadata i32* %a.addr, metadata !12, metadata !13), !dbg !14
  call void @llvm.dbg.declare(metadata i32* %c, metadata !15, metadata !13), !dbg !16
  store i32 1, i32* %c, align 4, !dbg !16
  %0 = load i32, i32* %a.addr, align 4, !dbg !17
  %cmp = icmp slt i32 %0, 0, !dbg !19
  br i1 %cmp, label %if.then, label %if.end, !dbg !20

if.then: ; preds = %entry
  store i32 2, i32* %c, align 4, !dbg !21
  br label %if.end, !dbg !23

if.end: ; preds = %if.then, %entry
  %1 = load i32, i32* %c, align 4, !dbg !24
  ret i32 %1, !dbg !25
}

This generates sensible DWARF location information for the variable c. So debugging the code compiled at –O0 is just fine .... looking at the value of c when on the return statement the correct value is returned.

However if I pass this IR through the –mem2reg optimisation phase, the debug information for the variable C becomes incorrect for the return statement:

define i32 @func(i32 %a) #0 !dbg !8 {
entry:
  call void @llvm.dbg.value(metadata i32 %a, i64 0, metadata !12, metadata !13), !dbg !14
  call void @llvm.dbg.value(metadata i32 1, i64 0, metadata !15, metadata !13), !dbg !16
  %cmp = icmp slt i32 %a, 0, !dbg !17
  br i1 %cmp, label %if.then, label %if.end, !dbg !19

if.then: ; preds = %entry
  call void @llvm.dbg.value(metadata i32 2, i64 0, metadata !15, metadata !13), !dbg !16
  br label %if.end, !dbg !20

if.end: ; preds = %if.then, %entry
  %c.0 = phi i32 [ 2, %if.then ], [ 1, %entry ]
  ret i32 %c.0, !dbg !22
}

This looks correct to me. I realize that this isn't documented in SourceLevelDebugging.rst, but a dbg.value is only describing the starting point of a location range. It is the job of LiveDebugVariables, LiveDebugValues, and DbgValueHistoryCalculator to compute the correct ranges:

- LiveDebugVariables is inserting DBG_VALUE instructions into the MIR representations
- LiveDebugValues is doing a DFA to propagate them across basic block boundaries
- DbgValueHistoryCalculator (and DwarfDebug) is determining the end of each range.

Note that there is currently some overlap in functionality between these three and as LiveDebugValues is maturing we will be able to further simplify the other two. The end goal is to have three passes that do one thing well and are simple to understand and extend.

The value of the variable c when on the return statement is always incorrectly reported as being the value 2.

This sounds like a bug in LiveDebugVariables or SelectionDAG. LiveDebugValues hasn't been taught about constants yet, so it *should* not insert a dbg.value in the return BB. To further diagnose this we should figure out if and who is inserting a DBG_VALUE into the return MBB.

The generated DWARF location list for the variable c looks something like (the offset 00000038 is beyond the end of the function):

    00000013 00000004 00000024 (DW_OP_consts: 1; DW_OP_stack_value)
    00000020 00000024 00000038 (DW_OP_consts: 2; DW_OP_stack_value)

I know what is wrong, I thought! After the phi instruction there should be an @llvm.dbg.value call which describes the variable c as having the value returned by the phi, so I manually altered the IR to the following, thinking that the return statement would now be able to generate the correct location information for the variable c:

define i32 @func(i32 %a) #0 !dbg !8 {
entry:
  call void @llvm.dbg.value(metadata i32 %a, i64 0, metadata !12, metadata !13), !dbg !14
  call void @llvm.dbg.value(metadata i32 1, i64 0, metadata !15, metadata !13), !dbg !16
  %cmp = icmp slt i32 %a, 0, !dbg !17
  br i1 %cmp, label %if.then, label %if.end, !dbg !19

if.then: ; preds = %entry
  call void @llvm.dbg.value(metadata i32 2, i64 0, metadata !15, metadata !13), !dbg !16
  br label %if.end, !dbg !20

if.end: ; preds = %if.then, %entry
  %c.0 = phi i32 [ 2, %if.then ], [ 1, %entry ]
  call void @llvm.dbg.value(metadata i32 %c.0, i64 0, metadata !15, metadata !13), !dbg !16
  ret i32 %c.0, !dbg !22
}

Unfortunately adding this additional line makes no difference to the generated debug information, and the value of the variable c is still incorrectly reported to be 2 when on the return statement.

You may have found a second bug there! I usually start by looking at the output of -print-after-all to find the point where things go wrong.

So my question is whether:
- The above addition to the IR is the correct thing to do [but if so then there is possibly a further issue in SelectionDAGBuilder::visitIntrinsicCall()’s handling of this additional line (where it is currently being discarded)]
- Some other @lldm.dbg.value entry should be produced to generate the correct debug information.

thanks for digging into this!
Adrian