Auto-undef debug uses of a deleted value

At the moment, debug uses of an Instruction are replaced with Undef in Instruction’s destructor (Instruction.cpp). My question is: is there a reason that we don’t do this in Value’s destructor instead, since Instruction’s dtor doesn’t cover cases such as use of a dead global?

(Alternative suggestion at end of post)


Here a contrived example that shows how we end up with misleading debug info (alternatively view in Compiler Explorer):

static void empty() {}
void ext();
void fun() {
  auto f = &ext;    // dbg.value(ext ptr, ...)
  f();
  f = ∅       // dbg.value(empty ptr, ...)
  f();
  ext();
}

Compiled with -O2 -g we end up with:

define dso_local void @_Z3funv() local_unnamed_addr #0 !dbg !9 {
  call void @llvm.dbg.value(metadata ptr @_Z3extv, metadata !14, metadata !DIExpression()), !dbg !16
  tail call void @_Z3extv(), !dbg !17
  ; NOTE: Ideally there should be a dbg.value(undef) here.
  tail call void @_Z3extv(), !dbg !18
  ret void, !dbg !19
}

The llvm.dbg.value intrinsic for the second assignment to f is dropped due to this sequence of events:

  1. SROA promotes the function pointer store-loads with the function pointer value (the indirect calls become direct calls)
  2. The call to empty is removed by the inliner.
  3. empty has internal linkage and no users so is deleted.
  4. The llvm.dbg.value describing the second assignment has its value (ptr to empty) replaced with empty metadata.
  5. The empty metadata operand signals that the debug intrinsic is fair game to be cleaned up, so it is deleted by EarlyCSEPass .

An alternative strategy could be to redefine the semantics of empty metadata in debug intrinsics. We could say that empty metadata in place of a value means what Undef currently means (the value / location of the variable is unknown). Then we don’t need to perform this RAUW at all. It’s just a thought though and I’ve not done any experimenting for that - opinions on this welcome too.

2 Likes

And here is a simpler example that demonstrates a dropped intrinsic but without the additional scaffolding required to produce incorrect debug info (since an undef dbg.value here is essentially equivalent to no dbg.value):

static void empty() {}
void fun() { auto e = ∅ }

becomes:

define dso_local void @_Z3funv() local_unnamed_addr #0 !dbg !7 {
entry:
  ret void, !dbg !13
}

Thanks for highlighting this case! After looking through your example, I agree it seems best to move the auto-undef to Value to cover cases such as this.

Hmm, I am still relatively new to the debug info portion of LLVM, so it’s hard for me to imagine the full impact of this potential change in semantics… Are there any cases where we do want to allow empty debug value inputs to still be cleaned up? If yes, then it suggests to me that we should leave the semantics as there are, since it shows that the different meanings of empty vs. undef are useful.

Based on recent discussions I think we’re probably going to need to change the auto-undef mechanism to auto-poison instead. So, if we do choose to go down that route (rather than change empty-metadata semantics) that change should happen first, i.e. change Instruction’s dtor to auto-poison, fixup tests etc, then move the behaviour from Instruction to Value’s dtor.

I get the feeling that the behaviour is just a historical artefact of the codebase, but I can’t say that with certainty. Going down the redefining-empty-metadata-semantics route would be more disruptive, so maintaining the auto-undef/poison route for now makes sense until someone has time to have a deeper look IMO (even though I do slightly prefer that idea from a design standpoint at the moment).

1 Like