Few basic questions about DBG_INSTR_REF

I am investigating the feasibility of using DBG_INSTR_REF for WebAssembly backend. I have a few basic questions:

  1. Instruction referencing for debug info — LLVM 17.0.0git documentation says we should use MachineFunction::substituteDebugValuesForInst() when an instruction is replaced and MachineInstr::dropDebugNumber() when an instruction computing a value is dropped. But it looks there are very few usages of these methods in the codebase, and close to none within lib/CodeGen:
aheejin@aheejin:~/llvm-project/llvm/lib$ grep substituteDebugValuesForInst * -R
CodeGen/InlineSpiller.cpp:    MF.substituteDebugValuesForInst(*MI, *FoldMI, Ops[0].second);
CodeGen/MachineFunction.cpp:void MachineFunction::substituteDebugValuesForInst(const MachineInstr &Old,
Target/X86/X86FixupLEAs.cpp:  MBB.getParent()->substituteDebugValuesForInst(*AluI, *NewMI2, 1);
Target/X86/X86FixupLEAs.cpp:  MBB.getParent()->substituteDebugValuesForInst(*I, *NewMI, 1);
Target/X86/X86FixupLEAs.cpp:      MBB.getParent()->substituteDebugValuesForInst(*MBI, *NewMI, 1);
Target/X86/X86FixupLEAs.cpp:    MBB.getParent()->substituteDebugValuesForInst(*I, *NewMI, 1);
Target/X86/X86FixupLEAs.cpp:    MBB.getParent()->substituteDebugValuesForInst(*I, *NewMI, 1);
Target/X86/X86FixupLEAs.cpp:    MBB.getParent()->substituteDebugValuesForInst(*I, *NewMI, 1);
Target/X86/X86FixupLEAs.cpp:    MBB.getParent()->substituteDebugValuesForInst(*I, *NewMI, 1);
Target/X86/X86FixupLEAs.cpp:  MBB.getParent()->substituteDebugValuesForInst(*I, *NewMI, 1);

aheejin@aheejin:~/llvm-project/llvm/lib$ grep dropDebugNumber * -R
Target/X86/X86InstrInfo.cpp:    CmpInstr.dropDebugNumber();
Target/X86/X86FloatingPoint.cpp:    MI.dropDebugNumber();
Target/X86/X86FloatingPoint.cpp:    I->dropDebugNumber();
Target/X86/X86FloatingPoint.cpp:  MI.dropDebugNumber();
Target/X86/X86FloatingPoint.cpp:  MI.dropDebugNumber();
Target/X86/X86FloatingPoint.cpp:  MI.dropDebugNumber();
Target/X86/X86FloatingPoint.cpp:  MI.dropDebugNumber();
Target/X86/X86FloatingPoint.cpp:  MI.dropDebugNumber();

I was mostly looking for usages for these methods to learn how I should do the required maintenance in each of our WebAssembly backend passes, but found not much. Is this because there is very little maintenance required in the optimization passes to use DBG_INSTR_REF? There are not many cases of instructions being replaced or dropped?

  1. Instruction referencing for debug info — LLVM 17.0.0git documentation says cloning is not yet supported. Is this still the case?

  2. I was told in How should I manage DBG_VALUEs when its def moves? - #7 by StephenTozer that I don’t need to move DBG_INSTR_REFs along with their defs in transformation passes. Is this true even if a DBG_INSTR_REF becomes not dominated by its def as a result of the def moving? Is this ‘not dominated’ relationship taken care of at the end, presumably in LiveDebugValues?

Thank you!

Hi,

Alas, all documentation is perpetually out of date sorry,

I was mostly looking for usages for these methods to learn how I should do the required maintenance in each of our WebAssembly backend passes, but found not much. Is this because there is very little maintenance required in the optimization passes to use DBG_INSTR_REF? There are not many cases of instructions being replaced or dropped?

The “substitute…” method is useful for scenarios where one identically formatted instruction is replaced with another – you might also look at makeDebugValueSubstitution which is a more general method, for situations where the position of operands change.

As for the frequency of use, at such a late stage in compilation I don’t believe it’s common for instructions to be radically transformed, hence there aren’t a large number of call sites where debug-info has to be maintained. Typically instruction selection picks the most appropriate kind of instruction, and then later passes might make minor modifications.

However, it’s very likely that the number of sites is incomplete: when comparing X86 using DBG_VALUEs with DBG_INSTR_REF there are some regressions of individual variable locations. However as an aggregate the total coverage is higher, and the number of sites to instrument might form a long tail, so it isn’t something I’ve explored.

dropDebugNumber is only required when an instruction is mutated to compute a different value: the default action if an instruction is deleted is for the instruction number to vanish, which will be interpreted as it having been optimised out by LiveDebugValues. Thus, we get a bit of maintenence for free there.

  1. Instruction referencing for debug info — LLVM 17.0.0git documentation says cloning is not yet supported. Is this still the case?

Hmmmm – I think it’s partially implemented, as I discovered that tail duplication would clone DBG_PHI instructions, thus giving multiple definitions for the same instruction number. This is solvable by running (another) SSA computation, in InstrRefBasedImpl::resolveDbgPHIs, it might be extendable to deal with multiple definitions at instructions as well as multiple definitions at DBG_PHIs.

  1. I was told in How should I manage DBG_VALUEs when its def moves? - #7 by StephenTozer that I don’t need to move DBG_INSTR_REFs along with their defs in transformation passes. Is this true even if a DBG_INSTR_REF becomes not dominated by its def as a result of the def moving? Is this ‘not dominated’ relationship taken care of at the end, presumably in LiveDebugValues?

That’s correct, you should be able to place a DBG_INSTR_REF anywhere in a function and then never think about it again. LiveDebugValues will work out what locations the value is available in, and select one for any DBG_INSTR_REFs that refers to it, or will produce a DBG_VALUE $noreg if the value isn’t resident at the position of the DBG_INSTR_REF. Even better, if the value becomes available sometime later in the function and the instructions are still dominated by a DBG_INSTR_REF referring to it, LiveDebugValues will produce a location for that variable, effectively allowing us to tolerate use-before-defs in debug-info.

1 Like

Thanks for the answers!

  1. Instruction referencing for debug info — LLVM 17.0.0git documentation says cloning is not yet supported. Is this still the case?

Hmmmm – I think it’s partially implemented, as I discovered that tail duplication would clone DBG_PHI instructions, thus giving multiple definitions for the same instruction number. This is solvable by running (another) SSA computation, in InstrRefBasedImpl::resolveDbgPHIs, it might be extendable to deal with multiple definitions at instructions as well as multiple definitions at DBG_PHIs.

Does it not work if we just assign a new instruction number every time we clone an instruction that already has a number? I thought that way all new instructions including DBG_PHIs would have non-overlapping numbers with the existing instructions.

Does it not work if we just assign a new instruction number every time we clone an instruction that already has a number? I thought that way all new instructions including DBG_PHI s would have non-overlapping numbers with the existing instructions.

That will only deliver a partial set of variable locations (example below). I suppose it depends on what we mean by “work”: assigning a new instruction number each time an instruction is cloned will deliver sound variable locations, so it’s a safe behaviour for optimisations, but it will not be as complete as the existing DBG_VALUE implementation. To illustrate, imagine tail duplication were to occur on the MIR-ish below:

bb1:
  $rax = someinst
  br label %join

bb2:
  $rax =  someinst
  br label %join

join:
  $rcx = add $rax, some-constant, debug-instr-number 1
  br label %later

later:
  DBG_INSTR_REF 1, 0, ...
  [more code]

This would be transformed into the following by tail duplication, I’ve retained the same instruction-number when the labelled add instruction gets cloned:

bb1:
  $rax = someinst
  $rcx = add $rax, some-constant, debug-instr-number 1
  br label %later

bb2:
  $rax = someinst
  $rcx = add $rax, some-constant, debug-instr-number 1
  br label %later

later:
  DBG_INSTR_REF 1, 0, [...]
  [more code]

Here, the PHI in the “join” block where $rax merges has effectively been eliminated and a new one installed for $rcx in %later. The value that the DBG_INSTR_REF refers to is still computed in the program, but now it’s computed by a PHI rather than an individual instruction. If, upon cloning the add instruction, we use a different instruction number then LiveDebugValues will correctly determine that neither of the values are available/resident in the “later” block, and the DBG_INSTR_REF will produce a location that’s “optimised out” – so it won’t produce incorrect locations.

The resolveDbgPHIs method I mentioned can solve this in scenarios where a DBG_PHI gets duplicated into multiple parent blocks, by re-running a bit of SSA PHI placement and discovering that the correct values merge in $rcx. It could technically be extended to handle the above, although as it’s not common on x86 I haven’t bothered yet. The existing DBG_VALUE design doesn’t have this problem as it deals with locations rather than values, there would be a DBG_VALUE $rcx where there’s a DBG_INSTR_REF. (Of course, by solving this problem easily, DBG_VALUE introduces additional problems when instructions move around, so it’s all a trade-off).

1 Like

Thank you!