[RFC] Adding a new Machine Operand type to implement variadic DBG_INSTR_REF instructions

tl;dr: We (Sony) would like to add a new type of Machine Operand to represent instruction references, with the intent of expanding DBG_INSTR_REF to support variadic debug values.

Instruction references[0] are a feature of the MIR that allows for generally more correct and more available variable locations[1] by tracking the instruction that produces a variable’s value, instead of trying to track a specific machine location (i.e. a register) throughout the MIR pipeline. Currently this feature is enabled by default for x86_64 targets, and is represented by the DBG_INSTR_REF instruction - essentially a DBG_VALUE instruction that has an instruction reference instead of the normal value operand, and without an indirect flag.

In order to further improve the quality of debug information, we intend to allow instruction references to be used in variadic debug values[2], a form of debug value in which multiple machine locations can be used to compute the implicit value of a variable (e.g. a variable whose value is the sum of two registers). This will be implemented by modifying DBG_INSTR_REF to contain more than one machine operand for its location, using the same syntax as DBG_VALUE_LIST[2]. This would include allowing locations other than instruction references to appear in a DBG_INSTR_REF, as we may want to reference immediate value types or stack locations as part of the variable’s value.

The DBG_INSTR_REF instruction currently has the following syntax[0]:

DBG_INSTR_REF 1, 0, !123, !456

The instruction reference itself is defined by the first two operands. This representation needs to be changed in order to be compatible with a list of locations, since that list can also contain immediate values. The new representation is a single operand containing two immediate values: 1, 0dbg-instr-ref(1, 0). As a demonstration of the importance of the new operand, observe the following instruction without this operand:

DBG_INSTR_REF !123, !456, 0, 1, 2, %stack, 3, 4

This form makes it ambiguous as to whether each of the immediate operands corresponds to an instruction reference or an immediate value. With the new operand, this would be represented instead as:

DBG_INSTR_REF !123, !456, dbg-instr-ref(0, 1), 2, %stack, dbg-instr-ref(3, 4)

Following this representational change, the DBG_INSTR_REF instruction would be changed to become semantically equivalent to “DBG_VALUE_LIST that may contain instruction reference operands”.

Any thoughts and comments on this change welcomed!

[0] Machine IR (MIR) Format Reference Manual — LLVM 15.0.0git documentation
[1] [llvm-dev] Call for testing -- new variable location tracking solution
[2] Source Level Debugging with LLVM — LLVM 15.0.0git documentation