[DebugInfo][RFC] Enabling "instruction referencing" variable locations for x86_64

Hi llvm-dev@,

tl;dr: Is it okay to enable value tracking (aka "instruction
referencing") to preserve debug-info variable locations, by default,
for x86_64?

The high-level summary of instruction referencing is that, after
instruction selection, instead of describing variable locations as a
virtual register, we indirectly refer to the MachineInstr operand
where the variables value is defined. Once compilation is done, some
standard SSA analysis is used to determine the locations that values
are in, and what locations contain variable values. The benefit is
improved variable location coverage because we no longer have to act
conservatively during register allocation. More in the original RFC
[0].

We've (Sony) tested this internally and it works well; Caroline kindly
tested it [1] on some Google benchmarks and it seemed to work well
there too. It's currently in-tree and can be enabled by passing
-Xclang -fexperimental-debug-variable-locations to clang, or
-experimental-debug-variable-locations in LLVM options.

Probably the biggest caveat right now is that I'd like to only enable
for x86_64: this isn't due to any design limitation, it's just where
I've been doing all the testing, and it's largely untested on most
other architectures. I've run a stage2 cross compile to aarch64 for
clang as it's another popular arch, but that's all. There's a
migration guide in [2] of how other architectures can benefit: I would
suggest that other architectures opt into instruction referencing when
they're ready.

Here's a list of "remaining work" that are things I consider
incomplete about the new implementation:
* A bunch of tests need to be updated to check for the correct
outputs in this new mode, see D113194 [3].
* Variadic variable locations aren't implemented -- this is probably
a week or two of work.
* I've got unit tests for maybe 85% of the important parts of the
"new" LiveDebugValues, but there are some gaps.
* It can be non-trivial for humans to interpret variable locations
[4] in MIR, some printing improvements are needed.

There are some downsides that can be discussed too: most significantly
that it's slower on CTMark [5]. The relevant configs there are
NewPM-ReleaseLTO-g and LegacyPM-ReleaseLTO-g, which are the only
configs that get optimisations and debug-info. Strictly speaking, this
slowdown is unavoidable because there's more information and more
accurate information being produced (one CTMark binary is almost 20%
larger due to extra debug-info), but it's still unfortunate. There are
some optimisations that can still be applied, and with instruction
referencing we can trivially compress all sequences of debug
instructions into a single instruction. Reids experiments [6] on
debug-instruction contribution to compile times (applied to IR not
MIR) suggest that could be a performance win. I should be able to
prototype this sometime soon.

Other downsides:
* Support for i686 FP registers is nonexistant; due to it being a
stack, it's extra effort to track, which I haven't bothered to do yet.
Normal DBG_VALUEs don't do particularly well either.
* In the original RFC I pointed out that we can define a debug
use-before-def as a missing location at the point of any instruction
not dominated by the def, and as a normal location at any instruction
dominated by the def. Turning on instruction referencing by default
means this interpretation is implicitly accepted.

In my opinion it's mature enough to turn on by default (for x86_64),
ideally in good time for LLVM14's branch date, and I'm confident that
the remaining work can be done by the branch date. What do other
people think?

[0] [llvm-dev] [RFC] DebugInfo: A different way of specifying variable locations post-isel
[1] [llvm-dev] Call for testing -- new variable location tracking solution
[2] βš™ D113586 [DebugInfo][NFC] Add instr-ref documentation, migration guide
[3] βš™ D113194 [DebugInfo][NFC] Prevent some tests from running in instruction-referencing mode
[4] βš™ D111317 [DebugInfo][InstrRef] Track instructions that write-to-stack after having a spill fused into them
[5] LLVM Compile-Time Tracker
[6] [llvm-dev] [RFC] Moving llvm.dbg.value out of the instruction stream

1 Like

Really broadly, I think Jeremy has done a lot of good work here, and if he thinks this is ready to enable for x86, we should go forward with it.

Previously, I have expressed concerns about this design direction because instructions do not necessarily remain in source order. However, I think the benefit of the increased variable availability probably outweighs the cost of variable values that don’t correspond to the current source location, and we should go forward with this.

Thanks for working on this, Jeremy!

Really broadly, I think Jeremy has done a lot of good work here, and if he thinks this is ready to enable for x86, we should go forward with it.

I agree with all of the above.

thanks Jeremy, for pushing this forward!
– adrian

I totally agree with this. Thank you, Jeremy, for this great work!

Djordje

Many thanks for all the support! The most likely timeline for this is
me rewriting the existing X86 tests to expect instruction referencing
(D113194) next week, then trying to land a patch changing the default.
If all goes well, there'll be a good month or two for it to soak in
before the branch date of the next release.

Reid wrote:

Previously, I have expressed concerns about this design direction because instructions do not necessarily remain in source order. However, I think the benefit of the increased variable availability probably outweighs the cost of variable values that don't correspond to the current source location, and we should go forward with this.

This is on my radar as a problem too, as an existing design issue,
where the source-location and variable-location information we track
are only lightly connected. I don't have any good ideas for solving
it, but I think we're in a better position for making educated
decisions about variable locations now.