[RFC] A value-tracking LiveDebugValues implementation

Hi debuginfo-cabal,

tl;dr: Let's please consider using a new implementation of LiveDebugValues
that produces richer information, might be slightly faster, but mostly will
support the instruction referencing and value tracking paradigm from my RFC [0]
rather than the location tracking that LiveDebugValues does today.

In that RFC, the main motivator is to treat variable locations a bit more like
SSA, having variable location metadata refer to the instructions that define a
value, and track the value moving around registers from there. This
prototype [1] is an implementation of the "track" mechanism, where we produce
value numbers for every 'def' in a function, and track them being moved around.
Variable locations are then resolved and extended by following where the value
numbers go.

This matches, to an extent, the "Register Data Flow" / RDF graph analysis
that recently moved from the Hexagon backend to CodeGen. I had a look, and
it appears RDF computes much richer information than LiveDebugValues needs
(specifically, all uses of values) and allocates for each value number, which
is quite a bit of overhead. I decided it'd be simpler to make a new
LiveDebugValues implementation, as we've had performance problems with
LiveDebugValues in the past, and we don't need to track additional metadata
for each value, only number them uniquely.

There are few independent reasons to adopt this new implementation, and I'm
self-conscious about dropping a code-bomb on people. My questions for this
email are:
* Does the new implementation (described below) make sense, and
* If so, how would I go about landing this?

We could wait until there's a complete suite of patches for the instruction
referencing work (I've got a working prototype, but not production-quality).
On the other hand, this new LiveDebugValues implementation is probably ready
for review now, works independently, and is very much the hardest part of
instruction referencing, so it'd be nice to get it under-way earlier. My
preferred outcome would be adding it as a parallel implementation to the
current LiveDebugValues, enabled by a runtime switch, then start to land
the rest of the instruction referencing work around it. This reduces the
scope for further bit-rot, and as Vedant mentioned [2] this would all need
to sit under a runtime-switch for a transition period anyway.

# Overview

Today's LiveDebugValues (explained in the file docu-comment [3]) concerns
itself with variable /locations/ only, not the values that they contain. The
implementation is very good at its job: it detects when a location is
invalidated (i.e. clobbered), control flow joins are a simple matter of "Do
the predecessors agree on the location?", and not much more work is needed.

Switching to tracking variable values is more complicated. Following the
original proposal, for this pseudo-MIR code sequence:

    $rax = somedef
    DBG_INSTR_REF reference-to-somedef
    $rbx = COPY killed $rax
    ; Variable location is $rax, then moves to $rbx

We need to track that:
a) $rax contains a value defined by the "somedef" instruction,
b) The DBG_INSTR_REF needs a location for the value somedef defines.
c) That value from "somedef" is copied to rbx,

The new LiveDebugValues implementation doesn't have DBG_INSTR_REF yet, so
I've written it to work with DBG_VALUEs too, to prove that it works just as
well as the current implementation. It deals with code sequences such as:

    $rax = somedef
    DBG_VALUE $rax
    $rbx = COPY killed $rax
    ; Variable location is $rax, then moves to $rbx

In which the same tracking has to occur, but the DBG_VALUE "reads" a value
number from $rax. As the variable locations are determined by the value number
that the variable has, it doesn't matter whether that was specified by a
DBG_VALUE or DBG_INSTR_REF. The tracking is sufficiently good to produce an
identical binary to the output of current LiveDebugValues (details below).

There are some unrelated benefits to consider: because the new implementation
tracks all locations that a value may be in, we can handle the following
problem, that the current implementation cannot:

  entry:
    $rax = somedef
    DBG_VALUE $rax
    SPILL_TO_STACK %somestackslot, $rax
  loop:
    [... some code...]
    $rax = RESTORE_FROM_STACK %somestackslot
    do_things_with_rax
    brcc loop
  exit:
    ret

Here, there's a variable spilt to the stack, that's temporarily restored
in the loop. At the end of the "loop" block, the value has two locations:
$rax and the stack slot. The current LiveDebugValues always follows the
restore from the stack: it decides the variable is on the stack at the end
of "entry", and in $rax at the end of "loop". Because these locations don't
agree on entry to "loop", the location in "loop" is dropped completely.
Having LiveDebugValues ignore or prefer the restore just changes what kind of
circumstances locations get dropped in: the underlying problem is that the
value can be in multiple places, but we can track only one of those places.

The new implementation has enough information to solve this, as it tracks all
values in all machine locations. Unfortunately it doesn't make much difference
to the location statistics for clang RelWithDebInfo builds. An additional
benefit is that we would be better placed to produce saner locations lists [4].

In terms of performance, the new implementation takes on average just as long
as the current implementation. A full stage2 build of clang may be 0.1% slower,
but it's in the noise on my machine. All the algorithms are much more localised
than the current implementation (see below), and I reckon there's still scope
for further optimisation as it's easier to make decisions such as "Don't try to
join these variable locations in this block, they're immediately re-stated by
DBG_VALUEs".

# Implementation

The new implementation strictly does more work than the old one: it tracks every
value in the function, not just those that are associated with a variable
location. It also has to detect places where control flow merges and produces
PHI values, where the old implementation could just check that the live-out
variable location in predecessors agreed. To address this, I've separated the
implementation into two smaller problems, that can be solved almost
independently and with much greater locality.

Both of the problems below are sort-of-like SSA construction (they have a set
of defs, and a large set of uses), but instead of calculating new PHI
placements, we have to infer where PHIs _were_ placed.

Problem 1: Machine value numbering. Every def in the function is given a
number, which is then condensed into a transfer function for each block,
specifying which registers (or spill locs) contain which values. It's
genuinely just a map of machine location to value. Using the same sort of
dataflow model that LiveDebugValues currently uses, values are propagated to
compute the live-ins and live-out values of each block, with one significant
modification: we assign a new value number when control flow merges different
values, essentially re-computing PHI nodes. This leads to some ungraceful
code: see "Ugly PHI stuff" below. This problem is kept localised by only
considering the registers and spill slots that are accessed, making the
problem size proportionate to the product of the max working set of the
function and the number of blocks. The number of variables used is irrelevant.

Problem 2: Variable value assignment. By using the solution for the first
problem, of "what value number does each machine location contain" at each
program point, every DBG_VALUE can be interpreted as assigning a value to a
variable, not a location. This gets turned into a transfer function too, and
we can then perform another dataflow analysis, propagating what _value_ a
variable has in a block, and computing the set of live-in variable values for
each block. A specific machine location can be picked after this problem is
solved, possibly from several options if multiple locations have the same
value. This isn't a completely clean abstraction: predecessors with different
variable values that are in the same location should be propagated, but the
new implementation has to find and validate a PHI value number that represents
those merged values (see "More ugly PHI stuff" below).

I'm keeping problem 2 localised by propagating variable values for one lexical
scope at a time, and only over the blocks that are in that scope. As well as
localising by turning the main problem into a large number of small problems,
this reduces the amount of work too. The current implementation filters out
variable locations that are out of lexical scope every time a block is
processed, which now no longer needs to happen. Variables in the outermost
scopes won't be re-processed when LiveDebugValues goes around an inner-scope
loop several times; likewise variables in inner scopes won't get re-analysed
when an outer scope variable needs another pass through the function. This also
means that all lexical scopes that cover a single block (or zero) require no
dataflow analysis, which should attenuate the effects of heavy inlining a bit.

The solution to these two problems (block live-in variable values and block
live-in machine location values) can then be combined in a final step through
the function, producing DBG_VALUEs for live-ins and DBG_VALUEs for any
location transfers within a block.

# Testing and validation

The new LiveDebugValues implementation produces an identical AMD64 clang-3.4
binary to the old implementation, based on d481e59863a (Feb 2020). Well,
almost -- it proved too difficult to completely emulate the old version,
instead with the following changes [5] applied to d481e59863a, the two
implementations produce an identical binary:
* Applied D67500 [6],
* Ignore identity copies,
* Force any newly created DBG_VALUEs to be inserted according to the order
   in which DebugVariables appear in the input function. So that they're
   consistent between the two implementations.
* Disabling all tracking of stack spills.

Ignoring the final point, I think this would demonstrate that the two
implementations are equivalent. Stack spills are tricky: because value
transfers within blocks are left until the final step of the new
implementation, it's hard to work out in advance when old LiveDebugValues
would have transferred a location. It was easier to just compare binaries
where spills hadn't been followed.

IMO, the non-spill (the vast majority) locations being identical should be
sufficient to show that the overall algorithm works, and dealing with spills
is a minor extension.

The tests pass, except for the ones covered by the limitations section, and
a few DBG_VALUE-change-order issues.

# Limitations and unimplemented features

There are a couple of things that I haven't gotten around to implementing
yet, for which the tests fail:

Overlapping fragments: this is trivial to do, but at the back of my job queue,

Entry values: much more interesting. As Djordje has pointed out elsewhere,
LiveDebugValues has been assuming that every DBG_VALUE assigns a new value
to a variable [7], which means if a DBG_VALUE re-states a variable location
when it doesn't have to, we risk dropping an entry value. This will be much
easier in this new model (I haven't had the time to do it yet), because the
machine value numbering means that each entry value has a specific number,
and we can easily identify when it's no longer available. This should just
be a matter of the in-block transfers tracking code identifying that a
variable should have the entry value; it not being any machine location
currently; and issuing an appropriate DBG_VALUE.

# Ugly PHI stuff

Above, I said that this problem is a lot closer to SSA construction than other
classic compiler problems. However, it's actually even weirder: instead of
having a set of uses and defs and computing the locations of PHIs: every
instruction with a DebugLoc in lexical scope is a use, and the PHIs are already
placed just not recorded. Thus the problems above are more about recovering
the locations of PHIs after they're eliminated.

This adds a lot of complexity to the lattice structure used for each variable
location compared to the current LiveDebugValues: there used to only ever be
one candidate location for each variable at every control flow merge, and over
the course of dataflow analysis this was found to be either true or false.
However because we're iteratively discovering new PHI locations as we explore,
the new implementation's lattice consists of:
* A single candidate def (\top),
* A PHI value at every control flow merge on the path between between the def
   and the current block,
* A PHI value at the current block (\bot).
Each of which potentially needs one iteration of dataflow to see whether the
join should produce that value.

This only truly needs exploring at loop heads, and the maximum number of
iterations required is the depth of loops in the function, which is the same as
the old LiveDebugValues implementation. However because all the PHI values
have to be discovered, that number of iterations is required more often than
the old implementation.

I wrote more about the lattice in the docu-comment in the new implementation
[8]. I can produce a worked example too, but this email has gotten a bit
wordy already.

# More ugly PHI stuff

The section above holds true for the variable values analysis too, where
instead of defs we have variable assignments at DBG_VALUEs. Rather than
discovering where PHIs happen and propagating that information, variable
value analysis has to do the same to discover what machine-value PHI would be
needed to merge a variable location from all predecessors, and then examine
whether such a PHI is actually available, and has the right inputs. (This is
annoyingly fiddly).

# Summary

This new implementation goes fast enough, is accurate enough, and appears
to be sound. That isn't a reason to replace the current implementation; but
eventually the instruction reference work would make use of this, so if
possible I'd like to start review on it, if that's acceptable to people.

Seeing how you've just read a lot of words, a motivating reminder: this path
can lead, slowly, to there being no debug instructions for variable locations!
And as described in my RFC sitrep-email, it can be more accurate than
location only tracking too.

[0] http://lists.llvm.org/pipermail/llvm-dev/2020-February/139440.html
[1] Branch is here:
Commits · jmorse/llvm-project · GitHub , tip
is 11450def5bc592321ece2a64ea2a8cabbc614c39 at time of sending. I
could open a PR containing the changes, but github mucks them up; best
to just look at the bare file [8].
[2] [llvm-dev] [RFC] DebugInfo: A different way of specifying variable locations post-isel
[3] llvm-project/LiveDebugValues.cpp at 3626eba11f2367da88ccb0280a34731243278034 · llvm/llvm-project · GitHub
[4] 42834 – Parameter location list unnecessarily complex when value available in multiple regs
[5] Re-juggle defs and copy processing · jmorse/llvm-project@720876f · GitHub
four commits from here
[6] ⚙ D67500 [DebugInfo] LiveDebugValues: don't create transfer records for potentially invalid locations , I probably should push to land this
[7] It's not a bad view, and definitely something I've been preaching,
[8] llvm-project/LiveDebugValues.cpp at 11450def5bc592321ece2a64ea2a8cabbc614c39 · jmorse/llvm-project · GitHub

Jeremy,

thank you, that is quite a lot of work you put in there! Generally, speaking there is nothing wrong with replacing the current LiveDebugValues pass with a new implementation if the new implementation solves all the same problems, and is either simpler or faster, or more accurate. We just need to be sure to judge the "simpler" aspect only after all the ugly edge cases have been addressed.
The LLVM way of staging is is to land the new implementation in trunk, and when it's ready, make it the default, and then remove the old one. Unfortunately we have a track record of getting stuck after step 1, because it's always difficult to completely replace a matured implementation of anything.

I agree that tracking values instead of instructions is a very elegant way of dealing with the output of regalloc and spilling and that it would seem to have potential for longer live ranges for the individual debug values. I've skimmed your implementation and it looks nice and well-documented.

I had two thoughts while read while reading the proposal, but reflecting on it, I fell like these are not necessarily just problems with your proposal, but also problems with current implementation. I'm going to include them here because I already typed them out, but honestly they shouldn't affect our decision :slight_smile:

1. I was wondering about potential downsides of tracking values. I noticed that a surprising amount of developers like to modify variables in the debugger. DWARF allows this when you have memory or register locations, as opposed to (DW_OP) stack values. If we wanted to become better at this, tracking values sounds potentially dangerous, since we can't know if the same value in a different location has any correspondence with the variable any more. I suppose that even with the new algorithm we could insert a hypothetical DBG_DECLARE at the definition and a DBG_VALUE when the register is moved. I guess there is no theoretical limit to supporting this with the value tracking approach.

2. When tracking values, do we keep track of the original DBG_VALUE's !dbg location to know when we need to stop propagating? I was worried that with good value tracking we might carry a stale value forward too far. But writing down the example made me realize that this is more an issue for HowToUpdateDebugInfo.rst than one just local toLiveDebugValues.

If we have the following unoptimized program, with !dbg !1 ~ line 1, etc:

%x = insn1, !dbg !1
DBG_VALUE "v", %x, !dbg !1
insn2, !dbg !2
%y = insn3, !dbg !3
DBG_VALUE "v", %y, !dbg !3
insn4, !dbg !4
%z = insn5, !dbg !5
DBG_VALUE "v", %z, !dbg !5

and it gets optimized to

%y = insn3, !dbg !3
DBG_VALUE "v", %y, !dbg !3
%x = insn1, !dbg !1
DBG_VALUE "v", %x, !dbg !1
insn2, !dbg !2
insn4, !dbg !4 ; now has wrong DBG_VALUE for "v"
%z = insn5, !dbg !5
DBG_VALUE "v", %z, !dbg !5

then insn4 has a stale value for the variable "v", because at location !dbg !4 it should be %y and not %x. We can require all passes to insert a DBG_VALUE undef after moving instructions, or we could start giving debug intrinsics a range of !dbg locations, such as DBG_VALUE "v", %x, [!dbg !1, dbg !3) and define an ordering of !dbg locations.

If we had such a facility: would it be possible to integrate this into the new pass implementation?

-- adrian

Hi Adrian,

quite a lot of work

Large amounts of credit should go to llvm-reduce, which was fundamental to
getting any of this done,

I've skimmed your implementation and it looks nice and well-documented.

The thing that worries me is the over-complicated lattices -- I didn't
anticipate the problem, and there's a risk that it's more complex than it
needs to be. As it sounds like getting this reviewed and landed is feasible,
I'll write about that (and a worked example) in the review.

I was wondering about potential downsides of tracking values. I noticed that
a surprising amount of developers like to modify variables in the debugger
[...]

This is really interesting, and as you say it's something that is already
happening today. Consider this:

  #include <stdbool.h>
  extern void ext(int);
  extern bool extb(int, int);
  void foo(int bar, int baz) {
    ext(bar);
    bool cont = true;
    while (cont) {
      cont = extb(bar, baz);
    }
  }

Compiled with a recent master -O2 -g and main() / other functions in another
file, we get this disassembly and location list for 'bar':

   0x0000000000400483 <+3>: mov %esi,%ebx
   0x0000000000400485 <+5>: mov %edi,%ebp
=> 0x0000000000400487 <+7>: callq 0x4004d0 <ext>

DW_AT_location (0x00000000:
  [0x0000000000400480, 0x0000000000400487): DW_OP_reg5 RDI
  [0x0000000000400487, 0x00000000004004a3): DW_OP_reg6 RBP)

Stepping through the function, we stop on the call to 'ext', and can set the
value of 'bar', but because there are two locations (and LiveDebugValues picked
the long term register $ebp rather than $edi), you can set "bar" but it has
no effect on the call to 'ext'.

AFAIUI, this is never a problem at -O0 because there's only ever one location
for a variable. With optimisations, and without a way to describe multiple
locations for a variable in DWARF, I don't think it's generally solvable.
Plus, if you had two variables that have been CSE'd into being the same value,
setting one will modify the other. These are all consequences of optimisations
changing the structure of the program.

The best guarantee that I imagine we could give, is that for any given block,
the variable location in that block is the location that instructions read
from. That way, if you modify the variable in a block, it's very likely that
the next few statements will read that modified variable. It's eminently
do-able with the new implementation, as location selection is the last thing
that happens and is done on a per-block basis, although it wouldn't be free
(in terms of performance cost). I think the "value is only read from one
location" idea is true of SelectionDAG, but it might fall apart with other
optimisations.

When tracking values, do we keep track of the original DBG_VALUE's !dbg
location to know when we need to stop propagating? [...] insn4 has a stale
value for the variable "v"

I don't believe there's any relationship between DBG_VALUE locations and !dbg
source locations right now. This is actually one of my pet peeves, that
variable locations and the line program don't necessarily line up into
something coherent. It's definitely something that leads to misleading program
states being presented today; there are at least two bugs.llvm.org reports that
are caused by such problems. [Can't find them right now as bugzilla is
throwing errors]. Defining an order on !dbg locations, and building DBG_VALUEs
into that order, would be really useful for ensuring correctness

If we had such a facility: would it be possible to integrate this into the
new pass implementation?

In the simple case, easily: in your example, there would just be an extra
layer of mapping from source location to DBG_VALUE; and each DBG_VALUE would
have a well defined value number. We could even have insn4 set the variable
location to %y's value if it's still available somewhere.

If variable locations are defined by source location however, this undermines
the meaning of how LiveDebugValues determines variable locations when control
flow merges: there wouldn't be a simple "variable location in predecessor
block" any more. And the merged "live in" location wouldn't necessarily mean
anything to the source locations in the block.

Stepping further back, we'd also lose one of the freebies that the current
design gives us: when variable values merge at a PHI node, but there's no
actual PHI instruction in the IR (because there's no subsequent IR use), we
don't create a "dbg.phi(...)" instruction, we just ignore it and let
LiveDebugValues patch it up later, if a location can be recovered. If every
source location needed to be connected to a variable location record, we would
need to represent such debug-only PHIs much earlier in compilation
(or drop them).

On the other hand, what you're describing (plus the instruction referencing
work) is something that doesn't require debug instructions in the IR or MIR,
which is highly desirable. We could just store a set of
{instruction references, variable / fragment / expr, source locations} and
build a location list out of that. Definitely worth pursuing, and value
tracking would definitely enable such designs.

While, I don't want to give too much emphasis on being able to modify
variables in an optimized program (unoptimized programs are a different
story though), I feel obligated to jump in to say that DWARF already
does supports this scenario (Section 2.6.2 of DWARF5, page 43):
"""
The address ranges defined by the bounded location descriptions of a
location list may overlap. When they do, they describe a situation in
which an object exists simultaneously in more than one place.
"""

Now, I don't know of any debugger which actually does this, but in
theory a debugger could go and update all locations of a variable for a
given address instead of just the first one.

Changing only one of the CSE'd variables is obviously impossible, but
that is also something that could be mitigated by a very careful
debugger -- by checking whether any other variable happens to refer to
the same location and warning or aborting the modification.

cheers,
pavel

> Hi Adrian,
>
>> quite a lot of work
>
> Large amounts of credit should go to llvm-reduce, which was fundamental to
> getting any of this done,
>
>> I've skimmed your implementation and it looks nice and well-documented.
>
> The thing that worries me is the over-complicated lattices -- I didn't
> anticipate the problem, and there's a risk that it's more complex than it
> needs to be. As it sounds like getting this reviewed and landed is feasible,
> I'll write about that (and a worked example) in the review.
>
>> I was wondering about potential downsides of tracking values. I noticed that
>> a surprising amount of developers like to modify variables in the debugger
>> [...]
>
> This is really interesting, and as you say it's something that is already
> happening today. Consider this:
>
> #include <stdbool.h>
> extern void ext(int);
> extern bool extb(int, int);
> void foo(int bar, int baz) {
> ext(bar);
> bool cont = true;
> while (cont) {
> cont = extb(bar, baz);
> }
> }
>
> Compiled with a recent master -O2 -g and main() / other functions in another
> file, we get this disassembly and location list for 'bar':
>
> 0x0000000000400483 <+3>: mov %esi,%ebx
> 0x0000000000400485 <+5>: mov %edi,%ebp
> => 0x0000000000400487 <+7>: callq 0x4004d0 <ext>
>
> DW_AT_location (0x00000000:
> [0x0000000000400480, 0x0000000000400487): DW_OP_reg5 RDI
> [0x0000000000400487, 0x00000000004004a3): DW_OP_reg6 RBP)
>
> Stepping through the function, we stop on the call to 'ext', and can set the
> value of 'bar', but because there are two locations (and LiveDebugValues picked
> the long term register $ebp rather than $edi), you can set "bar" but it has
> no effect on the call to 'ext'.
>
> AFAIUI, this is never a problem at -O0 because there's only ever one location
> for a variable. With optimisations, and without a way to describe multiple
> locations for a variable in DWARF, I don't think it's generally solvable.

While, I don't want to give too much emphasis on being able to modify
variables in an optimized program (unoptimized programs are a different
story though), I feel obligated to jump in to say that DWARF already
does supports this scenario (Section 2.6.2 of DWARF5, page 43):
"""
The address ranges defined by the bounded location descriptions of a
location list may overlap. When they do, they describe a situation in
which an object exists simultaneously in more than one place.
"""

Yup +1

Now, I don't know of any debugger which actually does this, but in
theory a debugger could go and update all locations of a variable for a
given address instead of just the first one.

Changing only one of the CSE'd variables is obviously impossible, but
that is also something that could be mitigated by a very careful
debugger -- by checking whether any other variable happens to refer to
the same location and warning or aborting the modification.

That'd be tricky to do in practice & probably involve more than
checking other location descriptions - possible that the compiler
shared storage in some way (maybe not with another variable, maybe
with the return value or a parameter value) & modifying that storage
would produce a different return value/other changes in program
behavior that were unintended.

I'd thought DWARF had some, at least theoretical (like teh
simultaneous location support), mechanism to describe storage a
consumer could realistically mutate & get something like the
observable behavior as distinct from just the immutable value. But at
a glance I don't find any wording like that. In the absence of that,
yeah, it's basically "if you use a debugger to mutate state in an
optimized program... don't bet on the behavior being in any way
reliable/sensible/consistent".

- Dave

Thanks for sharing this RFC, Jeremy.

Hi debuginfo-cabal,

tl;dr: Let’s please consider using a new implementation of LiveDebugValues
that produces richer information, might be slightly faster, but mostly will
support the instruction referencing and value tracking paradigm from my RFC [0]
rather than the location tracking that LiveDebugValues does today.

In that RFC, the main motivator is to treat variable locations a bit more like
SSA, having variable location metadata refer to the instructions that define a
value, and track the value moving around registers from there. This
prototype [1] is an implementation of the “track” mechanism, where we produce
value numbers for every ‘def’ in a function, and track them being moved around.
Variable locations are then resolved and extended by following where the value
numbers go.

Your earlier RFC sketches a plan for extending LiveDebugValues to handle instruction-referencing DBG_VALUEs. From http://lists.llvm.org/pipermail/llvm-dev/2020-February/139440.html

How then do we translate this new kind of machine location into
DWARF/CodeView variable locations, which need to know which register
to look in? The answer is: LiveDebugValues [3]. We already perform a
dataflow analysis in LiveDebugValues of where values “go” after
they’re defined: we can use that to take “defining instructions” and
determine register / stack locations. We would need to track values
from the defining instruction up to the DBG_VALUE where that value
becomes a variable location, after which it’s no different from the
LiveDebugValues analysis that we perform today.

I had interpreted this as meaning: introduce transfer functions to walk from the referenced instruction until we reach the referencing DBG_VALUE, then create a VarLoc. IIUC, at that point the unmodified LiveDebugValues algorithm could proceed as usual.

Reading this RFC, I get the sense that that has several drawbacks: it doesn’t handle situations where the referenced instruction is in a different basic block than its referencing DBG_VALUE (all the ‘ugly phi stuff’ is missing!), and it doesn’t track all the locations for a variable. Is that correct?

This matches, to an extent, the “Register Data Flow” / RDF graph analysis
that recently moved from the Hexagon backend to CodeGen. I had a look, and
it appears RDF computes much richer information than LiveDebugValues needs
(specifically, all uses of values) and allocates for each value number, which
is quite a bit of overhead. I decided it’d be simpler to make a new
LiveDebugValues implementation, as we’ve had performance problems with
LiveDebugValues in the past, and we don’t need to track additional metadata
for each value, only number them uniquely.

There are few independent reasons to adopt this new implementation, and I’m
self-conscious about dropping a code-bomb on people. My questions for this
email are:

  • Does the new implementation (described below) make sense, and

I think your results speak for themselves :).

  • If so, how would I go about landing this?

One thing I’m not yet clear on: does your prototype essentially implement the minimal extension to LiveDebugValues needed to handle instruction-referencing DBG_VALUEs? If not, should it?, or should we take the opportunity to remove limitations from the current LiveDebugValues algorithm?

As for how to land this, I propose:

  • Move LiveDebugValues.cpp to lib/CodeGen/LiveDebugValues/VarLocBasedImpl.cpp
  • Split out common transfer functions and other helpers into lib/CodeGen/LiveDebugValues/Common.{h,cpp}
  • Finally, add lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.cpp
  • Add a cl::opt like -instr-ref-dbg-values to allow selecting a DBG_VALUE flavor

We could wait until there’s a complete suite of patches for the instruction
referencing work (I’ve got a working prototype, but not production-quality).

On the other hand, this new LiveDebugValues implementation is probably ready
for review now, works independently, and is very much the hardest part of
instruction referencing, so it’d be nice to get it under-way earlier. My
preferred outcome would be adding it as a parallel implementation to the
current LiveDebugValues, enabled by a runtime switch, then start to land
the rest of the instruction referencing work around it. This reduces the
scope for further bit-rot, and as Vedant mentioned [2] this would all need
to sit under a runtime-switch for a transition period anyway.

Yes, I’m in favor of landing work related to instruction-referencing DBG_VALUEs in small-ish independent patches, as they become ready. I’d very much like to keep the unmodified var-loc DBG_VALUE code paths intact and separated.

Overview

Today’s LiveDebugValues (explained in the file docu-comment [3]) concerns
itself with variable /locations/ only, not the values that they contain. The
implementation is very good at its job: it detects when a location is
invalidated (i.e. clobbered), control flow joins are a simple matter of “Do
the predecessors agree on the location?”, and not much more work is needed.

Switching to tracking variable values is more complicated. Following the
original proposal, for this pseudo-MIR code sequence:

$rax = somedef
DBG_INSTR_REF reference-to-somedef
$rbx = COPY killed $rax
; Variable location is $rax, then moves to $rbx

We need to track that:
a) $rax contains a value defined by the “somedef” instruction,
b) The DBG_INSTR_REF needs a location for the value somedef defines.
c) That value from “somedef” is copied to rbx,

The new LiveDebugValues implementation doesn’t have DBG_INSTR_REF yet, so
I’ve written it to work with DBG_VALUEs too, to prove that it works just as
well as the current implementation. It deals with code sequences such as:

$rax = somedef
DBG_VALUE $rax
$rbx = COPY killed $rax
; Variable location is $rax, then moves to $rbx

In which the same tracking has to occur, but the DBG_VALUE “reads” a value
number from $rax. As the variable locations are determined by the value number
that the variable has, it doesn’t matter whether that was specified by a
DBG_VALUE or DBG_INSTR_REF. The tracking is sufficiently good to produce an
identical binary to the output of current LiveDebugValues (details below).

There are some unrelated benefits to consider: because the new implementation
tracks all locations that a value may be in, we can handle the following
problem, that the current implementation cannot:

entry:
$rax = somedef
DBG_VALUE $rax
SPILL_TO_STACK %somestackslot, $rax
loop:
[… some code…]
$rax = RESTORE_FROM_STACK %somestackslot
do_things_with_rax
brcc loop
exit:
ret

Here, there’s a variable spilt to the stack, that’s temporarily restored
in the loop. At the end of the “loop” block, the value has two locations:
$rax and the stack slot. The current LiveDebugValues always follows the
restore from the stack: it decides the variable is on the stack at the end
of “entry”, and in $rax at the end of “loop”. Because these locations don’t
agree on entry to “loop”, the location in “loop” is dropped completely.
Having LiveDebugValues ignore or prefer the restore just changes what kind of
circumstances locations get dropped in: the underlying problem is that the
value can be in multiple places, but we can track only one of those places.

The new implementation has enough information to solve this, as it tracks all
values in all machine locations. Unfortunately it doesn’t make much difference
to the location statistics for clang RelWithDebInfo builds. An additional
benefit is that we would be better placed to produce saner locations lists [4].

In terms of performance, the new implementation takes on average just as long
as the current implementation. A full stage2 build of clang may be 0.1% slower,
but it’s in the noise on my machine. All the algorithms are much more localised
than the current implementation (see below), and I reckon there’s still scope
for further optimisation as it’s easier to make decisions such as “Don’t try to
join these variable locations in this block, they’re immediately re-stated by
DBG_VALUEs”.

Implementation

The new implementation strictly does more work than the old one: it tracks every
value in the function, not just those that are associated with a variable
location. It also has to detect places where control flow merges and produces
PHI values, where the old implementation could just check that the live-out
variable location in predecessors agreed. To address this, I’ve separated the
implementation into two smaller problems, that can be solved almost
independently and with much greater locality.

Both of the problems below are sort-of-like SSA construction (they have a set
of defs, and a large set of uses), but instead of calculating new PHI
placements, we have to infer where PHIs were placed.

Problem 1: Machine value numbering. Every def in the function is given a
number, which is then condensed into a transfer function for each block,
specifying which registers (or spill locs) contain which values. It’s
genuinely just a map of machine location to value. Using the same sort of
dataflow model that LiveDebugValues currently uses, values are propagated to
compute the live-ins and live-out values of each block, with one significant
modification: we assign a new value number when control flow merges different
values, essentially re-computing PHI nodes. This leads to some ungraceful
code: see “Ugly PHI stuff” below. This problem is kept localised by only
considering the registers and spill slots that are accessed, making the
problem size proportionate to the product of the max working set of the
function and the number of blocks. The number of variables used is irrelevant.

Problem 2: Variable value assignment. By using the solution for the first
problem, of “what value number does each machine location contain” at each
program point, every DBG_VALUE can be interpreted as assigning a value to a
variable, not a location. This gets turned into a transfer function too, and
we can then perform another dataflow analysis, propagating what value a
variable has in a block, and computing the set of live-in variable values for
each block. A specific machine location can be picked after this problem is
solved, possibly from several options if multiple locations have the same
value. This isn’t a completely clean abstraction: predecessors with different
variable values that are in the same location should be propagated, but the
new implementation has to find and validate a PHI value number that represents
those merged values (see “More ugly PHI stuff” below).

I’m keeping problem 2 localised by propagating variable values for one lexical
scope at a time, and only over the blocks that are in that scope. As well as
localising by turning the main problem into a large number of small problems,
this reduces the amount of work too. The current implementation filters out
variable locations that are out of lexical scope every time a block is
processed, which now no longer needs to happen. Variables in the outermost
scopes won’t be re-processed when LiveDebugValues goes around an inner-scope
loop several times; likewise variables in inner scopes won’t get re-analysed
when an outer scope variable needs another pass through the function. This also
means that all lexical scopes that cover a single block (or zero) require no
dataflow analysis, which should attenuate the effects of heavy inlining a bit.

The solution to these two problems (block live-in variable values and block
live-in machine location values) can then be combined in a final step through
the function, producing DBG_VALUEs for live-ins and DBG_VALUEs for any
location transfers within a block.

Testing and validation

The new LiveDebugValues implementation produces an identical AMD64 clang-3.4
binary to the old implementation, based on d481e59863a (Feb 2020). Well,
almost – it proved too difficult to completely emulate the old version,
instead with the following changes [5] applied to d481e59863a, the two
implementations produce an identical binary:

  • Applied D67500 [6],
  • Ignore identity copies,
  • Force any newly created DBG_VALUEs to be inserted according to the order
    in which DebugVariables appear in the input function. So that they’re
    consistent between the two implementations.
  • Disabling all tracking of stack spills.

Ignoring the final point, I think this would demonstrate that the two
implementations are equivalent. Stack spills are tricky: because value
transfers within blocks are left until the final step of the new
implementation, it’s hard to work out in advance when old LiveDebugValues
would have transferred a location. It was easier to just compare binaries
where spills hadn’t been followed.

IMO, the non-spill (the vast majority) locations being identical should be
sufficient to show that the overall algorithm works, and dealing with spills
is a minor extension.

The tests pass, except for the ones covered by the limitations section, and
a few DBG_VALUE-change-order issues.

Limitations and unimplemented features

There are a couple of things that I haven’t gotten around to implementing
yet, for which the tests fail:

Overlapping fragments: this is trivial to do, but at the back of my job queue,

Entry values: much more interesting. As Djordje has pointed out elsewhere,
LiveDebugValues has been assuming that every DBG_VALUE assigns a new value
to a variable [7], which means if a DBG_VALUE re-states a variable location
when it doesn’t have to, we risk dropping an entry value. This will be much
easier in this new model (I haven’t had the time to do it yet), because the
machine value numbering means that each entry value has a specific number,
and we can easily identify when it’s no longer available. This should just
be a matter of the in-block transfers tracking code identifying that a
variable should have the entry value; it not being any machine location
currently; and issuing an appropriate DBG_VALUE.

Ugly PHI stuff

Above, I said that this problem is a lot closer to SSA construction than other
classic compiler problems. However, it’s actually even weirder: instead of
having a set of uses and defs and computing the locations of PHIs: every
instruction with a DebugLoc in lexical scope is a use, and the PHIs are already
placed just not recorded. Thus the problems above are more about recovering
the locations of PHIs after they’re eliminated.

This adds a lot of complexity to the lattice structure used for each variable
location compared to the current LiveDebugValues: there used to only ever be
one candidate location for each variable at every control flow merge, and over
the course of dataflow analysis this was found to be either true or false.
However because we’re iteratively discovering new PHI locations as we explore,
the new implementation’s lattice consists of:

  • A single candidate def (\top),
  • A PHI value at every control flow merge on the path between between the def
    and the current block,
  • A PHI value at the current block (\bot).
    Each of which potentially needs one iteration of dataflow to see whether the
    join should produce that value.

This only truly needs exploring at loop heads, and the maximum number of
iterations required is the depth of loops in the function, which is the same as
the old LiveDebugValues implementation. However because all the PHI values
have to be discovered, that number of iterations is required more often than
the old implementation.

I wrote more about the lattice in the docu-comment in the new implementation
[8]. I can produce a worked example too, but this email has gotten a bit
wordy already.

More ugly PHI stuff

The section above holds true for the variable values analysis too, where
instead of defs we have variable assignments at DBG_VALUEs. Rather than
discovering where PHIs happen and propagating that information, variable
value analysis has to do the same to discover what machine-value PHI would be
needed to merge a variable location from all predecessors, and then examine
whether such a PHI is actually available, and has the right inputs. (This is
annoyingly fiddly).

Summary

This new implementation goes fast enough, is accurate enough, and appears
to be sound. That isn’t a reason to replace the current implementation; but
eventually the instruction reference work would make use of this, so if
possible I’d like to start review on it, if that’s acceptable to people.

Seeing how you’ve just read a lot of words, a motivating reminder: this path
can lead, slowly, to there being no debug instructions for variable locations!
And as described in my RFC sitrep-email, it can be more accurate than
location only tracking too.

[0] http://lists.llvm.org/pipermail/llvm-dev/2020-February/139440.html
[1] Branch is here:
https://github.com/jmorse/llvm-project/commits/ldvars-separation , tip
is 11450def5bc592321ece2a64ea2a8cabbc614c39 at time of sending. I
could open a PR containing the changes, but github mucks them up; best
to just look at the bare file [8].
[2] http://lists.llvm.org/pipermail/llvm-dev/2020-February/139469.html
[3] https://github.com/llvm/llvm-project/blob/3626eba11f2367da88ccb0280a34731243278034/llvm/lib/CodeGen/LiveDebugValues.cpp#L9
[4] https://bugs.llvm.org/show_bug.cgi?id=42834
[5] https://github.com/jmorse/llvm-project/commit/720876fec2a5113d52977d00e0e1f7d526b4c1f0
four commits from here
[6] https://reviews.llvm.org/D67500 , I probably should push to land this
[7] It’s not a bad view, and definitely something I’ve been preaching,
[8] https://github.com/jmorse/llvm-project/blob/11450def5bc592321ece2a64ea2a8cabbc614c39/llvm/lib/CodeGen/LiveDebugValues.cpp#L60


Thanks for reading this essay,
Jeremy

vedant

Hi Vedant,

Your earlier RFC sketches a plan for extending LiveDebugValues to handle instruction-referencing DBG_VALUEs. From http://lists.llvm.org/pipermail/llvm-dev/2020-February/139440.html --

How then do we translate this new kind of machine location into
DWARF/CodeView variable locations, which need to know which register
to look in? The answer is: LiveDebugValues [3]. We already perform a
dataflow analysis in LiveDebugValues of where values "go" after
they're defined: we can use that to take "defining instructions" and
determine register / stack locations. We would need to track values
from the defining instruction up to the DBG_VALUE where that value
becomes a variable location, after which it's no different from the
LiveDebugValues analysis that we perform today.

I had interpreted this as meaning: introduce transfer functions to walk from the referenced instruction until we reach the referencing DBG_VALUE, then create a VarLoc. IIUC, at that point the unmodified LiveDebugValues algorithm could proceed as usual.

I think that was my meaning at the time; and it's still broadly true.
Unfortunately, it turns out that there can be pretty much an arbitrary
computation between the referenced instruction and the referencing
DBG_INSTR_REF. I ran across several scenarios where a referenced
instruction gets CSE'd / PRE'd / hoisted away from the referencing
DBG_INSTR_REF, to the extent that it's on the other side of a loop.
Once it's there, a dataflow algorithm is needed to work out whether
the value is live through the loop, or whether it gets clobbered
somewhere.

That's what the "machine value number" problem is solving. It could
still plug into the VarLoc implementation if that makes an easier
transition, where a DBG_INSTR_REF is decomposed into a DBG_VALUE when
an instruction reference / value number / location are matched up. It
would make debug use-before-defs harder to deal with, though.

(It's worth noting at this point that the machine value number work
is, in a sense, doing LiveDebugVariables' job, by linking up a
register def with a use. Just, LiveDebugVariables has the register
allocator to help it do it).

Reading this RFC, I get the sense that that has several drawbacks: it doesn’t handle situations where the referenced instruction is in a different basic block than its referencing DBG_VALUE (all the ‘ugly phi stuff’ is missing!), and it doesn’t track all the locations for a variable. Is that correct?

This might be my overly-defensive writing style working against me. I
believe those problems are solved, but the solutions are more
complicated than I wanted. For:

it doesn’t handle situations where the referenced instruction is in a different basic block than its referencing DBG_VALUE

This is solved by the machine value number dataflow; but as mentioned
requires a more complex lattice to infer where PHI values are
generated,

and it doesn’t track all the locations for a variable

I believe the implementation correctly picks the _value_ for every
variable in each block, and picking a location is a final
post-processing stage.

One thing I’m not yet clear on: does your prototype essentially implement the minimal extension to LiveDebugValues needed to handle instruction-referencing DBG_VALUEs? If not, should it?, or should we take the opportunity to remove limitations from the current LiveDebugValues algorithm?

I have the code for this hanging around, but not in the tree I linked
to. It establishes a fairly trivial mapping between:
* Instructions that are referred to, and
* The machine value numbers that instruction produces.
I'd suggest that it's not useful without DBG_INSTR_REFs as we wouldn't
be able to test it., and testing it means bringing in patches to
generate or parse DBG_INSTR_REFs, which I'd rather split up.

As for how to land this, I propose:

- Move LiveDebugValues.cpp to lib/CodeGen/LiveDebugValues/VarLocBasedImpl.cpp
- Split out common transfer functions and other helpers into lib/CodeGen/LiveDebugValues/Common.{h,cpp}
- Finally, add lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.cpp
- Add a cl::opt like -instr-ref-dbg-values to allow selecting a DBG_VALUE flavor

Sounds good to me,

(… trying to add back CC’s that were unintentionally dropped in my earlier mail — apologies.)

Hi Vedant,

Your earlier RFC sketches a plan for extending LiveDebugValues to handle instruction-referencing DBG_VALUEs. From http://lists.llvm.org/pipermail/llvm-dev/2020-February/139440.html --

How then do we translate this new kind of machine location into
DWARF/CodeView variable locations, which need to know which register
to look in? The answer is: LiveDebugValues [3]. We already perform a
dataflow analysis in LiveDebugValues of where values "go" after
they're defined: we can use that to take "defining instructions" and
determine register / stack locations. We would need to track values
from the defining instruction up to the DBG_VALUE where that value
becomes a variable location, after which it's no different from the
LiveDebugValues analysis that we perform today.

I had interpreted this as meaning: introduce transfer functions to walk from the referenced instruction until we reach the referencing DBG_VALUE, then create a VarLoc. IIUC, at that point the unmodified LiveDebugValues algorithm could proceed as usual.

I think that was my meaning at the time; and it's still broadly true.
Unfortunately, it turns out that there can be pretty much an arbitrary
computation between the referenced instruction and the referencing
DBG_INSTR_REF. I ran across several scenarios where a referenced
instruction gets CSE'd / PRE'd / hoisted away from the referencing
DBG_INSTR_REF, to the extent that it's on the other side of a loop.
Once it's there, a dataflow algorithm is needed to work out whether
the value is live through the loop, or whether it gets clobbered
somewhere.

That's what the "machine value number" problem is solving. It could
still plug into the VarLoc implementation if that makes an easier
transition, where a DBG_INSTR_REF is decomposed into a DBG_VALUE when
an instruction reference / value number / location are matched up. It
would make debug use-before-defs harder to deal with, though.

(It's worth noting at this point that the machine value number work
is, in a sense, doing LiveDebugVariables' job, by linking up a
register def with a use. Just, LiveDebugVariables has the register
allocator to help it do it).

Reading this RFC, I get the sense that that has several drawbacks: it doesn’t handle situations where the referenced instruction is in a different basic block than its referencing DBG_VALUE (all the ‘ugly phi stuff’ is missing!), and it doesn’t track all the locations for a variable. Is that correct?

This might be my overly-defensive writing style working against me. I
believe those problems are solved, but the solutions are more
complicated than I wanted. For:

it doesn’t handle situations where the referenced instruction is in a different basic block than its referencing DBG_VALUE

This is solved by the machine value number dataflow; but as mentioned
requires a more complex lattice to infer where PHI values are
generated,

and it doesn’t track all the locations for a variable

I believe the implementation correctly picks the _value_ for every
variable in each block, and picking a location is a final
post-processing stage.

I think we were on the same page. I was simply contrasting my earlier understanding of how LiveDebugValues would handle instruction-referencing DBG_VALUEs (with a single, block-local forward scan) to the value numbering approach used in your current prototype.

One thing I’m not yet clear on: does your prototype essentially implement the minimal extension to LiveDebugValues needed to handle instruction-referencing DBG_VALUEs? If not, should it?, or should we take the opportunity to remove limitations from the current LiveDebugValues algorithm?

I have the code for this hanging around, but not in the tree I linked
to. It establishes a fairly trivial mapping between:
* Instructions that are referred to, and
* The machine value numbers that instruction produces.
I'd suggest that it's not useful without DBG_INSTR_REFs as we wouldn't
be able to test it., and testing it means bringing in patches to
generate or parse DBG_INSTR_REFs, which I'd rather split up.

I see, I think it makes sense to get the value numbering piece in a testable state first, and prove that it’s at least as good as the current algorithm.

As for how to land this, I propose:

- Move LiveDebugValues.cpp to lib/CodeGen/LiveDebugValues/VarLocBasedImpl.cpp
- Split out common transfer functions and other helpers into lib/CodeGen/LiveDebugValues/Common.{h,cpp}
- Finally, add lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.cpp
- Add a cl::opt like -instr-ref-dbg-values to allow selecting a DBG_VALUE flavor

Sounds good to me,

I’m looking forward to this!

Yes, I’m in favor of landing work related to instruction-referencing DBG_VALUEs in small-ish independent patches, as they become ready. I’d very much like to keep the unmodified var-loc DBG_VALUE code paths intact and separated.

--
Thanks,
Jeremy

thanks,
vedant

Hi Xiang,

Got it! Thank you very much!! : )