Propagation of debug information for variable into basic blocks.

Adrian,

I am currently investigating issues where variables that one would expect to be available in a debugger are not in code that is compiled at optimisations other than –O0

The main problem appears to be with the LiveDebugValues::join() method because it does not allow variables to be propagated into blocks unless all predecessor blocks have an Outgoing Location for that variable.

As a simple example in the C code:

int func2( int);

void func(int a) {

int b = func2(10);

for(int i = 1; i < a; i++) {

func2(i+b);

}

}

One would reasonable expect when stopped within the body of the for loop that you could access the variable b in a debugger (especially as it is actually referenced in the loop).

Unfortunately this is often not the case. I believe that this is due to the requirement stated in the descriptive comment of LiveDebugValues::join() which states:

“if the same source variable in all the predecessors of @MBB reside in the same location.”

In our simple example we end up with a series of blocks like

BB#0 Initial-block Predecessor: Successor: BB#2

BB#1 for-body Predecessor: BB#2 Successor: BB#2

BB#2 for-condition Predecessor: BB#0 BB#1 Successor: BB#1 BB#3

BB#3 after-for Predecessor: BB#2 Successor :

Now b is initially defined to be an “Outgoing Location” to BB#0, but it isn’t imported into BB#2 because it is not defined as an “Outgoing Location” for both predecessor blocks BB#0 and BB#1.

So the outcome is that the variable b is not available in the debugging information while in BB#2 (or BB#1).

Now changing the algorithm in LiveDebugValues::join() to include all Outgoing Locations from predecessor blocks appears to significantly improve the visibility of variables in such cases. However I am worried that doing this possibly propagates the variables more than intended … or maybe it is the right thing to do.

So if you have any suggestions or alternative approaches to consider then please let me know.

Keith

Adrian,

I am currently investigating issues where variables that one would expect to be available in a debugger are not in code that is compiled at optimisations other than –O0

The main problem appears to be with the LiveDebugValues::join() method because it does not allow variables to be propagated into blocks unless all predecessor blocks have an Outgoing Location for that variable.

As a simple example in the C code:

int func2( int);
void func(int a) {
        int b = func2(10);
        for(int i = 1; i < a; i++) {
                func2(i+b);
        }
}

One would reasonable expect when stopped within the body of the for loop that you could access the variable b in a debugger (especially as it is actually referenced in the loop).

Side note:
In optimized code I would expect the loop to be rewritten into something like
  int func2( int);
  void func(int a) {
        int b = func2(10);
        for(int i = b+1; i < a; i++)
                func2(i);
  }
so I would expect the primary reason for b being unavailable in the loop body to be that b is effectively dead and there is no reason to keep it in a register. But that's not what happens in your example.

Unfortunately this is often not the case. I believe that this is due to the requirement stated in the descriptive comment of LiveDebugValues::join() which states:
  “if the same source variable in all the predecessors of @MBB reside in the same location.”

In our simple example we end up with a series of blocks like

  BB#0 Initial-block Predecessor: Successor: BB#2

  BB#1 for-body Predecessor: BB#2 Successor: BB#2

  BB#2 for-condition Predecessor: BB#0 BB#1 Successor: BB#1 BB#3

  BB#3 after-for Predecessor: BB#2 Successor :

Now b is initially defined to be an “Outgoing Location” to BB#0, but it isn’t imported into BB#2 because it is not defined as an “Outgoing Location” for both predecessor blocks BB#0 and BB#1.

So the outcome is that the variable b is not available in the debugging information while in BB#2 (or BB#1).

Now changing the algorithm in LiveDebugValues::join() to include all Outgoing Locations from predecessor blocks appears to significantly improve the visibility of variables in such cases. However I am worried that doing this possibly propagates the variables more than intended ... or maybe it is the right thing to do.

So if you have any suggestions or alternative approaches to consider then please let me know.

Conceptually, the LiveDebugValues data flow analysis should be using three-valued logic arranged in a lattice

    ⊥ (uninitialized / don't know)
   / \
true false (is (not) available)

where join(x, ⊥) = x, otherwise it behaves like boolean &.

All debug variable values are initialized to the bottom element first. After processing BB#0 we have var[b, reg23] = true. When we join this with the unknown ⊥ from BB#1, we propagate var[b, reg23] into BB#1. Next time we join at BB#2 we will have consistent information in both predecessors and the algorithm converges. If, for example, BB#1 had conflicing information for b the next join at BB#2 would delete the information for b and the result would still be correct.
This is guaranteed to terminate because the information at the nodes can only move in one direction in the lattice and can change at most once.

I haven't thought this through entirely, but it looks like we could implement this by keeping track of which basic blocks we never visited before and special-casing previously unvisited basic blocks in join().

-- adrian

GCC uses union of predecessor outs.

This looks to be trying to convert it to a lattice problem, but isn't
handling of the lattice looks a little odd.

Conceptually, the LiveDebugValues data flow analysis should be using
three-valued logic arranged in a lattice

    ⊥ (uninitialized / don't know)
   / \
true false (is (not) available)

where join(x, ⊥) = x, otherwise it behaves like boolean &.

All debug variable values are initialized to the bottom element first.
After processing BB#0 we have var[b, reg23] = true. When we join this with
the unknown ⊥ from BB#1, we propagate var[b, reg23] into BB#1. Next time we
join at BB#2 we will have consistent information in both predecessors and
the algorithm converges.

FWIW: GCC does this as a union, so you get the maximal info available. If
it's not available along a given path, it's simply not there for that path.
This will discard it if *any* path has missing info (not just inconsistent
info).

I'll skip whether this is or is not the right thing to do :slight_smile:

If, for example, BB#1 had conflicing information for b the next join at
BB#2 would delete the information for b and the result would still be
correct.
This is guaranteed to terminate because the information at the nodes can
only move in one direction in the lattice and can change at most once.

I haven't thought this through entirely, but it looks like we could
implement this by keeping track of which basic blocks we never visited
before and special-casing previously unvisited basic blocks in join().

This is because you don't really init all the info to bottom for real. It
tries to be lazy.
Otherwise, they'd all have outlocs of bottom.
They are only theoretically initialized, so things get the wrong answer.

For example, this code is not right:

  // For all predecessors of this MBB, find the set of VarLocs that
  // can be joined.
  for (auto p : MBB.predecessors()) {
    auto OL = OutLocs.find(p);
    // Join is null in case of empty OutLocs from any of the pred.
    if (OL == OutLocs.end())
      return false;

This is wrong if the block is unvisited (as you say)

  // For all predecessors of this MBB, find the set of VarLocs that
  // can be joined.
  for (auto p : MBB.predecessors()) {
    auto OL = OutLocs.find(p);
    // Join is null in case of empty OutLocs from any of the pred.
    if (OL == OutLocs.end())
      return false;

This is wrong if the block is unvisited (as you say)

This is actually non-trivial to fix, IMHO.

For example, the following looks kinda right:

for (auto p : MBB.predecessors()) {
    if (p is not visited)
     continue;
    auto OL = OutLocs.find(p);
    // Join is null in case of empty OutLocs from any of the pred.
    if (OL == OutLocs.end())
      return false;

(and will work in a lot of cases)

This can be wrong, however, if you have more than a single backedge in the
program.
(it may be possible to have a block that has multiple uninited predecessors
that will require multiple iterations to resolve)

The problem here is that to get the maximal answer in an intersection
problem, you *must* initialize the sets to contain every value. You can
look at dataflow algorithm papers for formal proofs if you want.

So you *actually* have to treat the default state of every unvisited block
as the equivalent of "containing every value" to get the maximal answer.

How to achieve this for real is ... not obvious at a glance.

  // For all predecessors of this MBB, find the set of VarLocs that
  // can be joined.
  for (auto p : MBB.predecessors()) {
    auto OL = OutLocs.find(p);
    // Join is null in case of empty OutLocs from any of the pred.
    if (OL == OutLocs.end())
      return false;

This is wrong if the block is unvisited (as you say)

This is actually non-trivial to fix, IMHO.

For example, the following looks kinda right:
  
for (auto p : MBB.predecessors()) {
    if (p is not visited)
     continue;
    auto OL = OutLocs.find(p);
    // Join is null in case of empty OutLocs from any of the pred.
    if (OL == OutLocs.end())
      return false;

(and will work in a lot of cases)

This can be wrong, however, if you have more than a single backedge in the program.
(it may be possible to have a block that has multiple uninited predecessors that will require multiple iterations to resolve)

The problem here is that to get the maximal answer in an intersection problem, you *must* initialize the sets to contain every value. You can look at dataflow algorithm papers for formal proofs if you want.

So you *actually* have to treat the default state of every unvisited block as the equivalent of "containing every value" to get the maximal answer.

How to achieve this for real is ... not obvious at a glance.

I had to construct an example to see the problem, but my example is bit contrived (it depends on starting in the middle of the graph) so I wonder if there is a better counterexample.

BB0:y BB1:x,y
> >
BB2:⊥ BB3:⊥ BB4:⊥
> / | /
BB5:x | /
      \ | /
        BB6: ?
         >
         +-> back-edge to BB1, BB4

BB5 has a definition for x but neither kills nor defines y. No block ever kills y. Let's assume we for whatever reason started in BB5. The first join() at BB6 will yield the set [x] (BB3 and BB4 are unvisited and thus skipped, just as if they contained every value). This causes BB2 and BB4 to be added to the working set. If we now visit BB4 first, it will see only [x] and it will be impossible for y be propagated to BB6.

Is there a better example?

-- adrian

So staring a bit at it, i've convinced myself the fix i listed may
actually be correct *as long* as we guarantee the proper iteration order.

The only reason i know about this at all is because it happened many many
moons ago in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=27755 :slight_smile:

For a while, we actually computed the maximal set and used it.

Now, we use implicit ones, which for an intersection problem, *should be*
the equivalent of skipping the predecessor (if the predecessor is maximal,
anding it with 1's will simply never remove anything).

However, this *only* works if you can guarantee you have visited at least
one predecessor/successor (for forward/backwards problems) by the time you
get to a given block
otherwise, an implicit in/out set will be empty, instead of all ones.

This is easy to prove. If you skip all of the predecessors or successors,
and the set was not actually initialized to all 1's, it will contain
absolutely nothing, and this nothing will propagate when it should be 1's :slight_smile:
If you have at least one predecessor/successor with some value, you can use
that, and you are guaranteed a correct answer.

The code i ended up writing for gcc looks STH like this (note, this was for
ANTIC, which is a backwards problem, so it's intersection of successors
instead of intersection of preds):

SmallVector worklist:

BasicBlock first = nullptr;
for each successor s {
  if (!first && visited.count(s))
    first = s;
  else if (visisted.count(s))
    worklist.push_back(s);
}
assert (first != nullptr && "We should have visited at least one succ")

for each thing on worklist
   intersect with first

If you use preds instead of succs, it should work here, as long as we
continue to use RPO.

I *think* that code i wrote is equivalent to adding the continue where i
have it, and an assert that at least one block was visited during the loop

>
>
>
> // For all predecessors of this MBB, find the set of VarLocs that
> // can be joined.
> for (auto p : MBB.predecessors()) {
> auto OL = OutLocs.find(p);
> // Join is null in case of empty OutLocs from any of the pred.
> if (OL == OutLocs.end())
> return false;
>
>
> This is wrong if the block is unvisited (as you say)
>
> This is actually non-trivial to fix, IMHO.
>
> For example, the following looks kinda right:
>
> for (auto p : MBB.predecessors()) {
> if (p is not visited)
> continue;
> auto OL = OutLocs.find(p);
> // Join is null in case of empty OutLocs from any of the pred.
> if (OL == OutLocs.end())
> return false;
>
> (and will work in a lot of cases)
>
> This can be wrong, however, if you have more than a single backedge in the program.
> (it may be possible to have a block that has multiple uninited predecessors that will require multiple iterations to resolve)
>
> The problem here is that to get the maximal answer in an intersection problem, you *must* initialize the sets to contain every value. You can look at dataflow algorithm papers for formal proofs if you want.
>
> So you *actually* have to treat the default state of every unvisited block as the equivalent of "containing every value" to get the maximal answer.
>
> How to achieve this for real is ... not obvious at a glance.
>

I had to construct an example to see the problem, but my example is bit contrived (it depends on starting in the middle of the graph) so I wonder if there is a better counterexample.

BB0:y BB1:x,y
> >
BB2:⊥ BB3:⊥ BB4:⊥
> / | /
BB5:x | /
      \ | /
        BB6: ?
         >
         +-> back-edge to BB1, BB4

BB5 has a definition for x but neither kills nor defines y. No block ever kills y. Let's assume we for whatever reason started in BB5. The first join() at BB6 will yield the set [x] (BB3 and BB4 are unvisited and thus skipped, just as if they contained every value). This causes BB2 and BB4 to be added to the working set. If we now visit BB4 first, it will see only [x] and it will be impossible for y be propagated to BB6.

Is there a better example?

So staring a bit at it, i've convinced myself the fix i listed may actually be correct *as long* as we guarantee the proper iteration order.

Cool! That explains why I struggled to come up with an example that did not start processing in the middle of the graph :slight_smile:

The only reason i know about this at all is because it happened many many moons ago in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=27755 :slight_smile:

For a while, we actually computed the maximal set and used it.

Now, we use implicit ones, which for an intersection problem, *should be* the equivalent of skipping the predecessor (if the predecessor is maximal, anding it with 1's will simply never remove anything).

However, this *only* works if you can guarantee you have visited at least one predecessor/successor (for forward/backwards problems) by the time you get to a given block
otherwise, an implicit in/out set will be empty, instead of all ones.

This is easy to prove. If you skip all of the predecessors or successors, and the set was not actually initialized to all 1's, it will contain absolutely nothing, and this nothing will propagate when it should be 1's :slight_smile:
If you have at least one predecessor/successor with some value, you can use that, and you are guaranteed a correct answer.

Exactly.

The code i ended up writing for gcc looks STH like this (note, this was for ANTIC, which is a backwards problem, so it's intersection of successors instead of intersection of preds):

SmallVector worklist:

BasicBlock first = nullptr;
for each successor s {
  if (!first && visited.count(s))
    first = s;
  else if (visisted.count(s))
    worklist.push_back(s);
}
assert (first != nullptr && "We should have visited at least one succ")

for each thing on worklist
   intersect with first

If you use preds instead of succs, it should work here, as long as we continue to use RPO.

I *think* that code i wrote is equivalent to adding the continue where i have it, and an assert that at least one block was visited during the loop

That makes sense: either there are no predecessors or at least one of them must be in the visited set.

thanks!
-- adrian

>
>
>
>
> >
> >
> >
> > // For all predecessors of this MBB, find the set of VarLocs that
> > // can be joined.
> > for (auto p : MBB.predecessors()) {
> > auto OL = OutLocs.find(p);
> > // Join is null in case of empty OutLocs from any of the pred.
> > if (OL == OutLocs.end())
> > return false;
> >
> >
> > This is wrong if the block is unvisited (as you say)
> >
> > This is actually non-trivial to fix, IMHO.
> >
> > For example, the following looks kinda right:
> >
> > for (auto p : MBB.predecessors()) {
> > if (p is not visited)
> > continue;
> > auto OL = OutLocs.find(p);
> > // Join is null in case of empty OutLocs from any of the pred.
> > if (OL == OutLocs.end())
> > return false;
> >
> > (and will work in a lot of cases)
> >
> > This can be wrong, however, if you have more than a single backedge in
the program.
> > (it may be possible to have a block that has multiple uninited
predecessors that will require multiple iterations to resolve)
> >
> > The problem here is that to get the maximal answer in an intersection
problem, you *must* initialize the sets to contain every value. You can
look at dataflow algorithm papers for formal proofs if you want.
> >
> > So you *actually* have to treat the default state of every unvisited
block as the equivalent of "containing every value" to get the maximal
answer.
> >
> > How to achieve this for real is ... not obvious at a glance.
> >
>
> I had to construct an example to see the problem, but my example is bit
contrived (it depends on starting in the middle of the graph) so I wonder
if there is a better counterexample.
>
> BB0:y BB1:x,y
> > >
> BB2:⊥ BB3:⊥ BB4:⊥
> > / | /
> BB5:x | /
> \ | /
> BB6: ?
> >
> +-> back-edge to BB1, BB4
>
> BB5 has a definition for x but neither kills nor defines y. No block
ever kills y. Let's assume we for whatever reason started in BB5. The first
join() at BB6 will yield the set [x] (BB3 and BB4 are unvisited and thus
skipped, just as if they contained every value). This causes BB2 and BB4 to
be added to the working set. If we now visit BB4 first, it will see only
[x] and it will be impossible for y be propagated to BB6.
>
> Is there a better example?
>
> So staring a bit at it, i've convinced myself the fix i listed may
actually be correct *as long* as we guarantee the proper iteration order.
>

Cool! That explains why I struggled to come up with an example that did
not start processing in the middle of the graph :slight_smile:

> The only reason i know about this at all is because it happened many
many moons ago in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=27755 :slight_smile:
>
> For a while, we actually computed the maximal set and used it.
>
> Now, we use implicit ones, which for an intersection problem, *should
be* the equivalent of skipping the predecessor (if the predecessor is
maximal, anding it with 1's will simply never remove anything).
>
> However, this *only* works if you can guarantee you have visited at
least one predecessor/successor (for forward/backwards problems) by the
time you get to a given block
> otherwise, an implicit in/out set will be empty, instead of all ones.
>
> This is easy to prove. If you skip all of the predecessors or
successors, and the set was not actually initialized to all 1's, it will
contain absolutely nothing, and this nothing will propagate when it should
be 1's :slight_smile:
> If you have at least one predecessor/successor with some value, you can
use that, and you are guaranteed a correct answer.

Exactly.

> The code i ended up writing for gcc looks STH like this (note, this was
for ANTIC, which is a backwards problem, so it's intersection of successors
instead of intersection of preds):
>
> SmallVector worklist:
>
> BasicBlock first = nullptr;
> for each successor s {
> if (!first && visited.count(s))
> first = s;
> else if (visisted.count(s))
> worklist.push_back(s);
> }
> assert (first != nullptr && "We should have visited at least one succ")
>
> for each thing on worklist
> intersect with first
>
>
> If you use preds instead of succs, it should work here, as long as we
continue to use RPO.
>
> I *think* that code i wrote is equivalent to adding the continue where i
have it, and an assert that at least one block was visited during the loop

That makes sense: either there are no predecessors or at least one of them
must be in the visited set.

Just a thought: There was a similar issue (
http://lists.llvm.org/pipermail/llvm-dev/2016-May/099626.html) and using
dominator info to detect loops and propagate information accordingly can
fix the issue but it might not be a clean approach.

Thanks,
Vikram TV