SCCP is not always correct in presence of undef (+ proposed fix)

Hi.
I'm sending this email to -dev as this may be of interest of
many/people may have opinions/want to try the change before it goes in
to report problems.
I've been recently working on a patch to integrate `undef` in the SCCP
solver, in the hope of fixing a tail of latent bugs in SCCP which
remained uncovered for many years. I think this is a decent time to
propose, so that it can (hopefully) be integrated in the 4.0 branch
(or backported, FWIW), if it gets reviewed and there are no
substantial problems.

#### Background:
The current lattice for SCCP doesn't know about `undef`, that is, it's
identical to the one described in [1]. There are three lattice states,
"Unknown"/"Constant"/"Overdefined" (the first one is what the paper
calls "Undefined", but was renamed as "undefined" in LLVM has a
different meaning).

SCCP (or its interprocedural counterpart) currently consists of two phases:
1) The solver, which implements the algorithm described in the paper.
At the end of this phase, if there are no `undef` values in the input,
all the values supposedly have been lowered (in a lattice-theoretical
sense) to either constant or overdefined. In presence of `undef`,
there may be values which are still in the 'unknown' state.
2) The resolver assumes that everything in the 'unknown' state is
undefined, and runs a procedure trying to plug the "correct" value,
and in case something changed, runs the solver again.

#### Problem/Motivation for the change:
1) and 2) use different orders of visitation. 1) walks the use-def
chains, 2) visit the function in basic block order (starting from the
entry block of the function).
The fact that 2) walks over the function instead of the SSA graph
creates some issues where it cannot distinguish between the following
two cases:

a) It doesn't know how to resolve this value
b) This value is really undef.

and may plug in wrong/invalid values. For a real example, see the bug
in [3], for a proof of why it's wrong, see [2] (courtesy of Daniel
Berlin).

#### Proposed fix:
Integrate `undef` in the solver. Undef is a lattice value lower than
`Unknown` but higher than `Overdefined` (at the same lattice level of
constant). Graphically:

   unknown
    / \ \
   / \ \
c1 ... ck undefined
  \ | /
   \ | /
   overdefined

#### Benefits:
-> The code now is correct (modulo other bugs =)) in presence of undefs
-> This allows us to keep the same lattice height and therefore all
the nice termination/convergence properties/bounds (see the paper for
reference).
-> This allows to remove a lot of code (pretty much the resolver) and
makes the algorithm easier to understand
-> SCCP doesn't need anymore to roll its own tiny constant folder
where we need to add cases everytime we realize there's something
missing, instead, it can leverage the full power of ConstantFolding
when something folds to undef.
-> Makes SCCP *slightly* faster.

#### (Potential) drawbacks:
-> This change the behavior of `phi` handling, i.e. we lose some cases
as we force phi to overdefined if not the values are constant. I
measured the impact on the llvm-testsuite and on some internal
benchmarks and I wasn't able to measure any substantial different in
runtime performances. If you care about this case, please speak up =)
(or try the patch and report regressions). There are probably ways to
make this working but we (see [4], comment 7 and 8) think the gain in
precision is not worth the complexity (and what's here is *much much*
better than what's in the tree as it tries at least to be correct).

#### Plan for integration
The patch is available at ⚙ D28177 [SCCP] Integrate `undef` in the solver
Any comments/feedback/testing is definitely welcome. This is a
self-contained change.

#### (Possible) future work
If this goes in I'd like to implement constant propagation for missing
IR constructs (in particular vectors), and make sure SCCP is on par
with the other two costant propagation passes we have in llvm (and
maybe garbage collect them).
Danny suggested an improvement could be that of propagating facts in
both directions (GCC does that as a separate pass
https://gcc.gnu.org/svn/gcc/trunk/gcc/gimple-ssa-backprop.c). I have
no immediate plans to work on this (I suspect NewGVN and other places
will keep me busy for a while), but I hope to eventually get there.

#### Thanks
Eli Friedman and Daniel Berlin provided very insightful
conversations/suggestions on how to tackle the problems, and provided
early feedback on the change (other than reviewing large part if not
all my previous work on SCCP).

[1] https://www.cs.utexas.edu/users/lin/cs380c/wegman.pdf
[2] http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20161107/403212.html
[3] https://llvm.org/bugs/show_bug.cgi?id=30448
[4] https://llvm.org/bugs/show_bug.cgi?id=30966#c8

Hi.
I'm sending this email to -dev as this may be of interest of
many/people may have opinions/want to try the change before it goes in
to report problems.
I've been recently working on a patch to integrate `undef` in the SCCP
solver, in the hope of fixing a tail of latent bugs in SCCP which
remained uncovered for many years. I think this is a decent time to
propose, so that it can (hopefully) be integrated in the 4.0 branch
(or backported, FWIW), if it gets reviewed and there are no
substantial problems.

#### Background:
The current lattice for SCCP doesn't know about `undef`, that is, it's
identical to the one described in [1]. There are three lattice states,
"Unknown"/"Constant"/"Overdefined" (the first one is what the paper
calls "Undefined", but was renamed as "undefined" in LLVM has a
different meaning).

SCCP (or its interprocedural counterpart) currently consists of two phases:
1) The solver, which implements the algorithm described in the paper.
At the end of this phase, if there are no `undef` values in the input,
all the values supposedly have been lowered (in a lattice-theoretical
sense) to either constant or overdefined. In presence of `undef`,
there may be values which are still in the 'unknown' state.
2) The resolver assumes that everything in the 'unknown' state is
undefined, and runs a procedure trying to plug the "correct" value,
and in case something changed, runs the solver again.

#### Problem/Motivation for the change:
1) and 2) use different orders of visitation. 1) walks the use-def
chains, 2) visit the function in basic block order (starting from the
entry block of the function).
The fact that 2) walks over the function instead of the SSA graph
creates some issues where it cannot distinguish between the following
two cases:

a) It doesn't know how to resolve this value
b) This value is really undef.

and may plug in wrong/invalid values. For a real example, see the bug
in [3], for a proof of why it's wrong, see [2] (courtesy of Daniel
Berlin).

#### Proposed fix:
Integrate `undef` in the solver. Undef is a lattice value lower than
`Unknown` but higher than `Overdefined` (at the same lattice level of
constant). Graphically:

   unknown
    / \ \
   / \ \
c1 ... ck undefined
  \ | /
   \ | /
   overdefined

#### Benefits:
-> The code now is correct (modulo other bugs =)) in presence of undefs
-> This allows us to keep the same lattice height and therefore all
the nice termination/convergence properties/bounds (see the paper for
reference).
-> This allows to remove a lot of code (pretty much the resolver) and
makes the algorithm easier to understand

How does this eliminate the need for the resolver? Without the resolver,
what will plug in values for the undef's?

-> SCCP doesn't need anymore to roll its own tiny constant folder
where we need to add cases everytime we realize there's something
missing, instead, it can leverage the full power of ConstantFolding
when something folds to undef.

Can you elaborate on why SCCP currently needs to "roll its own tiny
constant folder"? I wasn't able to understand it from just the context here.

-- Sean Silva

-> The code now is correct (modulo other bugs =)) in presence of undefs
-> This allows us to keep the same lattice height and therefore all
the nice termination/convergence properties/bounds (see the paper for
reference).
-> This allows to remove a lot of code (pretty much the resolver) and
makes the algorithm easier to understand

How does this eliminate the need for the resolver? Without the resolver,
what will plug in values for the undef's?

You don't need a separate resolver to plug in values for the undefs.

Even if you wanted to super-optimize them, you wouldn't do it the way the
resolver does it (there are better strategies :P)

Hi David,

Looking at the original bug, it seems like a straightforward
undef-propagation bug to me -- SCCP was folding "or undef, constant"
to "undef", which is wrong. Why is changing that not the fix? That
is, some variant of

diff --git a/lib/Transforms/Scalar/SCCP.cpp b/lib/Transforms/Scalar/SCCP.cpp
index 8a6be97..45f1241 100644
--- a/lib/Transforms/Scalar/SCCP.cpp
+++ b/lib/Transforms/Scalar/SCCP.cpp
@@ -908,8 +908,18 @@ void SCCPSolver::visitBinaryOperator(Instruction &I) {
   }

   // If something is undef, wait for it to resolve.
- if (!V1State.isOverdefined() && !V2State.isOverdefined())
- return;
+ if (!V1State.isOverdefined() && !V2State.isOverdefined()) {
+ Constant *L = UndefValue::get(I.getOperand(0)->getType());
+ if (V1State.isConstant())
+ L = V1State.getConstant();
+ Constant *R = UndefValue::get(I.getOperand(0)->getType());
+ if (V2State.isConstant())
+ R = V2State.getConstant();
+ Constant *C = ConstantExpr::get(I.getOpcode(), L, R);
+ if (isa<UndefValue>(C))
+ return;
+ return markConstant(IV, &I, C);
+ }

   // Otherwise, one of our operands is overdefined. Try to produce something
   // better than overdefined with some tricks.

Also, did you mean to make the lattice as:

digraph G {
  Unknown -> Undef
  Undef -> Constant1
  Undef -> Constant2
  Undef -> Constant3
  Constant1 -> Bottom
  Constant2 -> Bottom
  Constant3-> Bottom
}

? In the lattice you've drawn, Constant MEET Undef will be Bottom,
when it should ideally be Constant.

Secondly, what's the purpose of splitting Unknown and Undef in the new
scheme? Is there a case in your algorithm in which treating an
unknown as an undef will be a problem?

Actually given the above lattice (assuming you're okay with that) we know
treating unknown as undef should not make us any less conservative
(i.e. should be safe) since undef is lower than unknown. IOW why not change
the algorithm to start off each cell as undef and not unknown?

-- Sanjoy

Hi David,

Looking at the original bug, it seems like a straightforward
undef-propagation bug to me – SCCP was folding “or undef, constant”
to “undef”, which is wrong. Why is changing that not the fix? That
is, some variant of

You would still need to fix the iteration order of the resolver, or it will make more wrong decisions.
As Davide discovered, there are bugs open with the same cause.

diff --git a/lib/Transforms/Scalar/SCCP.cpp b/lib/Transforms/Scalar/SCCP.cpp
index 8a6be97…45f1241 100644
— a/lib/Transforms/Scalar/SCCP.cpp
+++ b/lib/Transforms/Scalar/SCCP.cpp
@@ -908,8 +908,18 @@ void SCCPSolver::visitBinaryOperator(Instruction &I) {
}

// If something is undef, wait for it to resolve.

  • if (!V1State.isOverdefined() && !V2State.isOverdefined())
  • return;
  • if (!V1State.isOverdefined() && !V2State.isOverdefined()) {
  • Constant *L = UndefValue::get(I.getOperand(0)->getType());
  • if (V1State.isConstant())
  • L = V1State.getConstant();
  • Constant *R = UndefValue::get(I.getOperand(0)->getType());
  • if (V2State.isConstant())
  • R = V2State.getConstant();
  • Constant *C = ConstantExpr::get(I.getOpcode(), L, R);
  • if (isa(C))
  • return;
  • return markConstant(IV, &I, C);
  • }

// Otherwise, one of our operands is overdefined. Try to produce something
// better than overdefined with some tricks.

Also, did you mean to make the lattice as:

digraph G {
Unknown → Undef
Undef → Constant1
Undef → Constant2
Undef → Constant3
Constant1 → Bottom
Constant2 → Bottom
Constant3-> Bottom
}

? In the lattice you’ve drawn, Constant MEET Undef will be Bottom,
when it should ideally be Constant.

Secondly, what’s the purpose of splitting Unknown and Undef in the new
scheme?

You need to distinguish between not visited and visited but undef.

Is there a case in your algorithm in which treating an
unknown as an undef will be a problem?

Actually given the above lattice (assuming you’re okay with that) we know
treating unknown as undef should not make us any less conservative
(i.e. should be safe) since undef is lower than unknown. IOW why not change
the algorithm to start off each cell as undef and not unknown?

Because then you can’t distinguish between not visited and undef.

  Is there a case in your algorithm in which treating an

unknown as an undef will be a problem?

Yes, if you try to optimize undefs in any way, no if you move them to

overdefined :slight_smile:

IE given phi [a, undef, undef, undef]

with just as many back edges, you will visit this node 4 times.

If you start out with

phi [a, undef, undef, undef], you may think "I know, i will make these
undef's a".

So you move everything to value

phi [a, a, a, a]

But remember, you really haven't visited the other 4 edges yet, so you
don't know if this is a valid value for undef (because of the rules around
and/or/etc, see http://llvm.org/docs/LangRef.html#undefined-values).

(a, a, a, a of course, is just an example, you also don't know if it's
valid to always return undef, or 0, or 1, or anything at all, really).

It's not until you hit this node for the 4th time, that you can really
safely choose a value.

If you have unknown and undef, it goes from

phi [a, unknown, unknown, unknown] to phi [a, undef, unknown, unknown] to
...

you know, once you have no unknown values, that everything you need to know
to resolve it is known.

(this is also why there is no need for a resolver. There really is no safe
logic in the resolver that can't be pushed into the solver, though it may
make sense to structure it as a resolver if you think it's easier)

Similarly for or %x, undef.
You need to know if it's really undef, not "something i haven't visited
yet".

Hi,

Hi David,

Looking at the original bug, it seems like a straightforward
undef-propagation bug to me -- SCCP was folding "or undef, constant"
to "undef", which is wrong. Why is changing that not the fix? That
is, some variant of

You would still need to fix the iteration order of the resolver, or it will
make more wrong decisions.
As Davide discovered, there are bugs open with the same cause.

Davide -- can you point me to those?

diff --git a/lib/Transforms/Scalar/SCCP.cpp
b/lib/Transforms/Scalar/SCCP.cpp
index 8a6be97..45f1241 100644
--- a/lib/Transforms/Scalar/SCCP.cpp
+++ b/lib/Transforms/Scalar/SCCP.cpp
@@ -908,8 +908,18 @@ void SCCPSolver::visitBinaryOperator(Instruction &I)
{
   }

   // If something is undef, wait for it to resolve.
- if (!V1State.isOverdefined() && !V2State.isOverdefined())
- return;
+ if (!V1State.isOverdefined() && !V2State.isOverdefined()) {
+ Constant *L = UndefValue::get(I.getOperand(0)->getType());
+ if (V1State.isConstant())
+ L = V1State.getConstant();
+ Constant *R = UndefValue::get(I.getOperand(0)->getType());
+ if (V2State.isConstant())
+ R = V2State.getConstant();
+ Constant *C = ConstantExpr::get(I.getOpcode(), L, R);
+ if (isa<UndefValue>(C))
+ return;
+ return markConstant(IV, &I, C);
+ }

   // Otherwise, one of our operands is overdefined. Try to produce
something
   // better than overdefined with some tricks.

Also, did you mean to make the lattice as:

digraph G {
  Unknown -> Undef
  Undef -> Constant1
  Undef -> Constant2
  Undef -> Constant3
  Constant1 -> Bottom
  Constant2 -> Bottom
  Constant3-> Bottom
}

? In the lattice you've drawn, Constant MEET Undef will be Bottom,
when it should ideally be Constant.

Secondly, what's the purpose of splitting Unknown and Undef in the new
scheme?

You need to distinguish between not visited and visited but undef.

What I'm getting at is, if you're implementing something close to the
paper mentioned, I can't think of a case where you'd care to
distinguish between "not visited" and "visited but undef". That is,
instead of starting from "map each each instruction as not visited" we
should be able to (my unproven conjecture) start from "map each
instruction to undef", making the "unknown" element superfluous.

-- Sanjoy

Whoops, hit send earlier.

Most of this gets taken care of by iterating in def-use order, but not for
phis or things dependent on them.

(especially if you want to resolve a phi node from a real value to undef)

>
> You need to distinguish between not visited and visited but undef.

What I'm getting at is, if you're implementing something close to the
paper mentioned, I can't think of a case where you'd care to
distinguish between "not visited" and "visited but undef". That is,
instead of starting from "map each each instruction as not visited" we
should be able to (my unproven conjecture) start from "map each
instruction to undef", making the "unknown" element superfluous.

See the email i just sent.

Your scheme would work if you didn't want to optimize undef, because you'd
just make meet (anything, undef) -> overdefined.
It's the "we want to fill in values for undef" that makes it so you need to
know when you can actually do that.
(and as mentioned, you don't need a separate post-resolver to do *what we
do now*, and so far, it has just been a source of bugs :()

Hi Daniel,

  Is there a case in your algorithm in which treating an
unknown as an undef will be a problem?

Yes, if you try to optimize undefs in any way, no if you move them to
overdefined :slight_smile:

IE given phi [a, undef, undef, undef]

with just as many back edges, you will visit this node 4 times.

If you start out with

phi [a, undef, undef, undef], you may think "I know, i will make these
undef's a".

So you move everything to value

phi [a, a, a, a]

But remember, you really haven't visited the other 4 edges yet, so you don't
know if this is a valid value for undef (because of the rules around
and/or/etc, see http://llvm.org/docs/LangRef.html#undefined-values).

But that's the same as case as:

  %x = phi [a, <unknown>]

Unless I know for sure that <unknown> is a, the *final result* can't
report the phi as a.

However, the I thought the algorithm we're talking about here is
optimistic, and we will initially fold %x to `a`, and then "try to
see" if other edges are viable or not. You're allowed to incorrectly
optimistic results in the midst of the algorithm run (by design).

Otherwise we won't be able to get cases like:

loop:
  %x = phi [ 0 , entry ], [ 1, loop ]
  if (x != 0) br loop else br leave_loop
leave_loop;

  =>

loop:
  br leave_loop
leave_loop;

As far as I can tell, the <unknown> has the semantic "given the viable
set of edges so far, there is no value assigned to this slot by the
program". This has the same rewrite rules as undef as far as I can
tell.

However, if you have a case like

loop:
  %x = phi [ 0 , entry ], [ t, loop ]
  t = undef | 1
  if (condition) br loop else br leave_loop
leave_loop;

and you end up with x as phi(0, undef) *after* the algorithm has
terminated, then that is a bug in the undef propagation, since "undef

1" is not undef.

(a, a, a, a of course, is just an example, you also don't know if it's valid
to always return undef, or 0, or 1, or anything at all, really).

It's not until you hit this node for the 4th time, that you can really
safely choose a value.

If you have unknown and undef, it goes from

phi [a, unknown, unknown, unknown] to phi [a, undef, unknown, unknown] to
...

you know, once you have no unknown values, that everything you need to know
to resolve it is known.

(this is also why there is no need for a resolver. There really is no safe
logic in the resolver that can't be pushed into the solver, though it may
make sense to structure it as a resolver if you think it's easier)

Similarly for or %x, undef.
You need to know if it's really undef, not "something i haven't visited
yet".

-- Sanjoy

Hi Daniel,

>>
>>> Is there a case in your algorithm in which treating an
>>> unknown as an undef will be a problem?
>>>
> Yes, if you try to optimize undefs in any way, no if you move them to
> overdefined :slight_smile:
>
> IE given phi [a, undef, undef, undef]
>
> with just as many back edges, you will visit this node 4 times.
>
> If you start out with
>
> phi [a, undef, undef, undef], you may think "I know, i will make these
> undef's a".
>
> So you move everything to value
>
> phi [a, a, a, a]
>
> But remember, you really haven't visited the other 4 edges yet, so you
don't
> know if this is a valid value for undef (because of the rules around
> and/or/etc, see http://llvm.org/docs/LangRef.html#undefined-values).

But that's the same as case as:

  %x = phi [a, <unknown>]

Unless I know for sure that <unknown> is a, the *final result* can't
report the phi as a.

Right, but we are talking about "when, in the intermediate state, can i
transform an undef to a different value".

Remember you can only go down the lattice. So you can't make undef
constant, and then discover it's wrong, and go back up :slight_smile:
IE can't change phi undef, undef to constant value 50, , discover 50
doesn't work, and say, well actually, let's make it undef again for an
iteration :slight_smile:

So you need to know the point, *in the intermediate state*, when it is safe
to change the undefs.
That point is when all unknowns disappear because you have completed
visiting the entire CFG..

However, the I thought the algorithm we're talking about here is
optimistic, and we will initially fold %x to `a`, and then "try to
see" if other edges are viable or not. You're allowed to incorrectly
optimistic results in the midst of the algorithm run (by design).

Yes, but, again, you still can't shove things back up the lattice :slight_smile:

Maybe this will help:
In the past, the resolver forced things to be constant. This is shoving
them down the lattice too far. There was, as i said no way to push them
back up the lattice when they turned out wrong (and this happened often
due to the iteration order differences).
If you want to do the stuff the resolver did, but do it at the right time,
you need the bit to tell you when you *can* shove them down to a constant.

You can store that bit however you want. You can store a bit that all
operands of an operation were visited at least once.
That is equivalent to "all operands != unknown".
But you can't dual purpose the bit with undef, because undef is an actual
value state. it tells you nothing about visited or unvisited, only the
value.

In particular, this would have to fall *down* to overdefined, not go back
*up* to undef.

Hi Daniel,

Right, but we are talking about "when, in the intermediate state, can i
transform an undef to a different value".

Remember you can only go down the lattice. So you can't make undef
constant, and then discover it's wrong, and go back up :slight_smile:
IE can't change phi undef, undef to constant value 50, , discover 50
doesn't work, and say, well actually, let's make it undef again for an
iteration :slight_smile:

If the kind of simplification (I think) you've mentioned is allowed,
then I think even Davide's lattice is problematic. Say we have:

loop:
  X = phi(entry: undef, loop: T)
  ...
  T = undef
  if C then br loop else br outside

When we first visit X then we'll compute its state as (Undef MEET
Unknown) = Undef. My guess is that you're implying that the Undef
lattice state (or some other computation hanging off of it) can be
folded immediately to, say, 50, but once we mark the backedge as
executable we'll set it to undef (or mark it as 49)? And both of
these are bad because you're making a non-monotonic transition? Given
this, I'm not sure what prevents the bad transform from happening even
if we have separate Unknown and Undef states.

If the solution is to not fold X into undef (which, by assumption, can
"spontaneously decay" into any value) until you're sure every incoming
value is also undef then I suppose we'll need some special handling
for undef values to prevent breaking cases like (as shown earlier):

loop:
  X = phi(entry: 10, loop: T)
  if X == 10 then br outside else br loop

=>

loop:
  X = phi(entry: 10, loop: T)
  br outside

(i.e. to keep the algorithm properly optimistic)?

And if we do have a way of keeping undefs as undefs (i.e. prevent X
from decaying into 50 when we first visit the first loop) then why
can't we re-use the same mechanism to avoid spuriously decaying "phi
undef undef-but-actually-unknown" to "50"?

Another way of stating my suggestion is that, if you agree that this
is a correct lattice (different from Davide's proposal) and pretend
the "spontaneous undef decay" problem does not exist, then:

digraph G {
  Unknown -> Undef
  Undef -> Constant1
  Undef -> Constant2
  Undef -> Constant3
  Constant1 -> Bottom
  Constant2 -> Bottom
  Constant3-> Bottom
}

then it should be legal / correct to first drop every lattice element
from "Unknown" to "Undef" before running the algorithm. The only
cases where this would give us a conservative result is a place where
we'd have otherwise gotten "Unknown" for a used value; and the
algorithm should never have terminated in such a state.

-- Sanjoy

Hi Daniel,

>> Right, but we are talking about "when, in the intermediate state, can i
>> transform an undef to a different value".
>>
>> Remember you can only go down the lattice. So you can't make undef
>> constant, and then discover it's wrong, and go back up :slight_smile:
>> IE can't change phi undef, undef to constant value 50, , discover 50
>> doesn't work, and say, well actually, let's make it undef again for an
>> iteration :slight_smile:

If the kind of simplification (I think) you've mentioned is allowed,
then I think even Davide's lattice is problematic. Say we have:

loop:
  X = phi(entry: undef, loop: T)
  ...
  T = undef
  if C then br loop else br outside

When we first visit X then we'll compute its state as (Undef MEET
Unknown) = Undef.

Yes

My guess is that you're implying that the Undef
lattice state (or some other computation hanging off of it) can be
folded immediately to, say, 50, but once we mark the backedge as
executable we'll set it to undef (or mark it as 49)?

Yes

  And both of

these are bad because you're making a non-monotonic transition?

Yes

Given
this, I'm not sure what prevents the bad transform from happening even
if we have separate Unknown and Undef states.

You can always get *out* of the bad transform by dropping to overdefined if
you discover it doesn't work.
The question is more one of "what is the optimal time to try to figure out
a constant value that works".
If you drop too early, you have to go overdefined when you may have been
able to figure out a constant to use to replace the undef.

If the solution is to not fold X into undef (which, by assumption, can
"spontaneously decay" into any value) until you're sure every incoming
value is also undef then I suppose we'll need some special handling
for undef values to prevent breaking cases like (as shown earlier):

loop:
  X = phi(entry: 10, loop: T)
  if X == 10 then br outside else br loop

=>

loop:
  X = phi(entry: 10, loop: T)
  br outside

(i.e. to keep the algorithm properly optimistic)?

Right. This is precisely the unknown part, or at least, that was my
thinking.

And if we do have a way of keeping undefs as undefs (i.e. prevent X
from decaying into 50 when we first visit the first loop) then why
can't we re-use the same mechanism to avoid spuriously decaying "phi
undef undef-but-actually-unknown" to "50"?

That mechanism is the "all executable paths leading to our operands
visited" bit, which is .. the unknown vs undef state.

IE we only mark the operand undef when we visit it from a reachable edge.

Another way of stating my suggestion is that, if you agree that this
is a correct lattice (different from Davide's proposal) and pretend
the "spontaneous undef decay" problem does not exist, then:

digraph G {
  Unknown -> Undef
  Undef -> Constant1
  Undef -> Constant2
  Undef -> Constant3
  Constant1 -> Bottom
  Constant2 -> Bottom
  Constant3-> Bottom
}

then it should be legal / correct to first drop every lattice element
from "Unknown" to "Undef" before running the algorithm. The only
cases where this would give us a conservative result is a place where
we'd have otherwise gotten "Unknown" for a used value; and the
algorithm should never have terminated in such a state.

It is definitely *legal* to do this or the algorithm is broken otherwise
(you should just end up in overdefined more).

I'm not sure i believe the last part though. I have to think about it.

FWIW: My initial proposal to all of this was "stop caring about trying to
optimize undef here at all". Remove it from the lattice, and treat undef as
overdefined if we see it. If we want to optimize undef, before SCCP
solving, build SCCs of the parts of the SSA graph that contain undef,
choose correct values for those subgraphs, then mark them constant *before*
the solver sees it".

This is pre-solve instead of post-solve or iterative solve. We already do
this for arguments by solving them to overdefined. IPCP also works much the
same way by pre-solving arguments constants

This is precisely because this stuff is non-trivial to reason about, and
unclear to me that it's really worth it. If *it* is worth it, we can do
better than we do now (and contain the mess *entirely* to the presolver),
and if it's not, we shouldn't have complicated reasoning to try to make it
work well.

Right now we have the worst of all worlds. We have a mess in both places
(the resolver and the solver both step on each other and differ in
behavior), it's hard to reason about (though provably not correct), and it
doesn't even give you the best answers it can *anyway*

(generally, at the point where we have a bunch of very smart people racking
their heads about whether something is really going to work algorithmically
or not, i take it as a sign that maybe we have the wrong path. Obviously,
there are some really complex problems, but ... this should not be one of
them :P)

Replying one mail at the time, still in a different timezone =)

-> This allows to remove a lot of code (pretty much the resolver) and
makes the algorithm easier to understand

How does this eliminate the need for the resolver? Without the resolver,
what will plug in values for the undef's?

Right now SCCP is implemented as a visitor for each instruction with
more or less the following structure:

visitInstruction(Instruction &I) {
   // This is not true in general because you can infer constant
values (sometimes) even if one of the operand is overdefined,
   // but just for simplicity
   if (operands are overdefined)
     markOverdefined(&I);
   [...]
   Constant *C = tryToConstantFold(...);
   if (isa<UndefValue>(C))
     return;
   markConstant(&I, C);
}

(see https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/Scalar/SCCP.cpp#L797
for a concrete example).

Pretty much, the solver delegates the folding to ConstantFolding.cpp
et similia, but doesn't lower the lattice value to constant if the
result of folding is `undef` value. In other words, everything that's
undef/folds to undef won't be lowered in the Solver, i.e. will still
have an `Unknown` lattice state after the solver finishes.

What the resolver does is walking the BBs and trying to plug the
correct values with the informations it has.
See a couple of examples in `ResolvedUndefsIn`

https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/Scalar/SCCP.cpp#L1382
or https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/Scalar/SCCP.cpp#L1390

These are cases which the constant folder will get just right, but we
silently ignore because if something folds to undef then we don't
assign a constant value. Therefore, `ResolvedUndefsIn` has to perform
extra work to plug in the "correct" values.

-> SCCP doesn't need anymore to roll its own tiny constant folder
where we need to add cases everytime we realize there's something
missing, instead, it can leverage the full power of ConstantFolding
when something folds to undef.

Can you elaborate on why SCCP currently needs to "roll its own tiny constant
folder"? I wasn't able to understand it from just the context here.

See comment above. Hope it makes sense. Thanks for pointing out and
sorry if it wasn't clear in the original mail (it wasn't obvious to me
in the beginning as well).

Hi Daniel,

>> Right, but we are talking about "when, in the intermediate state, can i
>> transform an undef to a different value".
>>
>> Remember you can only go down the lattice. So you can't make undef
>> constant, and then discover it's wrong, and go back up :slight_smile:
>> IE can't change phi undef, undef to constant value 50, , discover 50
>> doesn't work, and say, well actually, let's make it undef again for an
>> iteration :slight_smile:

If the kind of simplification (I think) you've mentioned is allowed,
then I think even Davide's lattice is problematic. Say we have:

loop:
  X = phi(entry: undef, loop: T)
  ...
  T = undef
  if C then br loop else br outside

When we first visit X then we'll compute its state as (Undef MEET
Unknown) = Undef.

Yes

My guess is that you're implying that the Undef
lattice state (or some other computation hanging off of it) can be
folded immediately to, say, 50, but once we mark the backedge as
executable we'll set it to undef (or mark it as 49)?

Yes

  And both of
these are bad because you're making a non-monotonic transition?

Yes

Given
this, I'm not sure what prevents the bad transform from happening even
if we have separate Unknown and Undef states.

You can always get *out* of the bad transform by dropping to overdefined if
you discover it doesn't work.
The question is more one of "what is the optimal time to try to figure out
a constant value that works".
If you drop too early, you have to go overdefined when you may have been
able to figure out a constant to use to replace the undef.

If the solution is to not fold X into undef (which, by assumption, can
"spontaneously decay" into any value) until you're sure every incoming
value is also undef then I suppose we'll need some special handling
for undef values to prevent breaking cases like (as shown earlier):

loop:
  X = phi(entry: 10, loop: T)
  if X == 10 then br outside else br loop

=>

loop:
  X = phi(entry: 10, loop: T)
  br outside

(i.e. to keep the algorithm properly optimistic)?

Right. This is precisely the unknown part, or at least, that was my
thinking.

And if we do have a way of keeping undefs as undefs (i.e. prevent X
from decaying into 50 when we first visit the first loop) then why
can't we re-use the same mechanism to avoid spuriously decaying "phi
undef undef-but-actually-unknown" to "50"?

That mechanism is the "all executable paths leading to our operands visited"
bit, which is .. the unknown vs undef state.

IE we only mark the operand undef when we visit it from a reachable edge.

Another way of stating my suggestion is that, if you agree that this
is a correct lattice (different from Davide's proposal) and pretend
the "spontaneous undef decay" problem does not exist, then:

digraph G {
  Unknown -> Undef
  Undef -> Constant1
  Undef -> Constant2
  Undef -> Constant3
  Constant1 -> Bottom
  Constant2 -> Bottom
  Constant3-> Bottom
}

then it should be legal / correct to first drop every lattice element
from "Unknown" to "Undef" before running the algorithm. The only
cases where this would give us a conservative result is a place where
we'd have otherwise gotten "Unknown" for a used value; and the
algorithm should never have terminated in such a state.

It is definitely *legal* to do this or the algorithm is broken otherwise
(you should just end up in overdefined more).

I'm not sure i believe the last part though. I have to think about it.

Thanks for the comments Sanjoy, I'll have to think about it further.
My problem with this is when you put undef into the equation the
original Zadeck's algorithm becomes not entirely obvious. I mean, I
originally thought it was only my problem, but we're here discussing
(and have discussed this already in the past, so it seems it's not).

FWIW: My initial proposal to all of this was "stop caring about trying to
optimize undef here at all". Remove it from the lattice, and treat undef as
overdefined if we see it. If we want to optimize undef, before SCCP solving,
build SCCs of the parts of the SSA graph that contain undef, choose correct
values for those subgraphs, then mark them constant *before* the solver sees
it".

Although originally I wasn't, I'm increasingly convinced this is the
right path forward (at least in the beginning), i.e. strip undef
handling entirely. I tried to integrate undef in the solver to see how
that worked and it seems the proposed lattice has still some issues.
We have hard time reasoning about simple things like what's the
lattice structure and what should be the invariants we
maintain/enforce at the end of each run (something that's very nice
about the original algorithm is that you know at the end of the
algorithm nothing should have value TOP otherwise you forgot to lower
something/have a bug, but we can't currently assert this because of
`undef`).
I'm also becoming increasingly convinced that this problem is
something we (LLVM) created.
If optimizing `undef` (which is sometimes if not often symptom of
undefined behaviour/bugs in your program) should come at the expense
of correctness or complicated logic/potentially unsound algorithms I
don't think we should go that way.
What Danny proposed I think it's very appealing but I'm taking a more
extreme position here saying that we may consider not doing that in
the beginning. Today I ran SCCP over a bunch of codebases (completely
removing the undef optimization) and I found it never actually
resulting in a runtime performance hit. Of course this is not
representative at all, but still a datapoint.
I think we should take this opportunity as a way to see if we can make
things simpler/easier to consider correct instead of adding cases for
problems we discover.

This is pre-solve instead of post-solve or iterative solve. We already do
this for arguments by solving them to overdefined. IPCP also works much the
same way by pre-solving arguments constants

This is precisely because this stuff is non-trivial to reason about, and
unclear to me that it's really worth it. If *it* is worth it, we can do
better than we do now (and contain the mess *entirely* to the presolver),
and if it's not, we shouldn't have complicated reasoning to try to make it
work well.

Right now we have the worst of all worlds. We have a mess in both places
(the resolver and the solver both step on each other and differ in
behavior), it's hard to reason about (though provably not correct), and it
doesn't even give you the best answers it can *anyway*

(generally, at the point where we have a bunch of very smart people racking
their heads about whether something is really going to work algorithmically
or not, i take it as a sign that maybe we have the wrong path. Obviously,
there are some really complex problems, but ... this should not be one of
them :P)

+1 on everything.

Hi Davide,

Although originally I wasn't, I'm increasingly convinced this is the
right path forward (at least in the beginning), i.e. strip undef
handling entirely. I tried to integrate undef in the solver to see how
that worked and it seems the proposed lattice has still some issues.

By "integrate undef" do you mean you implemented the lattice you
mentioned in the original post? If so, can you mention what issues
you ran into?

We have hard time reasoning about simple things like what's the
lattice structure and what should be the invariants we
maintain/enforce at the end of each run (something that's very nice
about the original algorithm is that you know at the end of the
algorithm nothing should have value TOP otherwise you forgot to lower
something/have a bug, but we can't currently assert this because of
`undef`).

That is a good point. Can we get the same kind of checking by keeping
track of if we've visited every non-dead instruction in the function
at least once?

I'm also becoming increasingly convinced that this problem is
something we (LLVM) created.
If optimizing `undef` (which is sometimes if not often symptom of
undefined behaviour/bugs in your program) should come at the expense
of correctness or complicated logic/potentially unsound algorithms I
don't think we should go that way.

I agree that undef has soundness issues, but I don't think those are
relevant here.

And with respect to `undef` and its implications on the correctness of
the source program, there are two factors here:

Using `undef` in a side effect (e.g. printf(undef)) is usually a sign
of a buggy source program, especially if the source language is C or
C++.

The *presence* of `undef` is fine, and does not imply that the source
program is incorrect.

What Danny proposed I think it's very appealing but I'm taking a more
extreme position here saying that we may consider not doing that in
the beginning. Today I ran SCCP over a bunch of codebases (completely
removing the undef optimization) and I found it never actually
resulting in a runtime performance hit. Of course this is not
representative at all, but still a datapoint.

What is the contribution of SCCP itself to the (I presume internal)
benchmarks? If you disable SCCP completely, how much performance do
you lose?

I think we should take this opportunity as a way to see if we can make
things simpler/easier to consider correct instead of adding cases for
problems we discover.

In theory I'm completely fine with not handling undef at all
(i.e. your proposal) as a starting point. If we see a need for it, we
can always add it back later.

In practice, this is the kind of thing that tends to get added back
poorly (because someone's benchmark will be faster by 1% with a "small
fix" to SCCP). Given that we're taking a step back and looking at the
bigger picture now anyway, this could be an opportune moment to fix
the underlying issue with regards to undef.

Having said that, since you're doing the work I'm more than happy to
let you make the call on this one. I have the armchair, but you have
the keyboard. :slight_smile:

-- Sanjoy

PS:

I still don't have an answer to the very first question I asked:

>> Looking at the original bug, it seems like a straightforward
>> undef-propagation bug to me -- SCCP was folding "or undef, constant"
>> to "undef", which is wrong.  Why is changing that not the fix?  That
>> is, some variant of
>
> You would still need to fix the iteration order of the resolver, or it will
> make more wrong decisions.
> As Davide discovered, there are bugs open with the same cause.

Davide -- can you point me to those?

Hi Davide,

Although originally I wasn't, I'm increasingly convinced this is the
right path forward (at least in the beginning), i.e. strip undef
handling entirely. I tried to integrate undef in the solver to see how
that worked and it seems the proposed lattice has still some issues.

By "integrate undef" do you mean you implemented the lattice you
mentioned in the original post? If so, can you mention what issues
you ran into?

Nothing in the testcases I used, but still you pointed out a case
where the lattice I proposed in the original mail wasn't necessarily
correct.

We have hard time reasoning about simple things like what's the
lattice structure and what should be the invariants we
maintain/enforce at the end of each run (something that's very nice
about the original algorithm is that you know at the end of the
algorithm nothing should have value TOP otherwise you forgot to lower
something/have a bug, but we can't currently assert this because of
`undef`).

That is a good point. Can we get the same kind of checking by keeping
track of if we've visited every non-dead instruction in the function
at least once?

I think that could work.

I'm also becoming increasingly convinced that this problem is
something we (LLVM) created.
If optimizing `undef` (which is sometimes if not often symptom of
undefined behaviour/bugs in your program) should come at the expense
of correctness or complicated logic/potentially unsound algorithms I
don't think we should go that way.

I agree that undef has soundness issues, but I don't think those are
relevant here.

And with respect to `undef` and its implications on the correctness of
the source program, there are two factors here:

Using `undef` in a side effect (e.g. printf(undef)) is usually a sign
of a buggy source program, especially if the source language is C or
C++.

The *presence* of `undef` is fine, and does not imply that the source
program is incorrect.

Thanks for the clarification. That's why I said sometimes if not often
(not always), anyway, I get your point =)

What Danny proposed I think it's very appealing but I'm taking a more
extreme position here saying that we may consider not doing that in
the beginning. Today I ran SCCP over a bunch of codebases (completely
removing the undef optimization) and I found it never actually
resulting in a runtime performance hit. Of course this is not
representative at all, but still a datapoint.

What is the contribution of SCCP itself to the (I presume internal)
benchmarks? If you disable SCCP completely, how much performance do
you lose?

~1.5% runtime on a game scene (I mean, disabling both SCCP/IPSCCP,
this is LTO, FWIW).
Note: I wish I could share numbers, but I'm not allowed.

I think we should take this opportunity as a way to see if we can make
things simpler/easier to consider correct instead of adding cases for
problems we discover.

In theory I'm completely fine with not handling undef at all
(i.e. your proposal) as a starting point. If we see a need for it, we
can always add it back later.

I have personally no rush to get this code in, so, while we're here,
we can just bite the bullet and fix this entirely.
Side note: I originally wanted to fix this for 4.0 so that I can avoid
maintaining a patch downstream. Turns out that the burden of keeping a
patch downstream is not terribly high, so this can probably wait (and
get more careful thoughts).
Side note 2: I think that even if we want to disable undef handling
that needs to be sustained with a set of benchmarks showing up we
don't lose too much. My original testing was, of course,
non-representative of the entire world (I just reported as a
data-point).

In practice, this is the kind of thing that tends to get added back
poorly (because someone's benchmark will be faster by 1% with a "small
fix" to SCCP). Given that we're taking a step back and looking at the
bigger picture now anyway, this could be an opportune moment to fix
the underlying issue with regards to undef.

Having said that, since you're doing the work I'm more than happy to
let you make the call on this one. I have the armchair, but you have
the keyboard. :slight_smile:

Your comments are of course very welcome (that goes without saying).
Are you happy with me experimenting with something similar to what
Danny proposed (a pre-solver computing the SCCs on the SSA graph?). At
this point this is my favourite solution because we can stick with the
default algorithm (which will keep me happier as somebody else did the
hard work of proving correct on my behalf).

-- Sanjoy

PS:

I still don't have an answer to the very first question I asked:

>> Looking at the original bug, it seems like a straightforward
>> undef-propagation bug to me -- SCCP was folding "or undef, constant"
>> to "undef", which is wrong.  Why is changing that not the fix?  That
>> is, some variant of
>
> You would still need to fix the iteration order of the resolver, or it will
> make more wrong decisions.
> As Davide discovered, there are bugs open with the same cause.

Davide -- can you point me to those?

Sorry, I missed this one :frowning:
I have another case (not reduced) where this falls apart. I think it's
the same issue as I locally patched with something very similar to
what you proposed in your original mail, so I'm tempted to claim it's
the same issue (or a slight modification of it).
That said, sure, I think we can probably get the patch you proposed
originally in and call it a night. But I'm still very nervous that
we're not sure of the correctness of the algorithm and I'm very afraid
it may fall apart in the future again.

Hi Davide,

Your comments are of course very welcome (that goes without saying).
Are you happy with me experimenting with something similar to what
Danny proposed (a pre-solver computing the SCCs on the SSA graph?).

SGTM!

At
this point this is my favourite solution because we can stick with the
default algorithm (which will keep me happier as somebody else did the
hard work of proving correct on my behalf).

Sounds good!

-- Sanjoy

Sending a real (updated) proposal to make sure we're all on the same
page (and the idea makes sense), in case somebody not following the
earlier discussion wants to comment. I haven't checked if LLVM has
functions for solving the individual steps (feel free to suggest).

The algorithm (if I understood the idea correctly) is more or less the
following:
1) Build the def-use chain graph.
2) Compute the set of SCCs of the graph
3) Compute RPO of the DAG formed at 2) and visit the nodes(SCCs)
according to that order
Within each SCC sort topologically and use RPO within the SCC (as a
worklist) until a fixed-point is reached (as we do here
https://github.com/dcci/llvm/blob/master/lib/Transforms/Scalar/ConstantProp.cpp#L78
just with a different visitation order). If something is found to be
folded to `undef`, we propagate the information through the DAG.

I think this should work for SCCP, I'm trying to convince myself how
this generalizes to IPSCCP. After the pre-solver run there shouldn't
be `undef` values which can be propagated anymore and we can run the
solver on a lattice which doesn't know (and shouldn't know) about
undef.

Any comment/correction is appreciated, that goes without saying =)

P.S. (mainly curiosity). I wasn't able to find anything in literature
describing an algorithm using this approach for CP (although I haven't
tried really hard). Nielson's book describes the approach of using
SCCs for a more intelligent worklist ordering
http://www.imm.dtu.dk/~hrni/PPA/slides6.pdf for constraint solvers in
general and http://homepages.dcc.ufmg.br/~fernando/publications/papers/CGO13_raphael.pdf
applies this to range analysis (maybe SCCP can be somehow seen as a
special case/simpler form of)