Dataflow analysis regression in 3.7

Hi all,
I just found an optimization regression regarding simple dataflow/constprop analysis:
https://godbolt.org/g/Uz8P7t

This code


int dataflow(int b) {
int a;

if (b==4)
a = 3*b; // fully optimized when changed to a = 3;
else
a = 5;

if (a == 4)
return 0;
else
return 1;
}

is no longer optimized to just a “return 1”. The regression happened in LLVM 3.6 → 3.7.

Is this a known regression? I couldn’t find it in the bug tracker (but don’t really know which keywords to look for…)

Kind regards,
Johan

This regressed when SimplifyCFG changed the threshold for PHI nodes
folding from 1 to 2.

(see lib/Transforms/Utils/SimplifyCFG.cpp)

+PHINodeFoldingThreshold("phi-node-folding-threshold", cl::Hidden, cl::init(2),
+ cl::desc("Control the amount of phi node folding to perform
(default = 2)"));

This could be re-evaluated, maybe, or some other transformation could
catch this case.
(e.g. a more powerful range solver could realize that a is always in
the range [5;12]).

FWIW, GCC does this.

I don't think this very case has been reported, so feel free to do it.

> Hi all,
> I just found an optimization regression regarding simple
> dataflow/constprop analysis:
> Compiler Explorer
>
> This code
> ```
> int dataflow(int b) {
> int a;
>
> if (b==4)
> a = 3*b; // fully optimized when changed to a = 3;
> else
> a = 5;
>
> if (a == 4)
> return 0;
> else
> return 1;
> }
> ```
> is no longer optimized to just a "return 1". The regression happened in
LLVM
> 3.6 --> 3.7.
>
> Is this a known regression? I couldn't find it in the bug tracker (but
don't
> really know which keywords to look for...)
>
> Kind regards,
> Johan
>

This regressed when SimplifyCFG changed the threshold for PHI nodes
folding from 1 to 2.

(see lib/Transforms/Utils/SimplifyCFG.cpp)

+PHINodeFoldingThreshold("phi-node-folding-threshold", cl::Hidden,
cl::init(2),
+ cl::desc("Control the amount of phi node folding to perform
(default = 2)"));

This could be re-evaluated, maybe, or some other transformation could
catch this case.

I'm sure that adjusting that threshold has many implications, so I don't
want to touch that.

(e.g. a more powerful range solver could realize that a is always in
the range [5;12]).

I seem to have been very "lucky" in finding the non-optimized case. I get
fully optimized output when replacing "3*b" with "2*b" or "4*b" or
"b+b+b+b" or "3*b-b"... But for "3*b", "b+b+b", and "3*b+6" things go wrong.
Is there another optimization pass interfering?

regards,
  Johan

I'm sure that adjusting that threshold has many implications, so I don't
want to touch that.

We shouldn't, unless there's a very good reason for.
Funny enough, this was fixed just yesterday by Chad's commit
https://reviews.llvm.org/D34901 so the function now just gets folded
to `ret i32 1` (as you expected).

(e.g. a more powerful range solver could realize that a is always in
the range [5;12]).

As I pointed out, this could've been caught by a more powerful range
analysis (before SimplifyCFG "destroys" the CFG). There's a balance to
keep between how aggressive SimplifyCFG gets VS what other passes
should/could do.

Without the last commit you have, after SimplifyCFG:

define i32 @dataflow(i32 %b) local_unnamed_addr #0 {
entry:
  %cmp = icmp eq i32 %b, 4
  %mul = mul nsw i32 %b, 3
  %phitmp = icmp eq i32 %mul, 4
  %a.0 = select i1 %cmp, i1 %phitmp, i1 false
  %. = select i1 %a.0, i32 0, i32 1
  ret i32 %.
}

but just before:

define i32 @dataflow(i32 %b) local_unnamed_addr #0 {
entry:
  %cmp = icmp eq i32 %b, 4
  br i1 %cmp, label %if.then, label %if.else

if.then: ; preds = %entry
  %vSSA_sigma = phi i32 [ %b, %entry ]
  %mul = mul nsw i32 3, %vSSA_sigma
  br label %if.end

if.else: ; preds = %entry
  br label %if.end

if.end: ; preds = %if.else, %if.then
  %a.0 = phi i32 [ %mul, %if.then ], [ 5, %if.else ]
  %cmp1 = icmp eq i32 %a.0, 4
  br i1 %cmp1, label %if.then2, label %if.else3

if.then2: ; preds = %if.end
  br label %cleanup

if.else3: ; preds = %if.end
  br label %cleanup

cleanup: ; preds = %if.else3, %if.then2
  %retval.0 = phi i32 [ 0, %if.then2 ], [ 1, %if.else3 ]
  ret i32 %retval.0
}

On the latter, you can find (after transforming to e-SSA) that:

Analysis for function: dataflow
[4, 4] %vSSA_sigma = phi i32 [ %b, %entry ]
[12, 12] %mul = mul nsw i32 3, %vSSA_sigma
[5, 12] %a.0 = phi i32 [ %mul, %if.then ], [ 5, %if.else ]
[0, 1] %retval.0 = phi i32 [ 0, %if.then2 ], [ 1, %if.else3 ]

Note that our VRP algorithm (correlated-propagation) currently doesn't
catch that, but a more powerful analysis does (and it's faster on
large testcases & has an interprocedural version, the code is on
github).
http://homepages.dcc.ufmg.br/~raphael/papers/CGO13.pdf [and the predecessor
paper from Zhendong Su, circa 2004 IIRC, I don't have a link handy sorry :)]

[I just put a 10 lines intraprocedural/interprocedural client on top
of it that prints the found ranges
https://github.com/dcci/range-analysis/commit/f9677dfd032f811ed545977690f8d7aead7cccaa\]

Thanks,

NewGVN should be able to do this at this point, but it's really a VRP
issue.

David/Johan,

I would love to claim victory, but I don't think that D34901 catches this case.

However, I got interested and threw this together quickly: ⚙ D35140 [WIP][InstCombine] See if we can determine the value of a use based on a dominating condition.

This does catch the below case. If people are interested I can add test cases and submit for formal review. FWIW, it does hit about 1/3 of all of the SPEC benchmarks. I haven't done any performance analysis to see if it matters, however.

   Chad

David/Johan,

I would love to claim victory, but I don't think that D34901 catches this
case.

Hi Chad, thanks for taking another look at this.
Maybe I didn't bisect correctly. Apologies. Anyway, more fun for us.

However, I got interested and threw this together quickly:
⚙ D35140 [WIP][InstCombine] See if we can determine the value of a use based on a dominating condition.

This does catch the below case. If people are interested I can add test
cases and submit for formal review. FWIW, it does hit about 1/3 of all of
the SPEC benchmarks. I haven't done any performance analysis to see if it
matters, however.

As I stated earlier, I think this calls for a slightly better VRP
[and/or improvements in VRP] & I'm not entirely sure `InstCombine` is
the best place for this transform.
OTOH, the fact this triggers a lot is a good thing, if your WIP patch
actually results in performance improvements, we should consider a
more general solution to address this kind of issues.

Hi Chad and Davide,
Thanks for working on this.
I don’t know enough about LLVM internals (yet?) to contribute here, but it’s nice to “listen-in” on the discussion and watch the progress :slight_smile:

  • Johan

David/Johan,

I would love to claim victory, but I don't think that D34901 catches this
case.

Hi Chad, thanks for taking another look at this.
Maybe I didn't bisect correctly. Apologies. Anyway, more fun for us.

Yes, indeed. More fun!

However, I got interested and threw this together quickly:
⚙ D35140 [WIP][InstCombine] See if we can determine the value of a use based on a dominating condition.

This does catch the below case. If people are interested I can add test
cases and submit for formal review. FWIW, it does hit about 1/3 of all of
the SPEC benchmarks. I haven't done any performance analysis to see if it
matters, however.

As I stated earlier, I think this calls for a slightly better VRP
[and/or improvements in VRP] & I'm not entirely sure `InstCombine` is
the best place for this transform.

I tend to agree that InstCombine doesn't feel like the right places for this transformation, hence the reason I didn't post for formal review.

OTOH, the fact this triggers a lot is a good thing, if your WIP patch
actually results in performance improvements, we should consider a
more general solution to address this kind of issues.

Matt Simpson and I briefly discussed this transformation. One of his suggestions was to add a pass in the pipeline where the dominator tree was available (note my patch used a poor man's version of domination) and to add range meta-data to values (or replace values if we know the exact value) based on dominating conditions. I thought it was pretty interesting idea, but I'm not very familiar with how range metadata is generated and used.

Matt Simpson and I briefly discussed this transformation. One of his

suggestions was to add a pass in the pipeline where the dominator tree was
available (note my patch used a poor man's version of domination) and to
add range meta-data to values (or replace values if we know the exact
value) based on dominating conditions.

This is a pretty trivial variant of what the predicateinfo utility does :slight_smile:

(it just happens to process branches, assumes, etc. But you could trivially
modify it to change the name anywhere the range is different, to ensure the
invariant that anything with the same ssa name has the same range)

I thought it was pretty interesting idea, but I'm not very familiar with

how range metadata is generated and used.

GCC pretty much does what i said above:
It generates assert_expr's, which rename values, where the ranges change
(this is equivalent to what predicateinfo does), then solves a lattice over
the resulting IR.

We have a related bug open, you opened, it seems :slight_smile:
https://bugs.llvm.org/show_bug.cgi?id=31895

From what I can tell the difference is that gcc solves the problem eagerly (and not lazily). Actually, they catch this case as part of VRP, which is run not very early in the pipeline but still the cfg is in a shape decent enough to get this case right (maybe we could consider reordering passes, instead).

Maybe some of us who are interested in LVI can get together in October? Doesn't seem like broad enough interest for a BoF but perhaps over a beer or during a coffee break?

John

Sounds great to me. Over a beer sounds more appealing as I won’t attend the meeting (but I’ll be in the Bay Area anyway).