BasicAA unable to analyze recursive PHI nodes

Hi all,

I came across the following limitation in our BasicAliasAnalysis. This
happens with the following IR pattern:

  %x = phi [ %incptr, ... ] [ %var, ... ]
  %incptr = getelementptr %x, 1

We will basically always return MayAlias for %x and any other value
because aliasPHI recurses on the first value and gives up.

There are, however, many cases where this is too conservative.

Take the following example:

typedef struct {
  unsigned num_words;
  unsigned word_ofs;
  const unsigned *data;
} section_t;

void test(section_t* restrict section, unsigned* restrict dst)
{
  while (section->data != NULL) {
    const unsigned *src = section->data;
    for (unsigned i=0; i < section->num_words; ++i) {
      dst[section->word_ofs + i] = src[i];
    }

    ++section;
  }
}

There is no need to reload section->word_ofs on every iteration of the
for loop, but due to the limitation described above, we're unable to
tell that section doesn't alias with dst inside the loop.

I have a fix, but would like to discuss it here first.

In aliasPHI, we can detect incoming values that are recursive GEPs with
a constant offset. Instead of treating them as an ordinary incoming
value, I instead create a temporary GEP expression with an undef offset
for every other (non-recursive) incoming value and then let aliasPHI do
its usual analysis for each of those.

In the example above,

  %x = phi [ %incptr, ... ] [ %var, ... ]
  %incptr = getelementptr %x, 1

the incoming values of %x would become, for the purpose of analysis in
aliasPHI:

   getelementptr %var, undef
and
   %var

I believe the undef GEP correctly represents the various offsets by
which %x advances across all iterations.

I'm not seeing any regressions on our test suites with this solution
(and some good performance improvements).

Any thoughts? Can anyone think of a better solution? I'm happy to post
the patch if this is OK.

Tobias

Patch now posted as http://reviews.llvm.org/D10368

Tobias

What is the compile time impact? Basic AA is supposed to be basic, not catch every case

Sorry for missing this the first time around, but I think -scev-aa
does what you want. I've commented on the phabricator review.

I'm not seeing a noticeable impact, apart from the fact that you'll
see different behavior further down the pipeline due to better alias
information (it's quite a common pattern).

The typical case is the one shown below - a PHI node with two
incoming values. The first value (the recursive GEP) goes away. In its
place, a temporary GEP gets created. So the number of incoming values
to analyze is the same. The only additional overhead is the creation
and destruction of the temporary instruction.

Tobias

It probably will, but it isn't on by default (right?) and has quite a
lot more overhead. I think this is a common enough case for it to be
handled by BasicAA.

It probably will, but it isn't on by default (right?)

True, but so what?

and has quite a
lot more overhead.

I didn't see numbers one way or the other. It probably has some
overhead, if it's the right solution, i'm sure it can be ade faster.

I think this is a common enough case for it to be
handled by BasicAA.

Can you provide some evidence?

IE hard numbers of how often it occurs on some set of benchmarks or
something, and what the compile time cost is?

It's really hard to say whether this is the right tradeoff without any data.

Hi Daniel,

I didn't see numbers one way or the other. It probably has some
overhead, if it's the right solution, i'm sure it can be ade faster.

As far as I remember, scev-aa is off because of limitations in the
current pass manager that would mean it would have to be recalculated
far too often.

> I think this is a common enough case for it to be
> handled by BasicAA.
>

Can you provide some evidence?

IE hard numbers of how often it occurs on some set of benchmarks or
something, and what the compile time cost is?

It's really hard to say whether this is the right tradeoff without any data.

I agree. We're seeing about 3% improvement on a number of internal
benchmarks, but I'll give this a try on SPEC etc. and share the results
here (including compile time).

Thanks for your comments!

Tobias

From: "Tobias Edler von Koch" <tobias@codeaurora.org>
To: "Daniel Berlin" <dberlin@dberlin.org>
Cc: "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>
Sent: Thursday, June 11, 2015 10:02:37 AM
Subject: Re: [LLVMdev] BasicAA unable to analyze recursive PHI nodes

Hi Daniel,

>
> I didn't see numbers one way or the other. It probably has some
> overhead, if it's the right solution, i'm sure it can be ade
> faster.

As far as I remember, scev-aa is off because of limitations in the
current pass manager that would mean it would have to be recalculated
far too often.

That's correct. It also may need some caching, or other performance-related work, to be sufficiently efficient to be on by default.

>
> > I think this is a common enough case for it to be
> > handled by BasicAA.
> >
>
> Can you provide some evidence?
>
> IE hard numbers of how often it occurs on some set of benchmarks or
> something, and what the compile time cost is?
>
> It's really hard to say whether this is the right tradeoff without
> any data.

I agree. We're seeing about 3% improvement on a number of internal
benchmarks, but I'll give this a try on SPEC etc. and share the
results
here (including compile time).

Sounds good; please do.

-Hal

Hi Daniel, Hal,

I've evaluated the patch on the SPEC CPU2006 C/C++ benchmarks. Here are
the results.

1. How many aliasPHI queries on recursive PHI nodes are there and what
are their results?

NoAlias 81317/10.6% (before) 118549/18.1% (after)
MayAlias 685637/89.4% (before) 534856/81.4% (after)
PartialAlias 0/ 0.0% (before) 3325/ 0.5% (after)
Total 766954 (before) 656730 (after)

As you can see, the share of NoAlias results for these queries goes up
noticeably. What's also interesting is that there is a 14.4% reduction
in total queries on recursive PHI nodes with the patch.

2. What is the average number of incoming values on recursive PHI nodes?

There were 2 incoming values (one recursive, one 'real' value) in all
but 102 (0.013%) of aliasPHI queries on recursive PHI nodes before the
patch. With the patch, there were 103 queries with more than 2 incoming
values. So two incoming values is indeed the common case.

3. What is the compile time impact of the change?

I ran the SPEC2006 build 5 times, discarding the numbers for the first
build to warm up the fs cache. Without the patch, the average build
time over the remaining four builds is 5023.913s. With the patch, it is
5023.713s. Pretty much unchanged.

4. What is the impact of the change on the execution time of the
benchmarks?

This is with the ref data set, average of three runs.

(Speedup, higher is better)
400.perlbench 0.991404586
401.bzip2 0.999081718
403.gcc 1.006213196
429.mcf 0.990988836
445.gobmk 0.994521527
456.hmmer 0.996110677
458.sjeng 0.993703064
462.libquantum 1.014540594
464.h264ref 0.996489267
471.omnetpp 0.986799815
473.astar 0.987321483
483.xalancbmk 0.985122405
433.milc 1.023695308
444.namd 0.998587195
447.dealII 0.999585507
450.soplex 1.001369759
453.povray 0.978606911
470.lbm 1.018186381
482.sphinx3 0.998224144
Average 0.997923809

Not a huge change here although the numbers tend to be more on the side
of a slowdown. This is interesting - you'd think that better alias
analysis shouldn't have that effect. Any idea what could be causing
this? The numbers are from a real machine though so I'd expect
something like 1-2% noise, so it could be just that.

I hope this answers some of your questions!

Tobias

Not a huge change here although the numbers tend to be more on the side
of a slowdown. This is interesting - you'd think that better alias
analysis shouldn't have that effect. Any idea what could be causing
this?

LLVM's alias analysis is good but not amazing.
If you improve it significantly (as you appear to have :P), you give
the compiler more freedom to do things.
What it chooses to do with that freedom may not always be good :slight_smile:

Basically, with any significant alias analysis change comes a lot of
tuning of heuristics/etc, or looking through regressions, to determine
what's going on now. *Usually* it's "something used to not be able to
do a ton of <whatever>, and now it can. The cost model for <whatever>
needs to be worked on so it doesn't go too crazy".

From: "Daniel Berlin" <dberlin@dberlin.org>
To: "Tobias Edler von Koch" <tobias@codeaurora.org>
Cc: reviews+D10368+public+78f7503e373eac35@reviews.llvm.org, "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>,
"Hal Finkel" <hfinkel@anl.gov>
Sent: Saturday, June 20, 2015 11:55:55 AM
Subject: Re: [LLVMdev] BasicAA unable to analyze recursive PHI nodes

> Not a huge change here although the numbers tend to be more on the
> side
> of a slowdown. This is interesting - you'd think that better alias
> analysis shouldn't have that effect. Any idea what could be causing
> this?

LLVM's alias analysis is good but not amazing.
If you improve it significantly (as you appear to have :P), you give
the compiler more freedom to do things.
What it chooses to do with that freedom may not always be good :slight_smile:

Basically, with any significant alias analysis change comes a lot of
tuning of heuristics/etc, or looking through regressions, to
determine
what's going on now. *Usually* it's "something used to not be able
to
do a ton of <whatever>, and now it can. The cost model for
<whatever>
needs to be worked on so it doesn't go too crazy".

I agree. In the mean time, let's get this committed turned off by default. Please update your patch so that the new behavior is controlled by some cl::opt that is off by default. This will give us a better way to do more-extensive benchmarking and help isolate the problems.

-Hal

Hi Hal, Daniel,

I agree. In the mean time, let's get this committed turned off by default. Please update your patch so that the new behavior is controlled by some cl::opt that is off by default. This will give us a better way to do more-extensive benchmarking and help isolate the problems.

I've updated the patch and added the flag. Is this OK to go in now?
http://reviews.llvm.org/D10368

We're seeing some small degradations on Hexagon too (and that's in
simulation, so definitely not noise). At first glance it seems that the
main difference is that some loops are being replaced by memcpy/memset
calls (you'd think that's a good thing...), but I'll investigate this
further.

Tobias

Hi Hal, Daniel,

I agree. In the mean time, let's get this committed turned off by default. Please update your patch so that the new behavior is controlled by some cl::opt that is off by default. This will give us a better way to do more-extensive benchmarking and help isolate the problems.

I've updated the patch and added the flag. Is this OK to go in now?
http://reviews.llvm.org/D10368

LGTM

We're seeing some small degradations on Hexagon too (and that's in
simulation, so definitely not noise). At first glance it seems that the
main difference is that some loops are being replaced by memcpy/memset
calls (you'd think that's a good thing...), but I'll investigate this
further.

I'm not shocked, this is basically what happens with any AA changes :slight_smile:

When I improved GCC's alias analysis by ~30% (in terms of noalias
results returned), it caused about an 5% geomean performance
degradation. :slight_smile: