GVN pass: does global value numbering remove duplicate computations in loops?

Hello,
I was hoping to get some clarification on the aim of the GVN pass. I have been reading a bit about global value numbering, and from what I understand, there are polynomial time algorithms for removing duplicate computations from loops[1].

I have an example program[2] which computes the sum of an array twice, in two separate accumulators.
Here, sum0 and sum1 are both sums of the array A.

void b01_sums(int size, int* A)

{
int sum0 = 0;
int sum1 = 0;
for (int i = 0; i != size; ++i)
{
sum0 = sum0 + A[i];
sum1 = sum1 + A[i];
printf(“sum0: %d\nsum1: %d\n”, sum0, sum1);
}
}

I would have expected global value numbering to see that sum0 and sum1 are initialised and updated with the same value, and so merge them into a single accumulator. However, the assembly output still contains both accumulators:

$ gcc sums.c -c -O3 -save-temps

. . .

A[i] (only once)

movl (%r15), %eax

add sum0 and sum1

addl %eax, %ebx
addl %eax, %r13d

I have tried a few variations on the C code: pulling out the array reads into a single binding makes no difference, while moving the printf outside of the loop allows the loop to be vectorised, but there are still two accumulators.
I am wondering whether I am missing some particular invocation (perhaps to enable fixpoints over loops in the GVN analysis?), or otherwise, if there is some reason that means this sort of analysis is impractical for real programs.

I am using the Apple clang 6, but also tried with llvm opt 3.6:

$ gcc --version
Configured with: --prefix=/Library/Developer/CommandLineTools/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 6.0 (clang-600.0.57) (based on LLVM 3.5svn)
Target: x86_64-apple-darwin13.4.0
Thread model: posix

$ /usr/local/opt/llvm/bin/opt --version
LLVM (http://llvm.org/):
LLVM version 3.6.2
Optimized build.
Built Apr 10 2016 (02:50:09).
Default target: x86_64-apple-darwin13.4.0
Host CPU: core-avx2

Thanks,
Amos Robinson

[1]: Gulwani & Necula: A Polynomial-Time Algorithm for Global Value Numbering http://www.eecs.berkeley.edu/~necula/Papers/gvndet-journal06.pdf

[2]: Example code and assembly output https://gist.github.com/amosr/06938ee5becaf5249de5634499bc1add

Hello,
I was hoping to get some clarification on the aim of the GVN pass. I have
been reading a bit about global value numbering, and from what I
understand, there are polynomial time algorithms for removing duplicate
computations from loops[1].

Yes

The current GVN will not do it.

I have an example program[2] which computes the sum of an array twice, in
two separate accumulators.
Here, sum0 and sum1 are both sums of the array A.

void b01_sums(int size, int* A)
{
  int sum0 = 0;
  int sum1 = 0;
  for (int i = 0; i != size; ++i)
  {
    sum0 = sum0 + A[i];
    sum1 = sum1 + A[i];
    printf("sum0: %d\nsum1: %d\n", sum0, sum1);
  }
}

I would have expected global value numbering to see that sum0 and sum1 are
initialised and updated with the same value, and so merge them into a
single accumulator.

The current GVN uses a very weak algorithm. All phi nodes are given new
values, so in the above it will never discover the in-loop equivalences.
The GVN on the newgvn branch i have will remove these, and is more
complicated
Note that we don't do full-on polynomial time equivalence finding. While
it would be fun to play with such algorithms, they are often not practical.

The one i saw some hope for was

I haven't had a chance to play with it.

From: "Daniel Berlin via llvm-dev" <llvm-dev@lists.llvm.org>
To: "Amos Robinson" <amos.robinson@gmail.com>
Cc: "llvm-dev" <llvm-dev@lists.llvm.org>
Sent: Tuesday, May 3, 2016 7:39:54 PM
Subject: Re: [llvm-dev] GVN pass: does global value numbering remove
duplicate computations in loops?

> Hello,

> I was hoping to get some clarification on the aim of the GVN pass.
> I
> have been reading a bit about global value numbering, and from what
> I understand, there are polynomial time algorithms for removing
> duplicate computations from loops[1].

Yes

The current GVN will not do it.

> I have an example program[2] which computes the sum of an array
> twice, in two separate accumulators.

> Here, sum0 and sum1 are both sums of the array A.

> void b01_sums(int size, int* A)

> {

> int sum0 = 0;

> int sum1 = 0;

> for (int i = 0; i != size; ++i)

> {

> sum0 = sum0 + A[i];

> sum1 = sum1 + A[i];

> printf("sum0: %d\nsum1: %d\n", sum0, sum1);

> }

> }

> I would have expected global value numbering to see that sum0 and
> sum1 are initialised and updated with the same value, and so merge
> them into a single accumulator.

The current GVN uses a very weak algorithm. All phi nodes are given
new values, so in the above it will never discover the in-loop
equivalences.
The GVN on the newgvn branch i have will remove these, and is more
complicated
Note that we don't do full-on polynomial time equivalence finding.
While it would be fun to play with such algorithms, they are often
not practical.

The one i saw some hope for was
An Efficient SSA-Based Algorithm for Complete Global Value Numbering | SpringerLink
I haven't had a chance to play with it.

If I recall correctly, AWZ will get this too (https://courses.cs.washington.edu/courses/cse501/04wi/papers/alpern-popl88.pdf). AWZ is a Hopcroft-partitioning-based algorithm, and Hopcroft partitioning is O(n*log(n)).

-Hal

If I recall correctly, AWZ will get this too (
https://courses.cs.washington.edu/courses/cse501/04wi/papers/alpern-popl88.pdf).
AWZ is a Hopcroft-partitioning-based algorithm, and Hopcroft partitioning
is O(n*log(n)).

Yes, AWZ will get some, and the hash based ones will get some different

ones.

The one i have implemented unifies AWZ and hash based and will also do
predication/value inference.

The GVN on the newgvn branch i have will remove these, and is more complicated
The one i have implemented unifies AWZ and hash based and will also do predication/value inference.

This is exciting news. It sounds like it will find a lot of the interesting cases.

Note that we don’t do full-on polynomial time equivalence finding. While it would be fun to play with such algorithms, they are often not practical.

Yes, fair enough. I thought that might be the case. A lot of the examples in the papers seem a bit contrived anyway, so it would be interesting to know just how many “real” opportunities are being missed.

The one i saw some hope for was http://link.springer.com/chapter/10.1007%2F978-3-540-76637-7_22
I haven’t had a chance to play with it.

This looks very promising, thank you.

As a general rule of thumb, if it only removes scalar code, it’s pretty useless.
If it helps it prove it can remove memory operations, it’s great.

Often, in your example, it’s accesses that are loop invariant but common to other accesses in the loop that are most promising.

IE (taken from a test i wrote for GCC’s VN)

int vnum_test8(int *data)
{
int i;
int stop = data[3];
int m = data[4];
int n = m;
int p;
for (i=0; i<stop; i++) {
int k = data[2];
data[k] = 2;
data[0] = m - n;
k = data[1];
m = m + k;
n = n + k;
p = data[0];
}
return p;
}

We test that we eliminate m-n in favor of 0, replace n with m, and set p to 0

Note that this requires proving a bunch of things about the memory accesses here, and properly value numbering memory accesses (which current GVN will not do).

As a general rule of thumb, if it only removes scalar code, it’s pretty useless.
If it helps it prove it can remove memory operations, it’s great.

Perhaps I am misunderstanding, but I agree my example wasn’t very exciting as-is but I imagine you could replace addition with some reasonably expensive function call and it would be a bit more compelling.
However, I am coming at this from a much higher level of abstraction: I am working on a query compiler that takes a whole bunch of queries and fuses them together into one. Removing duplicates in this case is also much simpler than general GVN, and I’m just trying to get a good story about the relationship.

Often, in your example, it’s accesses that are loop invariant but common to other accesses in the loop that are most promising.

IE (taken from a test i wrote for GCC’s VN)

int vnum_test8(int *data)
{
int i;
int stop = data[3];
int m = data[4];
int n = m;
int p;
for (i=0; i<stop; i++) {
int k = data[2];
data[k] = 2;
data[0] = m - n;
k = data[1];
m = m + k;
n = n + k;
p = data[0];
}
return p;
}

We test that we eliminate m-n in favor of 0, replace n with m, and set p to 0

Note that this requires proving a bunch of things about the memory accesses here, and properly value numbering memory accesses (which current GVN will not do).

Do you mean, for example, proving that accessing data[k] does not segfault here, or aliasing things?

As a general rule of thumb, if it only removes scalar code, it's pretty
useless.
If it helps it prove it can remove memory operations, it's great.

Perhaps I am misunderstanding, but I agree my example wasn't very exciting
as-is but I imagine you could replace addition with some reasonably
expensive function call and it would be a bit more compelling.

What i've given you is a broad generalization that comes down to "on modern
processors you often have to remove 20-100x more arithmetic to match the
improvement from removing a single load or other memory op".

They often can issue multiple arithmetic ops per cycle, and get the results
from all of them in a single cycle.

Are there exceptions to my generalization? Sure. If you call a function 10
billion times, and you remove a single cycle from it, ...

However, I am coming at this from a much higher level of abstraction: I am
working on a query compiler that takes a whole bunch of queries and fuses
them together into one. Removing duplicates in this case is also much
simpler than general GVN, and I'm just trying to get a good story about the
relationship.

GVN would work well for this, but depending on your situation, it may also
be massive overkill :wink:

Often, in your example, it's accesses that are loop invariant but common

to other accesses in the loop that are most promising.

IE (taken from a test i wrote for GCC's VN)
int vnum_test8(int *data)
{
  int i;
  int stop = data[3];
  int m = data[4];
  int n = m;
  int p;
  for (i=0; i<stop; i++) {
    int k = data[2];
    data[k] = 2;
    data[0] = m - n;
    k = data[1];
    m = m + k;
    n = n + k;
    p = data[0];
  }
  return p;
}

We test that we eliminate m-n in favor of 0, replace n with m, and set p
to 0

Note that this requires proving a bunch of things about the memory
accesses here, and properly value numbering memory accesses (which current
GVN will not do).

Do you mean, for example, proving that accessing data[k] does not segfault
here, or aliasing things?

It does not require any aliasing assistance, just symbolically evaluating
where these things access.