Naryreassociate vs reassociate

Whoops, forgot llvmdev

Hi Daniel,

I presume you mean, instead of assigning function arguments distinct ranks (http://llvm.org/docs/doxygen/html/Reassociate_8cpp_source.html#l00282), we should group function arguments in favor of existing pairings. You are not suggesting discarding the entire ranking system, right?

I’ll look into how that works on my benchmarks. AFAIK, we encountered some cases that seem beyond the fix you suggested. These cases involve constants, and I attached one reduced example in https://llvm.org/bugs/show_bug.cgi?id=22357.

void foo(int a, int b, int c, int *x, int *y) {
*x = (a + b);
*y = (a + 2) + b;
}

Reassociate assigns constants a lower rank than variables, which prevents Reassociate from transforming the above example to

*x = a + b;
*y = (a + b) + 2;

Jingyue

Hi Daniel,

I presume you mean, instead of assigning function arguments distinct ranks
(http://llvm.org/docs/doxygen/html/Reassociate_8cpp_source.html#l00282), we
should group function arguments in favor of existing pairings.

Existing = pairings reassociate already chose before
*not*
existing = pairings that already exist in the source IR

Given that, we should probably group everything in favor of existing
pairings when possible.

You are not
suggesting discarding the entire ranking system, right?

The only three cases that should matter hugely are constants,
arguments, and non-movable instructions.

The rest should hopefully already end up with consistent decisions.
If not, we have larger problems.

I'll look into how that works on my benchmarks. AFAIK, we encountered some
cases that seem beyond the fix you suggested. These cases involve constants,
and I attached one reduced example in
https://llvm.org/bugs/show_bug.cgi?id=22357.

void foo(int a, int b, int c, int *x, int *y) {
  *x = (a + b);
  *y = (a + 2) + b;
}

Reassociate assigns constants a lower rank than variables, which prevents
Reassociate from transforming the above example to

*x = a + b;
*y = (a + b) + 2;

This is a variant of the problem above, except you are getting ranks
0, 1, 2 vs 1, 2

It thus chooses 0,1 as one tree, and 2 as the other for the A+b+2 case.

If we made it make consistent decisions, you would get the right results.

There is an optimal order to make these decisions in, too. It
probably should be making decisions in order of smallest computation
tree to largest computation tree, because the small trees have the
least freedom. Otherwise, you have the same problem if you write this
as:
*y = (a + 2) + b;
*x = a + b;
instead :slight_smile:

(IE if you don't process smallest to largest, you will choose a+2+b,
and not be able to make a decision consistent with that when you get
to the a+b tree)

The above change should not be difficult. I'll take a crack at it
today and send you a patch.

> Hi Daniel,
>
> I presume you mean, instead of assigning function arguments distinct
ranks
> (http://llvm.org/docs/doxygen/html/Reassociate_8cpp_source.html#l00282),
we
> should group function arguments in favor of existing pairings.

Existing = pairings reassociate already chose before
*not*
existing = pairings that already exist in the source IR

Given that, we should probably group everything in favor of existing
pairings when possible.

Makes sense.

> You are not
> suggesting discarding the entire ranking system, right?

The only three cases that should matter hugely are constants,
arguments, and non-movable instructions.

The rest should hopefully already end up with consistent decisions.
If not, we have larger problems.

> I'll look into how that works on my benchmarks. AFAIK, we encountered
some
> cases that seem beyond the fix you suggested. These cases involve
constants,
> and I attached one reduced example in
> https://llvm.org/bugs/show_bug.cgi?id=22357.
>

> void foo(int a, int b, int c, int *x, int *y) {
> *x = (a + b);
> *y = (a + 2) + b;
> }
>
> Reassociate assigns constants a lower rank than variables, which prevents
> Reassociate from transforming the above example to
>
> *x = a + b;
> *y = (a + b) + 2;
>

This is a variant of the problem above, except you are getting ranks
0, 1, 2 vs 1, 2

The key difference is that constants are designed to be lowest ranked, and
then the current reassociation algorithm always groups the constant 2 with
other variables. Looks like your new solution will favor existing pairings
regardless of the ranks. Then, it should be able to solve the a+b+2 case
nicely.

It thus chooses 0,1 as one tree, and 2 as the other for the A+b+2 case.

If we made it make consistent decisions, you would get the right results.

There is an optimal order to make these decisions in, too. It
probably should be making decisions in order of smallest computation
tree to largest computation tree, because the small trees have the
least freedom. Otherwise, you have the same problem if you write this
as:
*y = (a + 2) + b;
*x = a + b;
instead :slight_smile:

(IE if you don't process smallest to largest, you will choose a+2+b,
and not be able to make a decision consistent with that when you get
to the a+b tree)

The above change should not be difficult. I'll take a crack at it
today and send you a patch.

Great! Thank you! Things will be clearer when I see your patch.

So, right now, it rewrites it in optimally bad order to have this
happen. Because it puts highest ranked values first, it is guaranteed
that it will end up putting them not next to each other.

This is because in the ops == 2 case, it places the first two into the
same expression
In the ops > 2 case, it'll place the first two into different expressions.

So if you have
a+c
and
a+b+c

You are essentially guaranteed it will never put a and c next to each
other in both cases, unless a and c are the lowest ranked elements.

For the moment, i reversed the sort order, and we reassign ranks so
that things that it has processed before should end up next to each
other.

Because the sort is stable, you are only guaranteed they will end up
in the same expression, not necessarily ordered exactly the same way
they were last time.

This is fixable, but probably not worth it.

The patch is not perfect (and it is only lightly tested), but i'd love
to see how it goes.

Some issues
1. Because i reversed the sort order, it may do worse in other cases.
I can fix the sort order reversal by changing how we assign ranks slightly.
Basically, when we look for ops to pair with, we reassign the ranks to
be greater than the maximum rank for that bb (so it doesn't conflict
with anything).

2. It will only pair it the same way as the immediately last time it
paired it. I doubt there is anything much better unless you spend the
time you do in naryreassociate.

Honestly, after staring at this long enough, i'm pretty convinced we
should take an approach similar to naryreassociate's underpinnings in
reassociate.

In particular, i think we should deliberately unbalance the tree if we
can guarantee it will expose a redundancy (IE it will be an expression
we know is computed by one of our dominators).

This logic can be incorporated pretty easily into reassociate

reassoc.diff (4.6 KB)

Hi Daniel,

I tried your patch. It indeed solves some (but not all) of the issues NaryReassociate aims to address. One thing NaryReassociate will work on but Reassociate will probably miss is to transform

p = &input[a]
q = &input[a + b]

to

p = &input[a]
q = p + b;

I don’t see an easy way to merge this into Reassociate unless we peep into GEPs. However, that doesn’t say I don’t like your approach.

That said, I would definitely like to see your patch in, and thanks for working on it. Some general comments on the implementation:

  1. Instead of hoisting the rank of last pairing, can we assign them a minimum value (e.g. -1)? This allows us to maintain the original sorting order (highest first) and at the same time group last paired values.

  2. LastPairingMap doesn’t reflect dominance. If you have
    a+c
    and
    a+b+c

but a+c doesn’t dominate a+b+c, it might not be beneficial to put a and c next to each other. I suspect a simple way of making LastPairingMap per-bb will work for most cases.

  1. FYI, Ken Kennedy et al. published a paper (http://dl.acm.org/citation.cfm?id=1454120) on a global reassociation algorithm, which shares the same spirit as your approach. Instead of considering last pairings, their algorithm collects more history and tends to group the pairs that appear most frequently. It’s much more expensive than your approach, so I don’t recommend to use their approach yet (unless there’s enough need). Just let you know some work explored this area.

Jingyue

Hi Daniel,

I tried your patch. It indeed solves some (but not all) of the issues
NaryReassociate aims to address. One thing NaryReassociate will work on but
Reassociate will probably miss is to transform

p = &input[a]
q = &input[a + b]

to

p = &input[a]
q = p + b;

I don't see an easy way to merge this into Reassociate unless we peep into
GEPs. However, that doesn't say I don't like your approach.

That said, I would definitely like to see your patch in, and thanks for
working on it. Some general comments on the implementation:

1. Instead of hoisting the rank of last pairing, can we assign them a
minimum value (e.g. -1)? This allows us to maintain the original sorting
order (highest first) and at the same time group last paired values.

We can assign the ranks in reverse order :slight_smile:

2. LastPairingMap doesn't reflect dominance. If you have
a+c
and
a+b+c
but a+c doesn't dominate a+b+c, it might not be beneficial to put a and c
next to each other. I suspect a simple way of making LastPairingMap per-bb
will work for most cases.

Or we could just store DFS pairs from the domtree and include them in
the sort order.

3. FYI, Ken Kennedy et al. published a paper
(http://dl.acm.org/citation.cfm?id=1454120) on a global reassociation
algorithm, which shares the same spirit as your approach. Instead of
considering last pairings, their algorithm collects more history and tends
to group the pairs that appear most frequently. It's much more expensive
than your approach, so I don't recommend to use their approach yet (unless
there's enough need). Just let you know some work explored this area.

Interesting.

The approach reassociation uses now is the one Briggs/Cooper came up
with earlier, and has the widely known issues that we've talked about
:slight_smile: