[polly] scev codegen (first step to remove the dependence on ivcanon pass)

Hi Tobi,

I would like to remove the SCEVRewriter code and replace it with a call to
SCEVAddRec::apply (see attached a patch that adds just this function). More
precisely I want to add another function called apply_map that applies a map
(loop -> expr) on a given scev. This is the apply function on a multi-variate
polynomial.

So here is an overview of how I would like the scev code generator to work on an
example: supposing that we have a Stmt_1 that gets code generated by either
CLooG or ISL-codegen like this:

  Stmt_1(c1, c1+4, c1+c2);

we will construct a map that maps the old iteration domain with 3 dimensions
(there are 3 arguments in Stmt_1 representing the original loop nest in which
Stmt_1 was located, let's call the original loop nest loop_1, loop_2, loop_3) to
the new expressions generated by cloog that are function of the new iteration
domain with 2 dimensions (c1 and c2 are the new induction variables of the code
generated by cloog). So the content of that map is:

loop_1 -> c1
loop_2 -> c1+7
loop_3 -> c1+c2

Given an access function from the original program:
  Scev_1 = {{{0, +, 4}_1, +, 5}_2, +, 6}_3

we will apply the map on it, and we will get a symbolic expression function of
the new induction variables c1, and c2:

Scev_2 = apply (loop_1 -> c1) on Scev_1 = {{4*c1, +, 5}_2, +, 6}_3
Scev_3 = apply (loop_2 -> c1+7) on Scev_2 = {4*c1 + 5*(c1+7), +, 6}_3
Scev_4 = apply (loop_3 -> c1+c2) on Scev_3 = 4*c1 + 5*(c1+7) + 6*(c1+c2)

We will then code generate Scev_4 and we will use this new expression for the
array access in the new loop nest.

Remark that in all this process we have never referred to the original
"canonical induction variable". SCEV actually provides such a canonical form
for the induction variables without having to transform the code.

Sebastian

0001-add-SCEVAddRecExpr-apply.patch (2.57 KB)

Hi Tobi,

I would like to remove the SCEVRewriter code and replace it with a call
to
SCEVAddRec::apply (see attached a patch that adds just this function).
More
precisely I want to add another function called apply_map that applies a
map
(loop -> expr) on a given scev. This is the apply function on a
multi-variate
polynomial.

Hi Sebastian,

thanks for working on removing our dependence on iv canonicalization.

Let me first comment on the approach:

So here is an overview of how I would like the scev code generator to
work on an
example: supposing that we have a Stmt_1 that gets code generated by
either
CLooG or ISL-codegen like this:

  Stmt_1(c1, c1+4, c1+c2);

we will construct a map that maps the old iteration domain with 3
dimensions
(there are 3 arguments in Stmt_1 representing the original loop nest in
which
Stmt_1 was located, let's call the original loop nest loop_1, loop_2,
loop_3) to
the new expressions generated by cloog that are function of the new
iteration
domain with 2 dimensions (c1 and c2 are the new induction variables of
the code
generated by cloog). So the content of that map is:

loop_1 -> c1
loop_2 -> c1+7
loop_3 -> c1+c2

Given an access function from the original program:
  Scev_1 = {{{0, +, 4}_1, +, 5}_2, +, 6}_3

we will apply the map on it, and we will get a symbolic expression
function of
the new induction variables c1, and c2:

Scev_2 = apply (loop_1 -> c1) on Scev_1 = {{4*c1, +, 5}_2, +, 6}_3
Scev_3 = apply (loop_2 -> c1+7) on Scev_2 = {4*c1 + 5*(c1+7), +, 6}_3
Scev_4 = apply (loop_3 -> c1+c2) on Scev_3 = 4*c1 + 5*(c1+7) + 6*(c1+c2)

We will then code generate Scev_4 and we will use this new expression for
the
array access in the new loop nest.

Yes. This is very similar to what I had in mind and what I already
started to implement. There are some slight differences:

You create a map from the old_loop to a symbolic expression. What type
would this symbolic expression have? Would
it be a SCEVExpr? At the moment, we calculate at the beginning of each
polly statement (a basic block) a value for each virtual induction
variable of the original loops. For your example we get something like
this.

new_bb:
     new_iv_1 = add i32 c1, 0
     new_iv_2 = add i32 c1, 7
     new_iv_3 = add i32 c1, c2
    
     ....

I want to highlight here that the values new_iv_1, new_iv_2, new_iv_3
correspond to the number of loop iterations of the original loop. As we
currently require canonical induction variables, they are equivalent to
the value of the old canonical induction variable. However, generating
those values does not imply that we need to have canonical induction
variables in the original code. Even without canonical induction
variables, calculating such values is useful, as this simplifies the map
you describe above. Instead of going from Loop* to some symbolic
expression, you can just pass the corresponding Value* or ScevUnknown*.
In your example this would be

Scev_2 = apply (loop_1 -> new_iv_1) on Scev_1 = {{4*new_iv_1, +, 5}_2,
+, 6}_3
Scev_3 = apply (loop_2 -> new_iv_2) on Scev_2 = {4*new_iv_1 +
5*new_iv_2, +, 6}_3
Scev_4 = apply (loop_3 -> new_iv_3) on Scev_3 = 4*new_iv_1 + 5*new_iv_2
+ 6*new_iv_3

Passing a symbolic expression, as you propose, could allow further
simplifictions, however it also requires s to
translate an isl_ast_expr to some kind of ScevExpr. This is non-trivial
as we would need to teach SCEV about the different
operands isl codegen could produce (floord, ceild, min, max, %, ....).
I would suggest to leave this optimization for later and to first remove
the dependence on the ivdeps pass. We can than evaluate, if this
optimization should be done in SCEV or if we rather rely on instcombine
or something similar to optimize the generated instructions further.

Remark that in all this process we have never referred to the original
"canonical induction variable". SCEV actually provides such a canonical
form
for the induction variables without having to transform the code.

Yes, this remains true with the modification I am proposing.

Now to the removal of the SCEVRewriter. I am not opposed to it, but
wonder what the benefit of removing it would be? Do you think moving
this transformation directly into SCEV is conceptually nicer or is there
some additional benefit. I have mainly two points we should think about
before removing it:

1. What happens to parameters

Besides rewriting the loop ivs, the SCEVRewriter also rewrites
parameters. It takes a map [<SCEV* -> Value*>] and replaces SCEV
expressions which we have recognized during code generation with a
parameter value. This is necessary,
in case we generate OpenMP / OpenCL code and need to pass parameters to
other functions or modules.

2. Speed

The SCEVRewriter only passes once over the SCEVExpr. In your approach,
we would pass three times over the SCEV, no?
It is probably too early to optimize here, but we should keep that in
mind.

In terms of code to be written:

To remove the dependency of the canonical iv from the code generation,
it should be sufficient to pass a map <Loop*, Value*> to the
BlockGenerator and to use it in SCEVRewriter::getNewIV(). For CLooG, you
should be able to initialize it
from the u->substitutions list. For isl, the information should be
readily available in IslNodeBuilder::createUser().

Cheers
Tobi

Tobias Grosser wrote:

You create a map from the old_loop to a symbolic expression. What type would
this symbolic expression have? Would it be a SCEVExpr?

evaluateAtIteration takes a scev, so apply will take a scev, or a map
(loop->scev). You can always build a ScevUnknown from an SSA name and use that
in the apply.

At the moment, we calculate at the beginning of each
polly statement (a basic block) a value for each virtual induction
variable of the original loops. For your example we get something like
this.

new_bb:
     new_iv_1 = add i32 c1, 0
     new_iv_2 = add i32 c1, 7
     new_iv_3 = add i32 c1, c2
    
     ....

I want to highlight here that the values new_iv_1, new_iv_2, new_iv_3
correspond to the number of loop iterations of the original loop. As we
currently require canonical induction variables, they are equivalent to
the value of the old canonical induction variable. However, generating
those values does not imply that we need to have canonical induction
variables in the original code. Even without canonical induction
variables, calculating such values is useful, as this simplifies the map
you describe above. Instead of going from Loop* to some symbolic
expression, you can just pass the corresponding Value* or ScevUnknown*.

Yes, that's equivalent to what I described for the map (loop->expression):
whether you have an expression or just an SSA name that computes that expression
is equivalent.

Passing a symbolic expression, as you propose, could allow further
simplifictions,

I don't see how it could simplify anything: I think both are equivalent.

Now to the removal of the SCEVRewriter. I am not opposed to it, but
wonder what the benefit of removing it would be? Do you think moving
this transformation directly into SCEV is conceptually nicer or is there
some additional benefit.

Other code would be able to call apply: it's a general operation that can be
exposed to other users.

1. What happens to parameters

Besides rewriting the loop ivs, the SCEVRewriter also rewrites parameters. It
takes a map [<SCEV* -> Value*>] and replaces SCEV expressions which we have
recognized during code generation with a parameter value. This is necessary,
in case we generate OpenMP / OpenCL code and need to pass parameters to other
functions or modules.

Maybe OpenMP / OpenCL codegen could separately handle the rewrite of in/out
variables for the outlined functions: the scalar and vector codegen don't need
to rewrite parameters.

What about moving a reduced version rewriteScevParams in the general
implementation of SCEV? That would make it available for other users.

2. Speed

The SCEVRewriter only passes once over the SCEVExpr. In your approach,
we would pass three times over the SCEV, no?

It depends on how you implement it. You can do it in one swipe as well.

Sebastian

Tobias Grosser wrote:

You create a map from the old_loop to a symbolic expression. What type would
this symbolic expression have? Would it be a SCEVExpr?

evaluateAtIteration takes a scev, so apply will take a scev, or a map
(loop->scev). You can always build a ScevUnknown from an SSA name and use that
in the apply.

OK, so we have the same idea of this.

At the moment, we calculate at the beginning of each
polly statement (a basic block) a value for each virtual induction
variable of the original loops. For your example we get something like
this.

new_bb:
      new_iv_1 = add i32 c1, 0
      new_iv_2 = add i32 c1, 7
      new_iv_3 = add i32 c1, c2

      ....

I want to highlight here that the values new_iv_1, new_iv_2, new_iv_3
correspond to the number of loop iterations of the original loop. As we
currently require canonical induction variables, they are equivalent to
the value of the old canonical induction variable. However, generating
those values does not imply that we need to have canonical induction
variables in the original code. Even without canonical induction
variables, calculating such values is useful, as this simplifies the map
you describe above. Instead of going from Loop* to some symbolic
expression, you can just pass the corresponding Value* or ScevUnknown*.

Yes, that's equivalent to what I described for the map (loop->expression):
whether you have an expression or just an SSA name that computes that expression
is equivalent.

>

Passing a symbolic expression, as you propose, could allow further
simplifictions,

I don't see how it could simplify anything: I think both are equivalent.

Passing symbolic expressions, instead of a plain SCEVUnknown could expose simplification opportunities on the SCEV level.

%const = sub i32 10, 5 // -5

Scev_1 = {20, +, 4}_1
Scev_2 = apply (loop_1 -> %const) on Scev_1 = 20 + 4 * %const

If we pass %const as a SCEVUnknown, we can not further simplify the resulting SCEV. However, if we pass it as the constant value -5, we
can do the following simplification:

Scev_2 = apply (loop_1 -> -5) on Scev_1 = 20 + 4*(-5) = 0

I don't think we there is currently a need to take advantage of simplifications at SCEV level. I would rather have this one working first. We can then have investigate the benefits of simplifications at
the SCEV level.

Now to the removal of the SCEVRewriter. I am not opposed to it, but
wonder what the benefit of removing it would be? Do you think moving
this transformation directly into SCEV is conceptually nicer or is there
some additional benefit.

Other code would be able to call apply: it's a general operation that can be
exposed to other users.

That makes sense.

1. What happens to parameters

Besides rewriting the loop ivs, the SCEVRewriter also rewrites parameters. It
takes a map [<SCEV* -> Value*>] and replaces SCEV expressions which we have
recognized during code generation with a parameter value. This is necessary,
in case we generate OpenMP / OpenCL code and need to pass parameters to other
functions or modules.

Maybe OpenMP / OpenCL codegen could separately handle the rewrite of in/out
variables for the outlined functions: the scalar and vector codegen don't need
to rewrite parameters.

What about moving a reduced version rewriteScevParams in the general
implementation of SCEV? That would make it available for other users.

If there is a use for it or if it simplifies some code I am fine with this.

2. Speed

The SCEVRewriter only passes once over the SCEVExpr. In your approach,
we would pass three times over the SCEV, no?

It depends on how you implement it. You can do it in one swipe as well.

OK. Good.

Cheers
Tobi