SCEV expansion sometimes generates redundent expr even if there is an
available expr which can be reused. The redundent exprs can be a lot
different from existing exprs so that existing cleanup passes cannot
remove them.
https://reviews.llvm.org/D12090 and https://reviews.llvm.org/D21313
already relieved the problem somewhat, but it is far from being solved
fundamentally. Recently we found another problematic case described in https://llvm.org/bugs/show_bug.cgi?id=29065. And I believe I can
create more tests revealing similar problems. In PR29065, I explained
why it is difficult to fix the problem by extending D12090 and D21313.
It is possible but still not very easy to fix the problem by enhancing
existing cleanup passes. So I am soliciting ideas here about how to
solve the problem in a more fundamental way.
One thing I am always wondering is GCC also uses SCEV to calculate the
loop iteration number and why it doesn't have such problem. From my
limited knowledge about GCC's SCEV I guess it is because gcc only has
the basic function of SCEV which can represent loop recursive expr
like llvm SCEVAddRecExpr part, SCEV operands are still gimple
variables and it doesn't have such complex transformations on SCEV as
llvm does. Correct me if I say something wrong here. I am also
wondering if llvm doesn't have such complex transformations on SCEV
(including so many simplifications and canonicalizations), what will
be missing? Why IR based canonicalization and simplification not
enough?
I've not seen GCC's SCEV so I cannot make a comparative comment here
(maybe Chris, Andy or Dan can chime in here), but I personally am in
the "make the cleanup passes smarter" camp. We can also try to make
SCEV expansion smarter -- not by putting more things in SCEVExpander
(it is already complex enough!), but by splitting out a dedicated
SCEVSimplifier that you invoke on code generated from SCEVExpander to
strength reduce it. SCEVSimplifier can then internally use routines
in SCEV, so that it is "as smart as" SCEV in most cases.
SCEV is super useful as an analysis without SCEVExpander. The only real issue with SCEV itself is invalidating the expressions.
I’ve always thought SCEVExpander is very dangerous to use directly. Ideally the SCEV expression should always be reformulated based on existing IR values before expanding. It would be nice if that was provided as a layer of functionality on top of SCEVExpander.
Sanjoy and Andy, thanks a lot for your suggestions.
>
> Hi Wei,
>
> I've not seen GCC's SCEV so I cannot make a comparative comment here
> (maybe Chris, Andy or Dan can chime in here), but I personally am in
> the "make the cleanup passes smarter" camp. We can also try to make
> SCEV expansion smarter -- not by putting more things in SCEVExpander
> (it is already complex enough!), but by splitting out a dedicated
> SCEVSimplifier that you invoke on code generated from SCEVExpander to
> strength reduce it. SCEVSimplifier can then internally use routines
> in SCEV, so that it is "as smart as" SCEV in most cases.
>
> — Sanjoy
Combine the efforts of making the cleanup passes smarter and
simplifying the expanded code to make it more IR canonical looks like
a promising way. But I may not understand SCEVSimplifier you suggested
here correctly: The input of SCEVSimplifier is IR or SCEV? The code
generated by SCEVExpander is IR. If the input of SCEVSimplifier is IR,
how does it use SCEV routine internally?
SCEV is super useful as an analysis without SCEVExpander. The only real issue with SCEV itself is invalidating the expressions.
I’ve always thought SCEVExpander is very dangerous to use directly. Ideally the SCEV expression should always be reformulated based on existing IR values before expanding. It would be nice if that was provided as a layer of functionality on top of SCEVExpander.
-Andy
Yes, it makes me think that the original idea in D12090 to try to
reuse the existing value directly may not be the right way to solve
the problem fundamentally. Reformulating SCEV to make it more
consistent with cannonical IR form is more promising. If only the IR
expanded is cannonical compared with existing IR, cleanup pass can
have big chance to catch the reuse.
In D12090, ScalarEvolution::ExprValueMap is used to map a new SCEV to
existing IR values when the SCEV is created by
ScalarEvolution::getSCEV. But a problem is that not all SCEVs are
created via getSCEV. For example, ScalarEvolution::getAddExpr can
create a new SCEV "C" based on two existing SCEVs: "A" and "B".
If A is a SCEVAddExpr itself: "A = A1 + A2", C is formulated as three
operands SCEV: "A1 + A2 + B" without A being an operand. Then it is
impossible to reuse existing IR value recorded for A in ExprValueMap.
Before expansion, it is too late so is difficult to convert C from "A1
+ A2 + B" back to "A + B" again. If we can afford to record more
information during ScalarEvolution::getAddExpr, we can keep the
information that C is "A + B" towards expansion. But I am not sure it
is an acceptable way -- adding more maps and more complexity.
Wei Mi wrote:
> Sanjoy and Andy, thanks a lot for your suggestions.
>
>>
>>>
>>> Hi Wei,
>>>
>>> I've not seen GCC's SCEV so I cannot make a comparative comment here
>>> (maybe Chris, Andy or Dan can chime in here), but I personally am in
>>> the "make the cleanup passes smarter" camp. We can also try to make
>>> SCEV expansion smarter -- not by putting more things in SCEVExpander
>>> (it is already complex enough!), but by splitting out a dedicated
>>> SCEVSimplifier that you invoke on code generated from SCEVExpander to
>>> strength reduce it. SCEVSimplifier can then internally use routines
>>> in SCEV, so that it is "as smart as" SCEV in most cases.
>>>
>>> — Sanjoy
>
> Combine the efforts of making the cleanup passes smarter and
> simplifying the expanded code to make it more IR canonical looks like
> a promising way. But I may not understand SCEVSimplifier you suggested
> here correctly: The input of SCEVSimplifier is IR or SCEV? The code
> generated by SCEVExpander is IR. If the input of SCEVSimplifier is IR,
> how does it use SCEV routine internally?
It accepts a ScalarEvolution instance that it uses internally to check
for equivalences, so the interface is:
class Simplifier {
public:
Simplifier(ScalarEvolution &SE);
Value *simplify(Value *);
};
or something like that. Once we have this separation between
expansion and strength reduction, we can be more aggressive (than we
can reasonably be within SCEVExpander) about teaching Simplifier SCEV
based CSE rules.
In this scheme it would be more complicated to have a "expand, but
only if we can do it cheaply" type schemes.
Sanjoy and Andy, thanks a lot for your suggestions.
Hi Wei,
I've not seen GCC's SCEV so I cannot make a comparative comment here
(maybe Chris, Andy or Dan can chime in here), but I personally am in
the "make the cleanup passes smarter" camp. We can also try to make
SCEV expansion smarter -- not by putting more things in SCEVExpander
(it is already complex enough!), but by splitting out a dedicated
SCEVSimplifier that you invoke on code generated from SCEVExpander to
strength reduce it. SCEVSimplifier can then internally use routines
in SCEV, so that it is "as smart as" SCEV in most cases.
— Sanjoy
Combine the efforts of making the cleanup passes smarter and
simplifying the expanded code to make it more IR canonical looks like
a promising way. But I may not understand SCEVSimplifier you suggested
here correctly: The input of SCEVSimplifier is IR or SCEV? The code
generated by SCEVExpander is IR. If the input of SCEVSimplifier is IR,
how does it use SCEV routine internally?
It accepts a ScalarEvolution instance that it uses internally to check
for equivalences, so the interface is:
class Simplifier {
public:
Simplifier(ScalarEvolution &SE);
Value *simplify(Value *);
};
or something like that. Once we have this separation between
expansion and strength reduction, we can be more aggressive (than we
can reasonably be within SCEVExpander) about teaching Simplifier SCEV
based CSE rules.
To rephrase it and see if I understand it correctly:
Existing Value A;
Expanded Value B (B is identical with A, but in a different form);
So we can convert both A and B to SCEV again and compare their
equivalence and use the result to guide strength reduction. It is like
to enhance existing cleanup pass using the SCEV CSE rules.
Or the idea of simplify is to transform B back to the form close to A
with SCEV knowledge?
You are right, Thanks! I just looked at loop iteration number
computation in GCC. It only uses SCEV to identify simple induction
variables, represent them as affine expr, and then use those affine
exprs and their value ranges in loop to compute loop iteration number,
so it doesn't involve SCEV expansion.
From: "Wei Mi via llvm-dev" <llvm-dev@lists.llvm.org>
To: "Daniel Berlin" <dberlin@dberlin.org>
Cc: "llvm-dev" <llvm-dev@lists.llvm.org>, "David Li" <davidxl@google.com>
Sent: Wednesday, August 24, 2016 5:41:12 PM
Subject: Re: [llvm-dev] Request suggestions about how to remove redundencies caused by SCEV expansion fundementally
>
>
>>
>> SCEV expansion sometimes generates redundent expr even if there is
>> an
>> available expr which can be reused. The redundent exprs can be a
>> lot
>> different from existing exprs so that existing cleanup passes
>> cannot
>> remove them.
>>
>> https://llvm.org/bugs/show_bug.cgi?id=24920
>> 24442 – Another cse pass may be needed after LSR to clean up redundency introduced by SCEV expansion
>>
>> ⚙ D12090 Try to reuse existing value during SCEV expansion and
>> ⚙ D21313 Use ValueOffsetPair to enhance value reuse during SCEV expansion.
>> already relieved the problem somewhat, but it is far from being
>> solved
>> fundamentally. Recently we found another problematic case
>> described in
>> 29065 – [SCEV] scev expansion generates redundent code in memcheck during vectorization. And I believe I can
>> create more tests revealing similar problems. In PR29065, I
>> explained
>> why it is difficult to fix the problem by extending D12090 and
>> D21313.
>> It is possible but still not very easy to fix the problem by
>> enhancing
>> existing cleanup passes. So I am soliciting ideas here about how
>> to
>> solve the problem in a more fundamental way.
>>
>> One thing I am always wondering is GCC also uses SCEV to calculate
>> the
>> loop iteration number and why it doesn't have such problem. From
>> my
>> limited knowledge about GCC's SCEV I guess it is because gcc only
>> has
>> the basic function of SCEV which can represent loop recursive expr
>> like llvm SCEVAddRecExpr part, SCEV operands are still gimple
>> variables and it doesn't have such complex transformations on SCEV
>> as
>> llvm does.
>
>
> This is not really correct
>
> In fact, at one point, GCC had *all of the possible scev*
> (sebastian worked
> a lot on it) ,including mixers, etc.
>
>
> The real thing here is that GCC just doesn't do a lot of scev
> expansion. It
> mainly uses it as an analysis, not as a code generation mechanism.
>
>
You are right, Thanks! I just looked at loop iteration number
computation in GCC. It only uses SCEV to identify simple induction
variables, represent them as affine expr, and then use those affine
exprs and their value ranges in loop to compute loop iteration
number,
so it doesn't involve SCEV expansion.
So how are you thinking about moving forward here?
Shorterm I feel easy to try is to enhance the expansion of
SCEVNAryExpr: As suggested by Andy, for a new SCEV C = A1 + A2 + B
which is generated by getAddExpr(A, B), adding an opaque value node
somewhere to record C equals to opaque_value = Value_A + Value_B. It
is possible that B doesn't have Value_B recorded in ExprValueMap. Need
to find some representation saying opaque_value = Value_A + expand(B).
This will relieve the problem in PR29065. I may extend ExprValueMap
from recording a single value to some form more flexible to represent
various expr.
Longterm, enhancing cleanup, especially reassociate is something worth
doing because it may be generally helpful (not just for scev
expansion). From my current experience, a large part of expansion
redundency uncleaned is caused by some weakness of reassociation,
especially the problem mentioned in http://lists.llvm.org/pipermail/llvm-dev/2015-May/085406.html and
problem related with multiple getelementptr. NaryReassociate solved
some common problems but there are more. The paper mentioned in Pact08
may be worthy to try.
By the way, to answer Andy's question. I did some experiment using the
testcase in PR29065 to see if we regenerate SCEV for existing and
expanded IR, whether it will be easier to check equivalence suppose we
have SCEV cse rules. The reassociated order of generated SCEVs seems
more consistent. For C= A1 + A2 + B above, the SCEV corresponding to
A1 + A2 has the same computation sequence with the SCEV corresponding
to A (A1, A2 and A are all complex SCEV themselves). However, during
expansion, the cost of searching existing IR, convert them to SCEV and
see if they are equivalent with the SCEV to be expanded looks
expensive. And we need to add complex rules to define SCEV equivalence
from CSE perspective. So I think the method provides me one more
alternative, but I am not going to persue it right away.
This is the ideas what I currently have. More suggestions are welcomed.
By the way, to answer Andy's question. I did some experiment using the
testcase in PR29065 to see if we regenerate SCEV for existing and
expanded IR, whether it will be easier to check equivalence suppose we
have SCEV cse rules. The reassociated order of generated SCEVs seems
more consistent. For C= A1 + A2 + B above, the SCEV corresponding to
A1 + A2 has the same computation sequence with the SCEV corresponding
to A (A1, A2 and A are all complex SCEV themselves). However, during
expansion, the cost of searching existing IR, convert them to SCEV and
see if they are equivalent with the SCEV to be expanded looks
expensive.
Is this the cost of processing every instruction, or just things that look
like recurrences?
I ask mainly because i'm curious if the cost decreases if you limit this
part to recurrences, and did something like have VN just take recurrences
(which are cheap to find as SCC's in the SSA graph), if you have >1
recurrence, and see if they are SCEVable.
You'd still need to define equivalence, of course.
And we need to add complex rules to define SCEV equivalence
By the way, to answer Andy's question. I did some experiment using the
testcase in PR29065 to see if we regenerate SCEV for existing and
expanded IR, whether it will be easier to check equivalence suppose we
have SCEV cse rules. The reassociated order of generated SCEVs seems
more consistent. For C= A1 + A2 + B above, the SCEV corresponding to
A1 + A2 has the same computation sequence with the SCEV corresponding
to A (A1, A2 and A are all complex SCEV themselves). However, during
expansion, the cost of searching existing IR, convert them to SCEV and
see if they are equivalent with the SCEV to be expanded looks
expensive.
Is this the cost of processing every instruction, or just things that look
like recurrences?
Both: the cost of creating SCEV for every existing instruction + the
cost of searching recursively the elements of SCEV expanded and
comparing each element with SCEV for existing instruction.
Wei Mi wrote:
> To rephrase it and see if I understand it correctly:
>
> Existing Value A;
>
> Expanded Value B (B is identical with A, but in a different form);
>
> So we can convert both A and B to SCEV again and compare their
> equivalence and use the result to guide strength reduction. It is like
> to enhance existing cleanup pass using the SCEV CSE rules.
Yes. And we don't necessarily have to worry about exact matches
either, we can (for instance) check if getMinusSCEV(getSCEV(A),
getSCEV(B)) is a constant, and use that to cheaply expand B.
> Or the idea of simplify is to transform B back to the form close to A
> with SCEV knowledge?
> By the way, to answer Andy's question. I did some experiment using the
> testcase in PR29065 to see if we regenerate SCEV for existing and
> expanded IR, whether it will be easier to check equivalence suppose we
> have SCEV cse rules. The reassociated order of generated SCEVs seems
> more consistent. For C= A1 + A2 + B above, the SCEV corresponding to
> A1 + A2 has the same computation sequence with the SCEV corresponding
> to A (A1, A2 and A are all complex SCEV themselves). However, during
> expansion, the cost of searching existing IR, convert them to SCEV and
> see if they are equivalent with the SCEV to be expanded looks
> expensive. And we need to add complex rules to define SCEV equivalence
> from CSE perspective. So I think the method provides me one more
> alternative, but I am not going to persue it right away.
This is a _terrible_ idea, but I'm going to put it on the table
nevertheless: to avoid sinking time into creating new SCEV
expressions, add a tryGetSCEV(Value *V) returns a SCEV expression only
if there is already a SCEV cached for that V. This is a bad idea
because it allows the state of SCEV's cache to affect optimization,
but we could do this as a last resort.
A somewhat cleaner idea is to allow _seeding_ a createSCEV call with
an existing SCEV expression, with the intent "I'm only looking for
something close to this SCEV expression S, so don't bother continuing
if what you'll return will be very different from S". Some simple
heuristics for this may not be difficult to implement.
> By the way, to answer Andy's question. I did some experiment using the
> testcase in PR29065 to see if we regenerate SCEV for existing and
> expanded IR, whether it will be easier to check equivalence suppose we
> have SCEV cse rules. The reassociated order of generated SCEVs seems
> more consistent. For C= A1 + A2 + B above, the SCEV corresponding to
> A1 + A2 has the same computation sequence with the SCEV corresponding
> to A (A1, A2 and A are all complex SCEV themselves). However, during
> expansion, the cost of searching existing IR, convert them to SCEV and
> see if they are equivalent with the SCEV to be expanded looks
> expensive. And we need to add complex rules to define SCEV equivalence
> from CSE perspective. So I think the method provides me one more
> alternative, but I am not going to persue it right away.
This is a _terrible_ idea, but I'm going to put it on the table
nevertheless: to avoid sinking time into creating new SCEV
expressions, add a tryGetSCEV(Value *V) returns a SCEV expression only
if there is already a SCEV cached for that V. This is a bad idea
because it allows the state of SCEV's cache to affect optimization,
but we could do this as a last resort.
I'd argue very strongly against this. If it gets to the point this is being seriously discussed, please surface it clearly on llvm-dev.