Working on FP SCEV Analysis

Demikhovsky, Elena wrote:

> Even then, I'd personally want to see further evidence of why the
correct solution is to model the floating point IV in SCEV rather than
find a more powerful way of converting the IV to an integer that models
> the non-integer values taken on by the IV. As an example, if the use
case is the following code with appropriate flags to relax IEEE
semantics so this looks like normal algabra etc:

> for (float f = 0.01f; f < 1.0f; f += 0.01f) ç **A**

...

> I'd rather see us cleverly turn it into:

> float f = 0.01f;

> for (int i = 1; i < 100; i += 1, f += 0.01f) ç **B**

I can later try to enhance IndVarSimplify::handleFloatingPointIV() in
order to convert**A** to **B**.

But **B** is exactly the case I’m starting from. The main IV “i” is
integer. The variable “f” is also considered as IV in this loop.

And this loop is not vectorized because “f” is floating point.

I don’t think that the case **B** is uncommon.

If B is the case we actually care about, I'd say changing SCEV to work with floating points is an overkill. How would you expect an SCEVFAddExpr to help vectorize B, other than tell you what the initial value and the increment is (and these can be found with a simple value analysis)?

If we're interested in handling complex variants of A directly: computing trip counts, proving away predicates etc. without translating the loops to use integer IVs (perhaps because we can't legally do so), then I can see FP-SCEV as a reasonable implementation strategy, but it looks like the general consensus is that such cases are rare and generally not worth optimizing?

-- Sanjoy

If we're interested in handling complex variants of A directly: computing
trip counts, proving away predicates etc. without translating the loops to
use integer IVs (perhaps because we can't legally do so), then I can see
FP-SCEV as a reasonable implementation strategy, but it looks like the
general consensus is that such cases are rare and generally not worth
optimizing?

Just to explicitly not include myself in that consensus: I'm withholding a

real view till I see *some* evidence one way or the other.

Hi,

Daniel Berlin wrote:
>
> If we're interested in handling complex variants of A directly:
> computing trip counts, proving away predicates etc. without
> translating the loops to use integer IVs (perhaps because we can't
> legally do so), then I can see FP-SCEV as a reasonable
> implementation strategy, but it looks like the general consensus is
> that such cases are rare and generally not worth optimizing?
>
> Just to explicitly not include myself in that consensus: I'm withholding
> a real view till I see *some* evidence one way or the other.

Actually, allow me to revise what I said:

"but it looks it isn't clear that such cases generally worth
optimizing?"

"but it looks like the general consensus is
that such cases are rare and generally not worth optimizing?"

In any case, I wasn't trying to have a strong opinion on whether A is
important or not, but was really trying to get at this: my intuition
is that an FP-enabled SCEV will be worth the effort only if we
ultimately want to make LLVM "handle complex variants of A directly".
I wouldn't to make SCEV smarter around FP (which will be a lot of
work, and add complexity through the analysis) and then stop at just
B, since I don't see why we'd need all of the simplification etc. SCEV
typically does if we just want to handle B-like cases. I may be wrong
about the latter part, and if I am, I'd like to know what teaching
SCEV about FP enables you to do in cases like B that are difficult to
do without FP-SCEV.

-- Sanjoy

I’m not sure what terminology to use (help), but we should make a fundamental distinction between these cases (copying from Hideki):

void foo(TTT *a, int N, TTT x, TTT y){
int i;
for (i=0;i<N;i++){
x+=y;
}
}

…where we have a pure reduction.

void foo(TTT *a, int N, TTT x, TTT y){
int i;
for (i=0;i<N;i++){
A[i] = x;
x+=y;
}
}

…where we have a FP induction variable (such that all intermediate values need to be computed).

I pretty much agree with everything Danny, Sanjoy, and Hideki said.

-Andy

Demikhovsky, Elena wrote:

Even then, I’d personally want to see further evidence of why the
correct solution is to model the floating point IV in SCEV rather than
find a more powerful way of converting the IV to an integer that models
the non-integer values taken on by the IV. As an example, if the use
case is the following code with appropriate flags to relax IEEE
semantics so this looks like normal algabra etc:

for (float f = 0.01f; f < 1.0f; f += 0.01f) ç A

I’d rather see us cleverly turn it into:

float f = 0.01f;

for (int i = 1; i < 100; i += 1, f += 0.01f) ç B

I can later try to enhance IndVarSimplify::handleFloatingPointIV() in
order to convertA to B.

But B is exactly the case I’m starting from. The main IV “i” is
integer. The variable “f” is also considered as IV in this loop.

And this loop is not vectorized because “f” is floating point.

I don’t think that the case B is uncommon.

If B is the case we actually care about, I’d say changing SCEV to work with floating points is an overkill. How would you expect an SCEVFAddExpr to help vectorize B, other than tell you what the initial value and the increment is (and these can be found with a simple value analysis)?

One option would be to extend InductionDescriptor::isInductionPHI in the vectorizer to directly analyze the PHIs without SCEV support as Sanjoy suggested. I think that that could be sufficient to handle case B.

Then if we find other pressing cases to handle we can rethink the strategy.

Also the current diagnostics is pretty bad for Hideki’s testcase with TTT as float. This is what we currently report with -Rpass-analysis=loop-vectorize:

/tmp/sss.c:3:6: remark: loop not vectorized: value that could not be

identified as reduction is used outside the loop

[-Rpass-analysis=loop-vectorize]

I have no clue why we say that the value is used outside the loop. I think this should just say that we have a loop-variant value that we couldn’t identify either as an induction or as a reduction.

One option would be to extend InductionDescriptor::isInductionPHI in the vectorizer to directly analyze the PHIs without SCEV support as Sanjoy suggested. I think that that could be sufficient to handle case B.

I implemented this with FP SCEV and the code looks very structured, including SCEVExpander. Extending the existing structures without implementing FP SCEV will be problematic.

And my end goal is to handle case A.

> One option would be to extend InductionDescriptor::isInductionPHI in the vectorizer to directly analyze the PHIs without SCEV support as Sanjoy suggested. I *think* that that could be sufficient to handle case B. <>

I implemented this with FP SCEV and the code looks very structured, including SCEVExpander. Extending the existing structures without implementing FP SCEV will be problematic.
And my end goal is to handle case *A*.

The problem isn't that the code will be bad, but that it might be unused. SCEV is already quite big, and extending it even further for no (stated so far) reason sounds questionable.

Owen mentioned an important use case where float is just used because integers are not supported at all - we definitely should be able to handle such cases, but it looks like promotion to int is the only thing we need for it. I looked into a simple example:
#define T float
void foo(T *buf) {
T i = 0;
#pragma clang loop vectorize(enable)
for (i = 0; i < 1000; i += 1) {
buf[(int)i] = i;
}
}

and we vectorize it now. That said, the generated code for integers is much cleaner, but I don't think and has anything to do with the IV, as it's been successfully promoted to an integer IV.

So, as far as I understand, your intention is to handle loops with non-integer step/start/end value. Why can't we focus on promotable to integer cases, which can and should be handled even now (if not, we'd rather fix it)?

Thanks,
Michael

PS: My old intel email address isn't accessible any more, so you probably should remove it from your contacts.

One option would be to extend InductionDescriptor::isInductionPHI in the vectorizer to directly analyze the PHIs without SCEV support as Sanjoy suggested. I think that that could be sufficient to handle case B.

I implemented this with FP SCEV and the code looks very structured, including SCEVExpander. Extending the existing structures without implementing FP SCEV will be problematic.

And my end goal is to handle case A.

Ok, but there have been numerous requests for an explanation of why this is important, and that hasn’t emerged yet.

We really need to have a clear understanding of the relative importance of solving these problems in order to understand the best approach.

To the best of my experience, handling case B (secondary induction) is must-have, and if I’m not mistaken,

people aren’t opposed to that.

For me, handling case A (primary induction) is “why not?”, but I certainly admit that that can be very naïve

thinking coming from lack of good understanding on SCEV and their proper usages. Now, let’s assume we

can postpone discussion about case A. What is the best approach to handle case B? Let me summarize

the discussion so far. Hope I didn’t miss anything.

Extend SCEV was the initial approach taken by Elena.

Elena thinks this solution ”looks very structured”.

If I’m not mistaken, some people think this is overkill and overly complicates already complicated SCEV.

Anyone care to look at the patch Elena came up with?

IndVarSimplify::handleFloatingPointIV (mentioned by Andy)

This transforms integer-valued FP (primary) IV into integer IV and convert.

Chandler says most of Graphics shading language use case mentioned by Owen

should be handled here.

It certainly has the logic of detecting FP induction, but Andy punted discussions
on non-integer valued IV issues to MichaelZ and Adam.

extend InductionDescriptor::isInductionPHI in the vectorizer to directly analyze the PHIs without SCEV support

If this is the standard way to deal with all secondary inductions, it certainly looks attractive.

Elena, would you try doing this and compare with 1)?

Thanks,

Hideki

Hi Hideki,

I like this summary overall, thanks. More below.

To the best of my experience, handling case B (secondary induction) is must-have, and if I’m not mistaken,
people aren’t opposed to that.

For me, handling case A (primary induction) is “why not?”, but I certainly admit that that can be very naïve
thinking coming from lack of good understanding on SCEV and their proper usages. Now, let’s assume we
can postpone discussion about case A. What is the best approach to handle case B? Let me summarize
the discussion so far. Hope I didn’t miss anything.

1)
Extend SCEV was the initial approach taken by Elena.
Elena thinks this solution ”looks very structured”.
If I’m not mistaken, some people think this is overkill and overly complicates already complicated SCEV.
Anyone care to look at the patch Elena came up with?
2)
IndVarSimplify::handleFloatingPointIV (mentioned by Andy)
This transforms integer-valued FP (primary) IV into integer IV and convert.
Chandler says most of Graphics shading language use case mentioned by Owen
should be handled here.
It certainly has the logic of detecting FP induction, but Andy punted discussions
on non-integer valued IV issues to MichaelZ and Adam.

My understanding is that we only need this for *A* not for *B*.

On the specific issue of non-integer values, there is simply no attempt made in the code to deal with them. That said, I think it should be possible to compute the trip count and then derive an integer induction variable controlling the loop.

3)
extend InductionDescriptor::isInductionPHI in the vectorizer to directly analyze the PHIs without SCEV support
If this is the standard way to deal with all secondary inductions, it certainly looks attractive.
Elena, would you try doing this and compare with 1)?

Just to clarify, the code is currently structured to check if the PHI is an add-recurrence that was detected by SCEV. The idea is to add a fall-back to analyze the PHI directly if its type is a float.

There is already precedence for such things in LV. We support more reductions (including floating-point) than what SCEV can analyze, therefore RecurrenceDescriptor::AddReductionVar needs to analyze PHIs directly.

Hi Hideki,

Thanks for the summary!

To the best of my experience, handling case B (secondary induction) is must-have, and if I’m not mistaken,
people aren’t opposed to that.

For me, handling case A (primary induction) is “why not?”, but I certainly admit that that can be very naïve
thinking coming from lack of good understanding on SCEV and their proper usages. Now, let’s assume we
can postpone discussion about case A. What is the best approach to handle case B? Let me summarize
the discussion so far. Hope I didn’t miss anything.

1)
Extend SCEV was the initial approach taken by Elena.
Elena thinks this solution ”looks very structured”.
If I’m not mistaken, some people think this is overkill and overly complicates already complicated SCEV.
Anyone care to look at the patch Elena came up with?

Sorry if I missed it. Could you please post a link to the patch here? I’ve been mostly speculating on how complicated the SCEV solution would be, if the actual patch happens to be really simple and obvious, we can go this way.

Michael

Michael, she hasn’t posted a patch (or the link) yet, but she wrote “I implemented this with FP SCEV and the code looks very structured, including SCEVExpander”.

So, there is actual code that can be looked at. May have to wait until Sunday morning in Israel.

Thanks,

Hideki

FP induction analysis should be useful for other loop optimizations that care about cross-iteration dependence.

Thus, I think the mechanism is best implemented as an analysis or a utility, as opposed to something confined

inside vectorizer. Any objections in this general direction?

Thanks,

Hideki

Michael, she hasn’t posted a patch (or the link) yet, but she wrote “I implemented this with FP SCEV and the code looks very structured, including SCEVExpander”.
So, there is actual code that can be looked at. May have to wait until Sunday morning in Israel.

Ah ok, no worries then. I just thought I missed it.

Michael

FP induction analysis should be useful for other loop optimizations that care about cross-iteration dependence.
Thus, I think the mechanism is best implemented as an analysis or a utility, as opposed to something confined
inside vectorizer. Any objections in this general direction?

Agreed, we usually put these in Transforms/Utils/LoopUtils.h,cpp. (Obviously the goal here is not to provide an alternative to SCEV but to share simple “pattern-matching” code between the loop optimizations.)

Adding support for FP inductions through isInductionPHI() is certainly
possible, I have a relatively small local patch that does exactly that
for simple fp add-recurrence cases, along with changes to the
vectorizer to make it aware of FP inductions. It won't get give you
the powerful reasoning capabilities of SCEV, but for the B-like cases
it should work.

Amara

I'm going to post the code to review in a few days. I'm polishing the code and adding more test cases.