Floating Point SCEV Analysis

My response to this patch is below, but adding a floating-point feature to SCEV should really be hashed out on the llvm-dev list first. I would like to get the attention of floating-point experts on the design review.

I’d like to see a small design proposal justifying the feature and defending it’s soundness. My concern is that the approach may not be sound, but providing this core API would encourage llvm dev’s to use the feature without thinking.

I suggest starting with SCEV’s most basic functionality and proving the validity of increasingly complex cases. Can you defend SCEV’s ability to remove loops like this?

float fincby(float start, int N) {
float result = start;
for (int i = 0; i < N; ++i) {
result += 1.0f;
}
return result;
}

-Andy

http://reviews.llvm.org/D20695

I mentioned this before, adding FP support to SCEV is not necessary
for vectorizing FP recurrences if enough intelligence is added to the
isInductionPHI. I have a patch but it's a bit of a hassle for me to
separate it out from some other functionality at the moment. If the
vectorizer is the only place that FP SCEV awareness is needed I also
share the doubts about this approach. I'm happy to stand corrected if
the analytical power of FP-SCEV would give other benefits though.

Amara

I implemented IV simplification with FP SCEV and uploaded a new patch. The loop bellow is converted to “start+const*N”.

(You can see the diff between patches to see what was added).

For reference, the case with a variable loop count is filed as PR27894:
https://llvm.org/bugs/show_bug.cgi?id=27894

And the case with a constant loop count is filed as PR27899:
https://llvm.org/bugs/show_bug.cgi?id=27899

Hi Elena,

At this point we're really waiting on responses to the following three
questions:

# Why is it worth aggressively optimizing floating pointer induction variables?

By "floating point induction variable" I'm referring to "for (float i
= 0.0; i < F; i += 1.0)", not "for (int i = 0 to N) x += 5.0". If
you're mainly interested in the latter, IMO the SCEV changes are
overkill -- we should just pattern match floating point reductions
outside of SCEV.

So far the only response I've seen is "ideally we'd like to vectorize
even if the induction variable is in floating point", and while I
agree with the sentiment (obviously, all else being equal, I'd like
LLVM to be as effective as possible), to justify this large a change
we'd need a compelling argument why such cases (loops with floating
point controlling induction variables) are interesting optimization
targets. Are
there real world program that have this kind of behavior?

I also think you're underestimating how complex the change is. It may
look simple now, but this will add significant cognitive overhead
moving forward, more so not because we're adding "one more" type, but
because we're going from "only one type" to "two types".
E.g. getRange() will be invalid over half the SCEV kinds, unless we
add a new ConstantFPRange class.

# How far can we get without teaching SCEV about floating point at all?

If you do have a set of applications that do rely on floating point
induction variables, how far can you push them by transforming the
floating point induction variables to integer induction variables?
This should have significantly less surface area than the FP-SCEV
changes and if it is enough, then we should prefer it over FP-SCEV.

OTOH if you have interesting use-cases that *cannot* be resolved by
the transform above, then that's a good case for moving FP-SCEV
forward.

# What things are you depending on to have the SCEVs accurately
represent FP semantics?

I think you at least need associativity, to be able to use SCEV's
canonicalization of N-ary expressions over the possible ordering of
operands. Given you want to do trip count computations, then you
probably also need "A + X != X" for X != 0.0.

Are fast-math flags enough for this?

(I typed this in before reading
https://llvm.org/bugs/show_bug.cgi?id=27894#c7 , Hal has answered the
question there).

And as Andy said, we'd need to circle in a floating point expert into
the review.

-- Sanjoy

From: "Sanjoy Das via llvm-dev" <llvm-dev@lists.llvm.org>
To: "Sanjay Patel" <spatel@rotateright.com>
Cc: "llvm-dev" <llvm-dev@lists.llvm.org>
Sent: Thursday, June 2, 2016 5:45:50 PM
Subject: Re: [llvm-dev] Floating Point SCEV Analysis

Hi Elena,

At this point we're really waiting on responses to the following
three
questions:

# Why is it worth aggressively optimizing floating pointer induction
variables?

By "floating point induction variable" I'm referring to "for (float i
= 0.0; i < F; i += 1.0)", not "for (int i = 0 to N) x += 5.0". If
you're mainly interested in the latter, IMO the SCEV changes are
overkill -- we should just pattern match floating point reductions
outside of SCEV.

So far the only response I've seen is "ideally we'd like to vectorize
even if the induction variable is in floating point", and while I
agree with the sentiment (obviously, all else being equal, I'd like
LLVM to be as effective as possible), to justify this large a change
we'd need a compelling argument why such cases (loops with floating
point controlling induction variables) are interesting optimization
targets. Are
there real world program that have this kind of behavior?

I also think you're underestimating how complex the change is. It
may
look simple now, but this will add significant cognitive overhead
moving forward, more so not because we're adding "one more" type, but
because we're going from "only one type" to "two types".
E.g. getRange() will be invalid over half the SCEV kinds, unless we
add a new ConstantFPRange class.

# How far can we get without teaching SCEV about floating point at
all?

If you do have a set of applications that do rely on floating point
induction variables, how far can you push them by transforming the
floating point induction variables to integer induction variables?
This should have significantly less surface area than the FP-SCEV
changes and if it is enough, then we should prefer it over FP-SCEV.

OTOH if you have interesting use-cases that *cannot* be resolved by
the transform above, then that's a good case for moving FP-SCEV
forward.

# What things are you depending on to have the SCEVs accurately
represent FP semantics?

I think you at least need associativity, to be able to use SCEV's
canonicalization of N-ary expressions over the possible ordering of
operands. Given you want to do trip count computations, then you
probably also need "A + X != X" for X != 0.0.

Are fast-math flags enough for this?

(I typed this in before reading
https://llvm.org/bugs/show_bug.cgi?id=27894#c7 , Hal has answered the
question there).

I'd say that I proposed an answer there. :wink: -- We should probably move this part of the discussion to llvm-dev/cfe-dev.

-Hal

Hi Elena,

   >
   >At this point we're really waiting on responses to the following three
   >questions:
   >
   ># Why is it worth aggressively optimizing floating pointer induction
   >variables?
   >
   >By "floating point induction variable" I'm referring to "for (float i = 0.0; i < F;
   >i += 1.0)", not "for (int i = 0 to N) x += 5.0". If you're mainly interested in
   >the latter, IMO the SCEV changes are overkill -- we should just pattern
   >match floating point reductions outside of SCEV.

[Demikhovsky, Elena] Pattern match for "for (int i = 0 to N) x += 5.0" may be not so difficult.
But you can't cover everything with patterns, for example:
for (int i = 0 to N) {
  x += 5.0
  y = (float)i * 0.5 + z
  A[i] = x + y
}
This loop is resolved with FP_SCEV. You can say that FP SCEV is too powerful for this small task, but it is using the *existing* algorithm.
I don't need to invent anything for the *recurrent* calculation of (x+y). SCEV allows to estimate behavior of FP value inside loop - I mean isLoopInvariant() at least.

Conversion of " for (int i = 0 to N) x += 5.0 " to 5.0*N is also very convenient with FP SCEV. (Let's assume that we let this transformation in some mode)
Just because we do not need to implement anything new. The engine is already exists.

Are there real world program that have this kind of behavior?

[Demikhovsky, Elena] As Hideki said, fortran applications are FP IV based. I can't say how easy to map FP IV to integer.

   >
   >I also think you're underestimating how complex the change is. It may look
   >simple now, but this will add significant cognitive overhead moving
   >forward, more so not because we're adding "one more" type, but because
   >we're going from "only one type" to "two types".
   >E.g. getRange() will be invalid over half the SCEV kinds, unless we add a
   >new ConstantFPRange class.

[Demikhovsky, Elena] I have primary implementation of ConstantFPRange, but I don't want to upload it now.
The first patch is for the *secondary* FP IV only.

[Demikhovsky, Elena] I have a question - will you allow implementation of FP SCEV for secondary IVs only and use it for recurrence, like in 2 examples above?
Let it will be restricted and not as powerful as integer on this stage. I'll limit transformations that can be done with FP SCEV.
It is still better than "pattern match" in order to vectorize loops with FP variables.
Do you see other than "pattern match" solution that will work like FP SCEV for FP variables?

Thank you.
- Elena

Hi Elena,

At this point we're really waiting on responses to the following three
questions:

# Why is it worth aggressively optimizing floating pointer induction
variables?

By "floating point induction variable" I'm referring to "for (float i
= 0.0; i < F; i += 1.0)", not "for (int i = 0 to N) x += 5.0". If
you're mainly interested in the latter, IMO the SCEV changes are
overkill -- we should just pattern match floating point reductions
outside of SCEV.

So far the only response I've seen is "ideally we'd like to vectorize
even if the induction variable is in floating point", and while I
agree with the sentiment (obviously, all else being equal, I'd like
LLVM to be as effective as possible), to justify this large a change
we'd need a compelling argument why such cases (loops with floating
point controlling induction variables) are interesting optimization
targets. Are
there real world program that have this kind of behavior?

The cases that immediately come to mind for me are time step integration
loops like:

T x = ...;
for (float t = 0; t < tmax; t += dt) {
  T dx = dt * fprime(x, t);
  x += dx;
}

But I'm not sure how much we expect to speed things up considering the
loop-carried dependence on `x`.

-- Sean Silva