[ScalarEvolution][SCEV] no-wrap flags dependent on order of getSCEV() calls

Hi all,

I'm looking into resolving a FIXME in the LoopDataPrefetch (and FalkorMarkStridedAccesses) pass by marking both of these passes as preserving the ScalarEvolution analysis. Unfortunately, when this change is made, LSR will generate different code. One of the root causes seems to be that SCEV will return different nsw/nuw flags for the same Value, depending on what order the SCEVs are computed, due to the fact that the SCEV object unique-ing doesn't take the nsw/nuw flags into account. Since LoopDataPrefetch computes SCEVs in a different order than LSR, the nsw/nuw flags seen by LSR will differ based on whether the SCEVs are preserved from LoopDataPrefetch.

I believe this issue has been discussed before, but I just wanted to check if this is indeed the current expected behavior, and if anyone has any plans/ideas for addressing this issue.

For reference, below is a reduced loop where this problem occurs. The SCEV for %i.07.i will have <nuw> or not depending on whether %idxprom.i was computed before it:

for.body.i:
   %i.07.i = phi i32 [ %inc.i, %for.body.i ], [ 0, %for.body.i.preheader ]
   %idxprom.i = zext i32 %i.07.i to i64
   %arrayidx.i = getelementptr inbounds i32, i32* %rot, i64 %idxprom.i
   %0 = load i32, i32* %arrayidx.i
   %inc.i = add i32 %i.07.i, 1
   %cmp.i = icmp ult i32 %inc.i, %size
   br i1 %cmp.i, label %for.body.i, label %do_bwe.exit

Hi all,

I'm looking into resolving a FIXME in the LoopDataPrefetch (and FalkorMarkStridedAccesses) pass by marking both of these passes as preserving the ScalarEvolution analysis. Unfortunately, when this change is made, LSR will generate different code. One of the root causes seems to be that SCEV will return different nsw/nuw flags for the same Value, depending on what order the SCEVs are computed, due to the fact that the SCEV object unique-ing doesn't take the nsw/nuw flags into account. Since LoopDataPrefetch computes SCEVs in a different order than LSR, the nsw/nuw flags seen by LSR will differ based on whether the SCEVs are preserved from LoopDataPrefetch.

I believe this issue has been discussed before, but I just wanted to check if this is indeed the current expected behavior, and if anyone has any plans/ideas for addressing this issue.

The general issue that SCEV nsw is weird is known... see, for example 23527 – [ScalarEvolution] createNodeForGEP over-optimistic about preserving nsw.

For reference, below is a reduced loop where this problem occurs. The SCEV for %i.07.i will have <nuw> or not depending on whether %idxprom.i was computed before it:

%idxprom.i, the zext? I'm not sure how you're getting that particular effect. ScalarEvolution::getSCEV for a zext immediately calls getSCEV on its operand.

-Eli

Hi all,

I'm looking into resolving a FIXME in the LoopDataPrefetch (and FalkorMarkStridedAccesses) pass by marking both of these passes as preserving the ScalarEvolution analysis. Unfortunately, when this change is made, LSR will generate different code. One of the root causes seems to be that SCEV will return different nsw/nuw flags for the same Value, depending on what order the SCEVs are computed, due to the fact that the SCEV object unique-ing doesn't take the nsw/nuw flags into account. Since LoopDataPrefetch computes SCEVs in a different order than LSR, the nsw/nuw flags seen by LSR will differ based on whether the SCEVs are preserved from LoopDataPrefetch.

I believe this issue has been discussed before, but I just wanted to check if this is indeed the current expected behavior, and if anyone has any plans/ideas for addressing this issue.

The general issue that SCEV nsw is weird is known... see, for example 23527 – [ScalarEvolution] createNodeForGEP over-optimistic about preserving nsw.

For reference, below is a reduced loop where this problem occurs. The SCEV for %i.07.i will have <nuw> or not depending on whether %idxprom.i was computed before it:

%idxprom.i, the zext? I'm not sure how you're getting that particular effect. ScalarEvolution::getSCEV for a zext immediately calls getSCEV on its operand.

Here is an abridged record of the getSCEV results as seen by each pass with/without preserving SCEVAnalysis.
In the first case, when the SCEV is invalidated, the SCEV for %i.07.i is computed in LSR as {0,+,1}<%for.body.i>. In the second case, the SCEV for %i.07.i is computed in LSR the same way, but because the SCEV for %idxprom.i from FalkorHWPFFix unique's to the same value but has the nuw flag set and is still present in the foldingset, {0,+,1}<nuw><%for.body.i> is returned for %i.07.i in LSR instead. The SCEVs for other values differ too, but I thought I'd start with this one.

********** FalkorHWPFFix **********
Created SCEV for %i.07.i = phi i32 [ %inc.i, %for.inc.i ], [ 0, %for.body.i.preheader ]: {0,+,1}<%for.body.i>
Created SCEV for %idxprom.i = zext i32 %i.07.i to i64: (zext i32 {0,+,1}<%for.body.i> to i64)
Created SCEV for %arrayidx.i = getelementptr inbounds i32, i32* %rot, i64 %idxprom.i: ((4 * (zext i32 {0,+,1}<%for.body.i> to i64))<nuw><nsw> + %rot)<nsw>
Created SCEV for %inc.i = add i32 %i.07.i, 1: {1,+,1}<%for.body.i>
Created SCEV for %idxprom.i = zext i32 %i.07.i to i64: {0,+,1}<nuw><%for.body.i>
Created SCEV for %arrayidx.i = getelementptr inbounds i32, i32* %rot, i64 %idxprom.i: {%rot,+,4}<nw><%for.body.i>

********** SCEV invalidated **********

********** LSR **********
Created SCEV for %i.07.i = phi i32 [ %inc.i, %for.inc.i ], [ 0, %for.body.i.preheader ]: {0,+,1}<%for.body.i>
Created SCEV for %idxprom.i = zext i32 %i.07.i to i64: (zext i32 {0,+,1}<%for.body.i> to i64)
Created SCEV for %arrayidx.i = getelementptr inbounds i32, i32* %rot, i64 %idxprom.i: ((4 * (zext i32 {0,+,1}<%for.body.i> to i64))<nuw><nsw> + %rot)<nsw>
Created SCEV for %inc.i = add i32 %i.07.i, 1: {1,+,1}<%for.body.i>
Created SCEV for %cmp.i = icmp ult i32 %inc.i, %size: %cmp.i

vs.

********** FalkorHWPFFix **********
Created SCEV for %i.07.i = phi i32 [ %inc.i, %for.inc.i ], [ 0, %for.body.i.preheader ]: {0,+,1}<%for.body.i>
Created SCEV for %idxprom.i = zext i32 %i.07.i to i64: (zext i32 {0,+,1}<%for.body.i> to i64)
Created SCEV for %arrayidx.i = getelementptr inbounds i32, i32* %rot, i64 %idxprom.i: ((4 * (zext i32 {0,+,1}<%for.body.i> to i64))<nuw><nsw> + %rot)<nsw>
Created SCEV for %inc.i = add i32 %i.07.i, 1: {1,+,1}<%for.body.i>
Created SCEV for %idxprom.i = zext i32 %i.07.i to i64: {0,+,1}<nuw><%for.body.i>
Created SCEV for %arrayidx.i = getelementptr inbounds i32, i32* %rot, i64 %idxprom.i: {%rot,+,4}<nw><%for.body.i>

********** SCEV preserved **********

********** LSR **********
Created SCEV for %i.07.i = phi i32 [ %inc.i, %for.inc.i ], [ 0, %for.body.i.preheader ]: {0,+,1}<nuw><%for.body.i>
                                                                                                 ^^^^^
Created SCEV for %inc.i = add i32 %i.07.i, 1: {1,+,1}<nuw><%for.body.i>
Existing SCEV for %idxprom.i = zext i32 %i.07.i to i64: {0,+,1}<nuw><%for.body.i>
Existing SCEV for %arrayidx.i = getelementptr inbounds i32, i32* %rot, i64 %idxprom.i: {%rot,+,4}<nw><%for.body.i>
Created SCEV for %cmp.i = icmp ult i32 %inc.i, %size: %cmp.i
Created SCEV for %lsr.iv = phi i64 [ 0, %for.body.i.preheader ], [ %lsr.iv.next, %for.inc.i ]: {0,+,1}<nuw><nsw><%for.body.i>

Oh, I see... yes, we do stupid things involving mutating NoWrap flags after a SCEV is created. (grep for setNoWrapFlags in ScalarEvolution.cpp.)

-Eli

Hi,

Oh, I see... yes, we do stupid things involving mutating NoWrap flags after
a SCEV is created. (grep for setNoWrapFlags in ScalarEvolution.cpp.)

That's really a compile time hack -- we defer some expensive tricks to
prove nsw/nuw on an add recurrences to when we've been asked to
sign/zero extend said add recurrence.

I would be okay if you wanted to expose those tricks via a helper in
SCEV that could be called independently of creating zero or sign
extensions. That way SCEV clients could trade-off compile time for
more precise information in a granular way.

-- Sanjoy

Hi,

Oh, I see... yes, we do stupid things involving mutating NoWrap flags after
a SCEV is created. (grep for setNoWrapFlags in ScalarEvolution.cpp.)

That's really a compile time hack -- we defer some expensive tricks to
prove nsw/nuw on an add recurrences to when we've been asked to
sign/zero extend said add recurrence.

If that is the case, preserving SCEV analysis where we didn't before should only
result in later passes seeing more nsw/nuw flags, which should theoretically be
beneficial?

I would be okay if you wanted to expose those tricks via a helper in
SCEV that could be called independently of creating zero or sign
extensions. That way SCEV clients could trade-off compile time for> more precise information in a granular way.

I'm not concerned with getting better SCEV results at the moment, I just want to
make sure that changing these passes to preserve SCEV analysis is behaving as
expected (which it sounds like it is).

Hi Geoff,

Hi,

Oh, I see... yes, we do stupid things involving mutating NoWrap flags
after
a SCEV is created. (grep for setNoWrapFlags in ScalarEvolution.cpp.)

That's really a compile time hack -- we defer some expensive tricks to
prove nsw/nuw on an add recurrences to when we've been asked to
sign/zero extend said add recurrence.

If that is the case, preserving SCEV analysis where we didn't before should
only
result in later passes seeing more nsw/nuw flags, which should theoretically
be
beneficial?

There are downsides to preserving SCEV for "too long" though -- if you
simplified a loop in a way that SCEV can compute its trip count when
it couldn't before (or it could compute a trip count before, but now
it can compute a more precise trip count) then invalidating the cache
can be an improvement. However, in this case judicious use of
forgetLoop() (as opposed to re-constructing the whole of SCEV) will
also do the job.

-- Sanjoy

Hi Sanjoy,

[adding Adam since I believe he added the original FIXME to preserve SCEV
in LoopDataPrefetch]

Hi Geoff,

Hi,

Oh, I see... yes, we do stupid things involving mutating NoWrap flags
after
a SCEV is created. (grep for setNoWrapFlags in ScalarEvolution.cpp.)

That's really a compile time hack -- we defer some expensive tricks to
prove nsw/nuw on an add recurrences to when we've been asked to
sign/zero extend said add recurrence.

If that is the case, preserving SCEV analysis where we didn't before should
only
result in later passes seeing more nsw/nuw flags, which should theoretically
be
beneficial?

There are downsides to preserving SCEV for "too long" though -- if you
simplified a loop in a way that SCEV can compute its trip count when
it couldn't before (or it could compute a trip count before, but now
it can compute a more precise trip count) then invalidating the cache
can be an improvement. However, in this case judicious use of
forgetLoop() (as opposed to re-constructing the whole of SCEV) will
also do the job.

In this particular case, neither of the passes that I'm marking as newly
preserving SCEV modify any loop structure or any instructions that should
impact any of the existing SCEV values. All they do is add prefetch
instructions (and some new instructions to compute the addresses to
prefetch) and annotate some load instructions with metadata. That's why
I was surprised to see changes in generated code from marking them as
preserving SCEV.

I've done some performance testing and didn't see any significant changes
in performance due to these minor SCEV changes, so I think I'm going to
post a patch to enable SCEV preservation in these passes and move on.

Hi Sanjoy,

[adding Adam since I believe he added the original FIXME to preserve SCEV
in LoopDataPrefetch]

For record, that wasn’t me. It was there from the beginning when Hal added the PPC-specific pass.

Adam

Hi Sanjoy,

[adding Adam since I believe he added the original FIXME to preserve SCEV
in LoopDataPrefetch]

For record, that wasn’t me. It was there from the beginning when Hal added the PPC-specific pass.

My mistake. Hal, would you like to chime in on ⚙ D36716 [LoopDataPrefetch][AArch64FalkorHWPFFix] Preserve ScalarEvolution

Hi Sanjoy,

[adding Adam since I believe he added the original FIXME to preserve SCEV
in LoopDataPrefetch]

For record, that wasn’t me. It was there from the beginning when Hal added the PPC-specific pass.

My mistake. Hal, would you like to chime in on ⚙ D36716 [LoopDataPrefetch][AArch64FalkorHWPFFix] Preserve ScalarEvolution '

I don't have anything to add in particular. As I recall, marking this as preserved used to cause crashes later in the pipeline in some cases. Might as well see if it's still true.

Thanks again,
Hal