Late setting of SCEV NoWrap flags does bad with cache

Maxim_Kazantsev · January 25, 2018, 6:03am

Hello Everyone,

I want to raise a discussion about reasonability of late setting of nsw/nuw/nw flags to SCEV AddRecs through setNoWrapFlags method. A discussion about this have already happened in August last year, there was a concern about different no-wrap flags that come from different sequences of SCEV flags invocations. It was mentioned there that late setting of flags is actually a hack to save some compile time.

Recently I’ve stumbled over a different problem related to this late flag setting. The problem is that current SCEV implementation caches a lot of data about SCEVs. In particular, information about ranges and loop backedge taken counts. A side effect of that is that if we had an AddRec like {1,+,1} and cached its range as [MIN_INT, MAX_INT+1), and then later found out that this recurrency provably has , its cached range doesn’t get updated. As well as ranges of all other SCEVs that depend on that. As result, we have a very weird behavior of SCEV that is unable to prove that sext({1,+,1}) is a positive value just because it has outdated cached ranges.

In fact, I don’t see any point in having / for AddRecs if they are not used to prove useful facts about these recs (such as non-negativeness or positiveness). We will only be able to do it if we accidentally call forgetMemoizedResults for all SCEV that depend on that AddRec (which is very unlikely to ever happen in practice). I tried to clean up cache selectively, but in fact it takes full traversal through all cached SCEVs at least to identify the ones that depend on the updated AddRec, and this completely ruins the idea of saving compile time.

You may find the example of such behavior if you re-enable the assertion from patch https://reviews.llvm.org/rL323309 . The test that was added with this patch fails on this assert with exactly this problem.

My question is: do we still want to have this hack with late setting of nsw/nuw given that this flag does not give us any benefits in many real situations? What would you think?

Best regards,

Max

Sanjoy_Das1 · January 26, 2018, 5:04am

Hi Max,

I want to raise a discussion about reasonability of late setting of
nsw/nuw/nw flags to SCEV AddRecs through setNoWrapFlags method. A discussion
about this have already happened in August last year, there was a concern
about different no-wrap flags that come from different sequences of SCEV
flags invocations. It was mentioned there that late setting of flags is
actually a hack to save some compile time.

Recently I've stumbled over a different problem related to this late flag
setting. The problem is that current SCEV implementation caches a lot of
data about SCEVs. In particular, information about ranges and loop backedge
taken counts. A side effect of that is that if we had an AddRec like {1,+,1}
and cached its range as [MIN_INT, MAX_INT+1), and then later found out that
this recurrency provably has <nsw>, its cached range doesn't get updated. As
well as ranges of all other SCEVs that depend on that. As result, we have a
very weird behavior of SCEV that is unable to prove that sext({1,+,1}<nsw>)
is a positive value just because it has outdated cached ranges.

Yes, this problem has come up several times and it is counter-intuitive.

Part of the problem is (I'm guessing, I wasn't there when SCEV was
written) that SCEV is tuned for C/C++ and for C/C++ add recurrences
are almost always nsw on construction so you don't have these, let's
say "phase ordering", issues.

However, for Java, I was under the impression that despite this
caching issue we are fairly effective at proving ranges. Of course
"fairly effective" does not help if the one thing we don't optimize is
90% of an Important Benchmark(TM).

I see a couple of ways out of this:

1. Eagerly compute nsw/nuw on AddRec construction. I'd be happy
with this if you can show that we don't regress compile time. Other
than the test-suite, you can probably ask for interesting compile time
benchmarks on this list.

2. Don't cache ranges so much. Especially for things like
sext({1,+,1}), just recomputing the ranges may not be significantly
worse than the densemap lookup, assuming the backedge taken count is
already computed. Of course for complex SCEVs we should still cache
ranges, but for those perhaps we're okay with "locking in" a more
conservative result. We could even return conservatives result for
recursion deeper than a certain depth, like we do for Values in
ComputeKnownBits. I like this philosophically since it reduces caching
(a lot of the complexity in SCEV is due to caching), but again, you'll
have to show that we don't regress compile time.

3. Precise invalidation via use-lists -- add use lists to SCEV and
use that to precisely kick values out of caches as you infer NSW/NUW
etc. We can optimize the representation of use lists by e.g. avoiding
keeping use lists for SCEVConstants and exploiting the fact that we
don't delete SCEVs so use lists don't need fast deletion. Again,
you'll need to show compile time memory usage does not go up by very
much.

You may find the example of such behavior if you re-enable the assertion
from patch rG0f720e1296ae . The test that was added with

For that case specifically: I don't think such asserts are a good
idea, unless you've already checked SE.isKnownNonNegative(X) (i.e. not
only do you know that X is "supposed to be" non-negative, but you also
know that SCEV can prove it). For instance see
25170 – PowerPC crash in PPCLoopPreIncPrep::runOnLoop() where SCEV can prove "A-B"
is a constant but can't prove that "B-A" is a constant. I think this
behavior is fine btw, SCEV is allowed to be arbitrarily conservative.

My question is: do we still want to have this hack with late setting of
nsw/nuw given that this flag does not give us any benefits in many real
situations? What would you think?

We can't just remove it (because you'll regress the cases where it
does help), but we can replace it with something that is more
effective.

-- Sanjoy

Maxim_Kazantsev · January 26, 2018, 5:55am

Thanks for your insides Sanjoy!

I don't really believe that option 2 may work just because even if we recalculate the range for this add recurrency, there are already its derivatives with cached ranges (the most obvious example is sext and expressions where this sext is involved). We can speculate about what is "simple enough" to not cache the ranges, but I believe that there is always a counter-example to any definition. It's just maybe hard to construct.

Just a bit of speculation about option 3 (not sure how often this case is): we can have dependency between SCEVs which is not represented in use-list. For example:

  if ({100,+,1} > -1) {
    x = a / {1,+,1}
  }

If we prove nsw for the first addrec, it will also imply nsw for the second addrec (because the number of iterations in this loop is provably less than MAX_INT - 99). But there is no clear use-chain dependency between them. So I would rather belive in option 1 as in the most reliable.

I'll try to experiment with options 1 and 3 and see where it gets us.

Best regards,
Max

Sanjoy_Das1 · January 26, 2018, 6:09am

I don't really believe that option 2 may work just because even if we recalculate the range for this add recurrency, there are already its derivatives with cached ranges (the most obvious example is sext and expressions where this sext is involved). We can speculate about what is "simple enough" to not cache the ranges, but I believe that there is always a counter-example to any definition. It's just maybe hard to construct.

Just a bit of speculation about option 3 (not sure how often this case is): we can have dependency between SCEVs which is not represented in use-list. For example:

  if ({100,+,1} > -1) {
    x = a / {1,+,1}
  }

If we prove nsw for the first addrec, it will also imply nsw for the second addrec (because the number of iterations in this loop is provably less than MAX_INT - 99). But there is no clear use-chain dependency between them. So I would rather belive in option 1 as in the most reliable.

That's true, but that's also not the problem that this thread intends to solve.

I think option 1 is fine, but (3) has greater applicability. For
instance, if we had (3) we would be able to get rid of annoying stuff
like https://github.com/llvm-mirror/llvm/blob/807062f0a15470a029046910f4618f9627ca5c03/lib/Analysis/ScalarEvolution.cpp#L3701

-- Sanjoy

Maxim_Kazantsev · January 26, 2018, 6:15am

That's true, but that's also not the problem that this thread intends to solve.

Agreed.

Thanks again!

Topic		Replies	Views
Late setting of SCEV NoWrap flags does bad with cache LLVM Dev List Archives	1	68	January 25, 2018
Improving SCEV's behavior around IR level no-wrap flags LLVM Dev List Archives	13	106	September 24, 2016
sign extensions, SCEVs, and wrap flags LLVM Dev List Archives	7	121	September 20, 2012
[ScalarEvolution][SCEV] no-wrap flags dependent on order of getSCEV() calls LLVM Dev List Archives	10	91	August 15, 2017
ScalarEvolution invariants around wrapping flags LLVM Dev List Archives	10	96	October 2, 2019

Late setting of SCEV NoWrap flags does bad with cache

Related Topics