Sorry I missed this message initially. Trip count computation
really does not do well with non-unit strides, especially when they
aren't represented be an existing phi. If there was a simple
fix, it would have been fixed long ago.

It would help to add a check for %n > 0 before the loop, but the
bigger problem is that there are a couple different places where we
effectively need know that %n + 4 <= INT64_MAX.

The question is whether we can infer this from the NSW flag. There are
multiple problems with this, and there's no one place we can currently
handle it. Just consider this issue: the original phi is a recurrence
{0,+,1}<L> which cannot wrap. But we have no phi that represents the
scaled form of that recurrence {0,+,4}<L>. So how do we know that
recurrence can never wrap? We have a multiply instruction that doesn't
wrap, but that only tells us that one value at a particular point in
the code doesn't wrap. It doesn't tell us about this ethereal thing we
call a recurrence. We would have to determine that whenever we rely on
our computed expression for the scaled recurrence, we are dominated by
that multiply--not something we can currently represent.

This limitation is fundamental to the design of SCEV. The problem is
not really SCEV though, it's the way LLVM uses it. It's obvious
looking at your test case that the trip count is computed by a
mul<nsw> of {0,+,1}<L><nsw>, so cannot wrap. I could certainly write
an analysis that works directly on your IR to determine that fact. In
general, I think we can do better by redesigning analysis currently
encapsulated within SCEV to be expressed in terms of IR values, using
SCEV as a supplemental analysis rather than a proxy for the IR. This
would decrease the modularity and generality of the code, but would
allow LLVM to properly deal with messy realities of NSW semantics and
control dependence.

We could start by rewriting trip count computation in this
manner. File a bug if you'd like. I can't say I'll be able to
implement it in the near future.

This problem could also be engineered away by relying on profile
data. If we know this is *the* important loop, we can disregard
code bloat and create two versions gated by bounds checks. Then we can
ignore NSW/NUW flags because we have an explicit test in the CFG.