LSR breaks debug info

The Loop Strength Reduction pass appears to break debug information even for

the most basic input. I believe this is a well known issue already (see

https://bugs.llvm.org/show_bug.cgi?id=38815) but I also believe that it deserve

some extra attention.

Consider the following input compiled with ‘clang -g -O3 foo.c -mllvm -print-after-all’

Hi Markus,

The Loop Strength Reduction pass appears to break debug information even for
the most basic input. I believe this is a well known issue already (see
https://bugs.llvm.org/show_bug.cgi?id=38815) but I also believe that it deserve
some extra attention.

Indeed, it's a poor performer, and losing loop variables is one of the
most common complaints I've heard,

[...]

One idea for how to address this would be that, since LSR is a SCEV based
optimization, one could perform additional debug salvaging by comparing SCEV
expressions for the new and old PHI-node and then adjusting DIExpressions if
they match with an offset.

Any thoughts on that?

It sounds like a plan -- I'm not especially familiar with SCEV, but if
we can determine equivalence in that manner then that kind of
salvaging would be sound. It feels like it would cover quite a common
case, and would be an "easy" win.

How difficult is it to identify an offset in SCEV? It's probably OK to
just compare expressions, as offsets such as in your example can be
recovered with the existing salvageDebugInfo function. It wouldn't be
worth putting a lot of effort into /interpreting/ SCEV expressions, as
we may as well go all the way and ask SCEV to produce an expression
for each dbg.value, and implement the general solution. That's what
I'd like to happen in the long term, but this sounds like a great
stepping stone along the way.

Hi Jeremy,

It sounds like a plan -- I'm not especially familiar with SCEV, but if we can
determine equivalence in that manner then that kind of salvaging would be
sound. It feels like it would cover quite a common case, and would be an
"easy" win.

I am not very familiar with IR SCEV (or the LSR code) but in our downstream target we have a similar pass to LSR that operates on MIR and uses a SCEV implementation for our MIR. There we face almost exactly the same problem of how to keep the DBG_VALUEs up to date after an induction variable has been replaced with a newly generated instruction sequence (including PHI). In that setting (which is likely much simplified compared to IR SCEV and LSR) I have played a bit with comparing SCEV expressions and compensating for potential offsets with DIExpressions and there that seems like a viable solution.

How difficult is it to identify an offset in SCEV? It's probably OK to just
compare expressions, as offsets such as in your example can be recovered
with the existing salvageDebugInfo function. It wouldn't be worth putting a
lot of effort into /interpreting/ SCEV expressions, as we may as well go all the
way and ask SCEV to produce an expression for each dbg.value, and
implement the general solution. That's what I'd like to happen in the long
term, but this sounds like a great stepping stone along the way.

For the IR I posted earlier I believe the SCEVs would be

Before LSR: %p.addr.05 {%p,+,3}
After LSR: %lsr.iv {3+%p,+,3}

So I do not think it would be very difficult to extract these offsets but then again I am not really familiar with IR SCEV.
I will try to look a round a bit in LSR and see if I can come up with something.

Thanks
Markus