Problem with LSR


To back you up with a repost from another thread: LSR is counter
productive for easy to understand cases on x86. Take this simple
array copy for example where my assessment seems similar yours.

void copy(int* src, int* dst, unsigned int size) {
    for (unsigned int i = 0; i < size; ++i)
        dst[i] = src[i];

The hot spot for i686 contains 6 instructions with LSR:

8b 32 mov (%edx),%esi
89 31 mov %esi,(%ecx)
83 c1 04 add $0x4,%ecx
83 c2 04 add $0x4,%edx
48 dec %eax
75 f3 jne 11 <copy+0x11>

but only 5 instructions *without* LSR.

8b 3c b2 mov (%edx,%esi,4),%edi
89 3c b1 mov %edi,(%ecx,%esi,4)
46 inc %esi
39 c6 cmp %eax,%esi
75 f5 jne 14 <copy+0x14>

My interpretation of this problem was that LSR starts analysis with an
SCEV that counts down toward zero even in an upward counting loop. A
more natural fit would be an initial IV going from 0 to some limit,
but there does not seem to be a way to express this sort of SCEV. The
initial condition puts an artificially high cost on solutions with
upward counting registers since none match the "free" down counting
SCEV. Notice the weird 'dec %eax' showing up as an artifact. The
best solution with base-index scale 4 addressing was considered by
LSR, but incorrectly deemed to be too expensive.