Anyone?

To back you up with a repost from another thread: LSR is counter

productive for easy to understand cases on x86. Take this simple

array copy for example where my assessment seems similar yours.

void copy(int* src, int* dst, unsigned int size) {

for (unsigned int i = 0; i < size; ++i)

dst[i] = src[i];

}

The hot spot for i686 contains 6 instructions with LSR:

8b 32 mov (%edx),%esi

89 31 mov %esi,(%ecx)

83 c1 04 add $0x4,%ecx

83 c2 04 add $0x4,%edx

48 dec %eax

75 f3 jne 11 <copy+0x11>

but only 5 instructions *without* LSR.

8b 3c b2 mov (%edx,%esi,4),%edi

89 3c b1 mov %edi,(%ecx,%esi,4)

46 inc %esi

39 c6 cmp %eax,%esi

75 f5 jne 14 <copy+0x14>

My interpretation of this problem was that LSR starts analysis with an

SCEV that counts down toward zero even in an upward counting loop. A

more natural fit would be an initial IV going from 0 to some limit,

but there does not seem to be a way to express this sort of SCEV. The

initial condition puts an artificially high cost on solutions with

upward counting registers since none match the "free" down counting

SCEV. Notice the weird 'dec %eax' showing up as an artifact. The

best solution with base-index scale 4 addressing was considered by

LSR, but incorrectly deemed to be too expensive.