LLVM misses some cross-MBB and loop optimizations compared to GCC


While testing my new register allocators on some test-cases, I've
noticed that LLVM misses sometimes some optimization opportunities:

1) LocalSpiller::RewriteMBB seems not to propagate the information
about e.g. Spills between MBBs.In many cases, where MBB B1 has only
one predecessor MBB B2, B1 could reuse the information about the
physical registers that are in the live-out set of B2. This could help
to e.g. eliminate some useless reloads from spill slots, if the value
is available on the required physical register already. For example,
in the example below, the marked "movl 12(%esp), %ecx" instruction
could be eliminated.

.LBB2_2: # bb31
        movl 12(%esp), %ecx
        movl 8(%esp), %eax
        cmpl $0, up+28(%eax,%ecx,4)
        je .LBB2_9 # bb569
.LBB2_3: # bb41 ; <--- bb31 is the only predecessor of bb41
        movl 12(%esp), %ecx ; <--- This could be eliminated!!!
        movl 4(%esp), %eax
        cmpl $0, down(%eax,%ecx,4)
        je .LBB2_9 # bb569

It is also worth mentioning, that currently reloads from spill slots
are not recorded in the Spills set using the addAvailable method, as
far as I can see. Wouldn't it make sense?

I have the feeling that these improvements are rather easy to achieve
and would not require too much changes to the LocalSpiller. Probably,
we just need to keep the live-out set of the MBB around after
rewriting it, so that its successors can use it in some cases as
initial value for the Spills set.

Any opinions?

2) Moving of sub-expressions from loops and replacement of array
accesses via pointer-based induction variables is also not optimal in
some situations.
   In the example mentioned above, both blocks are executed inside a
loop enclosing them. And they keep evaluating e.g. the
down(%eax,%ecx,4) expression on every iteration. GCC at the same time
hoists this expression outside of the loop and replaces it with a
simple pointer, as you can see below:

        movl -32(%ebp), %edx
        movl 28(%edx), %eax
        testl %eax, %eax
        je .L5

        movl -48(%ebp), %eax
        movl (%eax), %edi
        testl %edi, %edi
        je .L5

To make it possible for you to analyze this test-case, I attach the
source file, the BC file and the output of the code produced by LLVM
and by "GCC -O6".


8q_speed.c.s (10.2 KB)

8q_speed.s.gcc (12.2 KB)

8q_speed.c.bc (4.61 KB)

8q_speed.c (595 Bytes)

Thanks. Can you file bugzilla reports? I'll look at the first one soon.



Please check these Bugzilla entries:

http://llvm.org/bugs/show_bug.cgi?id=3495 (LocalSpiller problems)

http://llvm.org/bugs/show_bug.cgi?id=3496 (Loop optimization problems)