some llvm/clang missed optimizations

A few random observations:

1.

Clang could do better with large but boring switches like this:

http://embed.cs.utah.edu/embarrassing/jan_10/harvest/source/E8/E88C5111.shtml

Performance of clang's output will be fine but this is a major code size lose.

2.

Destruction of stupid loops is incomplete, sometimes due to phase ordering problems:

http://embed.cs.utah.edu/embarrassing/jan_10/harvest/source/FC/FCADC848.shtml

Sometimes not:

http://embed.cs.utah.edu/embarrassing/jan_10/harvest/source/EC/ECC74C0C.shtml

This is both a speed and size issue. Probably this kind of code most often appears in machine-generated C or where loops contain logging code that is conditionally compiled away.

3.

Repetitive code with lots of bitwise operations is compiled by LLVM into much larger code than the other compilers:

http://embed.cs.utah.edu/embarrassing/jan_10/harvest/source/ED/ED37DAF5.shtml
http://embed.cs.utah.edu/embarrassing/jan_10/harvest/source/1F/1F4003C7.shtml

Note that this is straight-line code, so LLVM's output will run 4-5 times longer than everyone else's.

I'll be interested to learn the source of this one.

4.

It seems possible to do a better job recognizing that the current stack frame can be used unmodified by a new call:

http://embed.cs.utah.edu/embarrassing/jan_10/harvest/source/0A/0A6CDE2D.shtml

This is a speed lose as well as size. This pattern seems quite common in real code, due to layered APIs. Of course when IPO is on, most of these calls should be destroyed.

5.

Sometimes a function modifies globals but even so has no net effect:

http://embed.cs.utah.edu/embarrassing/jan_10/harvest/source/8A/8AB0B238.shtml
http://embed.cs.utah.edu/embarrassing/jan_10/harvest/source/14/14157FE8.shtml

Somehow gcc3 sees these but everyone else including gcc4 fails.

6.

Here llvm-gcc and gcc, but not clang, exploit undefinedness of integer overflow to eliminate most of the code in a function:

http://embed.cs.utah.edu/embarrassing/jan_10/harvest/source/82/82A5CC31.shtml

Most likely this is not what the authors of the code intended, but the compilers are correct.

7.

Cute elimination of useless varargs code:

http://embed.cs.utah.edu/embarrassing/jan_10/harvest/source/3A/3A235937.shtml

John

2.
Sometimes not:

http://embed.cs.utah.edu/embarrassing/jan_10/harvest/source/EC/ECC74C0C.shtml

The primary issue here is that scalar evolution doesn't know how to
deal with loops using "sle" for the exit condition. Shouldn't be too
hard to fix now that we have overflow flags for addition.

3.

Repetitive code with lots of bitwise operations is compiled by LLVM into
much larger code than the other compilers:

http://embed.cs.utah.edu/embarrassing/jan_10/harvest/source/ED/ED37DAF5.shtml
http://embed.cs.utah.edu/embarrassing/jan_10/harvest/source/1F/1F4003C7.shtml

Note that this is straight-line code, so LLVM's output will run 4-5
times longer than everyone else's.

I'll be interested to learn the source of this one.

This looks like a one-off case; instcombine destroys the symmetry of
the code that the test harness duplicated by reducing the masking
constants. Probably too complicated for too little gain to be worth
pursuing.

5.

Sometimes a function modifies globals but even so has no net effect:

http://embed.cs.utah.edu/embarrassing/jan_10/harvest/source/8A/8AB0B238.shtml
http://embed.cs.utah.edu/embarrassing/jan_10/harvest/source/14/14157FE8.shtml

Somehow gcc3 sees these but everyone else including gcc4 fails.

6.

Here llvm-gcc and gcc, but not clang, exploit undefinedness of integer
overflow to eliminate most of the code in a function:

http://embed.cs.utah.edu/embarrassing/jan_10/harvest/source/82/82A5CC31.shtml

Most likely this is not what the authors of the code intended, but the
compilers are correct.

LLVM doesn't handle correlated expressions at the moment.

-Eli

Repetitive code with lots of bitwise operations is compiled by LLVM into
much larger code than the other compilers:

http://embed.cs.utah.edu/embarrassing/jan_10/harvest/source/ED/ED37DAF5.shtml
http://embed.cs.utah.edu/embarrassing/jan_10/harvest/source/1F/1F4003C7.shtml

Note that this is straight-line code, so LLVM's output will run 4-5
times longer than everyone else's.

I'll be interested to learn the source of this one.

This looks like a one-off case; instcombine destroys the symmetry of
the code that the test harness duplicated by reducing the masking
constants. Probably too complicated for too little gain to be worth
pursuing.

There are a bunch of these actually, I can try to make a list...

John

Umm, can you find one that isn't a popcount implementation?

-Eli

Umm, can you find one that isn't a popcount implementation?

Ok.

MMX psadbw instruction:

http://embed.cs.utah.edu/embarrassing/jan_10/harvest/source/CE/CE3DA132.shtml

Position of first set bit:

http://embed.cs.utah.edu/embarrassing/jan_10/harvest/source/1F/1F4003C7.shtml

Log2 floor:

http://embed.cs.utah.edu/embarrassing/jan_10/harvest/source/83/837A80E9.shtml

Pixel format conversion:

http://embed.cs.utah.edu/embarrassing/jan_10/harvest/source/EC/EC3353C5.shtml

John

Okay... it definitely appears that the issue is being introduced by
your test harness duplicating expressions.

-Eli

Okay... it definitely appears that the issue is being introduced by
your test harness duplicating expressions.

Hmm, our copy propagator has probably gone awry. I'll ask my CIL hacker to look into it...

John