Benchmarking for automatic parallelization project

I think auto-parallelization (for C and many other languages) is quite
practical, but perhaps not in a way that's easily benchmarked. I don't
believe it's effective for "dusty decks", especially with C or C++. On
the other hand, I think it can be an excellent tool to help write new

When I write code for the Cray XMT, I very much rely on their C
compiler to parallelize loops, recognize and rewrite reductions and
recurrences, insert synchronization to enable additional parallelism,
etc. I also rely on feedback from the compiler (in the form of an
annotated listing) to show me when the compiler has done what I
wanted, or been somehow flummoxed. That is, when I write code for
this parallel machine, I have in mind which loops should be
parallelized and I code them in such a way that I hope the compiler
will notice. Then I compile and check. If the compiler failed, I
rewrite and try again, sometimes falling back on pragmas when
necessary (e.g., to assert that I'm confident that a loop can be
safely run in parallel).