My comment was mostly in response to the Intel proposal, which effectively translates OpenMP pragmas directly into llvm intrinsics + metadata. I can't imagine a way to make this work *correctly* without massive changes to the optimizer.
There are three ways to make this work correctly:
1) Ignore OpenMP-related intrinsics and associated metadata. Least
effort, least benefit (no OpenMP support). Yet, OpenMP programs
compiled correctly, as if no pragmas are present -- including *exactly
the same* number of routines and call graph (thanks to no
procedurization in front-end). OpenMP specification allow such
compilation. This might be the choice for targets that don't support
OpenMP runtime library.
2) Make procedurization (including all runtime calls -- no intrinsics
left after this step) at the very start of LLVM optimizer. No changes
to optimizations, but no opportunity to optimize parallel code. As
cheap and easy as one can do to support OpenMP. This might be a good
choice for initial implementation.
3) Do some carefully chosen optimizations before procedurization. Do
heavylifting (like loop restructuring optimizations) after
procedurization. Some effort, a lot of benefit. This is essentially
what is described in [Tian05] (referenced in our proposal).
4) Make all optimizations thread-aware. Best approach in theory, no
compilers exist that go as far.
Our proposal make all these choices possible. One can implement 1) in
half an hour, yet keep the door opened for a better solution.