[RFC] Abstract Parallel IR Optimizations

This is an RFC to add analyses and transformation passes into LLVM to
optimize programs based on an abstract notion of a parallel region.

  == this is _not_ a proposal to add a new encoding of parallelism ==

We currently perform poorly when it comes to optimizations for parallel
codes. In fact, parallelizing your loops might actually prevent various
optimizations that would have been applied otherwise. One solution to
this problem is to teach the compiler about the semantics of the used
parallel representation. While this sounds tedious at first, it turns
out that we can perform key optimizations with reasonable implementation
effort (and thereby also reasonable maintenance costs). However, we have
various parallel representations that are already in use (KMPC,
GOMP, CILK runtime, ...) or proposed (Tapir, IntelPIR, ...).

Our proposal seeks to introduce parallelism specific optimizations for
multiple representations while minimizing the implementation overhead.
This is done through an abstract notion of a parallel region which hides
the actual representation from the analysis and optimization passes. In
the schemata below, our current five optimizations (described in detail
here [0]) are shown on the left, the abstract parallel IR interface is
is in the middle, and the representation specific implementations is on
the right.

         Optimization (A)nalysis/(T)ransformation Impl.