Hi, Erik,
That's great!
Gor, Marshall, and I discussed this after some past committee meeting. We
wanted to architect the implementation so that we could provide different
underlying concurrency mechanisms; including:
a. A self-contained thread-pool-based implementation using a
work-stealing scheme.
b. An implementation that wraps Grand Central Dispatch (for Mac and
any other platforms providing libdispatch).
c. An implementation that uses OpenMP.
Sorry to butt in, but I'm kinda curious how these will be substantially
different under the hood
No need to be sorry; this is a good question. I think that there are a few
high-level goals here:
1. Provide a solution that works for everybody
2. Take advantage of compiler technology as appropriate
3. Provide useful interoperability. In practice: don't oversubscribe the
system.
The motivation for providing an implementation based on a libc++ thread
pool is to satisfy (1). Your suggestion of using our OpenMP runtime's
low-level API directly is a good one. Personally, I really like this idea.
It does imply, however, that organizations that distribute libc++ will also
end up distributing libomp. If libomp has matured (in the open-source
sense) to the point where this is a suitable solution, then we should do
this. As I recall, however, we still have at least several organizations
that ship Clang/LLVM/libc++-based toolchains that don't ship libomp, and I
don't know how generally comfortable people will be with this dependency.
If "people" aren't comfortable with llvm-openmp then kick it out as a
project. I use it and I know other projects that use it just fine. I can
maybe claim the title of OpenMP hater and yet I don't know any legitimate
reason against having this as a dependency. It's a portable parallel
runtime that exposes an API and works.. I hope someone does speak up about
specific concerns if they exist.
That having been said, to point (2), using the OpenMP compiler directives
is superior to calling the low-level API directly. OpenMP directives to
translate into API calls, as you point out, but they also provide
optimization hints to the compiler (e.g. about lack of loop-carried
dependencies). Over the next couple of years, I expect to see a lot more in
the compiler optimization capabilities around OpenMP (and perhaps other
parallelism) directives (parallel-region fusion, etc.). OpenMP also
provides a standard way to access many of the relevant vectorization hints,
and taking advantage of this is useful for compiling with Clang and also
other compilers.
If projects can't even ship llvm-openmp runtime then I have a very strong
concern with bootstrap dependencies which may start relying on external
tools.
Further, I'm not sure I understand your point here. The directives wouldn't
be in the end user code, but would be in the STL implementation side.
Wouldn't that implementation stuff be fixed and an abstract layer exposed
to the end user? It almost sounds like you're expressing the benefits of
OMP here and not the parallel STL side. (Hmm.. in the distance I
hear.. "*premature
optimization* is the root of *all evil")*
Once llvm OpenMP can do things like handle nested parallelism and a few
more advanced things properly all this might be fun (We can go down a big
list if anyone wants to digress)
Regarding why you'd use GDC on Mac, and similarly why it is important for
many users to use OpenMP underneath, it is important, to the extent
possible, to use the same underlying thread pool as other things in the
application. This is to avoid over-subscription and other issues associated
with conflicting threading runtimes. If parts of the application are
already using GCD, then we probably want to do this to (or at least not
compete with it). Otherwise, OpenMP's runtime is probably better 
Again this detail isn't visible to the end user? We pick an implementation
that makes sense. If other applications use GCD and we use OpenMP, if
multiple thread heavy applications are running, over-subscription would be
a kernel issue and not userland. I don't see how you can always avoid that
situation and creating two implementations to try kinda seems funny. btw
GCD is a marketing term and libdispatch is really what I'm talking about
here. It's been quite a while since I hands on worked with it, but I wonder
how much the API overlaps with similar interfaces to llvm-openmp. If the
interfaces are similar and the "cost" in terms of complexity is low, who
cares, but I don't remember that being the case. (side note: I worked on an
older version of libdispatch and ported it Solaris. I also played around
and benchmarked OMP tasks lowering directly down to libdispatch calls
across multiple platforms. At the time our runtime always beat it in
performance. Maybe newer versions of libdispatch are better)
I'm not trying to be combative, but your points just don't make
sense....... (I take the blame and must be missing something)