At Intel, we have developed an implementation of C++17 execution policies
for algorithms (often referred to as Parallel STL). We hope to contribute it
to libc++/LLVM, so would like to ask the community for comments on this.
The code is already published at GitHub (GitHub - oneapi-src/oneDPL: oneAPI DPC++ Library (oneDPL) https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/dpc-library.html).
It supports the C++17 standard execution policies (seq, par, par_unseq) as well as
the experimental unsequenced policy (unseq) for SIMD execution. At the moment,
about half of the C++17 standard algorithms that must support execution policies
are implemented; a few more will be ready soon, and the work continues.
The tests that we use are also available at GitHub; needless to say we will
contribute those as well.
The implementation is not specific to Intel’s hardware. For thread-level parallelism
it uses TBB* (https://www.threadingbuildingblocks.org/) but abstracts it with
an internal API which can be implemented on top of other threading/parallel solutions –
so it is for the community to decide which ones to use. For SIMD parallelism
(unseq, par_unseq) we use #pragma omp simd directives; it is vendor-neutral and
does not require any OpenMP runtime support.
The current implementation meets the spirit but not always the letter of
the standard, because it has to be separate from but also coexist with
implementations of standard C++ libraries. While preparing the contribution,
we will address inconsistencies, adjust the code to meet community standards,
and better integrate it into the standard library code.
We are also proposing that our implementation is included into libstdc++/GCC.
Compatibility between the implementations seems useful as it can potentially
reduce the amount of work for everyone. We hope to keep the code mostly identical,
and would like to know if you think it’s too optimistic to expect.
Obviously we plan to use appropriate open source licenses to meet the different
We expect to keep developing the code and will take the responsibility for
maintaining it (with community contributions, of course). If there are other
community efforts to implement parallel algorithms, we are willing to collaborate.
We look forward to your feedback, both for the overall idea and – if supported –
for the next steps we should take.
- Alexey Kukanov
* Note that TBB itself is highly portable (and ported by community to Power and ARM
architectures) and permissively licensed, so could be the base for the threading
infrastructure. But the Parallel STL implementation itself does not require TBB.