[RFC] libcxx <experimental/simd> implementation

[RFC] libcxx <experimental/simd> implementation


std::experimental::simd is a portable, zero-overhead C++ types for explicitly data-parallel programming. It has been included in GCC/libstdc++ since GCC 11.

The patch set of the first version of experimental/simd implementation has been shelved for several years. We (PLCT Lab) have updated and revised a version of the implementation based on the previous works, according to the latest document. We reconstructed the original code structure. Moved most independent internal interfaces to directory libcxx/include/__simd/. We also reconstructed the test code structure. Added a test framework to improve test coverage.

We submitted the patch: https://reviews.llvm.org/D139421

Current status

At present, we have implemented all external user interfaces except the Section 9.7.7 Math Library. And added corresponding external interfaces tests.

We use bootstrapping build to build and test. All the tests passed with x86_64 from SSE-only up to AVX512. And there are still 4 failed tests on ARM64. Testing on other target platforms has not been completed.

Existing issues and Future work


  • About compatibility:
    • Some features of LLVM 14 or later are used in the current implementation. Therefore, it may not be compatible with lower version compilers. Need to use bootstrapping build. Modifications may be required for compatibility
    • Some features of C++20 or later are used in the current implementation. It does not compatible with lower C++ versions. Modifications may be required for compatibility.
  • Refine the implementation of ABI tags to support more target platforms.
  • Optimize the implementation of simd operations on specific target platforms (May need to use intrinsics).
  • Optimize target platform specialization may need to extend implementation of ABI fixed_size.
  • Implement an internal mask with bitwise for AVX512.
  • Math library support.


  • Negative tests for SFINAE check.
  • Run tests in more target platform environments (Aarch64 and PowerPC etc.).
  • Optimize test structure, increase code reuse and reduce redundancy.
  • Because the test uses a framework with high coverage, the test will take a long time. It can be simplified and optimized if necessary.

Documentations and References