Contributing MDSpan implementation

Hi,

we are interested in contributing our mdspan implementation (or a revision thereof) to libcxx. It currently lives at GitHub - kokkos/mdspan: Reference implementation of mdspan targeting C++23. This is the primary reference implementation we did for the C++23 proposal of mdspan P0009 (MDSPAN) . We also are developing the follow up parts like mdarray, and submdspan there.

A bit on background on “us”. I am working and Sandia National Laboratories and lead with Damien Lebrun-Grandie from Oak Ridge National labs the Kokkos performance portability programming model project. We also have contributors for mdspan from NVIDIA like Mark Hoemmen. NVIDIA is already shipping this implementation with nvc++ since version 22.7 and is working on adding it to libcudacxx (Adding `mdspan` reference implementation by youyu3 · Pull Request #299 · NVIDIA/libcudacxx · GitHub).

If you are in principle interested there are a few things we would like to discuss regarding scope, where our implementation goes a bit beyond the pure standard implementation:

  1. We support usage in CUDA and HIP code, i.e. we have function attribute markup such as host device via macros if needed.
  2. We do support standards prior to C++23 (specifically 14-20). That comes with a few trade-offs in implementation complexity and standards compliance. Details on that can be found here: GitHub - kokkos/mdspan: Reference implementation of mdspan targeting C++23
  3. As stated above we also implement some stuff for follow up proposals, many of which are getting fairly close to their final form, but will only be considered for C++26, since we ran out of time in the committee.

We would need to discuss which of these aspects to carry over into libcxx and to which degree.

Since this is a fairly complex thing, we would also propose to do the inclusion into libcxx step wise. I.e. we start with extents, add the layouts, then the accessor, and then in the end mdspan itself. This would give opportunity for hopefully reviewable chunks.

Best regard
Christian Trott

1 Like

Couple more comments: we are currently working on switching our License wholesale to Apache2 with LLVM exceptions, which should hopefully solve that issue. On our side we also have a business need for our current implementation to support a large range of compilers and platforms including GCC, Clang, Intel, Cray, NVCC, NVC++, HIPCC, and MSVC. I.e. the bones of this implementation are fairly robust with respect to usability from various toolchains.

Thanks for proposing to contribute an implementation!

I’m assuming you propose to essentially contribute an implementation based on yours? I.e. your project and the libc++ code would live side-by-side and not one as a downstream user of the other.

Regarding the questions:

  1. We don’t support CUDA or HIP, so these macros would most likely just rot and should therefore be removed from the libc++ code.
  2. Generally we don’t back-port features to earlier versions of the standard to avoid portability traps, so any extensions should be removed.
  3. These should definitely not live in std, but maybe we could put them into std::experimental. AFAIK we haven’t done such a thing before though, so maybe it’s not a good idea, I don’t know. It would definitely be appreciated to get an implementation when these features get standardized though.

Adding the implementation in multiple steps would definitely be a good idea. That makes it a lot easier to review. You also get a gauge on what we expect/require in terms of implementation style and tests.

To get started you should probably read through Contributing to libc++ — libc++ 16.0.0git documentation. If you have any questions about contributing the best way is to ask in the #libcxx channel on Discord.

Regarding the CUDA and HIP stuff: libcxx is downstream used by AMDs and Clangs toolchain for example as well as Intels, all of which are capable of targeting GPUs. Even vanilla clang does work as a CUDA compiler.

Now, we have for a long time a bit struggled with libcxx even using just simple things like pair and complex, because no markup is there. These macros can be defined without configuration (i.e. we can detect the language extensions CUDA/HIP being enabled during compilation). As a consequence it wouldn’t be a significant maintenance burden to have the macros in there, while it would mean we could use that version from all the different clang based compilers down the road. Generally speaking we would be interested in being able to use more from libcxx directly in GPU code instead of maintaining our own versions of all these features with the only difference being macros added to function declarations - but we will of course defer to what the community wants.

For the backport: understood.

Regarding the other features: in principle I do like the idea of std::experimental in particular for features which are basically baked but not yet voted into the C++ standard draft. But its also not a big deal to maintain them outside of libcxx, until they are.

Cheers
Christian

Oh I should have made clear: yes we propose side-by-side not a dependency.

If we want to add support for running libc++ on GPUs this should be an extra effort and not part of implementing MDSPAN. Adding support for this would probably be quite a bit of effort. As a minimum someone would have to provide hardware to test that configuration and time to help regular libc++ developers to fix any problems. I imagine this stuff would result in quite a bit of extra effort (I don’t have any experience with that, so maybe not). Whether we want to support this upstream also depends heavily on how much complexity results from that.

Basically, if you’d like libc++ to support being run on GPUs you’d have to write an RFC where you show how complex it would be to support that configuration and do the work to actually implement it.

Fair enough. Probably defer that for now, but I will consider writing that RFC at least for the parts of libc++ which are header only inlineable code without dynamic memory allocations - i.e. the pieces where its literally just function markup without any other implications for the underlying implementation (math functions, numeric traits, tuple, array, etc.). But I guess we should start with building some credibility with providing a good pure standard C++ mdspan implementation first :-D.

Thanks for volunteering to contribute MDSpan!

I agree with what @philnik said.

Regarding back-porting, we’ve done that more in the past, but that causes extra maintenance and makes it sometimes harder to address followup papers. These new papers might require language features not available in C++14. So I too prefer not to backport it. When users want to use it in earlier language versions they can use your reference implementation.

For GPU support, I think it would be nice to have it. But I would prefer to have that in a separate project. To properly support GPUs we need to have CI support for them and we need to see how libc++ developers can maintain it.

We’re more active on Discord so it would be great if you can join there.

Just some comments, since Christian mentioned me : - )

I’d first like to credit my NVIDIA colleague Yu You for his contributions both to the libcu++ implementation of mdspan and to the reference implementation.

Our plan is to make direct contributions to the reference mdspan implementation, and to the libcu++ implementation. We’re not planning to make direct contributions to libc++'s mdspan implementation at this time. That being said, we’ll be happy to discuss and review libc++'s mdspan implementation.

Thanks!
mfh