RFC: Increasing the GCC and Clang requirements to support C++17 in LLVM

The policy does not require that the library and compiler versions be uniform. It also does not require that library header and runtime versions be uniform.

The policy requires a list of desired new features to be identified. It is again noted that the list of features mentioned as being desired on this thread does not include library features.

The policy also requires for downsides upon important platforms to be detailed. These downsides may be indirectly, via the effect upon the necessary minimum library versions, dependent on the list of features.

The policy does allow for the downsides to be greater than strictly necessary. Option number 1 is deciding that we’re fine with that for the current RFC?

Identifying the new features we won’t be able to use is easier if we start with a list of new features that we would like to use.

There are two approaches to this:

  1. New headers, old library: Deployment to backlevel Ubuntu might be possible with something like RH DTS (devtoolset); it is likely possible to use DTS on a Red Hat distro to build and then deploy to Ubuntu. Header-only features of the newer libstdc++ would be available.
  2. Old headers, configure a newer compiler (e.g., Clang) to target old libstdc++: Even header-only features of the newer libstdc++ would not be available.

I don’t read it the same way: the 3 years is a strict minimum and nothing more. This is why the policy requires to “Detail upsides of the version increase” as well as “downsides on important platforms”.
I agree we shouldn’t hold back when installing more recent compiler to build the project is possible, but the deployment requirement are much much harder to work around it.

Do we though? While I like some of the bells and whistles from C++17, the status quo is still an option in absence of any pressing motivation. What makes it more pressing now than 1y ago? Why change it if it works?

The “downsides on important platforms” is also very different when if is only “you need to install a precompiled clang/gcc to be able to build the project” instead of “if you use LLVM, your product won’t be able to ship on Ubuntu 16.04 anymore”.

The precedent for the move to C++11 is interesting also: it didn’t happen in a single step, we started to allow an individual C++11 features to be used based on the support on the platform we cared about for this particular feature.

That goes into what @reinterpretcast mentions here: we can very well bump the compiler requirements and restrict the features allowed to only language features.
We could setup the pre-merge bot to stay on Ubuntu 16.04 with a recent clang compiler, building against the system libstdc++.
This may be an appealing tradeoff also because with the language feature, we can almost always get any library feature ourselves in ADT / Support!

1 Like

Afaik, most cross-linux Python binary packages are built with manylinux2014 or older (Centos/RHEL 7, which is eol in 2024). On that platform, the toolset is backported in such a way that we get modern compilers (I think we build with GCC 10 there) and standard libraries updated in such a way that built binaries remain compatible with the original. We build our non-Python binaries for deployment with these docker images because of the broad binary compatibility.

Notably, the next manylinux branch (manylinux_2_24) is Debian 9 based. Some discussion of compiler backports here: GCC version in manylinux_2_24 · Issue #1012 · pypa/manylinux · GitHub

Since we have ~2 years left on the manylinux2014 vintage, I’m hoping that someone manages to replicate/support a mechanism similar to what RedHat did for toolset/stdlib backports on the Centos/RHEL lines (and just makes that part of the Docker images we all use as a base for such things). It has been really nice to have suitably old LTS distros, patched with modern toolsets and still able to produce binary compatible artifacts – although, admittedly, it is based on dark arts.

(here is a deep link to the explanation of how RH backports deal with libstdc++: GCC version in manylinux_2_24 · Issue #1012 · pypa/manylinux · GitHub)

1 Like

My use-case for updating to C++17 is that I have a downstream project that builds LLVM from source, but also wants to include a separate downstream library, for use within LLVM. The library would like to be able to use C++17 features (std::filesystem being one among several others), but cannot, because it is being held back by the need for the library to be buildable within the LLVM project too. We could prebuild the library, but this doesn’t help if the code in question is in header files.

I accept that this is a downstream issue, but worth pointing out that there’s pressure for updating from downstream projects, as much as there is pressure for not updating.

@jh7370 Downstream, you should be able to build LLVM/Clang with -std=c++17 without an issue, the code is already compatible (in fact, some members of the community already do that). This thread is about making -std=c++17 default and required. Your use case shouldn’t be blocked on that.

See: llvm-project/CMakeLists.txt at efece08ae27d8ace684c5119d6696b1a18b2150f · llvm/llvm-project · GitHub

2 Likes

Afaik, most cross-linux Python binary packages are built with manylinux2014 or older (Centos/RHEL 7, which is eol in 2024). On that platform, the toolset is backported in such a way that we get modern compilers (I think we build with GCC 10 there) and standard libraries updated in such a way that built binaries remain compatible with the original. We build our non-Python binaries for deployment with these docker images because of the broad binary compatibility.

Yes, very good point – we should add “python manylinux” to the compatibility spreadsheet. That is a really important “distro” which isn’t being explicitly tracked at the moment.

Fortunately, both manylinux2010 and manylinux2014 appear to be sufficient for this proposed change.

Notably, the next manylinux branch (manylinux_2_24) is Debian 9 based. Some discussion of compiler backports here: GCC version in manylinux_2_24 · Issue #1012 · pypa/manylinux · GitHub

I don’t think that having a nominally newer manylinux with an older compiler is going to be very popular anywhere in the community. So I don’t think llvm should worry about this – the python folks will need to figure out a solution to this before dropping the manylinux2010/2014 images (if they ever drop things in the first place?).

I don’t have the reference handy, but I recall reading on an issue recently that for the eol base images, they are presently not removing them but will stop updating them when the machinery breaks. Can’t remember where I read that, though. My impression is that dealing with eol on such things is a happy outcome of having had some success, and they are working through it in real-time.

Can you clarify the situation with respect to libstdc++? The manylinux case is exactly what caused TensorFlow a big issue last time: having a toolchain to build the project wasn’t the difficult part, back-deployment was.

manylinux2010 is RHEL6-based, and has RedHat’s devtoolset-8 installed (that is, a copy of GCC 8.X and its corresponding libstdc++, patched so that it provides all the expected new functionality, yet is deployable against a normal RHEL6 system). manylinux2014 is similar, but RHEL7, with devtoolset-10 (GCC 10). Neither of these should pose problems w.r.t. the updated requirements here.

IIRC, tensorflow’s big problems a few years ago were when it was trying to support manylinux1 – which only has GCC/libstdc++ 4.8 because manylinux2010 didn’t exist yet (it wasn’t added until 2018.)

The issue with Tensorflow was that Linux has a build-time (and boot-time) option which can be used to break the x86-64 userspace ABI. manylinux2010 still uses the old userspace ABI, but a key Tensorflow contributor flipped that kernel switch and as the result could not run the manylinux2010 build image anymore to build Tensorflow. I spent quite some time looking at ways how to make manylinux2010 use the new userspace ABI, but there is no simple way to detect whether a kernel supports the old ABI (!), and the rather complex changes required would have been too disruptive for the underlying Red Hat Enterprise Linux 6 distribution running on top of the Red Hat-released distribution kernel.

The issue with Tensorflow was that Linux has a build-time (and boot-time) option which can be used to break the x86-64 userspace ABI.

Are you referring to https://www.python.org/dev/peps/pep-0571/#compatibility-with-kernels-that-lack-vsyscall (which is solved), or something else?

Yes, that’s what I meant.

Hi everyone!

Sorry for being absent for a while - but I have been starting a new job and it has taken some time to get up to speed.

I wanted to summarize where we are today:

  • Quite a few people support moving to C++17
  • (primarily) people from the TensorFlow community have voiced some opposition due to how complicated the 11->14 transition was for their end users. This was related to how mixing older and newer C++ STL libraries can be tricky and disruptive.
  • The features I listed above that would have a positive impact where mostly language features and not library features - which lead to a discussion about separating the requirements of compiler and library.
  • There has been a long discussion about what problems python’s manylinuxes setup can have and not have on this change. I can’t say that I was fully able to follow the discussion with many back and forths. I think we need a summary of what would or could be a blocker here.

Options for moving forward (again):

  • Status Quo - don’t change anything and re-evaluate later down the line.
  • Make a more complicated decision where compiler and runtime library versions are not linked. We would have to maintain some kind of build machine for this to be verifiable and not broken by contributors. We would have to document library functions that can’t be used somewhere.
  • Go ahead with the toolchain requirements and let downstream deal with the fallout.

Feedback wanted:

  • I would like to see other people in the community get involved and let us know what features of C++17 they would like to be able to use. So far the list is only compiled by me without much input. If there is very little interest in moving forward with newer features and compilers for LLVM I think we can go ahead and punt this for later.
  • I would like to hear from the Tensorflow community (and others worried about the runtime bits of this) about when / what metric to use in order to decide when it’s time if we go down the route of not updating the toolchain requirements at this point. Otherwise I think we’ll just end up in the same discussion again. Is there some tests that can be done to expose possible errors beforehand?

Thanks again for everyone chiming in here! I’ll try to keep the discussion going a little more active in the coming weeks.

Thanks,
Tobias

2 Likes

I’m in favor of moving to C++17, ideally without limitations, but if I had to pick and choose, I’d take structured bindings and CTAD.

3 Likes

I’d like access to the C++17 library features like std::optional that LLVM already provides: although LLVM already provides these, if I’ve got a downstream library that is being built from source with LLVM as a client, I want to avoid mixing C++14 and C++17 within my build to avoid confusion, but also the library doesn’t have access to the LLVM functionality (perhaps because it’s being used outside of LLVM too).

I’d also like std::shared_mutex and type_traits improvements which didn’t appear until C++17.

There are probably others, but I can’t think of them off the top of my head.

1 Like

LLVM builds well in C++17 mode (there is a CMake option for it), this is what we do at Google by the way.

To add to one of the niceties from C++17: all the fold expressions are quite nice! There is also if constexpr which can simplify template meta programming and things usually done with SFINAE.

2 Likes

Unfortunately, I doubt I’ll be able to persuade the wider team to switch from the upstream default, so this doesn’t help me really.

1 Like

What about <memory_resource> in C++17?

I think that structured bindings and init statements in if/switch would be huge readability benefits to the codebase. Thanks @tobiashieta for pushing this.

I think we all want C++17 – it would be useful if the people raising caveats evaluated the fallout this would concretely cause for their sitaution, to ground those caveats so we can move forward.

1 Like

AFAICT, the concerns from tensorflow regarding manylinux were resolved. It appears as if they should be good to go for C++17, from the information I can see. Absent concrete evidence (e.g. from testing) that the info was incorrect, I think we can consider that discussion complete.

Were there other concerns raised?

1 Like