RFC: Increasing the GCC and Clang requirements to support C++17 in LLVM

Hi,

I think it’s time to start thinking about updating the requirements for Clang and GCC to support C++17 in our LLVM code.

We have process for this as explained in the Developer Policy.

This lays out the following requirements for updating the toolchain requirements:

It is a general goal to support LLVM and GCC versions from the last 3 years at a minimum. This time-based guideline is not strict: we may support much older compilers, or decide to support fewer versions.

And the process described is:

  • An RFC is sent to the llvm-dev mailing list
    • Detail upsides of the version increase (e.g. which newer C++ language or library features LLVM should use; avoid miscompiles in particular compiler versions, etc).
    • Detail downsides on important platforms (e.g. Ubuntu LTS status).
  • Once the RFC reaches consensus, update the CMake toolchain version checks as well as the getting started guide. This provides a softer transition path for developers compiling LLVM, because the error can be turned into a warning using a CMake flag. This is an important step: LLVM still doesn’t have code which requires the new toolchains, but it soon will. If you compile LLVM but don’t read the mailing list, we should tell you!
  • Ensure that at least one LLVM release has had this soft-error. Not all developers compile LLVM top-of-tree. These release-bound developers should also be told about upcoming changes.
  • Turn the soft-error into a hard-error after said LLVM release has branched.
  • Update the coding standards to allow the new features we’ve explicitly approved in the RFC.
  • Start using the new features in LLVM’s codebase.

This post is the first step of that.

I propose we update the requirements as follows:

  • GCC to 7.1
  • Clang to 5.0
  • MSVC to 2019 16.0
  • Xcode / Apple Clang to 9.3

For MSVC no change is needed since this requirement is already in place as described in this commit.

GCC 7.1 was released in May 2017, Clang 5.0 was released in September 2017, MSVC 2019 16.0 was released in April 2019 and Xcode 9.x was released in May 2018. All of these fall within the 3 year support window mentioned in the developer policy.

This would allow us to use C++17 features like:

  • Structured Bindings
  • Compile-time if constexpr
  • constexpr lambda
  • Init-statements in if and switch

These are just some of the features of course, but I feel like C++17 is a much bigger step than C++14 was. Lot of good stuff in there.

I have prepared a spreadsheet on Google Sheet with the linux distribution compiler versions: LLVM Toolchain Support Matrix - Google Sheets

This sheet is editable and I hope we can use it in the future so this information doesn’t have to be collected over and over again. Feel free to update and add more distributions or operating systems in there.

If there are no concerns I will submit the soft-error patch to Phabricator next week.

Thanks,
Tobias

16 Likes

Sounds reasonable to me.

I for one welcome this update (and consider it long overdue, personally), but cards on the table: I’m a Windows developer, who only really uses Linux to make sure my changes don’t break that side of things.

This doesn’t impact me, but is there a way for people using those impacted distributions to migrate to a more recent gcc/clang version (without needing to build it from source)?

There is always binary distributions of gcc and clang people can download to build LLVM.

I don’t expect this to be a huge problem though. The Linux distributions with to old compilers are very old by now.

I guess the one that stands out is RHEL 7. Do you expect this to be a problem @tstellar ?

I think this makes sense as we’re already using some c++17 features like Init-statements in if right? I wonder if we officially have c++17 support, should we use [[nodiscard]] directly instead of using LLVM_NODISCARD macro?

I guess one thing we should discuss is if the soft error should be part of LLVM14 or 15? If 15 then the first release we can use C++17 in is LLVM 16.

Given the recurring code patterns in clang, this alone would probably simplify quite a bit of code.

This sounds like a very reasonable proposal to me :slight_smile:

1 Like

RHEL/CentOS 7 already requires devtoolset with a newer gcc installed to build LLVM. I suppose that everyone already upgraded to the most recent available during the previous tool version bump, although there is always possibility that someone is stuck at the GCC 6 for whatever reason.

I wanted to ask whether any buildbots would need to be updated but then remembered that I had to fix/raise build issues on gcc 6 several times in the past so I guess that there were no buildbots that tested the least supported version already (it’d be good to get it right this time around).

1 Like

As @danilaml said, it shouldn’t be an issue since newer versions of both clang and gcc are available for users to install.

2 Likes

Making the soft error in LLVM 14 leaves too much uncertainty in the time window for people building trunk to get their builds changed.

… because LLVM 14 has already branched. It seems to me the question of asking for doing the soft error in LLVM 14 is really asking to retroactively decide that trunk should hard error immediately.

Yeah this is a good point. Let’s try to get the soft error into trunk as long as no one have a big problem with the whole RFC.

I’m really looking forward to be able to use c++17 features, and to stop supporting gcc5 !

That said we should still have some due diligence in terms of the choice of which version to upgrade to.
There is a reason why the guideline is written in very soft way with an intention for supporting 3 years minimum.

There is always binary distributions of gcc and clang people can download to build LLVM.

I think it is an overly narrow consideration.
While you may think here of the workflow of an end-user building clang/LLVM on their system to test it, LLVM is also embedded in various downstream products: the users’ toolchain here isn’t just an individual on their local machine trying to use LLVM directly, but all the users and packager of these other projects than embed LLVM.
This goes beyond all the build environment, it also gets into the deployment target!
You can’t just update the compiler on the latest Ubuntu and build and deploy to the previous Ubuntu: it may not have the suitable stdlibc++ (see below for some painful memory from the last LLVM update…).

That means that we need to consider that LLVM (probably less clang/lld/…) “as a library” is a fairly foundational component for many other projects, and updating our requirements may have significant transitive impact.

Speaking for one I’m familiar with: we embed LLVM inside Tensorflow (and XLA) and we’re tracking HEAD very closely (never more than a few days behind). We embed LLVM as part of the libraries that ship as a pip package, and I assume that we’re not the only vendor out there in a similar situation. So any toolchain requirements propagates to every platform we would want to support, and the requirements of the pip environment, etc.
An upgrade isn’t just trivially updating the toolchain for LLVM in isolation there.

The last upgrade moved from C++11 to C++14 and gcc 4.8 to 5.2: this ended up being incredibly disruptive in practice. It turned out that we couldn’t build our pip package in a way that would still be installable on Ubuntu 14.04 (at the time still widely used by data scientist & universities) because of incompatibilities in libstdc++ (related to the ABI change with std::string if I remember correctly…).
RedHat had at the same time some convoluted tooling and linker script that would allow to build packages on a recent CentOS and deploy them to a previous one.

Anyway: I’m very supportive of moving forward (we need to be able to keep the ability to move forward!), but I still wanted to bring some dose of pragmatism as well and surface the “foundational” aspect of “LLVM as a library” out there.

I have buildbots on gcc 5.4, but I’ll happily turn them down: gcc 5 has been painful ~ once a week on average (including one new “ICE” just today!).

Back to my point above: while we can always install a new compiler and build on a more recent distribution, this may bring deployment limitations as well which are more impactful.
So the list of distribution this impacts (I haven’t looked at the spreadsheet carefully yet) is really important.

1 Like

Hi @mehdi_amini - I think you raise some valid points and if I am giving off the impression that I don’t think this needs careful consideration that was not my intention. I started by filling in some more distributions in the spreadsheet to make sure we had a bit more coverage.

So let’s carefully review the GCC requirement again:

We currently require GCC 5.1 to build LLVM. My suggestion was 7.x because it’s more C++17 complete than 6.x. If we go to 7.x the following dists/versions will be left in the dust:

  • Ubuntu 16.04
  • Debian 9 “Stretch”

If we go to 6.x instead of 7.x only Ubuntu 16.04 will be left behind. While Debian 9 is not EOL until this summer it will be EOL at the time when our change would go into a release. And I don’t believe that Debian is that widely used for it’s LTS flavor (citation needed).

Where does that leave us? I still think 7.1 is the right new requirement for GCC, because:

  • Any change will make LLVM unbuildable on Ubuntu 16.04. A distribution that no longer gets any updates except critical security issues.
  • Changing to 6.x will only help one other big name dist and one that will be EOL soon.
  • 7.x has a lot better support for C++17.

While I am not trying to make light of the users that will have to change their compilers because they need LLVM or something that depends on LLVM, I think that we are currently VERY conservative with GCC 5.x and even 7.x is not a very high bar, almost all distributions that still have support have 7.x or newer.

Truth is that updating the requirements will always be painful for certain users (and I have self been on that side and hated my life), but I don’t think it’s fair to the LLVM project to hold it back because of these users and these older, unsupported distributions.

So my suggestion stands - I still think GCC 7.x and Clang 5.x are reasonable requirements.

Please let me know if you think I am missing something or can provide any more information or research to make sure we don’t make a rash decision.

Thanks,
Tobias

2 Likes

You’ve answered based on the requirement to build, but something that I raised and is not very clear to me is the deployment requirement.
I don’t know if the C++11->C++14 was an exceptionally hard one, but I’m not as much worried by the ability to build on Ubuntu 16.04 that I would be about shipping a package there. So: if I build and install manually gcc 7.x (or clang!) on Ubuntu 16.04 so that I can build there: can I ship the resulting package to a system like pip (or other) where other users on Ubuntu 16.04 will be able to use it?
What’s the new constraints that is put on the system libstdc++ in terms of support here?

I see. I don’t know the answer to that question yet. But I can probably try to figure it out with docker.

Just so I understand what we are trying to answer here:

Take Ubuntu 16.04 that won’t have a new enough gcc. Build/download a newer gcc or clang. Use that newer compiler to build a program with the c++17 flag. Take this program and put it on a new 16.04 docker image. Make it run.

Does this sound correct?

Also while I am happy to investigate this (have a bit of free time this weekend) - our current policy doesn’t take this into account at all. I think it warrants some discussion if this should be something we consider when we want to adjust the toolchain requirements and in that case we need to formulate that and add it to the policy.

So I tested this using Docker and some scripts to build gcc. My dockerfile and configuration can be found here: tru/llvm-gcc-tests (github.com)

But in general what I did was:

It works without a problem.

I would suspect that it would fail if I used std::filesystem or similar library features that requires the compiled libstdc++ to match. But this is expected and can be worked around with -static-libstdc++ or copying the stdlib with the binary.

Anyway - this took me less than an hour and I think we can publish the method I used somewhere if people wonder how they can build/run LLVM when we have moved on.

Feel free to inspect my repo above and see if I missed something or did something wrong that you think could affect the outcome. If you want to build it locally - make sure you edit gcc-build-vars.sh locally to match your CPU cores.

That sounds about right (for the specific case of Ubuntu 16.04). If
using GCC as the compiler, if it picks up libstdc++ headers from your
newer GCC install, then I believe this is an unsupported use case
(even if it happens to work).

For Clang, it probably will pick up the system libstdc++ and it would
be fine (but the libstdc++ would not be as new as the proposal might
be assuming).

I think, especially for the case of Clang, the question is if there is
a separate minimum system libstdc++ or (for some other deployment
platform) libc++ level that should be identified as part of the
proposal (i.e., don’t just talk about compiler versions).

Right now, the list of features listed in the RFC thread does not
include any library features. So no motivation has been presented to
justify moving the minimum system libstdc++ or libc++ level up.

That’s exactly it! The question is be about the requirement on deployment dependency on an older libstdc++, this is exactly what was the pain point last time.

The problem is not as simple: the program can’t be just int main() {} ; it’ll be a question of each individual library features you intend to use that isn’t header only, as you mentioned with the filesystem header. The -static-libstdc++ is just a workaround, and is overly limited for any library usage where LLVM is integrated in another library project (for example: we ship a pip package that included LLVM in a .so with a c++ API, it’s not clear to me how to deploy this with a statically linked libstdc++ with respect to other packages using it).

So I have been thinking about how we move forward from here and I see a few different alternatives:

  1. We push ahead and don’t take any of the libstdc++ compatibility into consideration. We have several things in the policy that advocate for this stance. The distribution we are talking about is 6 years old, much older than the roughly 3 years stated in the policy. The other part is that this seems mostly to be a downstream LLVM kind of problem, where the policy is “try to not screw them over, but don’t hold back because of it”. This approach doesn’t require any changes to the policy as it is right now.
  2. Hold back of this bump in requirement for another year, and at that point we go ahead because we think it fits better timeline wise. We could even start warning and say that support for older compilers and runtimes will be removed in 2023. Update the policy to allow for a one year window of warnings instead of one release (3-4 months).
  3. Bump the build requirements to what we discussed above, but institute a new rule that we can’t use features of C++17 that are not available in older versions of libstdc++. This means that we can build with newer headers and use a lot of the features that we are looking forward to use and all the compiler only features. But we hold back on breaking the binary compatibility for now (maybe a year or two?). This can be tricky, not sure how well libstdc++ guarantees this compatibility and we need to make sure that we identify the new features we won’t be able to use. We probably also need a buildbot that build in this configuration (new headers old library) to make sure we don’t break this rule. The policy should be updated to take this into account.
  4. Stay on GCC 5 forever (j/k no one is advocating for this).

My personal preference and looking at the current feedback in this thread I think we are leaning towards option number 1 here.

I also believe downstream vendors (like tensorflow) can work with soft errors and pinned versions of LLVM during a shorter term in order to push their users and communities to adjust. I totally understand that pain and complications that will bring, but at the same time, we need to be able to move forward and I think we can all agree that we are not doing this willy-nilly but have consider all the alternatives.

That said - I really would like to hear other voices here and especially what people think about option 3 here, is that a practical way forward? I like that option much better than any of the others. Also am I missing an alternative? Is there some better middle-ground we can walk here?

1 Like