With C++17 merged to main the process caused a bit of problems with bots and peoples local development environment. From those problems there was a long and interesting discussion in Phabricator here: D130689 [LLVM] Update C++ standard to 17
This section doesn’t really address what happens after we have bumped the toolchain requirement and doesn’t address updating the C++ standard at all. It’s obvious we can do better here - so I suggest we add new section to address C++ standard upgrade:
When the toolchain requirement have become a hard requirement in main we can now suggest to bump the C++ standard to the latest supported by the new toolchain requirement. In order to land this we should follow the procedure below.
Suggest the change as a RFC on discourse.
– This RFC should encourage bot owners and other vendors to test this new configuration by running a build with the cmake argument -DCMAKE_CXX_STANDARD=<newstandard> and have them report issues.
Once the RFC reaches consensus post a diff to Phabricator that changes the C++ standard in the CMake files. Here is an example. Link this diff from the discourse post.
When the diff is accepted by a wide range of contributors the post on discourse should be updated and suggest that it will be merged the following weekend. We want to try to merge this on a weekend to minimize the potential disruptions. This post also needs to point out that the change can be reverted so no new features should be used until further notice.
Keep an eye on the bots when the change is rolling out. If there seems to be many bots failing the change should be reverted and fixes should be solicitated.
After a week from the merge it’s now safe to use new features and the discourse post should be updated to reflect this.
This is my first draft based on the feedback in Phabricator - please discuss if this seems like a reasonable plan or if we should tweak it any way.
Thank you for putting this together! I definitely agree with the direction you’re heading with this.
I think we might want a more firm statement regarding the build lab after the change has been posted. There are some platforms for which we have very little testing unfortunately, so we can run into a situation where most of the build lab is green but it turns out that we’ve lost all testing for something that matters. As a concrete example, we don’t have very many Windows MSVC bots, so it’s actually not that hard to overlook when all three of them are red but everything else is green. That said, I’m not certain how strong of a statement we want to make here.
Regarding your first point, to encourage folks to preflight the new language standard, I was thinking we could introduce a CMake warning (for developers, similar to the soft toolchain requirements) that says “hey we’re moving to this new standard, please try it with DCMAKE_CXX_STANDARD and if you do this warning this will go away”. Hopefully getting rid of the warning is enough motivation for folks to give it a try.
Another thought I had is we should probably make it an explicit task that someone follow up with the owners of failing bots after the change has been pushed when we discover bots aren’t meeting the new build requirements. For example, these bots all seem to be down because of the switch (there may be others):
Yeah this is probably a good idea. I did this when we introduced the soft error for toolchain requirements. I wish we had a better way to surface things to bot owners in general - it’s a bit time-consuming and daunting to manually email the bot owners and to figure out who would be responsible for that. I do worry a bit that these procedures becomes very overwhelming to the point where a single contributor will not be willing to start working on it.
FWIW, I share that concern but not very strongly. I don’t want to see us add needless process that is just busywork for people. But a lot of these suggestions are along the lines of ensuring we maintain a high quality product without being unduly disruptive to contributors and downstreams. I think that the amount of effort involved in that sort of work is what it is. We should not skimp on quality or allow disruption just to make it easy to change the default language mode – we only do that once every half decade or so, so the process being heavy isn’t a particularly worrying concern, but needless overhead is (if that makes sense).
Just speaking for Linaro, we manage the two Windows on Arm bots you linked and we were aware of the switch from the updates on the original RFC. A message to us would be welcome but not necessary, we should be reacting as it happens (though I admit we have been slow in this case).
Well that logic seems to apply in the other direction equally well, doesn’t it?
“We only do that once every half decade or so, so the process incurring disruption for bot owners and downstream who aren’t following closely enough Discourse isn’t a particularly worrying concern”.
I guess I view it differently. We have quite a few bots, but we don’t have a lot of redundancy in our bots. So when we lose “enough” of one particular configuration, it’s easy for us to basically have lost almost all testing for that situation. If that happens for a day or two, it’s not a big deal (that happens as a natural course of us working on things, though we try to avoid it), but if it goes on for a significant amount of time the risk of serious disruption continues to increase. Sometimes that disruption may really only impact a downstream, but other times that disruption might be “there’s a new crash that only happens on Windows and there’s 1000 patches that could be the culprit, so who is responsible for fixing this”. Once a bot is red, any subsequent unrelated failures are very easy to miss.
Also, some bot owners are more likely to be in IT functions (managing servers, doing upgrades, etc) rather than development functions (following our Discourse, looking at bot status, etc). So I think a downstream not paying attention is a bit different than a bot owner not paying attention.
I used to have a bot on gcc 5.2 but we moved to gcc 8 right now, we could downgrade to gcc 7.1 if needed. However none of this is a guarantee about our support: for example we hit many bugs in gcc 7 when the gcc 5 bot was perfectly happy.
To me the minimum supported version is mostly useful as the line at which you don’t revert or accept patches to workaround compiler bugs anymore.
As a result of this, I was thinking it could possibly be useful to have a small email list of bot maintainers, both public and private. That way changes like this could be proactively sent out to people who would be dealing with the fallout and help to get issues flushed out beforehand.