The buildbot upgrade is entering the phase when the results to become visible.
No change is required at this time on any of the builders. The bot owners could upgrade the buildbot on build computers later, at their convenience, as this is not on the critical path.
We are going to upgrade the staging bot first. Then, once that is stable and all detected issues are resolved, we will upgrade the production bot.
I will need some help with testing, and will be asking to move some of the builders temporarily to the staging. LLVM buildbot is a piece of critical infrastructure, so more eyes and hands in making sure it works properly the better.
I’ll be posting updates and ETA of particular changes in this thread.
Please feel free to ask if you have any questions or concerns.
Starting tomorrow we will be upgrading the staging and production LLVM build bot.
To make the transition smooth we would not accept any change to zorg from tomorrow 11:00 AM PDT till the production bot is up and running the new version. Please feel free to talk to me if you will have a special situation and if you would absolutely have to make changes. Thanks for your patience and understanding.
If you are a bot owner, you do not need to do anything at this point, unless I’ll ask you to help.
Here is the plan.
Tomorrow, on September 30th, 2020, at 11:00 AM PDT we will upgrade the staging build bot to buildbot v2.8.4.
If you are an owner of one of the staged builders, you would not need to do anything. The staging will go down for a short period of time, and then a new version will come up and will accept connections from your bots.
We will be watching the staging and will be improving zorg for the next week or so. Meaning that staging will be restarted more often than before, and restarts will be done without further notices.
Once the staging is reliable you may want to move your bot from production to staging to make sure it will work as expected after the upgrade. I will send a note when this could be done.
After staging is good and we have about a week of running history, we will upgrade the production bot. I will send a separate announcement for this closer to the date.
Once the production is up and running, I will need your feedback about blame e-mails delivery, IRC reporting issues, and anything you could spot wrong with the new bot. I hope the transition will go smoothly and we will handle issues quickly if any would come up.
After production is good and we have about a week of running history, I’ll ask the bot owners to upgrade buildbots on their side. Please do not upgrade your buildbots unless I’ll ask you to. We are trying to limit a number of moving parts at this stage.
Thanks for your support and help. And please feel free to ask if you have questions.
The staging buildbot was up and running for 6 days now, and looks good.
Tomorrow at 12:00 PM PDT we will switch the production buildbot to the new version.
If you are a bot owner, you do not need to do anything at this point, unless I’ll ask you to help.
The buildbot will go down for a short period of time, and then a new version will come up and will accept connections from your bots.
Please note that the new version has a bit different URL structure. You will need to update the bookmarks or scripts if you have stored direct URLs to inside the buldbot.
We will be watching the production and staging bots and will be improving zorg for the next week or so.
I will need your feedback about blame e-mails delivery, IRC reporting issues, and anything you could spot wrong with the new bot. I hope the transition will go smoothly and we will handle issues quickly if any would come up.
After production is good and we have about a week of running history, I’ll ask the bot owners to upgrade buildbots on their side. Please do not upgrade your buildbots unless I’ll ask you to. We are trying to limit a number of moving parts at this stage. We will start accepting change to zorg at this point. Please feel free to talk to me if you will have a special situation and if you would absolutely have to make changes.
Thanks for your support and help. And please feel free to ask if you have questions.
AnnotatedCommand has severe design conflict with the new buildbot.
We have changed it to be safe and still do something useful, but it will need more love and care.
Please let me know if you have some spare time to work on porting AnnotatedCommand.
Our Flang-aarch64 buildbots just won't connect to the main Buildbot master anymore. I switched them to the staging buildbot master instead and it seems fine for now. Is there anything that we can/should tweak at our end?
This error is fine. The buildbot has tested the worker version. 0.8.x apparently does not have that method.
The error gets handled gracefully on the server side. At least it seems so so far.
AnnotatedCommand has severe design conflict with the new buildbot.
We have changed it to be safe and still do something useful, but it will need more love and care.
Please let me know if you have some spare time to work on porting AnnotatedCommand.
It’s unlikely in near future.
What is missing exactly?
That’s unfortunate, it would’ve been good to know that earlier. I and another team member have spent a fair amount of time porting things to use more AnnotatedCommand steps, because it gives us the flexibility to test steps locally and make changes to the steps without restarting the buildbot master. IMO that is the Right Way to define steps: a script that you can run locally on a machine that satisfies the OS and dep requirements of the script.
I am restarting the two bots that I am responsible for, and may need some help debugging further issues soon. I’ll let you know.
We have a better version of AnnotatedCommand on the staging. It should be a functional equivalent of the old one.
We need to stress test it well before moving to the production build bot.
For that we need all sanitizer + other bots which use the AnnotatedCommand directly or indirectly moved temporarily to the staging.
Looks like staging AnnotatedCommand fixed step statuses, so we can see which one is green.
Please let me know when to switch bots back from the staging.
Let’s have them there for at least 24 hours, shall we?
Could you move sanitizer-buildbot1, sanitizer-buildbot3, sanitizer-buildbot7 as well, please?
AnnotatedCommand on the staging has been tested functionally and is good. My only concern at this point is how it would handle a heavy load, so the more bots we will have on the staging the better.
If somebody else could move their AnnotatedCommand bots to the staging area, that would be much appreciated.