Upcoming upgrade of LLVM buildbot

Hello everyone,

The buildbot upgrade is entering the phase when the results to become visible.

No change is required at this time on any of the builders. The bot owners could upgrade the buildbot on build computers later, at their convenience, as this is not on the critical path.

We are going to upgrade the staging bot first. Then, once that is stable and all detected issues are resolved, we will upgrade the production bot.

I will need some help with testing, and will be asking to move some of the builders temporarily to the staging. LLVM buildbot is a piece of critical infrastructure, so more eyes and hands in making sure it works properly the better.

I’ll be posting updates and ETA of particular changes in this thread.

Please feel free to ask if you have any questions or concerns.

Thanks

Galina

Hello everyone,

Starting tomorrow we will be upgrading the staging and production LLVM build bot.

To make the transition smooth we would not accept any change to zorg from tomorrow 11:00 AM PDT till the production bot is up and running the new version. Please feel free to talk to me if you will have a special situation and if you would absolutely have to make changes. Thanks for your patience and understanding.

If you are a bot owner, you do not need to do anything at this point, unless I’ll ask you to help.

Here is the plan.

  • Tomorrow, on September 30th, 2020, at 11:00 AM PDT we will upgrade the staging build bot to buildbot v2.8.4.

If you are an owner of one of the staged builders, you would not need to do anything. The staging will go down for a short period of time, and then a new version will come up and will accept connections from your bots.

We will be watching the staging and will be improving zorg for the next week or so. Meaning that staging will be restarted more often than before, and restarts will be done without further notices.

  • Once the staging is reliable you may want to move your bot from production to staging to make sure it will work as expected after the upgrade. I will send a note when this could be done.

  • After staging is good and we have about a week of running history, we will upgrade the production bot. I will send a separate announcement for this closer to the date.

Once the production is up and running, I will need your feedback about blame e-mails delivery, IRC reporting issues, and anything you could spot wrong with the new bot. I hope the transition will go smoothly and we will handle issues quickly if any would come up.

  • After production is good and we have about a week of running history, I’ll ask the bot owners to upgrade buildbots on their side. Please do not upgrade your buildbots unless I’ll ask you to. We are trying to limit a number of moving parts at this stage.

Thanks for your support and help. And please feel free to ask if you have questions.

Galina

Hello everyone,

The staging buildbot was up and running for 6 days now, and looks good.

Tomorrow at 12:00 PM PDT we will switch the production buildbot to the new version.

If you are a bot owner, you do not need to do anything at this point, unless I’ll ask you to help.
The buildbot will go down for a short period of time, and then a new version will come up and will accept connections from your bots.

Please note that the new version has a bit different URL structure. You will need to update the bookmarks or scripts if you have stored direct URLs to inside the buldbot.

We will be watching the production and staging bots and will be improving zorg for the next week or so.

I will need your feedback about blame e-mails delivery, IRC reporting issues, and anything you could spot wrong with the new bot. I hope the transition will go smoothly and we will handle issues quickly if any would come up.

After production is good and we have about a week of running history, I’ll ask the bot owners to upgrade buildbots on their side. Please do not upgrade your buildbots unless I’ll ask you to. We are trying to limit a number of moving parts at this stage. We will start accepting change to zorg at this point. Please feel free to talk to me if you will have a special situation and if you would absolutely have to make changes.

Thanks for your support and help. And please feel free to ask if you have questions.

Galina

It looks like all sanitizer builder are still offline http://lab.llvm.org:8011/#/builders

They are online now - http://lab.llvm.org:8011/#/waterfall?tags=sanitizer

AnnotatedCommand has severe design conflict with the new buildbot.
We have changed it to be safe and still do something useful, but it will need more love and care.

Please let me know if you have some spare time to work on porting AnnotatedCommand.

Thanks

Galina

Our Flang-aarch64 buildbots just won't connect to the main Buildbot master anymore. I switched them to the staging buildbot master instead and it seems fine for now. Is there anything that we can/should tweak at our end?

http://lab.llvm.org:8014/#/waterfall?tags=flang

-Andrzej

Hello bot owners,

I see a lot of builders went down and not connecting back to the production buildbot.

Could you check your bots to make sure they are up and running, please?

And report connectivity issues if any with quotes from the logs directly to me.

Thanks

Galina

Hey Andrzej,

What are you seeing in your buildbot logs? Is it this error?
`twisted.spread.flavors.NoSuchMethod: No such method:
remote_getWorkerInfo`

If so, you might want to try updating your buildbot worker.
I updated llvmlibc's to 2.8.4 and that seemed to solve the connection
problem: https://github.com/llvm/llvm-project/commit/f60686f35cc89504f3411f49cf16a651a74be6eb

Best,
Paula Askar

Hi Paula,

This error is fine. The buildbot has tested the worker version. 0.8.x apparently does not have that method.
The error gets handled gracefully on the server side. At least it seems so so far.

That should not prevent your bot from connecting.

Thanks

Galina

I switched one of our workers to the main Buildbot and everything seems fine #fingers-crossed. I guess that that was a temporary glitch?

We haven't updated our local Buildbot installations - still on 0.8.5. Should we update?

-Andrzej

Hello Andrzej,

Please do not update your bots yet. I will explicitly ask later.

Feel free to move other reliably green bots of yours to the production buildbot.

Thanks

Galina

Thanks, I see them.

They are online now - http://lab.llvm.org:8011/#/waterfall?tags=sanitizer

AnnotatedCommand has severe design conflict with the new buildbot.
We have changed it to be safe and still do something useful, but it will need more love and care.

Please let me know if you have some spare time to work on porting AnnotatedCommand.

It’s unlikely in near future.
What is missing exactly?

That’s unfortunate, it would’ve been good to know that earlier. I and another team member have spent a fair amount of time porting things to use more AnnotatedCommand steps, because it gives us the flexibility to test steps locally and make changes to the steps without restarting the buildbot master. IMO that is the Right Way to define steps: a script that you can run locally on a machine that satisfies the OS and dep requirements of the script.

I am restarting the two bots that I am responsible for, and may need some help debugging further issues soon. I’ll let you know.

We have a better version of AnnotatedCommand on the staging. It should be a functional equivalent of the old one.
We need to stress test it well before moving to the production build bot.

For that we need all sanitizer + other bots which use the AnnotatedCommand directly or indirectly moved temporarily to the staging.

Please let me know when that could be arranged.

Thanks

Galina

Switched all but PPC, I don’t have access to them. But they run the same script as sanitizer-x86_64-linux.
http://lab.llvm.org:8014/#/waterfall?tags=sanitizer

Looks like staging AnnotatedCommand fixed step statuses, so we can see which one is green.
Please let me know when to switch bots back from the staging.

Thank you!

Thanks, Vitaly!

Let’s have them there for at least 24 hours, shall we?

Could you move sanitizer-buildbot1, sanitizer-buildbot3, sanitizer-buildbot7 as well, please?

AnnotatedCommand on the staging has been tested functionally and is good. My only concern at this point is how it would handle a heavy load, so the more bots we will have on the staging the better.

If somebody else could move their AnnotatedCommand bots to the staging area, that would be much appreciated.

Thanks

Galina

Thanks, Vitaly!

Let’s have them there for at least 24 hours, shall we?

We can do that.

Could you move sanitizer-buildbot1, sanitizer-buildbot3, sanitizer-buildbot7 as well, please?

Done.

FWIW, I don’t see any issues with my two bots that use buildbot annotated commands:
http://lab.llvm.org:8011/#/builders/sanitizer-windows

http://lab.llvm.org:8011/#/builders/clang-x64-windows-msvc

The individual steps don’t highlight as green or red, but that’s OK for now.

I moved the libc bots to staging to now.

Thanks,
Siva Chandra