LLVM Lab is down due power outage

Hello everyone.

Due to severe winter storm power in Berkeley area is down since last night.
PG&E didn’t give us any ETA yet when they restore the power.

I’ll let everyone know when the lab is back on.

Thank you for patience and understanding.

Thanks for letting us know!

In the meantime, there’s no testing happening while the lab is down but we’re still committing significant amounts of work. I think we should lock the repo so that no further commits can go in until the lab is back up and running, otherwise we’re going to be in a situation where every builder is broken and the blame list is effectively useless due to it being “everyone”.

(It would be especially nice if we could automate this so that in the future, if the lab goes down, the repo is automatically locked and a notice goes out.)

1 Like

I’m not so convinced that freezing the main branch is going to be helpful, but I won’t object. However, I think if we want to do to this, we should pick a time and say: If we still don’t have an ETA by X date/time, we will freeze the branch.

Maybe the cut off time could be 3PM Pacific today? @gkistanova How easy is it for you to get updates from PG&E?

Galina is subscribed on PG&E notifications, and could call them if needed.
She mentioned that PG&E just estimated restoration time as 8:00 PM PDT tonight.

I think as a matter of policy, we should freeze commits to the repo when there is ~zero testing happening for an extended period of time. Landing things blindly is obviously risky, but beyond that, the longer the build farm is down, the harder it is to determine what changes broke any given bot. In turn, this means individual bots will be red for longer, which in turn means subsequent breakages are easy to miss (because the bot doesn’t send emails when it goes from red for one reason to red for another reason). It’s going to be disruptive either way, but holding off on commits is more of an annoyance than a risk.

Whether it makes sense to freeze the repo now is less clear to me (it’s been 15+ hours since it went down, so plenty of time for damage to already be done).

The power has been restored.

Buildbot is back online.
Hope it would work trough the collected workload soon.

Please let me know if you see issues.

1 Like

Thank you for the update! The only builder I see for Clang that’s not come back as green after the bots went down is:

and I’ve already identified the problematic commit for it. So I think on the Clang side of things, we weathered the downtime quite well.

1 Like