LLVM lab could be unavailable today starting at 1:00 pm PDT today

Hello everyone,

LLVM lab could be unavailable today starting at 1:00 pm PDT today.

Thank you for understanding.

Thanks

Galina

Hello Galina,
It appears that none of our ppc64le bots are able to connect to lab.llvm.org?tags=ppc
This bot for example: clang-ppc64le-linux-lnt #27311. On the worker hosting end I see these errors repeated over and over again:

2022-06-28 09:27:58-0400 [-] recording hostname in twistd.hostname
2022-06-28 09:27:58-0400 [-] Starting factory <buildbot_worker.pb.BotFactory object at 0x7fffb9a354e0>
2022-06-28 09:28:28-0400 [-] Scheduling retry 1 to connect <twisted.internet.endpoints.TCP4ClientEndpoint object at 0x7fffb9a6bda0> in 1.6407759183131552 seconds.
2022-06-28 09:28:28-0400 [-] Stopping factory <buildbot_worker.pb.BotFactory object at 0x7fffb9a354e0>

I am able to change the port to staging and connect without problem: lab staging ppc64le and all of the build tools on our end seems fine, the big endian bots run the same tools at the same versions and are unaffected.

Do you know what is the problem?
I am wondering if this is a problem on the masters end and is related to this maintenance window since the clang-ppc64le-linux-multistage #20835 bot received a Master Shutdown command right before the problem occurred for all the ppc64le bots.

Thanks,
Kamau

Hello Kamau,

The problem is with your ‘ppc64le-flang-mlir-rhel-test’ worker. It spammed the build bot trying to connect with a wrong name and credentials, which triggered a blocking of the IP address it was connecting from. You have multiple workers behind that NAT address, aren’t you? This effectively banned all of them on the production buildbot.

Here is what I see in the buildbot logs:

...
2022-06-27 20:14:25-0700 [Broker,36,129.41.86.5] invalid login from unknown user 'ppc64le-flang-mlir-rhel-test'
2022-06-27 20:14:28-0700 [Broker,56,129.41.86.5] invalid login from unknown user 'ppc64le-flang-mlir-rhel-test'
2022-06-27 20:14:30-0700 [Broker,70,129.41.86.5] invalid login from unknown user 'ppc64le-flang-mlir-rhel-test'
2022-06-27 20:14:33-0700 [Broker,78,129.41.86.5] invalid login from unknown user 'ppc64le-flang-mlir-rhel-test'
...

I can remove that IP from the black list, but it will be blocked again, unless you fix the problem with that worker name and credentials or stop it. Please let me know when you are ready.

Thanks

Galina

Thank you very much for investigating. I am terribly sorry for this, I should have turned off the ‘ppc64le-flang-mlir-rhel-test’ worker in anticipation of the ppc64le-mlir-rhel-test and ppc64le-flang-rhel-test workers coming online in staging.

Yes, I do.

I have stopped and removed the ‘ppc64le-flang-mlir-rhel-test’ worker:

[buildbots@pk ppc64le-flang-mlir-rhel-test]$ buildbot-worker stop .
worker process 13713 is dead
[buildbots@pk ~]$ rm -rf ppc64le-flang-mlir-rhel-test
[buildbots@pk ~]$ crontab -e

Please let me know when its safe/okay to move the lab staging ppc64le back to production
I would like to only move:
clang-ppc64le-linux-lnt
clang-ppc64le-linux-multistage
clang-ppc64le-rhel
ppc64le-lld-multistage-test
sanitizer-ppc64le-linux
ppc64le-flang-rhel-clang
and intend to keep ppc64le-mlir-rhel-clang in staging while I try to figure out why the build command no longer works.

Thanks, Kamau!

I cleaned that IP address. Please give it a try whenever you are ready.

Awesome, I have successfully brought them from staging back to production, thank you!