AArch64 buildbots and PR33972

Hi all,

It turns out we lost coverage of the release configuration on the
AArch64 buildbots for a while. This is because the machine that we
were running the clang-cmake-aarch64-full bot on became unstable
(random shutdowns after a day or two of building).

Embarrassingly enough, we didn't notice this until now because of a
bug in our internal bot monitoring page which listed the
clang-cmake-aarch64-lld bot as running compiler-rt too. This is not
actually the case.

We're working on bringing the full bot back on a different machine,
but that might take a couple of days. In the meantime, I turned back
on the old machine [2] until we get the release out. I'm going to keep
an eye on it and try to turn it back on as soon as I can if it shuts
down again. That way we should be able to test whatever fix or
workaround we come up with for PR33972 [1]. I think people won't be
getting spurious emails when the machine turns off, but if they do let
me know and I can move it to the silent master.

Sorry about the inconvenience.

Cheers,
Diana

[1] https://bugs.llvm.org/show_bug.cgi?id=33972
[2] http://lab.llvm.org:8011/builders/clang-cmake-aarch64-full

I risk guess that the random shutdowns started around the time when
the OOM tests were being introduced, so not likely the machine itself,
but the tests.

We had similar problems when running glibc memory tests, so I believe
bringing the bot back up is the only sane way to make sure.

Trying to allocate 30TB is not a good type of test on so many
different hardware and operating systems we have as buildbots. I think
we need to be a bit more practical and try to create environments
where the memory is limited well below the actual hardware (different
OSs have different ways to do that) and then try to allocate a small
amount of memory. It would be better to have it working on at least
one arch/OS than to go the full monty on all OSs and architectures.

--renato

I’d like to mention that test does not allocate 30TB, it allocates 1TB, the rest, ~20TB, is reserved (but not actually used) for ASan shadow memory, it should not be a problem by itself.

The test on your bot failed because it tried to reserve 27TB of memory, which is more than set by ulimit earlier in this test. I do not immediately see why it wants to reserve that much shadow for AArch64 config, that’s why before fixing anything, I’d like to know what’s going on. How and where can I access your bot configuration?

Ah, right, you mentioned that on the bug, sorry for missing it. Yes, this test should be disabled on 48-bit VMA AArch64 for the same reason it is disabled on s390 and then it is probably simpler to keep it disabled on aarch64 altogether, r311674 does just that (thank you!).

Great, glad we got that sorted out. Thanks!