Unstable XRay test on ARM

Hi Dean/Serge,

I just spotted this on our bots:

First failure, unrelated commit:
http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15-full/builds/3190

'XRay-Unit :: unit/XRayFDRLoggingTest/FDRLoggingTest.Simple' FAILED
llvm/projects/compiler-rt/lib/xray/tests/unit/fdr_logging_test.cc:55: Failure
      Expected: FDRLogging_init(kBufferSize, kBufferMax, &Options,
sizeof(FDRLoggingOptions))
      Which is: 0
To be equal to: XRayLogInitStatus::XRAY_LOG_INITIALIZED
      Which is: 2
[ FAILED ] FDRLoggingTest.Simple (0 ms)
...
1 FAILED TEST
==11476==XRay instrumentation map missing. Not initializing XRay.

Then a similar, but not identical error:
http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15-full/builds/3191

'XRay-Unit :: unit/XRayFDRLoggingTest/FDRLoggingTest.Simple' FAILED
llvm/projects/compiler-rt/lib/xray/tests/unit/fdr_logging_test.cc:58: Failure
      Expected: FDRLogging_finalize()
      Which is: 2
To be equal to: XRayLogInitStatus::XRAY_LOG_FINALIZED
      Which is: 4
[ FAILED ] FDRLoggingTest.Simple (0 ms)
...
1 FAILED TEST
==11476==XRay instrumentation map missing. Not initializing XRay.

Note 0->2 on the first, 2->4 on the second.

And then, for no reason, it's green again:
http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15-full/builds/3192

Looks like an unstable test to me. :slight_smile:

Can you guys have a look?

Thanks!
--renato

Hi Renato, Dean, Serge,

Just looked into the code and wanted to share some thoughts.

This might be a compare_exchange_weak spurious failure. ARM is a weakly ordered CPU, but I am not sure whether spurious failures are really possible in a single threaded app. On the other hand, there is no other way for FDRLogging_init to fail in such a way (return XRAY_LOG_UNINITIALIZED instead of XRAY_LOG_INITIALIZED) without any extra output. This is also true for the 2nd test failure in FDRLogging_finalize, which uses a weak exchange too.

Probably, the weak exchange needs to be either replaced with a strong one or looped with more detailed CurrentStatus checks.

Oleg

+Dean Michael Berris

Hi,

Indeed, it seems that exactly this happens: “The weak forms (1-2) of the functions are allowed to fail spuriously, that is, act as if *this != expected even if they are equal.” (from http://en.cppreference.com/w/cpp/atomic/atomic/compare_exchange ). Because the value returned indicates that they are indeed equal. The reference doesn’t say it shouldn’t happen in single-threaded scenario, so I don’t see a reason to investigate why this happens except curiosity… maybe the mutex is not ready, or CPU cache problem, or it translates into a single CPU instruction that fails for hardware reasons, who knows.

Dean, is there any reason not to change this to compare_exchange_strong() ? As I understand, polling in a loop is not an option because it’s not clear for how long to iterate.

Cheers,
Serge

Yeah, I was being optimistic about this one. It should be changed to strong if he weak version doesn’t quite do it.

I agree.

Do you guys think we can get that backported in 4.0?

cheers,
--renato

I don’t object to backporting the fix. But I think the FDR itself is not in 4.0 yet. Let Dean suggest how risky FDR may be: I see substantial code changes, but for the old code they seem to be mostly moving.

Dean, if you’re on vacation, I can submit the compare_exchange_strong() fix.

Cheers,
Serge

Oh, so, don't worry. I just thought that this had gone in and the fix
was necessary. If not, than let's keep it in trunk.

cheers,
--renato

FYI, we got it again...

http://lab.llvm.org:8011/builders/clang-cmake-aarch64-42vma/builds/3875

So it wasn't a one of.

cheers,
--renato

And again...

http://lab.llvm.org:8011/builders/clang-cmake-aarch64-42vma/builds/3900

Serge, if you know the fix, can you please apply?

--renato

Yes, I will, just a little later today.

I’ve submitted the fix for review: https://reviews.llvm.org/D29286

Thanks!

I'm not sure how I can review that, other than it makes sense based on
the error message and your conversation with Dean. :slight_smile:

I'm happy for this to go through now and later be reviewed by Dean
(and reverted if necessary).

cheers,
--renato