Debugging atom builds?

Hi LLVM Dev,

One of my recent commits has broken build on Ubuntu Atom D2700.

The Buildbot has detected a new failure on builder clang-atom-d2700-ubuntu-rel while building llvm.
Full details are available at:
http://lab.llvm.org:8011/builders/clang-atom-d2700-ubuntu-rel/builds/7852

Buildbot URL: http://lab.llvm.org:8011/

Buildslave for this Build: atom1-buildbot

Build Reason: scheduler
Build Source Stamp: [branch trunk] 178634
Blamelist: timurrrr

BUILD FAILED: failed check-all

Can you suggest me a way to debug this without an access to Atom D2700?

I ran "make check-all" on my Ubuntu box before committing and yet the
Ubuntu Atom bot went red.
Is there any way we could write test in a cross-platform way? (i.e. if
a test passes in one setup, it will pass with any other setup)

Thanks,
Timur

Hi,

As someone working on ARM, we're interested in this sort of issue, to which
there's not yet a complete solution. Some advice below:

The Buildbot has detected a new failure on builder

clang-atom-d2700-ubuntu-rel while building llvm.

Full details are available at:
http://lab.llvm.org:8011/builders/clang-atom-d2700-ubuntu-rel/builds/7852

Buildbot URL: http://lab.llvm.org:8011/

Buildslave for this Build: atom1-buildbot

Build Reason: scheduler
Build Source Stamp: [branch trunk] 178634
Blamelist: timurrrr

BUILD FAILED: failed check-all

Can you suggest me a way to debug this without an access to Atom D2700?

The first thing to do is not assume it's necessarily machine specific, so go
to the build that's failed and look at the "configure" stdio output and see
what the arguments are. It might be that if you use those arguments (when
they aren't completely unsupported on your architecture) to configure and
see if you get the same failure. I see that the args include
"--enable-optimized --enable-assertions": is that what you were testing? If
that still doesn't work you can look at the stdio file for the actual tests
and search for "FAILED" to at least see what happened in detail (looks like
a different instruction was emitted).

I ran "make check-all" on my Ubuntu box before committing and yet the
Ubuntu Atom bot went red.

Is there any way we could write test in a cross-platform way? (i.e. if
a test passes in one setup, it will pass with any other setup)

As above, it's not necessarily just a "platform" thing; configure options
can also affect processing enough to cause issues. These you can at least
try on your system. But in terms of cross-platform tests, it doesn't seem
possible because while half the time the breakage is "trivial differences
that only crop up when you try to write a test", half the time the
differences reflect a platform difference that would also affect real code.
This clearly isn't ideal, but I guess we'd rather hear about platforms where
stuff doesn't work earlier rather than later.

Cheers,
Dave

Hi,

As someone working on ARM, we're interested in this sort of issue, to which
there's not yet a complete solution. Some advice below:

The Buildbot has detected a new failure on builder

clang-atom-d2700-ubuntu-rel while building llvm.

Full details are available at:
http://lab.llvm.org:8011/builders/clang-atom-d2700-ubuntu-rel/builds/7852

Buildbot URL: http://lab.llvm.org:8011/

Buildslave for this Build: atom1-buildbot

Build Reason: scheduler
Build Source Stamp: [branch trunk] 178634
Blamelist: timurrrr

BUILD FAILED: failed check-all

> Can you suggest me a way to debug this without an access to Atom D2700?

The first thing to do is not assume it's necessarily machine specific, so go
to the build that's failed and look at the "configure" stdio output and see
what the arguments are. It might be that if you use those arguments (when
they aren't completely unsupported on your architecture) to configure and
see if you get the same failure. I see that the args include
"--enable-optimized --enable-assertions": is that what you were testing?

Yes =\

If that still doesn't work you can look at the stdio file for the actual tests
and search for "FAILED" to at least see what happened in detail (looks like
a different instruction was emitted).

Sure, just wanted to know how I can repro this locally to write better checks.

Probably I can pass some special flag to llc to
a) repro the Atom behavior locally
b) explicitly tell llc to use some specific target arch, making the
output Atom-independent

I ran "make check-all" on my Ubuntu box before committing and yet the
Ubuntu Atom bot went red.
> Is there any way we could write test in a cross-platform way? (i.e. if
> a test passes in one setup, it will pass with any other setup)

As above, it's not necessarily just a "platform" thing; configure options
can also affect processing enough to cause issues. These you can at least
try on your system. But in terms of cross-platform tests, it doesn't seem
possible because while half the time the breakage is "trivial differences
that only crop up when you try to write a test", half the time the
differences reflect a platform difference that would also affect real code.
This clearly isn't ideal, but I guess we'd rather hear about platforms where
stuff doesn't work earlier rather than later.

I totally agree we should know about problems earlier rather than later.

[I'm relatively new to LLVM development, so I might be wrong below]
I'd usually expect tests to work exactly the same way on all platforms.
If there are things that must be handled differently on different
platforms, I'd expect those to have a separate test for each platform
with explicit flags for each of these platforms.

You're absolutely right, but the tests don't necessarily follow the
accepted model.

The main causes of build failure, to my experience are, in order of
probability:

* Bad tests, testing for order in random output or machine-specific
behaviour (lib version, etc)
* Target-specific behaviour on target agnostic tests (as you say, it can be
easily fixed)
* autoconf vs. CMake issues
* bug in LLVM/Clang

What I do is to contact the responsible for the bot and ask him/her to give
me more info on the failure, so I can start the debug process, and go back
and forth until the bug is fixed. The owner can be queried on the bot's
page:

http://lab.llvm.org:8011/buildslaves/atom1-buildbot

cheers,
--renato

I'd usually expect tests to work exactly the same way on all platforms.
If there are things that must be handled differently on different
platforms, I'd expect those to have a separate test for each platform
with explicit flags for each of these platforms.

You're absolutely right, but the tests don't necessarily follow the accepted
model.

The main causes of build failure, to my experience are, in order of
probability:

* Bad tests, testing for order in random output or machine-specific
behaviour (lib version, etc)
* Target-specific behaviour on target agnostic tests (as you say, it can be
easily fixed)
* autoconf vs. CMake issues
* bug in LLVM/Clang

What I do is to contact the responsible for the bot and ask him/her to give
me more info on the failure, so I can start the debug process, and go back
and forth until the bug is fixed. The owner can be queried on the bot's
page:

http://lab.llvm.org:8011/buildslaves/atom1-buildbot

That's a great suggestion, thank you!

If that still doesn't work you can look at the stdio file for the actual tests
and search for "FAILED" to at least see what happened in detail (looks like
a different instruction was emitted).

Sure, just wanted to know how I can repro this locally to write better checks.

Probably I can pass some special flag to llc to
a) repro the Atom behavior locally
b) explicitly tell llc to use some specific target arch, making the
output Atom-independent

Yes, that's a good way to analyse things; the tricky bit is it's not always obvious which explicit triple corresponds to the "native" target (particularly with the clang driver, which is clever but inscrutable).

I ran "make check-all" on my Ubuntu box before committing and yet the
Ubuntu Atom bot went red.
> Is there any way we could write test in a cross-platform way? (i.e. if
> a test passes in one setup, it will pass with any other setup)

As above, it's not necessarily just a "platform" thing; configure options
can also affect processing enough to cause issues. These you can at least
try on your system. But in terms of cross-platform tests, it doesn't seem
possible because while half the time the breakage is "trivial differences
that only crop up when you try to write a test", half the time the
differences reflect a platform difference that would also affect real code.
This clearly isn't ideal, but I guess we'd rather hear about platforms where
stuff doesn't work earlier rather than later.

I totally agree we should know about problems earlier rather than later.

[I'm relatively new to LLVM development, so I might be wrong below]
I'd usually expect tests to work exactly the same way on all platforms.
If there are things that must be handled differently on different
platforms, I'd expect those to have a separate test for each platform
with explicit flags for each of these platforms.

Sorry, I was a bit unclear. What I meant was: this is fundamentally a compiler project and CPU's have many subtle differences. I spent a while last year helping fix issues on the ARM regressions and found that maybe 50-75 per cent were "cosmetic differences" (an implementation defined type (eg, char) had been given a concrete type based on what it was on the implementers system which didn't work on systems that used a different definition) which arguably only cause problems when you're trying to write a platform-independent test case. The remaining issues were cases where the test was either detecting a genuine failure on the other platform (eg, an transformation wasn't correctly plumbed in to fire on all platforms) or there were subtle differences (eg, valid-but-different ABI's that weren't being correctly accounted for) that would also hit "real code". The problem is that it's not as simple as "this is generic" vs "this is totally platform specific": a "generally generic" transformation often has some tangential stuff which is platform specific stuff associated with it. There's been a certain amount of debate about the correct settings for tests, while it's generally OK to have a concrete triple on tests, that's something I'm a bit uncomfortable with doing too widely as it leads to developers deciding "I'll write code that works on my platform and ensure we only test there; every other platform is not something within my purview" which tends to mitigate against the whole idea of LLVM as an "IR compiler that's cross platform". (I'm saying this what it's desirable to avoid; clearly you're doing the responsible thing and looking at the regression that has arisen on other platforms, which is great.)

Cheers,
Dave