Test Suite - Livermore Loops

David,

I got some more work on the Livermore Loops and I found out that the issue is the difference in the parameters between a single step and a multi step compilation.

When you compile “clang kernel06.c” it works fine, but when you get all steps (clang -emit-llvm + llvm-as + opt + llc etc), the defaults options of each and how they interact is showing a bug in the code generated.

This difference is due to the fact that I’m running the test-suite using LNT, while the build bots are running it using Make directly. I’d expect them both to be the same, but apparently they’re quite different in what kind of parameters they use, passes they test and results they get.

I think there are two courses of action here:

  1. Identify the issue, isolate the case and create a bug to resolve later.
  2. Make sure LNT does exactly what the build bots are doing

I’m working on item 1 right now, not sure how item 2 can be solved…

Of course, the fact that it’s the not same flow meant we caught a bug in LLVM, but that’s bound to create more confusion and broken commits, which is worse in the long run.

Also, if we’re not running LNT as often as buildbots, the benefit of having them different is sporadic at best.

When I set up some tests to run on ARM I have done both direct and multi-step, to make sure they were generating the same code and in many cases I found that the order in which the passes were executed was breaking some tests.

We managed to get the EDG bridge to set it up in the same way as the multi-pass would, so we would get similar results, but it doesn’t seem to be the case with clang.

cheers,
–renato

FYI, attached is a way to reproduce the error without the test-suite paraphernalia.

cheers,
–renato

livermore-llvm-bug.zip (5.79 KB)

Forgot to mention, commenting out the line that runs opt with link time opts solves the problem! :wink:

+Daniel & Michael who work on the LNT infrastructure & might have some
thoughts on the differences & their merits & motivations.

David,

I got some more work on the Livermore Loops and I found out that the issue
is the difference in the parameters between a single step and a multi step
compilation.

Thanks for the investigation.

When you compile "clang kernel06.c" it works fine, but when you get all
steps (clang -emit-llvm + llvm-as + opt + llc etc), the defaults options of
each and how they interact is showing a bug in the code generated.

Sounds quite plausible.

This difference is due to the fact that I'm running the test-suite using
LNT, while the build bots are running it using Make directly. I'd expect
them both to be the same, but apparently they're quite different in what
kind of parameters they use, passes they test and results they get.

I think there are two courses of action here:

1. Identify the issue, isolate the case and create a bug to resolve later.
2. Make sure LNT does exactly what the build bots are doing

Part of the issue here is whether or not the Make-based execution is
still maintained/valued. I'm getting the impression that the LNT
execution may be already, or be becoming, the standard way to run the
test suite even when not gathering perf statistics. Michael/Daniel -
is that the case?

If so, should we rip out the direct Make execution, or do something to
otherwise warn/disable it?

I'm working on item 1 right now, not sure how item 2 can be solved...

Of course, the fact that it's the not same flow meant we caught a bug in
LLVM, but that's bound to create more confusion and broken commits, which is
worse in the long run.

Yeah, unless there's some strong/specific motivation for this I'd be
in favor of removing the difference (or removing the Make-based
execution entirely)

Also, if we're not running LNT as often as buildbots, the benefit of having
them different is sporadic at best.

we're running both pretty regularly, I think - if anything I suspect
we might be running LNT on more configurations than the Make-based
execution (except that on some LNT runners we're multisampling, so
it's slower)

Part of the issue here is whether or not the Make-based execution is
still maintained/valued. I'm getting the impression that the LNT
execution may be already, or be becoming, the standard way to run the
test suite even when not gathering perf statistics. Michael/Daniel -
is that the case?

The main issue here is that Clang seems not to be choosing link time
optimizations by default, while the make-based run calls it explicitly. So
it is possible to achieve the same effect (ie. cover LTO) by turning them
on on some runs (for all types of tests on all hardware configurations).

If so, should we rip out the direct Make execution, or do something to
otherwise warn/disable it?

I'd strongly recommend that we use only one test style (LNT) everywhere,
and that we should test LTO more effectively.

cheers,
--renato

Part of the issue here is whether or not the Make-based execution is
still maintained/valued. I'm getting the impression that the LNT
execution may be already, or be becoming, the standard way to run the
test suite even when not gathering perf statistics. Michael/Daniel -
is that the case?

The main issue here is that Clang seems not to be choosing link time
optimizations by default, while the make-based run calls it explicitly. So
it is possible to achieve the same effect (ie. cover LTO) by turning them on
on some runs (for all types of tests on all hardware configurations).

If so, should we rip out the direct Make execution, or do something to
otherwise warn/disable it?

I'd strongly recommend that we use only one test style (LNT) everywhere,

Sure, I understand that's your preference (& mine). I was mostly
directing that question at Daniel & Michael to ensure they were on the
same page.

My only hesitation here is that using LNT as the authoritative runner
does have a little more setup overhead for people wishing to run the
suite (they need to install some extra stuff).

and
that we should test LTO more effectively.

Certainly - I expect Bill (Wendling) is working on that sooner or
later, as he seems to be making LTO a priority.

- David

My only hesitation here is that using LNT as the authoritative runner

does have a little more setup overhead for people wishing to run the

suite (they need to install some extra stuff).

Yes, there is an issue on shared machines, regarding installing new
software (mainly the virtualenv, since lnt itself is local to the sandbox).

However, to be fair, I found it way simpler to run the LNT tests than the
buildbots. The documentation was much clearer and the process sleeker.

cheers,
--renato

All of our internal testers use LNT. LNT behind the scenes just calls the Makefiles appropriately. So it would be impossible to get rid of the makefile execution without gutting LNT as well = p.

I will let Daniel comment on the rest of it.

Hi Michael,

The idea was never to gut the makefiles or LNT, but to let all buildbots
use LNT to call the makefiles, and not call them directly, as it happens
with some of them.

I think it's just a matter of "standardizing" who calls the makefiles, not
change any significant behaviour in the test-suite.

cheers,
--renato

And/or standardizing the way the Makefiles are invoked so that it's
the same as the way LNT invokes them.

- David

Fair enough - you could write up a patch for the zorg repository to do this.

Wouldn't requiring every buildbot to use LNT achieve the same thing?

--renato

That's how you achieve this goal. What a buildbot does is governed by
the configuration in the zorg repository (that's where we keep the
buildbot configuration code that is sync'd up to the lab.llvm.org
buildmaster).

Hi David,

I had a go at Zorg after the website was back online.

As far as I can tell, buildbot/osuosl/master/config/builders.py is the
script that has to be changed, using LNTBuilder rather than ClangBuilder.

I can see that there are only a few LNTBuilder's in use, and I don't want
to break all buildbots, so I'm planning in just changing the gcc12 and
gcc20 (with Duncan's permission) to make it stop failing the Livermore
loops.

Is there some PyDoc / Doxygen documentation on the Zorg classes? I'll be
digging it manually in the interim and will send you guys a patch to
convert those two buildbots to LNT.

Question: Is there any way of changing them locally BEFORE committing the
change to Zorg? Or is it changing Zorg the only way to test?

cheers,
--renato

Hi Renato,

    That's how you achieve this goal. What a buildbot does is governed by
    the configuration in the zorg repository (that's where we keep the
    buildbot configuration code that is sync'd up to the lab.llvm.org
    <http://lab.llvm.org>
    buildmaster).

Hi David,

I had a go at Zorg after the website was back online.

As far as I can tell, buildbot/osuosl/master/config/builders.py is the script
that has to be changed, using LNTBuilder rather than ClangBuilder.

I can see that there are only a few LNTBuilder's in use, and I don't want to
break all buildbots, so I'm planning in just changing the gcc12 and gcc20 (with
Duncan's permission) to make it stop failing the Livermore loops.

sorry, what change do you plan to make? Did you work out what the bug is? My
basic worry is that it sounds like you are trying to hide the underlying issue
rather than fixing it, please correct me if I'm wrong.

Is there some PyDoc / Doxygen documentation on the Zorg classes? I'll be digging
it manually in the interim and will send you guys a patch to convert those two
buildbots to LNT.

Question: Is there any way of changing them locally BEFORE committing the change
to Zorg? Or is it changing Zorg the only way to test?

To test you need to set up your own build master. It's not that hard.

Ciao, Duncan.

sorry, what change do you plan to make? Did you work out what the bug is?
My
basic worry is that it sounds like you are trying to hide the underlying
issue
rather than fixing it, please correct me if I'm wrong.

Hi Duncan,

There are two issues:

1. The LTO bug we found by running Livermore Loops on test-suite. I'm still
trying to isolate this and will report as soon as I get a smaller test
case. (I sent a tarball earlier to the list on how to reproduce it). Bugs
like these will not be caught any more with the standard LNT tests, true,
but there's also point 2 below...

2. Buildbots with multiple test builders, confusing and generating too much
noise. LNT is not testing LTO at the moment, but David said there someone
working on it right now. So, the way to go would be to have LNT on all
buildbots (in the long run) testing with and without LTO (and possibly
other variations), so we can have a coherent story and an easy way to
reproduce errors locally.

To test you need to set up your own build master. It's not that hard.

That's a good point. I'll do that.

cheers,
--renato

We are testing LTO internally and have not run into this issue IIRC. But on the other hand, we are doing a straight compilation (I.e. not doing it in parts as you said you were). Even so, you are right, we should have public lnt LTO testers.

My opinion is that tests should be intentional. If you spot a difference
between two calls, you either start testing both explicitly or ignore one
of them. Relying on side-effects for testing, in the majority of cases,
increase the perceived importance of small matters and takes away time to
fix real bugs.

If LTO is important (I think it is), then we should have explicit LTO
tests. If testing the order of passes is important, we should consistently
test it on all important configurations we have (ex. using bugpoint).
Keeping an old testing style *just* to have the side-effect of testing LTO
leads to confusion and noise which is worse in the end.

Random tests are one way of achieving a huge vector space in a fair way.
The hard bits is to know what to ignore (ie. it'll never happen in real
world) against the real bugs, that need fixing, or the real bugs that have
very little importance, etc. But all that should only be pursued when all
the other proper tests are set up and giving meaningful results.

cheers,
--renato

To weigh in here…

To weigh in here...

+Daniel & Michael who work on the LNT infrastructure & might have some
thoughts on the differences & their merits & motivations.

> David,
>
> I got some more work on the Livermore Loops and I found out that the
> issue
> is the difference in the parameters between a single step and a multi
> step
> compilation.

Thanks for the investigation.

> When you compile "clang kernel06.c" it works fine, but when you get all
> steps (clang -emit-llvm + llvm-as + opt + llc etc), the defaults options
> of
> each and how they interact is showing a bug in the code generated.

Sounds quite plausible.

> This difference is due to the fact that I'm running the test-suite using
> LNT, while the build bots are running it using Make directly. I'd expect
> them both to be the same, but apparently they're quite different in what
> kind of parameters they use, passes they test and results they get.
>
> I think there are two courses of action here:
>
> 1. Identify the issue, isolate the case and create a bug to resolve
> later.
> 2. Make sure LNT does exactly what the build bots are doing

Part of the issue here is whether or not the Make-based execution is
still maintained/valued. I'm getting the impression that the LNT
execution may be already, or be becoming, the standard way to run the
test suite even when not gathering perf statistics. Michael/Daniel -
is that the case?

Well, the distinction isn't really between LNT and non-LNT, its between the
TEST=nightly and TEST=simple style supported by the Makefiles. LNT uses the
TEST=simple style and that is all I care to support.

Fair enough, though that's sort of what I was getting at in a way:
whatever way LNT is driving the test-suite is, essentially, the only
supported way. Sure we can have non-LNT bots (not ideal, perhaps -
still another path to maintain/possibly diverge by accident) but they
certainly shouldn't be using anything other than the way LNT uses the
test-suite (ie: TEST=simple).

Can we kill TEST=nightly, then, since it's just an
untested/unsupported trap? Or do you know of users that have a need
for this?

- David

Historically, the old way of testing (TEST=nightly) used the various LLVM
tools to effect a compilation because there weren't compilers that worked.
However, this is a bad way to "test" the product that most users actually
care about, which is the compiler.

With TEST=simple, all the compilation is done using the compiler just as an
end user would. If you want LTO, the right way to get it is to use the
compilers support for LTO. This is how we test LTO internally. I've never
tried to get LTO working on Linux, but it should be possible using the gold
plugin and passing the right compiler options.

If so, should we rip out the direct Make execution, or do something to
otherwise warn/disable it?

Per my other thread polling users of the test-suite, there are still people
who use the Makefiles to do more custom things. I personally would love to
deprecate them completely, but they do support some useful workflows.

My ideal would be:
1. Migrate LNT to drive the test-suite using a more sane mechanism (not a
glob of Makefiles). I would like the "more sane mechanism" to be lit-based.
2. Maybe do some work to make using lit to drive the test-suite more
convenient and hopefully support some of the useful workflows the Makefiles
support with less of the crap.
3. Deprecate the Makefiles, or at least let the die through lack of
maintenance.

Does that answer the parts you wanted my input on?

More or less, I suppose I wouldn't mind an opinion on the "should we
kill off/migrate bots from test-suite invocation to LNT?" issue too.
(my assumption is that your answer to that is "yes", but just want to
be clear)

- David