Using the test suite to benchmark patches

Hi,

just a quick email. I've been working on a patch to simplifycfg last week and
want to test its performance. I've ran the test-suite succesfully, both with
the patched and unpatched versions. However, I could find no easy way to
compare both results. I see that the web pages of the nightly tester provide
nice results (changes compared to the day before, together with percentages
and colors etc). Something like that should be supported for two local test
runs as well, but I couldn't find how.

I did a bit of hacking on the HTMLColDiff.pl script that I found lying around,
the (rough) patch is attached. Is this script a quick hackup that got
forgotten, or is it still used by people for a purpose that I don't see right
now?

Any thoughts or suggestions on how to do this testing in a structured manner?
In particular, it would be useful to have some means of running a test a few
times and taking mean values for comparison or something...

Gr.

Matthijs

coldiff.diff (2.96 KB)

just a quick email. I've been working on a patch to simplifycfg last week and
want to test its performance. I've ran the test-suite succesfully, both with
the patched and unpatched versions. However, I could find no easy way to
compare both results. I see that the web pages of the nightly tester provide
nice results (changes compared to the day before, together with percentages
and colors etc). Something like that should be supported for two local test
runs as well, but I couldn't find how.

Currently, the nightly tester scripts only compare current day to previous and do not have the ability to compare between two test runs that are more than 1 day/run apart. There is a GSOC student who will be working to improve this and add this feature.

However, the nightly tester scripts are PHP based and require the results to be in a database. You could just send them to the LLVM server to do your comparisions if you wanted. Another option is to do a local set up, but this is more work and really only necessary if you have results that you don't want out in the public.

I did a bit of hacking on the HTMLColDiff.pl script that I found lying around,
the (rough) patch is attached. Is this script a quick hackup that got
forgotten, or is it still used by people for a purpose that I don't see right
now?

I do not know if people are using this frequently. If it doesn't work, I am sure the answer is no. :slight_smile: It was probably used before the nightly tester was around.

Any thoughts or suggestions on how to do this testing in a structured manner?
In particular, it would be useful to have some means of running a test a few
times and taking mean values for comparison or something...

I would look at the TEST.*.Makefile and TEST.*.report for a way to do your multiple run testing. You still have the problem with comparing 2 different sets of results though.. and that would require a new script.

Of course, someone else may have a better suggestion.

-Tanya

I think that if what you're doing is sound, and you get the results you want, say, on compiling something like gcc with it and others review the basic idea (hi evan or chris) and like it, just checking it in and watching the performance numbers for the next day seems reasonable to me. You can always revert the patch if there are unexpected downsides that make it not worth while.

If you can run something larger like spec, that'd also help.

It depends on the scope of the change. If it is a relatively minor change, getting the code approved, testing it for correctness, and adding a regression test is sufficient. If it is major (adding a new pass, significantly changing pass ordering etc) then the bar is much higher.

We don't have a great way of diffing performance runs, other than the nightly tester. Devang has an experimental "opt-beta" mode that can be used for experimenting with optimization passes, and we have "llc-beta" which is great for measuring the impact of codegen changes.

The usual approach is to decide that the patch is good, check it in, then watch for unexpected fallout on the nightly testers.

-Chris

Btw, this is all spelled out in the developers policy, which I like to mention in case people are not aware of it:
http://llvm.org/docs/DeveloperPolicy.html#quality

-Tanya

Matthijs,

We don't have a great way of diffing performance runs, other than the
nightly tester. Devang has an experimental "opt-beta" mode that can be
used for experimenting with optimization passes, and we have "llc-beta"
which is great for measuring the impact of codegen changes.

opt-beta allows you to compare "opt -std-compile-opts" vs. "opt <your sequence of optimization passes>"
For your use, you can have local command line option to trigger your simplifycfg patch and then you can try
  ENABLE_OPTBETA=1 OPTBETAOPTIONS="-your-command-line-flag -std-compile-opts"

Let me know, if you try this and run into issues.

Hi,

I've polished my changes to HTMLColDiff.pl a bit and it should now be a fairly
useful tool for finding performance changes. It's not the most robust tool,
but it should be able to compare different test runs and output something
useful.

opt-beta allows you to compare "opt -std-compile-opts" vs. "opt <your
sequence of optimization passes>"
For your use, you can have local command line option to trigger your
simplifycfg patch and then you can try
  ENABLE_OPTBETA=1 OPTBETAOPTIONS="-your-command-line-flag -std-compile-
opts"

This also sounds like an (even more useful) option, in particular to quickly
compare a single program or change. I will be trying this next.

Thanks for all the pointers!

Gr.

Matthijs

Hi Devang,

I've tried the OPTBETA approach, and it now runs with and without my patch
succesfully. I've found two problems, however:
1. The output of the nightly report does not include the figures from
    opt-beta by default. I've modified the TEST.nightly.report script to add
    two columns (OPT-BETA and LLC/OPT-BETA), but committing this change would
    mean that most users will be having a lot of useless columns (though they
    already have that for the LLC-BETA stuff).

    Is there a way to access the ENABLE_* variables from the makefile in the
    report script? If so, you could only show the columns that will contain
    actual output, which will make the report a lot easier to read as well.
2. The opt-beta output is always processed by llc, so you can only compare
    llc results with and without a change. It seems that for my change, the
    largest degradations are in the cbe test, not llc. I don't think there is
    an easy way to solve this, though.

Gr.

Matthijs

Hi Devang,

I've tried the OPTBETA approach, and it now runs with and without my patch
succesfully. I've found two problems, however:
1. The output of the nightly report does not include the figures from
   opt-beta by default. I've modified the TEST.nightly.report script to add
    two columns (OPT-BETA and LLC/OPT-BETA), but committing this change would
    mean that most users will be having a lot of useless columns (though they
    already have that for the LLC-BETA stuff).

Oh, I forgot to mention this. And such change may disrupt nightly tester database also.

    Is there a way to access the ENABLE_* variables from the makefile in the
    report script? If so, you could only show the columns that will contain
    actual output, which will make the report a lot easier to read as well.

I'm not sure.

2. The opt-beta output is always processed by llc, so you can only compare
   llc results with and without a change. It seems that for my change, the
    largest degradations are in the cbe test, not llc. I don't think there is
    an easy way to solve this, though.

Yes, opt-beta output is always processed by llc. If you want to use CBE then most likely you'll need another version of opt-beta. Or yet another variable to select cbe when opt-beta is used.