> As I see it, there are regulary commits that introduce performance and
> code size regressions. There doesn't seem to be any formal testing in
> place. Not for X86, not for ARM. Hunting down regressions like
> enable-iv-rewrite=false, which added 130 Bytes to a piece of code that
> can only be 8KB large in total is painful and slow. From my point of
> view, the only way to ensure that the compiler does a good job is
> providing a test infrastructure to monitor this. This is about forcing
^^^ not
> pre-commit test, it is about ensuring that the testing is done at all
> in a timely manner.
In a world of multiple developers with conflicting priorities, this
simply isn't realistic. I know that those 130 bytes are very important
to those concerned with the NetBSD bootloader, but the patch that added
them was worth significant performance improvements on important
benchmarks (see Jack Howarth's posting for 9/6/11, for instance), which
lots of other developers consider an obviously good tradeoff.
Don't get me wrong, my problem is not the patch by itself. LLVM at the
moment is relatively bad at creating compact code on x86. I'm not sure
what the status is on ARM for that, but there are use cases where it
matters a lot. Boot loaders are one of them. So disabling some
optimisations when using -Os or -Oz is fine.
The bigger issue is that accepting a size/performance trade off here and
another one there and yet another trade off in that corner sums up. It
can get to the point any of the trade offs by itself is fine, but the
total result goes over the CPU instruction cache and completely kills
performance. More importantly, it will happen with completely harmless
looking changes at some point.
A policy of "never regress anything" is not tenable, because ANY change
in code generation has the possibility to regress something. We end up
in a world where either we never make any forward progress, or where
developers hoard up trivial improvements they can use to "negate" the
regressions caused by real development work. Neither of these is a
desirable direction.
This is not what I was asking for. For GCC there are not only build bots
and functional regression tests, but also regular runs of benchmarks
like SPEC etc. Consider it a call for the community to identify useful
real-world test cases to measure:
(1) Changes in the performance of compiled code, both with and without
LTO.
(2) Changes in the size of compiled code, both with and without
explicitly optimising for it.
(3) Changes in compilation time.
I know that for many bigger changes at least (1) and (3) are often
checked. This is about doing a general testing over a long time. When a
regression on one of the metrics occur, it can be evaluated. But that's
a separate discussion, e.g. whether to disable an optimisation for
-Os/-Oz or move it to a higher optimiser level etc.
The existing modus operandi on X86 and other targets has been that
there is a core of functionality (what is represented by the LLVM
regression tests and test-suite) that all developers implicitly agree
to avoid regressing on set of "blessed" configurations. We are
deliberately cautious in expanding the range of functionality that
cannot be regressed, or on widening the set of configurations (beyond
those easily accessible to all developers) on which those regressions
must not occur. This allows us to improve quality over time without
preventing forward progress.
As I see it, the current regression test suite is aimed at preventing
bad compilation. It's not that useful to handle the other cases above.
Of course, checking for compile or runtime regressions is a lot harder
to do as they require a reproducable environment. So my request can't
replace the existing tests and it isn't meant to.
I hope I made myself a bit clearer.
Joerg