-fplugin-arg-dragonegg-enable-gcc-optzns impact

Now that dragoneegg is robust in its default usage and the dragonegg svn
is moderately stable with -fplugin-arg-dragonegg-enable-gcc-optzns, it is
possible to gauge the impact of that feature. Comparing clang 2.9, FSF gcc 4.5.3svn,
FSF gcc 4.6.0 and dragonegg svn with FSF gcc 4.5.3svn using the himenoBMTxpa benchmark,
the enhancement to code performance from -fplugin-arg-dragonegg-enable-gcc-optzns is
clear on x86_64-apple-darwin10 for -fomit-frame-pointer -O3 -ffast-math -funroll-loops...

compiler MFLOPS
clang 2.9 230.529385
gcc 4.5.3 246.064891
de-gcc 4.5.3 204.845864
de-gcc 4.5.3 optzns 259.672855
gcc 4.6.0 260.344783

Pretty impressive.
          Jack

Hi Jack,

   Now that dragoneegg is robust in its default usage and the dragonegg svn
is moderately stable with -fplugin-arg-dragonegg-enable-gcc-optzns, it is
possible to gauge the impact of that feature. Comparing clang 2.9, FSF gcc 4.5.3svn,
FSF gcc 4.6.0 and dragonegg svn with FSF gcc 4.5.3svn using the himenoBMTxpa benchmark,
the enhancement to code performance from -fplugin-arg-dragonegg-enable-gcc-optzns is
clear on x86_64-apple-darwin10 for -fomit-frame-pointer -O3 -ffast-math -funroll-loops...

compiler MFLOPS
clang 2.9 230.529385
gcc 4.5.3 246.064891
de-gcc 4.5.3 204.845864
de-gcc 4.5.3 optzns 259.672855
gcc 4.6.0 260.344783

Pretty impressive.

interesting results. It needs some analysis to work out where the extra juice
is coming from though, in particular to distinguish between effects coming from
LLVM's IR level optimizations and those coming from its code generators.

Ciao, Duncan.

Hi Jack,

> Now that dragoneegg is robust in its default usage and the dragonegg svn
> is moderately stable with -fplugin-arg-dragonegg-enable-gcc-optzns, it is
> possible to gauge the impact of that feature. Comparing clang 2.9, FSF gcc 4.5.3svn,
> FSF gcc 4.6.0 and dragonegg svn with FSF gcc 4.5.3svn using the himenoBMTxpa benchmark,
> the enhancement to code performance from -fplugin-arg-dragonegg-enable-gcc-optzns is
> clear on x86_64-apple-darwin10 for -fomit-frame-pointer -O3 -ffast-math -funroll-loops...
>
> compiler MFLOPS
> clang 2.9 230.529385
> gcc 4.5.3 246.064891
> de-gcc 4.5.3 204.845864
> de-gcc 4.5.3 optzns 259.672855
> gcc 4.6.0 260.344783
>
> Pretty impressive.

interesting results. It needs some analysis to work out where the extra juice
is coming from though, in particular to distinguish between effects coming from
LLVM's IR level optimizations and those coming from its code generators.

Duncan,
   This gets even more interesting.

de-gcc45 -fomit-frame-pointer -O3 205.738266
de-gcc45 -fomit-frame-pointer -O3 -fplugin-arg-dragonegg-enable-gcc-optzns 267.066124
de-gcc45 -fomit-frame-pointer -O2 206.015974
de-gcc45 -fomit-frame-pointer -O2 -fplugin-arg-dragonegg-enable-gcc-optzns 276.676232
gcc-fsf-4.5 -fomit-frame-pointer -O2 239.868551
gcc-fsf-4.6 -fomit-frame-pointer -O2 248.147753
llvm-clang -fomit-frame-pointer -O2 226.756189

So the enhancement from -fplugin-arg-dragonegg-enable-gcc-optzns doesn't appear due
to optimizations added between -O2 and -O3. It also is interesting that -O2 outperforms
-O3 with -fplugin-arg-dragonegg-enable-gcc-optzns. I assume that the llvm and FSF gcc
optimizations must be at cross purposes somewhere there.
              Jack

Hi Jack,

    This gets even more interesting.

de-gcc45 -fomit-frame-pointer -O3 205.738266
de-gcc45 -fomit-frame-pointer -O3 -fplugin-arg-dragonegg-enable-gcc-optzns 267.066124
de-gcc45 -fomit-frame-pointer -O2 206.015974
de-gcc45 -fomit-frame-pointer -O2 -fplugin-arg-dragonegg-enable-gcc-optzns 276.676232
gcc-fsf-4.5 -fomit-frame-pointer -O2 239.868551
gcc-fsf-4.6 -fomit-frame-pointer -O2 248.147753
llvm-clang -fomit-frame-pointer -O2 226.756189

So the enhancement from -fplugin-arg-dragonegg-enable-gcc-optzns doesn't appear due
to optimizations added between -O2 and -O3. It also is interesting that -O2 outperforms
-O3 with -fplugin-arg-dragonegg-enable-gcc-optzns. I assume that the llvm and FSF gcc
optimizations must be at cross purposes somewhere there.

it would be nice to have a way of specifying different -O levels for the GCC and
the LLVM parts. Actually you can get this (awkwardly) by using (say) -O3 on the
gcc command line but disabling LLVM optimizations using
-fplugin-arg-dragonegg-disable-llvm-optzns; using
-fplugin-arg-dragonegg-emit-ir (and -S) to output bitcode; running "opt -O1" (or
whatever LLVM optimization level you want) on the bitcode; then using llc to
codegen it.

Ciao, Duncan.