gcc vs clang on pp vs variadic template tuple benchmark

Is anyone working on speeding up the compile time for variadic
templates? Apparently, because of the slow compile times for variadic
templates, variadic templates are not used by several boost libraries:

  fusion:
    http://www.boost.org/doc/libs/1_51_0/libs/fusion/doc/html/index.html
  mpl:
    http://www.boost.org/doc/libs/1_51_0/libs/mpl/doc/index.html
  proto:
    http://www.boost.org/doc/libs/1_51_0/doc/html/proto.html

Instead, these libraries still use preprocessing to generate code
instead of using variadic templates.

The purpose of the code here:

http://svn.boost.org/svn/boost/sandbox/variadic_templates/sandbox/slim/test/tuple.benchmark.cpp

was to provide some insight into why variadic templates is slower than
using preprocessing. It uses the slim library, which can be
downloaded using:

  git clone http://ch.ristopher.com/r/slim

(Unfortunately, some of the slim code had to be slightly modified to
enable compilation by clang. The modifications are in the sandbox:

http://svn.boost.org/svn/boost/sandbox/variadic_templates/sandbox/slim/slim/include/slim/support/deduce.hpp

http://svn.boost.org/svn/boost/sandbox/variadic_templates/sandbox/slim/slim/include/slim/support/internal/base.hpp
)

The slim library has 2 methods for implementing random access tuples.
When:

  #define TUPLE_TEST_IMPL TUPLE_TEST_VERTICAL

in tuple.benchmakr.cpp, the preprocessing method is used.
OTOH, when:

  #define TUPLE_TEST_IMPL TUPLE_TEST_HORIZONTAL

only variadic templates are used.

In the same directory where the benchmark.cpp file is located, there's
are also a Makefile and two .txt files. The two .txt files are
timings of the gcc and clang compiler on compiling the benchmark with
various selections of macros.

The interesting thing about the benchmark .txt files is that, for gcc,
when LAST_LESS is small, the difference in compile times between
TUPLE_TEST_VERTICAL and TUPLE_TEST_HORIZONTAL are comparable; however,
when LAST_LESS approaches TUPLE_SIZE and when TUPLE_SIZE gets a little
large (say around 16), the compile times for TUPLE_TEST_HORIZONTAL
dramatically increase.

In contrast, the compile times for clang show no such dramatic
increase. This suggests that *maybe* clang would lessen the
temptation to use the preprocessor for the aforementioned boost
libraries.

Does anyone have any idea why there is such a difference between gcc
an clang?

-regards,
Larry

[snip]

(Unfortunately, some of the slim code had to be slightly modified to
enable compilation by clang. The modifications are in the sandbox:

http://svn.boost.org/svn/boost/sandbox/variadic_templates/sandbox/slim/slim/include/slim/support/deduce.hpp

http://svn.boost.org/svn/boost/sandbox/variadic_templates/sandbox/slim/slim/include/slim/support/internal/base.hpp
)

Christopher Schmidt has recently updated slime to that the above
changes are no longer needed to enable clang to compile slim's
vector.
[snip]