I work on a C++ project for which compilation time is a significant
concern. One of my colleagues was able to significantly shorten the
time Clang took to compile our project, by manually outlining
independently-typed code from large template functions.
This makes intuitive sense to me, because when instantiating a
template function, Clang traverses the body of the function. The
longer the function body, the more nodes in the AST Clang has to
traverse, and the more time it takes. Programmers can read the
function and see that some statements in the function body remain the
same no matter what types the function is instantiated with. By
extracting these statements into a separate, non-template function,
programmers can reduce the amount of nodes Clang must traverse.
I created a contrived example that demonstrates how splitting up a
long template function can improve compile time. (Beware, the files
are large, I needed something that would take Clang a hefty amount of
time to process.)
In the example above, 'example.cpp' defines a template function
'foo<T, U, V, W, X, Y, Z>', whose body is ~46k LoC. It then
instantiates 'foo' 10 times, with 10 different combinations of
template type parameters. In total, 'clang -c -O1 example.cpp -Xclang
-disable-llvm-passes -Xclang -emit-llvm' takes ~35 seconds in total to
compile. Each additional instantiation of 'foo' adds an additional ~3
seconds to the total compile time.
Only the last statement in 'foo' is dependent upon the template type
parameters to 'foo'. 'example-outlined.cpp' moves ~46k LoC of
independently-typed statements out of 'foo' and into a function named
'foo_prologue_outlined', and has 'foo' call 'foo_prologue_outlined'.
'foo_prologue_outlined' is not a template function. The result is
identical program behavior, but a total compile time of just ~5
seconds (~85% faster). Additional instantiations of 'foo' in
'example-outlined.cpp' cost almost no additional compile time.
Although the functions in our project are not as long, some of them
take significantly longer than 35 seconds to compile. By outlining
independently-typed statements, we've been able to reduce compile time
of some functions, from 300s to 200s (1/3rd faster). So, my colleagues
and I are looking for other functions we can manually outline in order
to reduce the amount of time Clang takes to compile our project. To
this end, it would be handy if Clang could tell us, for example, “hey,
I just instantiated 'bar<int, float, double>', but X% of the
statements in that function did not require transformation,” where
'X%' is some threshold that could be set in the compiler invocation.
For now I'm thinking the option to set this warning threshold could be
called '-Wwasteful-template-threshold=' -- but I'm aware that sounds
awkward, and I'd love suggestions for a better name.
I think implementing this feature is possible by adding some state to
TreeTransform, or the Clang template instantiators that derive from
that class. But before I send a patch to do so, I'm curious if anyone
has attempted such a thing before, or if anyone has thoughts or
comments on this feature. I'd prefer not to spend time implementing
this diagnostic in Clang if it's predestined to be rejected in code
review, so please let me know what you think!
- Brian Gesiak