Messing around with clang -cc1 -print-stats, I think I can show that every class template instantiation costs a minimum of 1KiB of memory. To me, that seems like a lot of data just to capture temporary type traits that are often used solely for template metaprogramming namespace purposes. I figured that sharing this factoid might make it concrete, and might motivate talented C++ library developers to think more about optimizing their instantiation counts.
This is what I hacked together to try to measure the AST memory used per class template instantiation:
$ cat t.cpp
template <int N>
struct ClassTemplate {
static const int sdm = ClassTemplate<N-1>::sdm + 1;
};
template <>
struct ClassTemplate<0> {
static const int sdm = 0;
};
int main() {
return ClassTemplate<MULTIPLIER * 100>::sdm;
}
$ for i in $(seq 1 10) ; do clang -cc1 -DMULTIPLIER=$i t.cpp -print-stats |& \
grep Bytes\ used | cut -d' ' -f 3 | awk '{ sum += $0 } END { print sum }' ; done
386824
500824
614824
728824
842824
956824
1070824
1184824
1298824
1412824
I’m not a good enough shell programmer to do the subtraction here, but I threw it in sheets and did the obvious difference calculations to show that each step is 114000, and if you divide by 100 (the multiplier) you get 1140 per instantiation. The counts of each record go up in very obvious linear ways:
*** AST Context Stats:
2080 types total.
1 ConstantArray types, 48 each (48 bytes)
63 Builtin types, 32 each (2016 bytes)
1 FunctionProto types, 48 each (48 bytes)
1 InjectedClassName types, 48 each (48 bytes)
5 Pointer types, 48 each (240 bytes)
1003 Record types, 32 each (32096 bytes)
1006 TemplateSpecialization types, 48 each (48288 bytes)
Total bytes = 82784
0/1001 implicit default constructors created
0/1001 implicit copy constructors created
0/1001 implicit move constructors created
0/1001 implicit copy assignment operators created
0/1001 implicit move assignment operators created
0/1001 implicit destructors created
Number of memory regions: 207
Bytes used: 1147908
Bytes allocated: 1171456
Bytes wasted: 23548 (includes alignment, etc)
*** Decl Stats:
3025 decls total.
1 ExternCContext decls, 72 each (72 bytes)
1 Function decls, 168 each (168 bytes)
1002 Var decls, 104 each (104208 bytes)
1 NonTypeTemplateParm decls, 88 each (88 bytes)
8 Field decls, 80 each (640 bytes)
1005 CXXRecord decls, 144 each (144720 bytes)
1001 ClassTemplateSpecialization decls, 184 each (184184 bytes)
5 Typedef decls, 88 each (440 bytes)
1 ClassTemplate decls, 88 each (88 bytes)
Total bytes = 434608
*** Stmt/Expr Stats:
9021 stmts/exprs total.
1000 SubstNonTypeTemplateParmExpr, 40 each (40000 bytes)
1 UnresolvedLookupExpr, 64 each (64 bytes)
1001 MaterializeTemporaryExpr, 24 each (24024 bytes)
1006 IntegerLiteral, 32 each (32192 bytes)
1002 ConstantExpr, 24 each (24048 bytes)
1 DependentScopeDeclRefExpr, 56 each (56 bytes)
2004 DeclRefExpr, 32 each (64128 bytes)
1001 ImplicitCastExpr, 24 each (24024 bytes)
2003 BinaryOperator, 32 each (64096 bytes)
1 ReturnStmt, 16 each (16 bytes)
1 CompoundStmt, 16 each (16 bytes)
Total bytes = 272664
Digging in a bit, I guess this is like 4-to-3-ish Decl to Stmt, which suggests that the template body matters a lot, and mine is pretty simple. It seems like one could use a similar methodology to stamp out unique_ptrs and count the per-instantiation memory overhead, and it would be quite high.
I’m not sure what to do with this information, other than to reflect on the fact that the compiler currently doesn’t give any feedback to the developer about the costs they are paying to compile their program. The state of the art is basically -ftime-trace, which in turn mostly tells people that the compiler spends a lot of time instantiating base or standard library template sets. I’ve seen some limited success with this, but only C++ experts seem to find this data to be actionable. I wonder if we could build a trace flame graph weighted by memory allocation, given that the cost ratio of RAM to compute is increasing. ![]()
Maybe what this all points to is that header-only library evolution is a dead end when it comes to compile time, and that if you want to have fast compiles, the old ways, i.e. forward declarations, pimpl patterns, and other forms of implementation hiding, are still relevant if you care about compile time, even in a modular world. ![]()