some notes about templates

A while ago, Richard Smith and I dug into a couple bugs in the general vicinity of templates, and came up with some ideas for changes to clang we’d like to see. As is, neither of us are presently working on any of this, but we though we should write them up in a place where they’re publicly readable.

ActiveTemplateInstantiations shouldn’t be a SmallVector in Sema, they should be allocated by a BumpPtrAllocator in ASTContext. Then we can put ActiveTemplateInstantions in the AST, which will requiring changing it a bit (no pointers to sema:: objects), but we also want to add a template depth, correct point-of-instantiation SourceLocation and add missing instantiation records (such as instantiation due to overload resolution). An awesome outcome of this will be that we’ll be able to produce template stack traces even when we aren’t actively doing template instantiation. That would in turn allow us to do other things like flatten instantiation instead of making it recursive.

Notice that I mentioned correct point-of-instantiations? It turns out we get them wrong in many cases, but nothing notices. We started to notice when we added a new warning that tries to (efficiently!) enforce the rule that a template must be the same (ie., name lookup would return the same result) at each possible point of instantiation. If we store the point of instantiation at the earliest possible PoI, then instantiation as much as we can as late as we can, we can check that all the declarations found through name lookup are before the PoI. The attached patch implements this, but unfortunately it isn’t useful now because the PoI is so often wrong.

There is also a cluster of small issues with templates declared inside functions. We can remove PendingLocalImplicitInstantiations because we needn’t defer instantiations inside a function. When the function is itself a template, we need to make sure that we recursively instantiate everything lexically inside of it (excluding other templates which can themselves be instantiated when needed), and currently our visitor gets confused by this and misses some cases.

Nick

warn-point-of-instantiation-differences.diff (2.94 KB)

[snip]

ActiveTemplateInstantiations shouldn't be a SmallVector in Sema, they
should be allocated by a BumpPtrAllocator in ASTContext. Then we can put
ActiveTemplateInstantions in the AST, which will requiring changing it a
bit (no pointers to sema:: objects), but we also want to add a template
depth, correct point-of-instantiation SourceLocation and add missing
instantiation records (such as instantiation due to overload
resolution). An awesome outcome of this will be that we'll be able to
produce template stack traces even when we aren't actively doing
template instantiation. That would in turn allow us to do other things
like flatten instantiation instead of making it recursive.

Would this flattening speed compile times? Long compile times
is one strong reason why boost preprocessor is used to implement tuples:

http://article.gmane.org/gmane.comp.lib.boost.devel/235386

-regards,
Larry

Not really, maybe only a tiny bit. What it means is that the depth of
templates we can instantiate wouldn't be limited by our stack space, but by
heap space.

I would be very surprised if clang has bad compile-time performance
building tuples. Without thinking about it too hard, this sounds like
something we should already be good at.

Nick

Hi Nick.

So you would also be surprised if Eric's benchmark mentioned in the
above boost.devel ml post showed little compile-time speed gain using
the preprocessor? (NOTE, as mentioned in the ml post, Eric
just used gcc, not clang).

-regards,
Larry

    [snip]
    > ActiveTemplateInstantiations shouldn't be a SmallVector in Sema, they
    > should be allocated by a BumpPtrAllocator in ASTContext. Then we
    can put
    > ActiveTemplateInstantions in the AST, which will requiring
    changing it a
    > bit (no pointers to sema:: objects), but we also want to add a
    template
    > depth, correct point-of-instantiation SourceLocation and add missing
    > instantiation records (such as instantiation due to overload
    > resolution). An awesome outcome of this will be that we'll be able to
    > produce template stack traces even when we aren't actively doing
    > template instantiation. That would in turn allow us to do other things
    > like flatten instantiation instead of making it recursive.
    >
    Would this flattening speed compile times? Long compile times
    is one strong reason why boost preprocessor is used to implement tuples:

    http://article.gmane.org/gmane.comp.lib.boost.devel/235386

Not really, maybe only a tiny bit. What it means is that the depth of
templates we can instantiate wouldn't be limited by our stack space, but
by heap space.

I would be very surprised if clang has bad compile-time performance
building tuples. Without thinking about it too hard, this sounds like
something we should already be good at.

Nick

clang3.3 does much better than gcc4.8, at least according to the
attached timings.

The test driver is also attached.

The .hpp file was python generated during each run to generate
the tuple_t. It had the form:

  typedef lib_container::vector
  < int_value< 0 >
  , int_value< 1 >
  , int_value< 2 >
.
.
.
  , int_value< 498 >
  , int_value< 499 >
  >
tuple_t;

The slim library was cloned from:

http://ch.ristopher.com/r/slim

sometime around June 2012.

HTH.

-regards,
Larry

last run it

tuple.benchmark.simple.clangxx_rel.txt (673 Bytes)

tuple.benchmark.simple.gcc4_8_rel.txt (706 Bytes)

tuple.benchmark.simple.cpp (1.41 KB)

[snip]

clang3.3 does much better than gcc4.8, at least according to the
attached timings.

The test driver is also attached.

[snip]
OOPS. The make file might also help a bit in clarifying the
timings. It's attached. It was invoked with:

make -f tuple.benchmark.simple.mk timings

-Larry

tuple.benchmark.simple.mk (1.17 KB)