[Implementation Details] Why does ASTContext::getTrivialTypeSourceInfo always allocate?

See https://github.com/llvm/llvm-project/blob/cd8f3e75813995c1d2da35370ffcf5af3aff9c2f/clang/lib/AST/ASTContext.cpp#L2975-L2993

It will always allocate. And ASTContext::getTrivialTypeSourceInfo is called relatively frequently and the sourcelocation is empty by default. llvm-project/ASTContext.h at cd8f3e75813995c1d2da35370ffcf5af3aff9c2f · llvm/llvm-project · GitHub

It smells bad. I feel it may be better to cache it for (QualType, SourceLocation) pair. For example, insert the following pseudo code in the front of the function:

if (TypeSourceInfoCache.find([QualType, SourceLocation]) != TypeSourceInfoCache.end())
    return TypeSourceInfoCache[QualType, SourceLocation];

But I am not sure if there is any special reason for this since it is too trivial…

1 Like

Paging @rjmccall in case he has more historical knowledge here.

I thought there were situations where the TypeLoc wrapped by the trivial TypeSourceInfo * will be updated in place, but when I looked around the code base I didn’t spot anything doing that (doesn’t mean I didn’t miss the code though).

I think using a cache (if possible) might be an improvement, but have you profiled how much overhead the extra allocations are causing?

It seems to me that getTrivialTypeSourceInfo is a sort of workaround that we should just get rid of. Would explain this problem, if that was the intention but we never got to it. Might not be feasible to do in the short term anyway.

1 Like

It’s been a long time, but I think I was thinking about that function mostly as a workaround for some places that weren’t preserving source locations properly. We did a lot less AST synthesis back then, and a lot of it was for some very secondary clients like the ObjC rewriter.

I remember thinking even at the time, though, that TypeSourceInfo probably wasn’t the right representation — that we should really be storing TypeLocs directly in the AST. Storing TypeLocs would be a coup because the storage of the source type would be independent of the storage of the source location data, which means:

  • we’d be able to use a static constant zero buffer for a lot of synthesized location data with no source locations,
  • even when we do have some source locations in synthesized AST, we could potentially unique a lot of the buffers they need with some simple heuristics, and
  • template substitution would often be able to re-use the location data buffer from the source type.

I don’t think we ever modify location buffers after filling them during type creation/substitution.

Anyway, doing a little bit of caching to reuse location buffers wouldn’t hurt in the very short term, but if you’re interested in the big win here, I recommend just killing off TSI completely in favor of storing TypeLocs.

3 Likes

Thanks for the explanation. I got it now. Since it looks like it doesn’t hurt too much now (I found this by reading the codes), we can and should pursue the long term beneficial. I’ve filed an issue for this: [AST] Killing off TSI completely in favor of storing TypeLocs. · Issue #57555 · llvm/llvm-project · GitHub.

1 Like

Another (too-obvious?) benefit of removing TSI is it would make the AST significantly easier to understand.

Talking to a handful of developers who’ve worked for multiple years consuming the clang AST, none of us could give a good explanation of the conceptual difference between TypeLoc and TypeSourceInfo, and how to choose between them in some case. It took us a while to come to the conclusion that it mostly didn’t matter!

1 Like