Splitting up Type.h: Good idea, bad idea?

Hello Clang folks,

I was using -ftime-trace to see where the compiler spends time parsing clang’s own headers, and it pointed me to Type.h. Many AST headers need QualType to be complete, but they do not need the full type class hierarchy. To improve compile time, I have attempted to create a new header, QualType.h, that defines only the QualType wrapper class. I have started to transition popular clang headers (Decl.h, Expr.h, ASTContext.h, etc) to just use QualType.h. I got pretty far with this, and pushed the results to github:
https://github.com/llvm/llvm-project/compare/master…rnk:split-qual-type-prune

You can see it is pretty involved / invasive. Many inline methods had to be sunk to cpp files, and this may affect performance. So far, I have nothing to show for it in build time gains, because there are still too many Type.h includes.

At this point I am wondering if this is worth doing at all. I figured I would take a break and ask for some opinions. If people generally feel this is worth pursuing, I’ll break that branch up and upload it to phab in incremental pieces.

The next most popular header seems to be CanonicalType.h, and it appears to be pretty strongly coupled to Type.h. It essentially reiterates all of the derived classes of Type, so I’m not sure how to break this link yet.

Reid

Hello Clang folks,

Why is this on llvm-dev instead of cfe-dev?

I was using -ftime-trace to see where the compiler spends time parsing
clang's own headers, and it pointed me to Type.h. Many AST headers need
QualType to be complete, but they do not need the full type class
hierarchy. To improve compile time, I have attempted to create a new
header, QualType.h, that defines only the QualType wrapper class. I have
started to transition popular clang headers (Decl.h, Expr.h, ASTContext.h,
etc) to just use QualType.h. I got pretty far with this, and pushed the
results to github:
https://github.com/llvm/llvm-project/compare/master...rnk:split-qual-type-prune

You can see it is pretty involved / invasive. Many inline methods had to be
sunk to cpp files, and this may affect performance. So far, I have nothing
to show for it in build time gains, because there are still too many Type.h
includes.

How many translation units actually don’t ultimately need the type
definitions? Because this achieves nothing if almost every translation
unit ends up including Type.h anyway.

John.

+cfe-dev.

Splitting this header up makes sense to me. Types.h was not intended to be the 7000 line monster it is now :-).

Splitting QualType out to its own header makes a lot of sense to me, but would it make sense to further split it up somehow? For example, one could split the C-like types, from the C+±like types, from the extensions. I’m not sure if that would be useful though.

-Chris

+cfe-dev.

Sorry about it going to llvm-dev. I typed cfe-dev into the To: box and got llvm-dev.

Splitting this header up makes sense to me. Types.h was not intended to be the 7000 line monster it is now :-).

Splitting QualType out to its own header makes a lot of sense to me, but would it make sense to further split it up somehow? For example, one could split the C-like types, from the C+±like types, from the extensions. I’m not sure if that would be useful though.

I think that’s a good idea. I think there are some key types that everyone uses (ReferenceType, PointerType, FunctionType). The Decl and Stmt class hierarchy currently often use these types from inline methods.

My guess would be that 90% or more of the translation units that depend
on the AST end up doing some type-specific analysis and so will need the
full Type.h anyway. Now, if you break Type.h up into the C base, then
sure, most of those translation units are just checking for RecordType or
PointerType.

John.

If you intend to improve compile time, wouldn't it be better to focus on
already existing ways to do that? Specifically with -DLLVM_ENABLE_MODULES=On
I'd expect Type.h to not matter for compile times at all.

It’s true that modules are generally defined at a larger granularity, and pruning includes this way will not affect modular build time. However, I am generally skeptical that using modules is a scalable way to speed up local builds.

Have you tried building with modules enabled? (as Lubos mentioned, it’s already supported by the LLVM build). What kind of scaling factors do you expect to be problematic in LLVM’s build?

Modules are great, but splitting up headers files is still a good idea for maintenance and readability of the codebase…

-Chris