Debug Info type optimization based on explicit template instantiation decls/defs

So, a while back I had started working on a debug info type optimization (similar to the vtable optimization that avoids emitting type information for dynamic classes except in the TUs that emit the vtable (thus drastically reducing debug info size)) because I’d mistaken GCC’s vtable optimization for this other one.

The basic idea is that, since any program that contains an explicit template instantiation declaration and ODR uses that type must also have an explicit template instantiation definition - we could leverage this to avoid emitting the full debug info for that type in the TUs that contain the explicit declaration and forcibly emit the type where the explicit definition is.

Easy.

Except there’s a wrinkle. We need to interoperate with GCC’s debug info (for things like the standard library). This means we can’t make stronger assumptions about where the debug info will be emitted than GCC will provide.

For example:

template struct foo { int i; }
template struct foo;

Doesn’t cause GCC to emit debug info for foo. But if you add a “void mem() { }” to ‘foo’, then it does (since the definition of the foo::mem function has to be emitted, along with the debug info for that and thus the debug info for its enclosing class… ).

So, what I’d like to be able to do is come up with some conservative test for “will this explicit template instantiation actually cause GCC to emit the debug info for the type” one such “sufficient by not necessarily necessary” Condition for this would be “are there any member function definitions prior to the explicit instantiation decl”. But I’m having trouble writing that test - my cursory attempt to walk the methods of the CXXRecordDecl and test that they were defined failed.

I’m guessing this failed because Clang isn’t instantiating the definitions of the member functions because it doesn’t need to (though it is allowed to, well, for the inline ones - there could be out of line definitions I could leverage here too)?

Is there a good/better/existing way I could write this test?

[if/when we implement this I’d still like to more aggressively emit the definition in the presence of even trivial explicit instantiation definitions where GCC currently doesn’t - so that maybe in the future we can remove this extra check and just rely on the decl and def to do this more aggressively]

I don't know if it fits your use case, but at the cost of flag proliferation - what about adding a flag to control this behavior?

1) How often is system STL compiled with debugging
2) Do users really mix gcc+clang in the wild that often? I can see this being the case for system level things, but for pure userland stuff I suspect they may build the whole stack with clang. (When/if they are using clang)

I'm biased and I doubt my vote counts for much, but I'd go for full optimizations without penalty either by default with a flag to turn it off or a flag which enables it. If I'm reading the above correctly - it would avoid any need to do complex "tests".

I'd agree. On FreeBSD and OS X, the system compiler is clang and so there's little danger of mixing clang and gcc-compiled code, so we'd rather not pay the penalty, unless the user knows that they are linking against a gcc-compiled library (we're down to around 0.3% of ports not compiling with clang on FreeBSD though, so that's increasingly rare...).

David

So, a while back I had started working on a debug info type optimization
(similar to the vtable optimization that avoids emitting type information
for dynamic classes except in the TUs that emit the vtable (thus
drastically reducing debug info size)) because I'd mistaken GCC's vtable
optimization for this other one.

The basic idea is that, since any program that contains an explicit
template instantiation declaration and ODR uses that type must also have an
explicit template instantiation definition - we could leverage this to
avoid emitting the full debug info for that type in the TUs that contain
the explicit declaration and forcibly emit the type where the explicit
definition is.

Easy.

Except there's a wrinkle. We need to interoperate with GCC's debug info
(for things like the standard library). This means we can't make stronger
assumptions about where the debug info will be emitted than GCC will
provide.

For example:

template<typename T> struct foo { int i; }
template struct foo<int>;

Doesn't cause GCC to emit debug info for foo<int>. But if you add a "void
mem() { }" to 'foo', then it does (since the definition of the
foo<int>::mem function has to be emitted, along with the debug info for
that and thus the debug info for its enclosing class... ).

So, what I'd like to be able to do is come up with some conservative test
for "will this explicit template instantiation actually cause GCC to emit
the debug info for the type" one such "sufficient by not necessarily
necessary" Condition for this would be "are there any member function
definitions prior to the explicit instantiation decl". But I'm having
trouble writing that test - my cursory attempt to walk the methods of the
CXXRecordDecl and test that they were defined failed.

I'm guessing this failed because Clang isn't instantiating the
definitions of the member functions because it doesn't need to (though it
is allowed to, well, for the inline ones - there could be out of line
definitions I could leverage here too)?

Is there a good/better/existing way I could write this test?

[if/when we implement this I'd still like to more aggressively emit the
definition in the presence of even trivial explicit instantiation
definitions where GCC currently doesn't - so that maybe in the future we
can remove this extra check and just rely on the decl and def to do this
more aggressively]

I don't know if it fits your use case, but at the cost of flag
proliferation - what about adding a flag to control this behavior?

Possibly/probably - though I'd still be inclined to default to the
conservative emission for now for a few reasons.

We've already got a flag for "assume I'm only building this object file
with debug info" which would disable this optimization entirely, but we
don't have one for "assume I'm building other object files with a different
(possibly older) compiler for debug info" which is what this would amount
to (and be tricky, since it's sort of a versioned flag, should we ever come
up with other optimizations like this they'd each have to go under separate
or versioned flags).

1) How often is system STL compiled with debugging

Essentially "all the time" and this is necessary today due to the vtable
optimization I alluded to earlier. If you don't have a debug build of the
STL you won't be able to debug certain programs because they rely on the
debug info to be in the STL, not in the program that uses it. (an STL type
with a vtable that's only emitted in the library and not in clients
(std::fstream is a good example - it has a virtual base and an explicit
instantiation declaration in the header (this is where I misunderstood/got
the idea for the optimization I'm now discussing) and an explicit
instantiation definition in the STL, if you only compile your program (and
not the STL) with debug info you'll only see a declaration for
std::fstream, never a definition))

2) Do users really mix gcc+clang in the wild that often? I can see this
being the case for system level things, but for pure userland stuff I
suspect they may build the whole stack with clang. (When/if they are using
clang)

Again, essentially "all the time" since users pick up debug builds of
libraries from their OS/distribution.

Also realize that this wouldn't just apply to GCC but to any version of
Clang prior to the one in which I add this functionality. It's not just GCC
that emits no debug info for foo<int> in the above example - Clang doesn't
either. Not until we improve it to do so. Then we'll still have version
problems until all OSs are building their packages with a version of Clang
that has this feature on.

I'm biased and I doubt my vote counts for much, but I'd go for full
optimizations without penalty either by default with a flag to turn it off
or a flag which enables it. If I'm reading the above correctly - it would
avoid any need to do complex "tests".

Possibly, though I think this might rule it out for a lot of users. Only on
new platforms where you didn't have many users building code for older
platforms.

- David