Catching Temporarily Used Types in Debug Info

dblaikie · March 3, 2014, 4:41pm

We have a few bugs like ( http://llvm.org/bugs/show_bug.cgi?id=19005 )
in debug info that all stem from the same basic problem:

A type that's not referenced by another debug entity (such as a
variable, parameter, function, etc) is not emitted in the debug info.

GCC, while not being wholely consistent at addressing this, does get
it 'more' right. I'd like Clang to be better at this as well, even if
we're not perfect.

The basic premise to implement this perfectly would be: If we're
emitting code for a function (or global variable initializer, etc) and
within that function a certain type is required to be complete, emit
the type (and include it in the "retained types list").

Does anyone have nice ideas on how we could realistically implement
that test (yeah, I'm mostly looking at Richard on this) or a rough
approximation that might get the 90% case?

And as a bonus it'd probably be good if we didn't do this for cases
where we already successfully emit the type (eg: if it's required to
be complete because there's a variable of that type we'd rather not
add that type to the retained types list needlessly... but there are
some wrinkles there)

Some examples:

Neither GCC nor Clang emit 'foo' here:

struct foo {
foo();
int i;
};

void func(int);

void func() {
func(foo().i + 1);
}

int main() {
}

but GCC does emit it if the foo ctor called is not the default (
declare 'foo(int)' and use 'foo(3).i' instead of 'foo().i').

And the previously linked bug amounts to "func(((foo*)v)->i + 1);"
(for some "void* v") - in which GCC does emit 'foo' and Clang does
not.

So, yes - nothing terribly consistent. It would nice to do better
(better than we are, maybe even better than GCC, or at least more
consistent).

adrian.prantl · March 3, 2014, 6:37pm

Hi David!

Slightly tangential, but this also reminds me that I eventually was going to send out a proposal for debug info for Modules/PCH. Having access to the serialized AST in its entirety would eliminate problems like this nicely. DWARF is extended by an external AST type DIE that is merely a USR-based index into the corresponding module file. Everything that is not explicitly referenced in the DWARF can still be found in the module/PCH and read via libclang…

We have a few bugs like ( 19005 – DW_TAG_typedef and DW_TAG_structure_type missing )
in debug info that all stem from the same basic problem:

A type that’s not referenced by another debug entity (such as a
variable, parameter, function, etc) is not emitted in the debug info.

GCC, while not being wholely consistent at addressing this, does get
it ‘more’ right. I’d like Clang to be better at this as well, even if
we’re not perfect.

The basic premise to implement this perfectly would be: If we’re
emitting code for a function (or global variable initializer, etc) and
within that function a certain type is required to be complete, emit
the type (and include it in the “retained types list”).

Does anyone have nice ideas on how we could realistically implement
that test (yeah, I’m mostly looking at Richard on this) or a rough
approximation that might get the 90% case?

The idea would be for the fronted to register a record type as debug-info-retained whenever, e.g., it is calculating the record’s memory layout (or similar)?

I guess this is really a question for the fronted p

adrian.prantl · March 3, 2014, 6:37pm

Hi David!

Slightly tangential, but this also reminds me that I eventually was going to send out a proposal for debug info for Modules/PCH. Having access to the serialized AST in its entirety would eliminate problems like this nicely. DWARF is extended by an external AST type DIE that is merely a USR-based index into the corresponding module file. Everything that is not explicitly referenced in the DWARF can still be found in the module/PCH and read via libclang...

We have a few bugs like ( 19005 – DW_TAG_typedef and DW_TAG_structure_type missing )
in debug info that all stem from the same basic problem:

A type that's not referenced by another debug entity (such as a
variable, parameter, function, etc) is not emitted in the debug info.

GCC, while not being wholely consistent at addressing this, does get
it 'more' right. I'd like Clang to be better at this as well, even if
we're not perfect.

The basic premise to implement this perfectly would be: If we're
emitting code for a function (or global variable initializer, etc) and
within that function a certain type is required to be complete, emit
the type (and include it in the "retained types list").

Does anyone have nice ideas on how we could realistically implement
that test (yeah, I'm mostly looking at Richard on this) or a rough
approximation that might get the 90% case?

The idea would be for the fronted to register a record type as debug-info-retained whenever, e.g., it is calculating the record's memory layout (or similar)?

I guess this is really a question for the frontend people

And as a bonus it'd probably be good if we didn't do this for cases
where we already successfully emit the type (eg: if it's required to
be complete because there's a variable of that type we'd rather not
add that type to the retained types list needlessly... but there are
some wrinkles there)

Some examples:

Neither GCC nor Clang emit 'foo' here:

struct foo {
foo();
int i;
};

void func(int);

void func() {
func(foo().i + 1);
}

int main() {
}

but GCC does emit it if the foo ctor called is not the default (
declare 'foo(int)' and use 'foo(3).i' instead of 'foo().i').

And the previously linked bug amounts to "func(((foo*)v)->i + 1);"
(for some "void* v") - in which GCC does emit 'foo' and Clang does
not.

So, yes - nothing terribly consistent. It would nice to do better
(better than we are, maybe even better than GCC, or at least more
consistent).

-- adrian

echristo · March 3, 2014, 6:41pm

Hi David!

Slightly tangential, but this also reminds me that I eventually was going to send out a proposal for debug info for Modules/PCH. Having access to the serialized AST in its entirety would eliminate problems like this nicely. DWARF is extended by an external AST type DIE that is merely a USR-based index into the corresponding module file. Everything that is not explicitly referenced in the DWARF can still be found in the module/PCH and read via libclang...

In general this is an insufficient solution to the problem of modules
debugging. It revlocks the compiler and the debugger and isn't
suitable for long term archival purposes. It's a "handy if there,
possibly", but unlikely to be more than an occasionally useful
extension.

-eric

dblaikie · March 3, 2014, 7:17pm

Hence this thread

What you described is essentially "required complete type" which is a thing
we already know - but we don't really want to go around emitting every type
that is required to be complete. Consider this header:

struct foo {
...
};
inline void func() {
foo f;
...
}

Now any translation unit that includes that header, even if it doesn't use
'foo' at all and never calls 'func', would emit foo (since foo is required
to be complete due to the 'f' variable in the 'func' function). Now
consider that Sema.h looks exactly like this.

An example use where we wouldn't want to emit 'foo's definition:

bar.h:
#include "foo.h"
class bar {
  foo *f;
public:
  bar();
  int compute_thing_with_foo();
};

The out of line member function and ctor use the full definition of 'foo',
but 'bar' does not, neither do clients of 'bar' need to. It'd be nice if we
didn't emit the full definition of 'foo' here.

Richard and I talked through some of this a bit and considered two solutions

1) essentially what we had before I made the change to power
-flimit-debug-info/-fstandalone-debug (prior to the vtable optimization
going in there too) by "requiresCompleteType": callback during Clang's
IRGen for various AST constructs that we know require types to be complete,
and emit debug info for those types. This is a bit painful since we
essentially ad-hoc implement complete type detection again... - but I might
be OK with this going into it knowing how/why we have to do this and taking
a somewhat more systematic approach.

2) Have a callback at every point a type was required to be complete, even
if it was already complete (currently we just have a callback on the
instnant the type is first required to be complete) and if necessary,
record the context in which that type was required to be complete (eg: as a
member of another type, as a use in a function, etc). Then if/when we emit
that contextual decl, check if any types were associated with it and emit
those - recursively expanding (since the associated type might itself be
the contextual decl for some other type).

I'm erring towards the latter, though we know in both cases there might be
gotchas for all manner of things that we need to blacklist/whitelist to get
to a good place...

- David

adrian.prantl · March 3, 2014, 7:24pm

Hi David!

Slightly tangential, but this also reminds me that I eventually was going to send out a proposal for debug info for Modules/PCH. Having access to the serialized AST in its entirety would eliminate problems like this nicely. DWARF is extended by an external AST type DIE that is merely a USR-based index into the corresponding module file. Everything that is not explicitly referenced in the DWARF can still be found in the module/PCH and read via libclang...

In general this is an insufficient solution to the problem of modules
debugging. It revlocks the compiler and the debugger and isn't
suitable for long term archival purposes. It's a "handy if there,
possibly", but unlikely to be more than an occasionally useful
extension.

Right, there are actually more nuances to the module debugging story
What I described above was the “development”-use-case.
I just looked at my notes from when we last discussed this, and I believe we came to the conclusion that for the long-term archival scenario clang should emit full DWARF for an entire module when the module is created. On Darwin, for instance, the DWARF for the module would then be archived by dsymutil.
A consumer would first attempt to directly load types from the module, if that fails (e.g., because of version mismatch) it can reconstruct the type from DWARF.

-- adrian

echristo · March 3, 2014, 7:26pm

Hi David!

Slightly tangential, but this also reminds me that I eventually was going to send out a proposal for debug info for Modules/PCH. Having access to the serialized AST in its entirety would eliminate problems like this nicely. DWARF is extended by an external AST type DIE that is merely a USR-based index into the corresponding module file. Everything that is not explicitly referenced in the DWARF can still be found in the module/PCH and read via libclang...

In general this is an insufficient solution to the problem of modules
debugging. It revlocks the compiler and the debugger and isn't
suitable for long term archival purposes. It's a "handy if there,
possibly", but unlikely to be more than an occasionally useful
extension.

Right, there are actually more nuances to the module debugging story
What I described above was the "development"-use-case.
I just looked at my notes from when we last discussed this, and I believe we came to the conclusion that for the long-term archival scenario clang should emit full DWARF for an entire module when the module is created. On Darwin, for instance, the DWARF for the module would then be archived by dsymutil.
A consumer would first attempt to directly load types from the module, if that fails (e.g., because of version mismatch) it can reconstruct the type from DWARF.

Yep. I think we'll need to extend a few things but hopefully not too much.

-eric

rnk · March 3, 2014, 8:09pm

I'm surprised this isn't easy to do in CodeGenTypes. I don't think Sema
has any knowledge of which decls are going to be emitted at CodeGen time,
so the only way to really get this right would be to mirror the CodeGen
logic of marking things for deferred emission.

zygoloid · March 3, 2014, 9:07pm

Hi David!

Slightly tangential, but this also reminds me that I eventually was going
to send out a proposal for debug info for Modules/PCH. Having access to the
serialized AST in its entirety would eliminate problems like this nicely.
DWARF is extended by an external AST type DIE that is merely a USR-based
index into the corresponding module file. Everything that is not explicitly
referenced in the DWARF can still be found in the module/PCH and read via
libclang...

> We have a few bugs like ( 19005 – DW_TAG_typedef and DW_TAG_structure_type missing )
> in debug info that all stem from the same basic problem:
>
> A type that's not referenced by another debug entity (such as a
> variable, parameter, function, etc) is not emitted in the debug info.
>
> GCC, while not being wholely consistent at addressing this, does get
> it 'more' right. I'd like Clang to be better at this as well, even if
> we're not perfect.
>
> The basic premise to implement this perfectly would be: If we're
> emitting code for a function (or global variable initializer, etc) and
> within that function a certain type is required to be complete, emit
> the type (and include it in the "retained types list").
>
> Does anyone have nice ideas on how we could realistically implement
> that test (yeah, I'm mostly looking at Richard on this) or a rough
> approximation that might get the 90% case?

The idea would be for the fronted to register a record type as
debug-info-retained whenever, e.g., it is calculating the record's memory
layout (or similar)?

I guess this is really a question for the frontend people

Hence this thread

What you described is essentially "required complete type" which is a
thing we already know - but we don't really want to go around emitting
every type that is required to be complete. Consider this header:

struct foo {
...
};
inline void func() {
  foo f;
  ...
}

Now any translation unit that includes that header, even if it doesn't use
'foo' at all and never calls 'func', would emit foo (since foo is required
to be complete due to the 'f' variable in the 'func' function). Now
consider that Sema.h looks exactly like this.

An example use where we wouldn't want to emit 'foo's definition:

bar.h:
#include "foo.h"
class bar {
  foo *f;
public:
  bar();
  int compute_thing_with_foo();
};

The out of line member function and ctor use the full definition of 'foo',
but 'bar' does not, neither do clients of 'bar' need to. It'd be nice if we
didn't emit the full definition of 'foo' here.

Richard and I talked through some of this a bit and considered two
solutions

1) essentially what we had before I made the change to power
-flimit-debug-info/-fstandalone-debug (prior to the vtable optimization
going in there too) by "requiresCompleteType": callback during Clang's
IRGen for various AST constructs that we know require types to be complete,
and emit debug info for those types. This is a bit painful since we
essentially ad-hoc implement complete type detection again... - but I might
be OK with this going into it knowing how/why we have to do this and taking
a somewhat more systematic approach.

2) Have a callback at every point a type was required to be complete, even
if it was already complete (currently we just have a callback on the
instnant the type is first required to be complete) and if necessary,
record the context in which that type was required to be complete (eg: as a
member of another type, as a use in a function, etc). Then if/when we emit
that contextual decl, check if any types were associated with it and emit
those - recursively expanding (since the associated type might itself be
the contextual decl for some other type).

As a nuance here, we only need the extra cost here if we're in a context
that we don't know for sure that we're going to emit (inside an inline
function definition or a class definition). Inside a normal function
definition, we can just directly register the type to have debug info
emitted, if we've not already done so.

Topic		Replies	Views
RFC: Up front type information generation in clang and llvm LLVM Dev List Archives	35	231	May 11, 2016
Plans for module debugging LLDB	23	148	December 1, 2014
Debug Info Generation in Clang. Clang Frontend	15	187	May 1, 2008
[RFC][PATCH] Keep un-canonicalized template types in the debug information LLDB	35	250	September 24, 2014
r222220 causes real debug-info bloat Clang Frontend	11	86	May 6, 2015

Catching Temporarily Used Types in Debug Info

Related topics