[RFC][PATCH] Keep un-canonicalized template types in the debug information

>From: David Blaikie [mailto:dblaikie@gmail.com]
>> From: David Blaikie [mailto:dblaikie@gmail.com]
>> > > From: David Blaikie [mailto:dblaikie@gmail.com]
>> > > > > > >
>> > > > > > > From the debugger's standpoint, the functional concern is
that if you do
>> > > > > > something more real, like:
>> > > > > > >
>> > > > > > > typedef int A;
>> > > > > > > template <typename T>
>> > > > > > > struct S
>> > > > > > > {
>> > > > > > > T my_t;
>> > > > > > > };
>> > > > > > >
>> > > > > > > I want to make sure that the type of my_t is given as "A"
not as "int".
>> > > > > > The reason for that is that it is not uncommon to have data
formatters
>> > > > > > that trigger off the typedef name. This happens when you use
some common
>> > > > > > underlying type like "int" but the value has some special
meaning when it
>> > > > > > is formally an "A", and you want to use the data formatters
to give it an
>> > > > > > appropriate presentation. Since the data formatters work by
matching type
>> > > > > > name, starting from the most specific on down, it is
important that the
>> > > > > > typedef name be preserved.
>> > > > > > >
>> > > > > > > However, it would be really odd to see:
>> > > > > > >
>> > > > > > > (lldb) expr -T -- my_s
>> > > > > > > (S<int>) $1 = {
>> > > > > > > (A) my_t = 5
>> > > > > > > }
>> > > > > > >
>> > > > > > > instead of:
>> > > > > > >
>> > > > > > > (lldb) expr -T -- my_s
>> > > > > > > (S<A>) $1 = {
>> > > > > > > (A) my_t = 5
>> > > > > > > }
>> > > > > > >
>> > > > > > > so I am in favor of presenting the template parameter type
with the most
>> > > > > > specific name it was given in the overall template type name.
>> > > > > >
>> > > > > > OK, we get this wrong today. I’ll try to look into it.
>> > > > > >
>> > > > > > What’s your take on the debug info representation for the
templated class
>> > > > > > type? The tentative patch introduces a typedef that declares
S<A> as a
>> > > > > > typedef for S<int>. The typedef doesn’t exist in the code,
thus I find it
>> > > > > > a bit of a lie to the debugger. I was more in favour of
something like :
>> > > > > >
>> > > > > > DW_TAG_variable
>> > > > > > DW_AT_type: -> DW_TAG_structure_type
>> > > > > > DW_AT_name: S<A>
>> > > > > > DW_AT_specification: -> DW_TAG_structure_type
>> > > > > > DW_AT_name: S<int>
>> > > > > >
>> > > > > > This way the canonical type is kept in the debug information,
and the
>> > > > > > declaration type is a real class type aliasing the canonical
type. But I’m
>> > > > > > not sure debuggers can digest this kind of aliasing.
>> > > > > >
>> > > > > > Fred
>> > > > >
>> > > > > Why introduce the extra typedef? S<A> should have a template
parameter
>> > > > > entry pointing to A which points to int. The info should all
be there
>> > > > > without any extra stuff. Or if you think something is missing,
please
>> > > > > provide a more complete example.
>> > > > My immediate concern here would be either loss of information or
bloat
>> > > > when using that with type units (either bloat because each
instantiation
>> > > > with differently spelled (but identical) parameters is treated as
a separate
>> > > > type - or loss when the types are considered the same and all but
one are
>> > > > dropped at link time)
>> > > You'll need to unpack that more because I'm not following the
concern.
>> > > If the typedefs are spelled differently, don't they count as
different types?
>> > > DWARF wants to describe the program as-written, and there's no
S<int> written
>> > > in the program.
>> > >
>> > > Maybe not in this TU, but possibly in another TU? Or by the user.
>> > >
>> > > void func(S<int>);
>> > > ...
>> > > typedef int A;
>> > > S<A> s;
>> > > func(s); // calls the same function
>> > >
>> > > The user probably wants to be able to call void func with S<int> or
S<A>
>> > Sure.
>> >
>> > > (and, actually, in theory, with S<B> where B is another typedef of
int, but
>> > > that'll /really/ require DWARF consumer support and/or new DWARF
wording).
>> >
>> > Not DWARF wording. DWARF doesn't say when you can and can't call
something;
>> > that's a debugger feature and therefore a debugger decision.
>> >
>> What I mean is we'd need some new DWARF to help explain which types are
>> equivalent (or the debugger would have to do a lot of spelunking to try
>> to find structurally equivalent types - "S<B>" and "S<A>", go look
through
>> their DW_TAG_template_type_params, see if they are typedefs to the same
>> underlying type, etc... )
>> >
>> >
>> > > We can't emit these as completely independent types - it would be
verbose
>> > > (every instantiation with different typedefs would be a whole
separate type
>> > > in the DWARF, not deduplicated by type units, etc) and wrong
>> >
>> > Yes, "typedef int A;" creates a synonym/alias not a new type, so S<A>
and S<int>
>> > describe the same type from the C++ perspective, so you don't want
two complete
>> > descriptions with different names, because that really would be
describing them
>> > as separate types. What wrinkles my brow is having S<int> be the
"real"
>> > description even though it isn't instantiated that way in the
program. I wonder
>> > if it should be marked artificial... but if you do instantiate S<int>
in another
>> > TU then you don't want that. Huh. It also seems weird to have this:
>> > DW_TAG_typedef
>> > DW_AT_name "S<A>"
>> > DW_AT_type -> S<int>
>> > but I seem to be coming around to thinking that's the most viable way
to have
>> > a single actual instantiated type, and still have the correct names
of things
>*mostly* correct; this still loses "A" as the type of the data member.
>
>For the DW_TAG_template_type_parameter, you mean? No, it wouldn't.
>
> (as a side note, if you do actually have a data member (or any other
mention) of
>the template parameter type, neither Clang nor GCC really get that
'right' -
>"template<typename T> struct foo { T t; }; foo<int> f;" - in both Clang
and GCC,
>the type of the 't' member of foo<int> is a direct reference to the "int"
DIE, not
>to the DW_TAG_template_type_parameter for "T" -> int)

Huh. And DWARF doesn't say you should point to the
template_type_parameter...
I thought it did, but no. Okay, so nothing is lost, but it feels desirable
to me, that uses of the template parameter should cite it in the DWARF as
well.
But I guess we can leave that part of the debate for another time.

>
>Crud.
>But I haven't come up with a way to get that back without basically
instantiating
>S<A> and S<int> separately.
>
>> >
>> Yep - it's the only way I can think of giving this information in a way
that's
>> likely to work with existing consumers. It would probably be harmless
to add
>> DW_AT_artificial to the DW_TAG_typedef, if that's any help to any debug
info
>> consumer.
>
>Hmmm no, S<A> is not the artificial name;
>
>It's not the artificial name, but it is an artificial typedef.

If the source only says S<A>, then the entire S<int> description is
artificial,
because *that's not what the user wrote*. So both the typedef and the
class type
are artificial. Gah. Let's forget artificial here.

>
>some debuggers treat DW_AT_artificial
>as meaning "don't show this to the user."
>
>In some sense that's what I want - we never wrote the typedef in the
source
>so I wouldn't want to see it rendered in the "list of typedefs" (or even
>probably in the list of types, maybe).
>
>But S<A> is the name we *do* want to
>show to the user.
>
>Maybe. Sometimes. But there could be many such aliases for the type. (&
many
>more that were never written in the source code, but are still valid in
the
>source language (every other typedef of int, every other way to name the
int
>type (decltype, etc)))

But you *lose* cases where the typedef is the *same* *everywhere*. And in
many cases that typedef is a valuable thing, not the trivial rename we've
been bandying about. This is a more real example:

typedef int int4 __attribute__((ext_vector_type(4)));
template<typename T> struct TypeTraits {};
template<>
struct TypeTraits<int4> {
  static unsigned MysteryNumber;
};
unsigned TypeTraits<int4>::MysteryNumber = 3U;

Displaying "TypeTraits<int __attribute__((ext_vector_type(4)))>" is much
worse than "TypeTraits<int4>" (and not just because it's shorter).
More to the point, having the debugger *complain* when the user says
something like "ptype TypeTraits<int4>" is a problem.

Reducing debug-info size is a worthy goal, but don't degrade the debugging
experience to get there.

I'm not sure which part of what I've said seemed like a suggestion to
degrade the debugging experience to minimize debug info size (the
proposition that we should use a typedef or other alias on top of the
canonical type? It wouldn't cause "ptype TypeTraits<int4>" to complain -
indeed for GDB ptyping a typedef gives /exactly/ the same output as if you
ptype the underlying type - it doesn't even mention that there's a typedef
involved:

typedef fooA foo<int>;

>From: David Blaikie [mailto:dblaikie@gmail.com]
>> From: David Blaikie [mailto:dblaikie@gmail.com]
>> > > From: David Blaikie [mailto:dblaikie@gmail.com]
>> > > > > > >
>> > > > > > > From the debugger's standpoint, the functional concern is
that if you do
>> > > > > > something more real, like:
>> > > > > > >
>> > > > > > > typedef int A;
>> > > > > > > template <typename T>
>> > > > > > > struct S
>> > > > > > > {
>> > > > > > > T my_t;
>> > > > > > > };
>> > > > > > >
>> > > > > > > I want to make sure that the type of my_t is given as "A"
not as "int".
>> > > > > > The reason for that is that it is not uncommon to have data
formatters
>> > > > > > that trigger off the typedef name. This happens when you
use some common
>> > > > > > underlying type like "int" but the value has some special
meaning when it
>> > > > > > is formally an "A", and you want to use the data formatters
to give it an
>> > > > > > appropriate presentation. Since the data formatters work by
matching type
>> > > > > > name, starting from the most specific on down, it is
important that the
>> > > > > > typedef name be preserved.
>> > > > > > >
>> > > > > > > However, it would be really odd to see:
>> > > > > > >
>> > > > > > > (lldb) expr -T -- my_s
>> > > > > > > (S<int>) $1 = {
>> > > > > > > (A) my_t = 5
>> > > > > > > }
>> > > > > > >
>> > > > > > > instead of:
>> > > > > > >
>> > > > > > > (lldb) expr -T -- my_s
>> > > > > > > (S<A>) $1 = {
>> > > > > > > (A) my_t = 5
>> > > > > > > }
>> > > > > > >
>> > > > > > > so I am in favor of presenting the template parameter type
with the most
>> > > > > > specific name it was given in the overall template type name.
>> > > > > >
>> > > > > > OK, we get this wrong today. I’ll try to look into it.
>> > > > > >
>> > > > > > What’s your take on the debug info representation for the
templated class
>> > > > > > type? The tentative patch introduces a typedef that declares
S<A> as a
>> > > > > > typedef for S<int>. The typedef doesn’t exist in the code,
thus I find it
>> > > > > > a bit of a lie to the debugger. I was more in favour of
something like :
>> > > > > >
>> > > > > > DW_TAG_variable
>> > > > > > DW_AT_type: -> DW_TAG_structure_type
>> > > > > > DW_AT_name: S<A>
>> > > > > > DW_AT_specification: -> DW_TAG_structure_type
>> > > > > > DW_AT_name: S<int>
>> > > > > >
>> > > > > > This way the canonical type is kept in the debug
information, and the
>> > > > > > declaration type is a real class type aliasing the canonical
type. But I’m
>> > > > > > not sure debuggers can digest this kind of aliasing.
>> > > > > >
>> > > > > > Fred
>> > > > >
>> > > > > Why introduce the extra typedef? S<A> should have a template
parameter
>> > > > > entry pointing to A which points to int. The info should all
be there
>> > > > > without any extra stuff. Or if you think something is
missing, please
>> > > > > provide a more complete example.
>> > > > My immediate concern here would be either loss of information or
bloat
>> > > > when using that with type units (either bloat because each
instantiation
>> > > > with differently spelled (but identical) parameters is treated
as a separate
>> > > > type - or loss when the types are considered the same and all
but one are
>> > > > dropped at link time)
>> > > You'll need to unpack that more because I'm not following the
concern.
>> > > If the typedefs are spelled differently, don't they count as
different types?
>> > > DWARF wants to describe the program as-written, and there's no
S<int> written
>> > > in the program.
>> > >
>> > > Maybe not in this TU, but possibly in another TU? Or by the user.
>> > >
>> > > void func(S<int>);
>> > > ...
>> > > typedef int A;
>> > > S<A> s;
>> > > func(s); // calls the same function
>> > >
>> > > The user probably wants to be able to call void func with S<int>
or S<A>
>> > Sure.
>> >
>> > > (and, actually, in theory, with S<B> where B is another typedef of
int, but
>> > > that'll /really/ require DWARF consumer support and/or new DWARF
wording).
>> >
>> > Not DWARF wording. DWARF doesn't say when you can and can't call
something;
>> > that's a debugger feature and therefore a debugger decision.
>> >
>> What I mean is we'd need some new DWARF to help explain which types are
>> equivalent (or the debugger would have to do a lot of spelunking to try
>> to find structurally equivalent types - "S<B>" and "S<A>", go look
through
>> their DW_TAG_template_type_params, see if they are typedefs to the same
>> underlying type, etc... )
>> >
>> >
>> > > We can't emit these as completely independent types - it would be
verbose
>> > > (every instantiation with different typedefs would be a whole
separate type
>> > > in the DWARF, not deduplicated by type units, etc) and wrong
>> >
>> > Yes, "typedef int A;" creates a synonym/alias not a new type, so
S<A> and S<int>
>> > describe the same type from the C++ perspective, so you don't want
two complete
>> > descriptions with different names, because that really would be
describing them
>> > as separate types. What wrinkles my brow is having S<int> be the
"real"
>> > description even though it isn't instantiated that way in the
program. I wonder
>> > if it should be marked artificial... but if you do instantiate
S<int> in another
>> > TU then you don't want that. Huh. It also seems weird to have this:
>> > DW_TAG_typedef
>> > DW_AT_name "S<A>"
>> > DW_AT_type -> S<int>
>> > but I seem to be coming around to thinking that's the most viable
way to have
>> > a single actual instantiated type, and still have the correct names
of things
>*mostly* correct; this still loses "A" as the type of the data member.
>
>For the DW_TAG_template_type_parameter, you mean? No, it wouldn't.
>
> (as a side note, if you do actually have a data member (or any other
mention) of
>the template parameter type, neither Clang nor GCC really get that
'right' -
>"template<typename T> struct foo { T t; }; foo<int> f;" - in both Clang
and GCC,
>the type of the 't' member of foo<int> is a direct reference to the
"int" DIE, not
>to the DW_TAG_template_type_parameter for "T" -> int)

Huh. And DWARF doesn't say you should point to the
template_type_parameter...
I thought it did, but no. Okay, so nothing is lost, but it feels
desirable
to me, that uses of the template parameter should cite it in the DWARF as
well.
But I guess we can leave that part of the debate for another time.

>
>Crud.
>But I haven't come up with a way to get that back without basically
instantiating
>S<A> and S<int> separately.
>
>> >
>> Yep - it's the only way I can think of giving this information in a
way that's
>> likely to work with existing consumers. It would probably be harmless
to add
>> DW_AT_artificial to the DW_TAG_typedef, if that's any help to any
debug info
>> consumer.
>
>Hmmm no, S<A> is not the artificial name;
>
>It's not the artificial name, but it is an artificial typedef.

If the source only says S<A>, then the entire S<int> description is
artificial,
because *that's not what the user wrote*. So both the typedef and the
class type
are artificial. Gah. Let's forget artificial here.

>
>some debuggers treat DW_AT_artificial
>as meaning "don't show this to the user."
>
>In some sense that's what I want - we never wrote the typedef in the
source
>so I wouldn't want to see it rendered in the "list of typedefs" (or even
>probably in the list of types, maybe).
>
>But S<A> is the name we *do* want to
>show to the user.
>
>Maybe. Sometimes. But there could be many such aliases for the type. (&
many
>more that were never written in the source code, but are still valid in
the
>source language (every other typedef of int, every other way to name the
int
>type (decltype, etc)))

But you *lose* cases where the typedef is the *same* *everywhere*. And in
many cases that typedef is a valuable thing, not the trivial rename we've
been bandying about. This is a more real example:

typedef int int4 __attribute__((ext_vector_type(4)));
template<typename T> struct TypeTraits {};
template<>
struct TypeTraits<int4> {
  static unsigned MysteryNumber;
};
unsigned TypeTraits<int4>::MysteryNumber = 3U;

Displaying "TypeTraits<int __attribute__((ext_vector_type(4)))>" is much
worse than "TypeTraits<int4>" (and not just because it's shorter).
More to the point, having the debugger *complain* when the user says
something like "ptype TypeTraits<int4>" is a problem.

Reducing debug-info size is a worthy goal, but don't degrade the debugging
experience to get there.

I'm not sure which part of what I've said seemed like a suggestion to
degrade the debugging experience to minimize debug info size (the
proposition that we should use a typedef or other alias on top of the
canonical type? It wouldn't cause "ptype TypeTraits<int4>" to complain -
indeed for GDB ptyping a typedef gives /exactly/ the same output as if you
ptype the underlying type - it doesn't even mention that there's a typedef
involved:

typedef fooA foo<int>;

(keyboard shortcuts are hard - accidentally sent before I finished)
(gdb) ptype fooA
type = struct foo<int> [with T = int] {
    <no data fields>
}

But in any case, I think what I'm saying boils down to:

Short of changing debug info consumers, I think the only thing we can do is
DW_TAG_typedef. That'll work for existing consumers.

Anything else will need possibly new DWARF wording, or at least an
agreement between a variety of debug info consumers and producers that some
new cliche/use of existing DWARF be used to describe these situations.

I could be wrong - if someone wants to try prototyping the
DW_TAG_structure_type proposal Fred had and see if existing debuggers work
with that, sure.

I'm not opposed to someone coming up with a standardizable more descriptive
form than DW_TAG_typedef, but that conversation probably needs to happen
with the DWARF Committee more than the LLVM community.

- David

David,

Sorry, thought you were protesting the typedef idea as interfering with deduplication or type-unit commonality.

So to recap, if we have source like this:

typedef int A;

template struct S { T member; };

S s_a;

then we’ll get

DW_TAG_typedef

DW_AT_name “A”

DW_AT_type → int

DW_TAG_structure_type

DW_AT_name “S

DW_TAG_member

DW_AT_name “member”

DW_AT_type → int // or the typedef for “A” ?

DW_TAG_template_type_parameter

DW_AT_name “T”

DW_AT_type → (the typedef for “A”)

DW_TAG_variable

DW_AT_name “s_a”

DW_AT_type → (the above structure_type)

Yes?

–paulr

David,

Sorry, thought you were protesting the typedef idea as interfering with
deduplication or type-unit commonality.

So to recap, if we have source like this:

typedef int A;

template<typename T> struct S { T member; };

S<A> s_a;

then we'll get

DW_TAG_typedef

  DW_AT_name "A"

  DW_AT_type -> int

DW_TAG_structure_type

  DW_AT_name "S<A>"

  DW_TAG_member

    DW_AT_name "member"

    DW_AT_type -> int // or the typedef for "A" ?

  DW_TAG_template_type_parameter

    DW_AT_name "T"

    DW_AT_type -> (the typedef for "A")

Are you suggesting putting the rest of S<int> here too? Or how would S<A>
refer to S<int> for the rest of the implementation?

DW_TAG_variable

  DW_AT_name "s_a"

  DW_AT_type -> (the above structure_type)

Ah, no - just a typedef of the template:

1: DW_TAG_structure_type // the debug info we already produce today (S<int>)
  ...

2: DW_TAG_typedef
       DW_AT_name "S<A>"
       DW_AT_type (1)

And honestly, the variable would still be of type (1).

Duplicating the entire type for each way of naming the same type is, I'm
fairly sure, not going to work for debuggers today. If someone wants to
propose a way of encoding this that will need new code/support from
debuggers, etc, then I feel the right venue to discuss that is the DWARF
committee - because you'll need buy-in from producers and consumers.
Without having that discussion, I believe just providing a typedef of the
template specialization is probably a benefit to users.

If we want to talk about a 'right' representation of this for DWARF that
would necessitate more substantial changes to both DWARF producers and
consumers... I think it'll be a bit more involved than even what you're
proposing. If we're going to deal with that, it'd be good to figure out how
to deal with all possible names for the type, even the ones the user hasn't
written (eg: typedef int A; typedef int B; and make sure that the debugger
can handle S<int>, S<A> and S<B> in their code, even though the user only
wrote one of those in the source).

The (limited) feedback I’ve had from the committee is along these lines.

If the program uses the type name “S” for something, the DWARF should fully describe the type named “S” because that’s the name as-in-the-source-program. If you use both S and S in the program in different places, then you need to describe both in the DWARF. There is sadly no standard way to associate the two as aliases. Yes in C++ they are the same; in standard DWARF they are not.

The typedef S => S hack might work [if the debugger can tolerate that]. It is obviously not a real typedef. You could mark it artificial as an indication that something funny is going on (artificial typedefs being highly atypical).

The DW_AT_specification hack is just wrong, because neither S nor S is completing the other.

I need to step back from the typedef hack. I believe our debugger throws away the on the theory that it can reconstruct them from template-parameter children; that is, the part of the name is redundant. The typedef hack does not provide those children, and the are not redundant, so this is likely to be a problem for us. Feh. I’d forgotten about that detail when I started liking the typedef hack. Yes, this means I don’t have a suggestion, apart from emitting things redundantly as needed to preserve as-in-the-source-program.

Here’s a bizarre data point. Going back to at least 3.2, Clang has emitted S instead of S. But with my vector example, it used to use the typedef name up through 3.4. That changed in 3.5, where the type name ‘int4’ has entirely disappeared from the DWARF. Clearly that’s a bug; the type name needs to be in there somewhere.

One more thing:

it’d be good to figure out how to deal with all possible names for the type, even the ones the user hasn’t written (eg: typedef int A; typedef int B; and make sure that the debugger can handle S, S and S in their code, even though the user only wrote one of those in the source).

The answer to this “how to deal” question is with debugger smarts, not more complicated DWARF. DWARF is about the program as-written and as-compiled, not about anything-the-user-might-conceivably-try-to-write-in-the-debugger. Handling this in DWARF is a combinatorial nightmare, for completely speculative purposes. Not gonna happen.

–paulr

The (limited) feedback I've had from the committee is along these lines.

If the program uses the type name "S<A>" for something, the DWARF should
fully describe the type named "S<A>" because that's the name
as-in-the-source-program. If you use both S<A> and S<int> in the program
in different places, then you need to describe both in the DWARF. There is
sadly no standard way to associate the two as aliases. Yes in C++ they are
the same; in standard DWARF they are not.

Yeah, I'm not sure I agree with this. I've seen the thread and I'm not sure
I like the logic.

The typedef S<A> => S<int> hack might work [if the debugger can tolerate
that]. It is obviously not a real typedef. You could mark it artificial as
an indication that something funny is going on (artificial typedefs being
highly atypical).

The DW_AT_specification hack is just wrong, because neither S<A> nor
S<int> is completing the other.

I need to step back from the typedef hack. I believe our debugger throws
away the <brackets> on the theory that it can reconstruct them from
template-parameter children; that is, the <bracket> part of the name is
redundant. The typedef hack does not provide those children, and the
<brackets> are not redundant, so this is likely to be a problem for us.
Feh. I'd forgotten about that detail when I started liking the typedef
hack. Yes, this means I don't have a suggestion, apart from emitting
things redundantly as needed to preserve as-in-the-source-program.

Here's a bizarre data point. Going back to at least 3.2, Clang has
emitted S<int> instead of S<A>. But with my vector example, it used to use
the typedef name up through 3.4. That changed in 3.5, where the type name
'int4' has entirely disappeared from the DWARF. Clearly that's a bug; the
type name needs to be in there somewhere.

One more thing:

it'd be good to figure out how to deal with all possible names for the
type, even the ones the user hasn't written (eg: typedef int A; typedef int
B; and make sure that the debugger can handle S<int>, S<A> and S<B> in
their code, even though the user only wrote one of those in the source).

The answer to this "how to deal" question is with debugger smarts, not
more complicated DWARF. DWARF is about the program as-written and
as-compiled, not about
anything-the-user-might-conceivably-try-to-write-in-the-debugger. Handling
this in DWARF is a combinatorial nightmare, for completely speculative
purposes. Not gonna happen.

I think it comes down to how the information is planning to be used. A
consumer with the dwarf information today could, in fact, get to the S<int>
type from a user who types S<A> pretty easily right? Now if you'd like a
way to print out the textual representation of every type as it was used in
the program that's likely to be less possible without some serious
duplication of dwarf. You could use an unnamed type for the base and then
use DW_AT_specification with just a bare DW_AT_name to avoid some of the
unpleasantness of the specification hack, but then you come to the problem
of template arguments etc. It's fairly crazy to consider, but a user could
quite easily write:

new std::vector<int, allocator>()

with some allocator that was never used in the program with vector and
expect the code to be generated at run time and the rest of the type to be
found.

Anyhow, I think the best bet is for the most general type to be left in the
debug information and then the typedefs etc to be their own DIEs. Unless we
have some use that we're not talking about here?

-eric

I think it comes down to how the information is planning to be used. A consumer with the dwarf information today could, in fact, get to the S type from a user who types S pretty easily right?

If the typedef actually appears in the DWARF, the consumer could figure out what the user meant by typing S, yes. In my experiments the typedef is not always present, which leaves the user up a creek with no paddle.

How the debugger presents the types of things is also a consideration, however. This is more evident with a less trivial example, such as the vector typedef I described previously. It is clearly a step backward in the end-user debugging experience if people are used to seeing

S

which the debugger has been displaying all along, but suddenly they start seeing instead

S<int attribute((ext_vector_type(4)))>

which is what has started happening. Especially if ‘int4’ no longer appears as a typedef at all, this is Just Wrong.

Wolfgang did some bisection and traced this change to r205447, and the intent of that change was centered on default template arguments. This de-referencing of typedefs appears to have been an unintended side effect of that patch.

I want my typedef’d template parameters back please…

–paulr

I think it comes down to how the information is planning to be used. A
consumer with the dwarf information today could, in fact, get to the S<int>
type from a user who types S<A> pretty easily right?

If the typedef actually appears in the DWARF, the consumer could figure
out what the user meant by typing S<A>, yes.

And if the user typedef S<decltype(std::result_of<

  In my experiments the typedef is not always present, which leaves the
user up a creek with no paddle.

Sure, we could do better to emit types when mentioned in non-odr-use
contexts. This general area of improvement extends beyond this particular
use case. Generally the way both Clang and GCC work is by emitting debug
info for types from the point of reference (so we need to emit debug info
for a variable, thus we emit debug info for the type) - so types used in
non-odr-use contexts tend to get lost very easily. Emitting them all is
impossible, from a debug info size perspective (this is something like
GCC's -gfull, I assume/believe, which is not the default for good reason)
so we have to be a bit selective (and, essentially ad-hoc) about how we do
this. There are some principled ways I've thought about doing this but
they're non-trivial and just not a high priority for me right now.

  How the debugger presents the types of things is also a consideration,
however. This is more evident with a less trivial example, such as the
vector typedef I described previously. It is clearly a step backward in
the end-user debugging experience if people are used to seeing

    S<int4>

which the debugger has been displaying all along, but suddenly they start
seeing instead

    S<int __attribute__((ext_vector_type(4)))>

which is what has started happening.

Indeed, a tradeoff - trading off surprisingly verbose names for the ability
to print things like std::string using GDB's pretty printers. Without the
change, you'd get:

$1 = Python Exception <class 'gdb.error'> There is no member named
_M_dataplus.:

Which isn't really acceptable. Even if we were to consider this a GDB bug,
again - until it's fixed in GDB I think this is essentially too egregious
not to workaround in Clang. And even if we were to fix it in GDB, I'd still
be really concerned about the debug info size implications (especially for
users of type units who expect to be able to reduce debug info size by
removing duplicate types safely).

Especially if 'int4' no longer appears as a typedef at all, this is Just
Wrong.

Wolfgang did some bisection and traced this change to r205447, and the
intent of that change was centered on default template arguments. This
de-referencing of typedefs appears to have been an *unintended side effect*
of that patch.

Not entirely, as mentioned in the commit:

"That didn't seem to be a useful distinction, and omitting the template
arguments was destructive to debuggers being able to associate the two
types across translation units or across compilers (GCC, reasonably,
never omitted the arguments)."

I'm fairly sure that using the typedef in the names would break debuggers
in the same way (the debugger will fail to realize these are the same type,
thus not allowing calls to a function taking one argument with the other
type, etc).

  I want my typedef'd template parameters back please…

If you've a scheme that enables the above with existing debuggers, we
should totally consider it. I don't know of any such scheme short of
providing wrapper typedefs ("S<A>" -> "S<int>"), as proposed in the
original email (this doesn't solve the arbitrary case, though - of what to
do about S<B> where B is also a typedef of int). Are there other schemes
that would likely work with existing DWARF consumers?

Are there modifications to existing debug info consumers you think we
should encourage/push them to make? To split the name apart, lookup the
names (should they then be fully qualified, so they can be correctly
resolved? What if they're written in other forms of the source language
(such as decltype, etc - at which point the problem becomes parsing
arbitrary C++ code in any debug info consumer wanting to get reasonable
behavior)?) and then substitute - and do the same on any expression the
user uses (so they can use other typedefs/decltype/ways of writing the type
name)? That presents two challenges: 1) expecting all debug info consumers
to do this work. 2) the debug information size penalty - I'd check (2)
first, as I suspect it's just not a reasonable cost to pay, let alone the
work implied by (1).

Beyond that, I assume the discussion needs to go to the DWARF committee
(actually I think even the above expectation that debug info consumers have
to do name (and C++) parsing to behave reasonably probably should have
buy-in at the committee level).

The (limited) feedback I've had from the committee is along these lines.

If the program uses the type name "S<A>" for something, the DWARF should
fully describe the type named "S<A>" because that's the name
as-in-the-source-program. If you use both S<A> and S<int> in the program
in different places, then you need to describe both in the DWARF.

To the best of my knowledge, having spent the better part of 6 months
studying the size of LLVM's debug info, this is simply not a workable
solution. I doubt any DWARF producer of C++ would make this the default
behavior.

There is sadly no standard way to associate the two as aliases. Yes in C++
they are the same; in standard DWARF they are not.

The typedef S<A> => S<int> hack might work [if the debugger can tolerate

that]. It is obviously not a real typedef. You could mark it artificial as
an indication that something funny is going on (artificial typedefs being
highly atypical).

The DW_AT_specification hack is just wrong, because neither S<A> nor
S<int> is completing the other.

I need to step back from the typedef hack. I believe our debugger throws
away the <brackets> on the theory that it can reconstruct them from
template-parameter children; that is, the <bracket> part of the name is
redundant. The typedef hack does not provide those children, and the
<brackets> are not redundant, so this is likely to be a problem for us.
Feh. I'd forgotten about that detail when I started liking the typedef
hack. Yes, this means I don't have a suggestion, apart from emitting
things redundantly as needed to preserve as-in-the-source-program.

Here's a bizarre data point. Going back to at least 3.2, Clang has
emitted S<int> instead of S<A>. But with my vector example, it used to use
the typedef name up through 3.4. That changed in 3.5, where the type name
'int4' has entirely disappeared from the DWARF.

Yep, I think maybe I fixed that somewhere along the way - debug size used
to be a real problem for LLVM. I think I fixed that before I was looking at
size though, just because it broke GDB test cases - the names were
different so basic debugger expressions didn't work between translation
units, IIRC. I'd have to go back & check what the particular
failure/bug/motivation was, but I think it was a name mismatch between
libstdc++ (built with GCC) debug info and clang debug info since we weren't
using the canonical name.

[Yeah, went and checked - the issue was that clang would produce a
declaration of "basic_string<char>" which, since it had a distinct name
from the debug info for the definition (compiled into libstdc++, built by
GCC) for "basic_string<char, traits and allocator goop>" the debugger
didn't identify these as being the same type and thus printing an
expression of the declared type couldn't find the guts of how basic_string
works and the pretty printer would fail]

Clearly that's a bug; the type name needs to be in there somewhere.

One more thing:

it'd be good to figure out how to deal with all possible names for the
type, even the ones the user hasn't written (eg: typedef int A; typedef int
B; and make sure that the debugger can handle S<int>, S<A> and S<B> in
their code, even though the user only wrote one of those in the source).

The answer to this "how to deal" question is with debugger smarts, not
more complicated DWARF. DWARF is about the program as-written and
as-compiled, not about
anything-the-user-might-conceivably-try-to-write-in-the-debugger. Handling
this in DWARF is a combinatorial nightmare, for completely speculative
purposes. Not gonna happen.

The question is whether the debugger needs more information from the DWARF
to do its job. (& whether the compiler/linker/etc will have trouble due to
excessively large debug info... and whether clang debug info interoperates
with GCC debug info (at least for libstdc++, generally compiled with GCC,
this is extra important))

Robinson, Paul wrote:

I think it comes down to how the information is planning to be used. A
consumer with the dwarf information today could, in fact, get to the
S<int> type from a user who types S<A> pretty easily right?

If the typedef actually appears in the DWARF, the consumer could figure
out what the user meant by typing S<A>, yes. In my experiments the
typedef is not always present, which leaves the user up a creek with no
paddle.

How the debugger presents the types of things is also a consideration,
however. This is more evident with a less trivial example, such as the
vector typedef I described previously. It is clearly a step backward in
the end-user debugging experience if people are used to seeing

S<int4>

which the debugger has been displaying all along, but suddenly they
start seeing instead

S<int __attribute__((ext_vector_type(4)))>

which is what has started happening. Especially if 'int4' no longer
appears as a typedef at all, this is Just Wrong.

In clang, ConvertTypeToDiagnosticString deals with vectors specially. The rationale, I think, is to prevent the compiler from showing the internal implementation detail of how float4 and friends are defined. I think that this is the wrong approach and would have preferred a second attribute. Does attribute nodebug on a typedef have any meaning yet? Could we repurpose it to mean that you shouldn't look through this typedef for compiler diagnostics nor debug info? Any any case, our behaviour on diagnostics and debug info should probably match here.

Nick

Robinson, Paul wrote:

I think it comes down to how the information is planning to be used. A
consumer with the dwarf information today could, in fact, get to the
S<int> type from a user who types S<A> pretty easily right?

If the typedef actually appears in the DWARF, the consumer could figure
out what the user meant by typing S<A>, yes. In my experiments the
typedef is not always present, which leaves the user up a creek with no
paddle.

How the debugger presents the types of things is also a consideration,
however. This is more evident with a less trivial example, such as the
vector typedef I described previously. It is clearly a step backward in
the end-user debugging experience if people are used to seeing

S<int4>

which the debugger has been displaying all along, but suddenly they
start seeing instead

S<int __attribute__((ext_vector_type(4)))>

which is what has started happening. Especially if 'int4' no longer
appears as a typedef at all, this is Just Wrong.

In clang, ConvertTypeToDiagnosticString deals with vectors specially. The
rationale, I think, is to prevent the compiler from showing the internal
implementation detail of how float4 and friends are defined. I think that
this is the wrong approach and would have preferred a second attribute.
Does attribute nodebug on a typedef have any meaning yet? Could we
repurpose it to mean that you shouldn't look through this typedef for
compiler diagnostics nor debug info? Any any case, our behaviour on
diagnostics and debug info should probably match here.

One of the issues is that there's only /so/ different we can be from GCC
here before types/declarations/definitions won't match up in GDB. I believe
GCC has some smarts to tolerate differences like S<0> versus S<0u> or
S<'\0'> I think... at least some of those, but I don't know how it'll go
with:

  S<__attribute__((__vector_size__(4 * sizeof(int)))) int>

V

  S<__vector(4) int>

(using a GCC-compatible syntax, vector_size(sizeof(int) * 4) rather than
the ext_vector_type which isn't supported by GCC)

Huh... apparently GDB ignores the entire adornment and allows
func(S<__vector(4) int>) to be called with a variable of type
S<__attribute__((__vector_size__(5 * sizeof(int)))) int> even... not sure
what to make of any of that.

One of the issues is that there’s only /so/ different we can be from

GCC here before types/declarations/definitions won’t match up in GDB.

I think this might get to the nub of it: I agree that GCC/GDB matters, I disagree that GCC/GDB is what matters. GCC/GDB compatibility may be an important use-case but it is not the Reference Implementation of DWARF, and in particular GCC/GDB compatibility is completely irrelevant to my environment. My environment is 100% Clang, and we care more about what the DWARF spec says than we do about whatever GCC/GDB might choose to do for one reason or another. So, if GCC/GDB compatibility means diverging so noticeably from what the spec says (i.e., that the name is as it is in the source program) maybe this is a point worth identifying as one where a divergence occurs, and make the choice target-dependent.

In a way it feels somewhat analogous to choices in supporting extensions/dialects of C++. For practical purposes it’s very worthwhile to the community to support things that GCC supports, but that doesn’t mean that GCC defines the standard. In the case at hand, Clang has strayed from the letter of the DWARF spec, and we’d really like to see a way back toward it.

We’re entirely willing to do work toward getting things realigned (admittedly I personally have been mostly MIA for the past year, but I am seeing an occasional photon from down the far end of my current tunnel) given that the primary contributor and code owner are willing to go along with it.

Thanks,

–paulr

> One of the issues is that there's only /so/ different we can be from

> GCC here before types/declarations/definitions won't match up in GDB.

I think this might get to the nub of it: I agree that GCC/GDB matters, I
disagree that GCC/GDB is what matters. GCC/GDB compatibility may be an
important use-case but it is not the Reference Implementation of DWARF, and
in particular GCC/GDB compatibility is completely irrelevant to my
environment. My environment is 100% Clang,

What do you use as a debugger? (or other DWARF consumers that might care
about whether two bits of DWARF describe the same type in the same sense
that the C++ language defines)

and we care more about what the DWARF spec says than we do about whatever
GCC/GDB might choose to do for one reason or another. So, if GCC/GDB
compatibility means diverging so noticeably from what the spec says (i.e.,
that the name is as it is in the source program) maybe this is a point
worth identifying as one where a divergence occurs, and make the choice
target-dependent.

Certainly it's possible to make this target-dependent - see DWARF2 support
for Darwin, etc.

In a way it feels somewhat analogous to choices in supporting
extensions/dialects of C++. For practical purposes it's very worthwhile to
the community to support things that GCC supports, but that doesn't mean
that GCC defines the standard. In the case at hand, Clang has strayed from
the letter of the DWARF spec, and we'd really like to see a way back toward
it.

The DWARF spec doesn't really describe the world of templates in a complete
and useful manner. I think it's problematic to try to wedge the wording
into saying "DWARF says this is the one way to encode this info" - DWARF
makes some general suggestions about how certain constructs could be
mapped, but until there's a document like the C++ ABI that says "this is
the required lowering from C++ to DWARF" (and there's buy-in to conform to
this from both DWARF producers and consumers) a lot of this is going to
come down to "what do consumers and producers agree to".

In a way it feels somewhat analogous to choices in supporting
extensions/dialects of C++. For practical purposes it's very worthwhile to
the community to support things that GCC supports, but that doesn't mean
that GCC defines the standard. In the case at hand, Clang has strayed from
the letter of the DWARF spec, and we'd really like to see a way back toward
it.

The DWARF spec doesn't really describe the world of templates in a complete
and useful manner. I think it's problematic to try to wedge the wording into
saying "DWARF says this is the one way to encode this info" - DWARF makes
some general suggestions about how certain constructs could be mapped, but
until there's a document like the C++ ABI that says "this is the required
lowering from C++ to DWARF" (and there's buy-in to conform to this from both
DWARF producers and consumers) a lot of this is going to come down to "what
do consumers and producers agree to".

Not that Dave needs me to echo/upvote his comments, but this.

-eric

What do you use as a debugger?

Sony has a proprietary debugger that plugs into Visual Studio and knows how to manage processes on the game consoles. It’s not the only non-gdb DWARF-speaking debugger I’ve ever used, although the other one I’m aware of is basically defunct now.

Certainly it’s possible to make this target-dependent - see DWARF2 support for Darwin, etc.

Right, I didn’t think that would be much of a problem given the existing practices. J

The DWARF spec doesn’t really describe the world of templates in a complete and useful manner

Yes, well, hmmm. We tried to improve that some in DWARF 5 (describes non-type non-scalar parameters [this essentially codifies what gcc does, btw], insists that parameter DIEs follow the source order, has a flag for defaulted parameters…) But one thing about DWARF is that it invariably says (and always has) that the names of things are as in the source program, and that clearly stopped being true (or at least, was noticeably less true) for these typedef’d template parameters in Clang 3.5. Which is where this entire thread came from (& I wonder whether we scared off the OP).

a lot of this is going to come down to “what do consumers and producers agree to”.

Heh. ALL of it comes down to that, but it’s nice if the spec is somehow relevant to that agreement! If you think there’s something crucial that’s still missing from DWARF 5 (Eric has a copy), I can buy you a round at the next social and we can chat.

–paulr

What do you use as a debugger?
Sony has a proprietary debugger that plugs into Visual Studio and knows how to manage processes on the game consoles. It's not the only non-gdb DWARF-speaking debugger I've ever used, although the other one I'm aware of is basically defunct now. <>

Certainly it's possible to make this target-dependent - see DWARF2 support for Darwin, etc.
Right, I didn't think that would be much of a problem given the existing practices. J

The DWARF spec doesn't really describe the world of templates in a complete and useful manner
Yes, well, hmmm. We tried to improve that some in DWARF 5 (describes non-type non-scalar parameters [this essentially codifies what gcc does, btw], insists that parameter DIEs follow the source order, has a flag for defaulted parameters…) But one thing about DWARF is that it invariably says (and always has) that the names of things are as in the source program, and that clearly stopped being true (or at least, was noticeably less true) for these typedef'd template parameters in Clang 3.5. Which is where this entire thread came from (& I wonder whether we scared off the OP).

I’m still here and not too scared :slight_smile: I started this thread to see if I could gather consensus about the idea and maybe the implementation. I think the idea of emitting more source-accurate debug information is well received and I’ll certainly pursue that. Regarding the implementation, we need to be pragmatic. We are not starting from scratch, and I’d be the opinion that specs actually matter less than what the widely available consumers do (I hope I’m not scaring you off now!). So we need to find a way to provide this new information without disrupting the existing user base - which I consider to be lldb and gdb.

I will revisit the patch when I have some time. The typedef approach (with a potential AT_artificial) seems to be the only workable solution in the short term. Debates on where to use the typedef and where to use the implementation type can happen later on. I’m sure getting the simple additional typedef in the description without breaking anything in existing test suites might prove complicated enough.

Other feedback I gathered from the thread is that we should go a bit further and propagate the typedef into the class definition also. This makes sense to me, but is more involved, as it hits the type duplication issues you have been debating with David. As far I can see, nothing has been proposed to address this and I can’t see a way myself. It would seem strange to have:

typedef int A;
template<typename T> struct S { T member; };
S<A> s;

generate

(0) DW_TAG_typedef
      DW_AT_name: “S<A>”
      DW_AT_type: (2)

(1) DW_TAG_typedef
      DW_AT_name: “A”
      DW_AT_type: -> int

(2) DW_TAG_struct_type
      DW_AT_name: “S<int>”
      DW_TAG_member:
        DW_AT_name: “member”
        DW_AT_type: (1)

In this case (and if S<A> is the only ‘name’ used in the source) it would be more logical to have the “S<A>” and "S<int>” exchanged in the pseudo-dwarf above.

Anyway, thanks for all the inputs!
Fred