[RFC][PATCH] Keep un-canonicalized template types in the debug information

Well, [+lldb-dev], could this confuse debuggers? :slight_smile:

– adrian

>
> I couldn’t even find a subject expressing exactly what this patch is
about… First of all, it’s meant to start a discussion, and I’m not
proposing it for inclusion right now.
>
> The issue I’m trying to address is that template types are always
canonicalized when emitted in the debug information (this is the desugar()
call in UnwrapTypeForDebugInformation).
>
> This means that if the developer writes:
>
> typedef int A;
> template <typename T>
> struct S {};
>
> S<A> var;
>
> The variable var will have type S<int> and not S<A>. In this simple
example, it’s not that much of an issue, but for heavily templated code,
the full expansion might be really different from the original declaration.
>
> The attached patch makes us emit an intermediate typedef for the
variable’s type:
>
> 0x0000002a: DW_TAG_variable [2]
> DW_AT_name [DW_FORM_strp] ( .debug_str[0x00000032]
= “var")
> DW_AT_type [DW_FORM_ref4] (cu + 0x0040 =>
{0x00000040})
> DW_AT_external [DW_FORM_flag] (0x01)
> DW_AT_decl_file [DW_FORM_data1] (0x01)
> DW_AT_decl_line [DW_FORM_data1] (8)
> DW_AT_location [DW_FORM_block1] (<0x09> 03 70 6c 00 00
00 00 00 00 )
>
> 0x00000040: DW_TAG_typedef [3]
> DW_AT_type [DW_FORM_ref4] (cu + 0x004b =>
{0x0000004b})
> DW_AT_name [DW_FORM_strp] ( .debug_str[0x00000035]
= “S<A>")
> DW_AT_decl_file [DW_FORM_data1] (0x01)
> DW_AT_decl_line [DW_FORM_data1] (6)
>
> 0x0000004b: DW_TAG_structure_type [4] *
> DW_AT_name [DW_FORM_strp] ( .debug_str[0x0000003e]
= “S<int>")
> DW_AT_byte_size [DW_FORM_data1] (0x01)
> DW_AT_decl_file [DW_FORM_data1] (0x01)
> DW_AT_decl_line [DW_FORM_data1] (6)
>
>
> Which basically is what I want, although I don’t like that it
introduces a typedef where there is none in the code. I’d prefer that to be:
>
> DW_TAG_variable
> DW_AT_type: -> DW_TAG_structure_type
> DW_AT_name: S<A>
> DW_AT_specification: -> DW_TAG_structure_type
> DW_AT_name: S<int>
> …
>
> The patch also has the nice property of omitting the defaulted template
arguments in the first level typedef. For example you get vector<A> instead
of vector<int, std::__1::allocator<int> >.

If you specify `vector<int>` in C++ do you get that instead of
`vector<int, std::__1::allocator<int>>`?

Yeah, I mentioned this as possibly causing problems with debuggers or
other consumers, but I don't have any proof past "ooooo scary!”.

Well, [+lldb-dev], could this confuse debuggers? :slight_smile:

*bugs random gdb developer as well*

-eric

This means you will see "S<A>" as the type for your variables in the debugger when you view variables or children of structs/unions/classes. I think this is not what the user would want to see. I would rather see "S<int>" as the type for my variable than see "S<A>".

This means you will see "S<A>" as the type for your variables in the
debugger when you view variables or children of structs/unions/classes. I
think this is not what the user would want to see. I would rather see
"S<int>" as the type for my variable than see "S<A>".

Clang generally hasn't taken this direction (granted, using the debugger
might have different priorities from using the compiler)

type.cpp:5:6: note: candidate function not viable: no known conversion
from 'foo<bar>' to 'int' for 1st argument
void func(int) {
     ^

(for example, where 'bar' is 'typedef int bar')

This means you will see "S<A>" as the type for your variables in the debugger when you view variables or children of structs/unions/classes. I think this is not what the user would want to see. I would rather see "S<int>" as the type for my variable than see "S<A>”.

I find it more accurate for the debugger to report what has actually been put in the code. Moreover when a typedef is used, it’s usually to make things more readable not to hide information, thus I guess it would usually be as informative while being more compact. The debugger needs to have a way to describe the real type behind the abbreviated name though, we must not have less information compared to what we have today.

Another point: this allows the debugger to know what S<A> actually is. Without it, the debugger only knows the canonical type. This means that currently you can’t copy/paste a piece of code that references that kind of template names and have it parse correctly. I /think/ that having this information in the debug info will allow more of this to work.

But we can agree to disagree :slight_smile: It would be great to have more people chime and give their opinion.

Fred

I'm in favour.

From the debugger's standpoint, the functional concern is that if you do something more real, like:

typedef int A;
template <typename T>
struct S
{
  T my_t;
};

I want to make sure that the type of my_t is given as "A" not as "int". The reason for that is that it is not uncommon to have data formatters that trigger off the typedef name. This happens when you use some common underlying type like "int" but the value has some special meaning when it is formally an "A", and you want to use the data formatters to give it an appropriate presentation. Since the data formatters work by matching type name, starting from the most specific on down, it is important that the typedef name be preserved.

However, it would be really odd to see:

(lldb) expr -T -- my_s
(S<int>) $1 = {
  (A) my_t = 5
}

instead of:

(lldb) expr -T -- my_s
(S<A>) $1 = {
  (A) my_t = 5
}

so I am in favor of presenting the template parameter type with the most specific name it was given in the overall template type name.

Jim

From the debugger's standpoint, the functional concern is that if you do something more real, like:

typedef int A;
template <typename T>
struct S
{
T my_t;
};

I want to make sure that the type of my_t is given as "A" not as "int". The reason for that is that it is not uncommon to have data formatters that trigger off the typedef name. This happens when you use some common underlying type like "int" but the value has some special meaning when it is formally an "A", and you want to use the data formatters to give it an appropriate presentation. Since the data formatters work by matching type name, starting from the most specific on down, it is important that the typedef name be preserved.

However, it would be really odd to see:

(lldb) expr -T -- my_s
(S<int>) $1 = {
(A) my_t = 5
}

instead of:

(lldb) expr -T -- my_s
(S<A>) $1 = {
(A) my_t = 5
}

so I am in favor of presenting the template parameter type with the most specific name it was given in the overall template type name.

OK, we get this wrong today. I’ll try to look into it.

What’s your take on the debug info representation for the templated class type? The tentative patch introduces a typedef that declares S<A> as a typedef for S<int>. The typedef doesn’t exist in the code, thus I find it a bit of a lie to the debugger. I was more in favour of something like :

DW_TAG_variable
DW_AT_type: -> DW_TAG_structure_type
               DW_AT_name: S<A>
               DW_AT_specification: -> DW_TAG_structure_type
                                         DW_AT_name: S<int>

This way the canonical type is kept in the debug information, and the declaration type is a real class type aliasing the canonical type. But I’m not sure debuggers can digest this kind of aliasing.

Fred

Sorry for being late to the party. We've had complaints about seeing things
like S<int>, would rather see S<A>, extremely happy to see this direction.
Thanks,
--paulr

From the debugger’s standpoint, the functional concern is that if you do something more real, like:

typedef int A;
template
struct S
{
T my_t;
};

I want to make sure that the type of my_t is given as “A” not as “int”. The reason for that is that it is not uncommon to have data formatters that trigger off the typedef name. This happens when you use some common underlying type like “int” but the value has some special meaning when it is formally an “A”, and you want to use the data formatters to give it an appropriate presentation. Since the data formatters work by matching type name, starting from the most specific on down, it is important that the typedef name be preserved.

However, it would be really odd to see:

(lldb) expr -T – my_s
(S) $1 = {
(A) my_t = 5
}

instead of:

(lldb) expr -T – my_s
(S) $1 = {
(A) my_t = 5
}

so I am in favor of presenting the template parameter type with the most specific name it was given in the overall template type name.

OK, we get this wrong today. I’ll try to look into it.

What’s your take on the debug info representation for the templated class type? The tentative patch introduces a typedef that declares S as a typedef for S. The typedef doesn’t exist in the code, thus I find it a bit of a lie to the debugger. I was more in favour of something like :

DW_TAG_variable
DW_AT_type: → DW_TAG_structure_type
DW_AT_name: S

DW_AT_specification: → DW_TAG_structure_type
DW_AT_name: S

This way the canonical type is kept in the debug information, and the declaration type is a real class type aliasing the canonical type. But I’m not sure debuggers can digest this kind of aliasing.

I assume this is more a discussion to have with the dwarf committee to get a sense of what consumers want, etc.

If not for that, I would assume using a typedef (you could add DW_tag_artificial in either of your proposals to make it sort of clear) would work for free on any debugger today, whereas the structure_type might cause some confusion/problems. (Testing on a few current consumers would be helpful)

  • Davod

From: llvm-commits-bounces@cs.uiuc.edu [mailto:llvm-commits-
bounces@cs.uiuc.edu] On Behalf Of Frédéric Riss
Sent: Tuesday, September 09, 2014 6:19 AM
To: jingham@apple.com
Cc: lldb-dev@cs.uiuc.edu; llvm-commits; Greg Clayton
Subject: Re: [lldb-dev] [RFC][PATCH] Keep un-canonicalized template types
in the debug information

>
> From the debugger's standpoint, the functional concern is that if you do
something more real, like:
>
> typedef int A;
> template <typename T>
> struct S
> {
> T my_t;
> };
>
> I want to make sure that the type of my_t is given as "A" not as "int".
The reason for that is that it is not uncommon to have data formatters
that trigger off the typedef name. This happens when you use some common
underlying type like "int" but the value has some special meaning when it
is formally an "A", and you want to use the data formatters to give it an
appropriate presentation. Since the data formatters work by matching type
name, starting from the most specific on down, it is important that the
typedef name be preserved.
>
> However, it would be really odd to see:
>
> (lldb) expr -T -- my_s
> (S<int>) $1 = {
> (A) my_t = 5
> }
>
> instead of:
>
> (lldb) expr -T -- my_s
> (S<A>) $1 = {
> (A) my_t = 5
> }
>
> so I am in favor of presenting the template parameter type with the most
specific name it was given in the overall template type name.

OK, we get this wrong today. I’ll try to look into it.

What’s your take on the debug info representation for the templated class
type? The tentative patch introduces a typedef that declares S<A> as a
typedef for S<int>. The typedef doesn’t exist in the code, thus I find it
a bit of a lie to the debugger. I was more in favour of something like :

DW_TAG_variable
DW_AT_type: -> DW_TAG_structure_type
               DW_AT_name: S<A>
               DW_AT_specification: -> DW_TAG_structure_type
                                         DW_AT_name: S<int>

This way the canonical type is kept in the debug information, and the
declaration type is a real class type aliasing the canonical type. But I’m
not sure debuggers can digest this kind of aliasing.

Fred

Why introduce the extra typedef? S<A> should have a template parameter
entry pointing to A which points to int. The info should all be there
without any extra stuff. Or if you think something is missing, please
provide a more complete example.
--paulr

From: llvm-commits-bounces@cs.uiuc.edu [mailto:llvm-commits-
bounces@cs.uiuc.edu] On Behalf Of Frédéric Riss
Sent: Tuesday, September 09, 2014 6:19 AM
To: jingham@apple.com
Cc: lldb-dev@cs.uiuc.edu; llvm-commits; Greg Clayton
Subject: Re: [lldb-dev] [RFC][PATCH] Keep un-canonicalized template types
in the debug information

From the debugger’s standpoint, the functional concern is that if you do
something more real, like:

typedef int A;
template
struct S
{
T my_t;
};

I want to make sure that the type of my_t is given as “A” not as “int”.
The reason for that is that it is not uncommon to have data formatters
that trigger off the typedef name. This happens when you use some common
underlying type like “int” but the value has some special meaning when it
is formally an “A”, and you want to use the data formatters to give it an
appropriate presentation. Since the data formatters work by matching type
name, starting from the most specific on down, it is important that the
typedef name be preserved.

However, it would be really odd to see:

(lldb) expr -T – my_s
(S) $1 = {
(A) my_t = 5
}

instead of:

(lldb) expr -T – my_s
(S) $1 = {
(A) my_t = 5
}

so I am in favor of presenting the template parameter type with the most
specific name it was given in the overall template type name.

OK, we get this wrong today. I’ll try to look into it.

What’s your take on the debug info representation for the templated class
type? The tentative patch introduces a typedef that declares S
as a
typedef for S. The typedef doesn’t exist in the code, thus I find it
a bit of a lie to the debugger. I was more in favour of something like :

DW_TAG_variable
DW_AT_type: → DW_TAG_structure_type
DW_AT_name: S

DW_AT_specification: → DW_TAG_structure_type
DW_AT_name: S

This way the canonical type is kept in the debug information, and the
declaration type is a real class type aliasing the canonical type. But I’m
not sure debuggers can digest this kind of aliasing.

Fred

Why introduce the extra typedef? S should have a template parameter
entry pointing to A which points to int. The info should all be there
without any extra stuff. Or if you think something is missing, please
provide a more complete example.

My immediate concern here would be either loss of information or bloat when using that with type units (either bloat because each instantiation with differently spelled (but identical) parameters is treated as a separate type - or loss when the types are considered the same and all but one are dropped at link time)

> > >
> > > From the debugger's standpoint, the functional concern is that if you do
> > something more real, like:
> > >
> > > typedef int A;
> > > template <typename T>
> > > struct S
> > > {
> > > T my_t;
> > > };
> > >
> > > I want to make sure that the type of my_t is given as "A" not as "int".
> > The reason for that is that it is not uncommon to have data formatters
> > that trigger off the typedef name. This happens when you use some common
> > underlying type like "int" but the value has some special meaning when it
> > is formally an "A", and you want to use the data formatters to give it an
> > appropriate presentation. Since the data formatters work by matching type
> > name, starting from the most specific on down, it is important that the
> > typedef name be preserved.
> > >
> > > However, it would be really odd to see:
> > >
> > > (lldb) expr -T -- my_s
> > > (S<int>) $1 = {
> > > (A) my_t = 5
> > > }
> > >
> > > instead of:
> > >
> > > (lldb) expr -T -- my_s
> > > (S<A>) $1 = {
> > > (A) my_t = 5
> > > }
> > >
> > > so I am in favor of presenting the template parameter type with the most
> > specific name it was given in the overall template type name.
> >
> > OK, we get this wrong today. I’ll try to look into it.
> >
> > What’s your take on the debug info representation for the templated class
> > type? The tentative patch introduces a typedef that declares S<A> as a
> > typedef for S<int>. The typedef doesn’t exist in the code, thus I find it
> > a bit of a lie to the debugger. I was more in favour of something like :
> >
> > DW_TAG_variable
> > DW_AT_type: -> DW_TAG_structure_type
> > DW_AT_name: S<A>
> > DW_AT_specification: -> DW_TAG_structure_type
> > DW_AT_name: S<int>
> >
> > This way the canonical type is kept in the debug information, and the
> > declaration type is a real class type aliasing the canonical type. But I’m
> > not sure debuggers can digest this kind of aliasing.
> >
> > Fred
>
> Why introduce the extra typedef? S<A> should have a template parameter
> entry pointing to A which points to int. The info should all be there
> without any extra stuff. Or if you think something is missing, please
> provide a more complete example.
My immediate concern here would be either loss of information or bloat
when using that with type units (either bloat because each instantiation
with differently spelled (but identical) parameters is treated as a separate
type - or loss when the types are considered the same and all but one are
dropped at link time)

You'll need to unpack that more because I'm not following the concern.
If the typedefs are spelled differently, don't they count as different types?
DWARF wants to describe the program as-written, and there's no S<int> written
in the program.

The template type information is stored in 2 places:

  • In the template argument DIEs that you mention (I didn’t touch to this one)
  • In the name of the template type (this is the one the patch is modifying)

It is my understanding that the debugger will use the latter more than the template argument DIE. Thus my goal is to register both S and S as valid types so that we don’t lose any information compared to what was there before.

Fred

> > > >
> > > > From the debugger's standpoint, the functional concern is that if
you do
> > > something more real, like:
> > > >
> > > > typedef int A;
> > > > template <typename T>
> > > > struct S
> > > > {
> > > > T my_t;
> > > > };
> > > >
> > > > I want to make sure that the type of my_t is given as "A" not as
"int".
> > > The reason for that is that it is not uncommon to have data
formatters
> > > that trigger off the typedef name. This happens when you use some
common
> > > underlying type like "int" but the value has some special meaning
when it
> > > is formally an "A", and you want to use the data formatters to give
it an
> > > appropriate presentation. Since the data formatters work by matching
type
> > > name, starting from the most specific on down, it is important that
the
> > > typedef name be preserved.
> > > >
> > > > However, it would be really odd to see:
> > > >
> > > > (lldb) expr -T -- my_s
> > > > (S<int>) $1 = {
> > > > (A) my_t = 5
> > > > }
> > > >
> > > > instead of:
> > > >
> > > > (lldb) expr -T -- my_s
> > > > (S<A>) $1 = {
> > > > (A) my_t = 5
> > > > }
> > > >
> > > > so I am in favor of presenting the template parameter type with
the most
> > > specific name it was given in the overall template type name.
> > >
> > > OK, we get this wrong today. I’ll try to look into it.
> > >
> > > What’s your take on the debug info representation for the templated
class
> > > type? The tentative patch introduces a typedef that declares S<A> as
a
> > > typedef for S<int>. The typedef doesn’t exist in the code, thus I
find it
> > > a bit of a lie to the debugger. I was more in favour of something
like :
> > >
> > > DW_TAG_variable
> > > DW_AT_type: -> DW_TAG_structure_type
> > > DW_AT_name: S<A>
> > > DW_AT_specification: -> DW_TAG_structure_type
> > > DW_AT_name: S<int>
> > >
> > > This way the canonical type is kept in the debug information, and the
> > > declaration type is a real class type aliasing the canonical type.
But I’m
> > > not sure debuggers can digest this kind of aliasing.
> > >
> > > Fred
> >
> > Why introduce the extra typedef? S<A> should have a template parameter
> > entry pointing to A which points to int. The info should all be there
> > without any extra stuff. Or if you think something is missing, please
> > provide a more complete example.
> My immediate concern here would be either loss of information or bloat
> when using that with type units (either bloat because each instantiation
> with differently spelled (but identical) parameters is treated as a
separate
> type - or loss when the types are considered the same and all but one are
> dropped at link time)

You'll need to unpack that more because I'm not following the concern.
If the typedefs are spelled differently, don't they count as different
types?
DWARF wants to describe the program as-written, and there's no S<int>
written
in the program.

Maybe not in this TU, but possibly in another TU? Or by the user.

  void func(S<int>);
  ...
  typedef int A;
  S<A> s;
  func(s); // calls the same function

The user probably wants to be able to call void func with S<int> or S<A>
(and, actually, in theory, with S<B> where B is another typedef of int, but
that'll /really/ require DWARF consumer support and/or new DWARF wording).
We can't emit these as completely independent types - it would be verbose
(every instantiation with different typedefs would be a whole separate type
in the DWARF, not deduplicated by type units, etc) and wrong (the debugger
wouldn't know these are actually the same type so wouldn't allow function
calls, etc).

- David

From: David Blaikie [mailto:dblaikie@gmail.com]
> > > >
> > > > From the debugger's standpoint, the functional concern is that if you do
> > > something more real, like:
> > > >
> > > > typedef int A;
> > > > template <typename T>
> > > > struct S
> > > > {
> > > > T my_t;
> > > > };
> > > >
> > > > I want to make sure that the type of my_t is given as "A" not as "int".
> > > The reason for that is that it is not uncommon to have data formatters
> > > that trigger off the typedef name. This happens when you use some common
> > > underlying type like "int" but the value has some special meaning when it
> > > is formally an "A", and you want to use the data formatters to give it an
> > > appropriate presentation. Since the data formatters work by matching type
> > > name, starting from the most specific on down, it is important that the
> > > typedef name be preserved.
> > > >
> > > > However, it would be really odd to see:
> > > >
> > > > (lldb) expr -T -- my_s
> > > > (S<int>) $1 = {
> > > > (A) my_t = 5
> > > > }
> > > >
> > > > instead of:
> > > >
> > > > (lldb) expr -T -- my_s
> > > > (S<A>) $1 = {
> > > > (A) my_t = 5
> > > > }
> > > >
> > > > so I am in favor of presenting the template parameter type with the most
> > > specific name it was given in the overall template type name.
> > >
> > > OK, we get this wrong today. I’ll try to look into it.
> > >
> > > What’s your take on the debug info representation for the templated class
> > > type? The tentative patch introduces a typedef that declares S<A> as a
> > > typedef for S<int>. The typedef doesn’t exist in the code, thus I find it
> > > a bit of a lie to the debugger. I was more in favour of something like :
> > >
> > > DW_TAG_variable
> > > DW_AT_type: -> DW_TAG_structure_type
> > > DW_AT_name: S<A>
> > > DW_AT_specification: -> DW_TAG_structure_type
> > > DW_AT_name: S<int>
> > >
> > > This way the canonical type is kept in the debug information, and the
> > > declaration type is a real class type aliasing the canonical type. But I’m
> > > not sure debuggers can digest this kind of aliasing.
> > >
> > > Fred
> >
> > Why introduce the extra typedef? S<A> should have a template parameter
> > entry pointing to A which points to int. The info should all be there
> > without any extra stuff. Or if you think something is missing, please
> > provide a more complete example.
> My immediate concern here would be either loss of information or bloat
> when using that with type units (either bloat because each instantiation
> with differently spelled (but identical) parameters is treated as a separate
> type - or loss when the types are considered the same and all but one are
> dropped at link time)
You'll need to unpack that more because I'm not following the concern.
If the typedefs are spelled differently, don't they count as different types?
DWARF wants to describe the program as-written, and there's no S<int> written
in the program.

Maybe not in this TU, but possibly in another TU? Or by the user.

void func(S<int>);
...
typedef int A;
S<A> s;
func(s); // calls the same function

The user probably wants to be able to call void func with S<int> or S<A>

Sure.

(and, actually, in theory, with S<B> where B is another typedef of int, but
that'll /really/ require DWARF consumer support and/or new DWARF wording).

Not DWARF wording. DWARF doesn't say when you can and can't call something;
that's a debugger feature and therefore a debugger decision.

We can't emit these as completely independent types - it would be verbose
(every instantiation with different typedefs would be a whole separate type
in the DWARF, not deduplicated by type units, etc) and wrong

Yes, "typedef int A;" creates a synonym/alias not a new type, so S<A> and S<int>
describe the same type from the C++ perspective, so you don't want two complete
descriptions with different names, because that really would be describing them
as separate types. What wrinkles my brow is having S<int> be the "real"
description even though it isn't instantiated that way in the program. I wonder
if it should be marked artificial... but if you do instantiate S<int> in another
TU then you don't want that. Huh. It also seems weird to have this:
  DW_TAG_typedef
    DW_AT_name "S<A>"
    DW_AT_type -> S<int>
but I seem to be coming around to thinking that's the most viable way to have
a single actual instantiated type, and still have the correct names of things
(because DWARF is all about the name "as it appears in the source program.")

> From: David Blaikie [mailto:dblaikie@gmail.com]
> > > > >
> > > > > From the debugger's standpoint, the functional concern is that
if you do
> > > > something more real, like:
> > > > >
> > > > > typedef int A;
> > > > > template <typename T>
> > > > > struct S
> > > > > {
> > > > > T my_t;
> > > > > };
> > > > >
> > > > > I want to make sure that the type of my_t is given as "A" not as
"int".
> > > > The reason for that is that it is not uncommon to have data
formatters
> > > > that trigger off the typedef name. This happens when you use some
common
> > > > underlying type like "int" but the value has some special meaning
when it
> > > > is formally an "A", and you want to use the data formatters to
give it an
> > > > appropriate presentation. Since the data formatters work by
matching type
> > > > name, starting from the most specific on down, it is important
that the
> > > > typedef name be preserved.
> > > > >
> > > > > However, it would be really odd to see:
> > > > >
> > > > > (lldb) expr -T -- my_s
> > > > > (S<int>) $1 = {
> > > > > (A) my_t = 5
> > > > > }
> > > > >
> > > > > instead of:
> > > > >
> > > > > (lldb) expr -T -- my_s
> > > > > (S<A>) $1 = {
> > > > > (A) my_t = 5
> > > > > }
> > > > >
> > > > > so I am in favor of presenting the template parameter type with
the most
> > > > specific name it was given in the overall template type name.
> > > >
> > > > OK, we get this wrong today. I’ll try to look into it.
> > > >
> > > > What’s your take on the debug info representation for the
templated class
> > > > type? The tentative patch introduces a typedef that declares S<A>
as a
> > > > typedef for S<int>. The typedef doesn’t exist in the code, thus I
find it
> > > > a bit of a lie to the debugger. I was more in favour of something
like :
> > > >
> > > > DW_TAG_variable
> > > > DW_AT_type: -> DW_TAG_structure_type
> > > > DW_AT_name: S<A>
> > > > DW_AT_specification: -> DW_TAG_structure_type
> > > > DW_AT_name: S<int>
> > > >
> > > > This way the canonical type is kept in the debug information, and
the
> > > > declaration type is a real class type aliasing the canonical type.
But I’m
> > > > not sure debuggers can digest this kind of aliasing.
> > > >
> > > > Fred
> > >
> > > Why introduce the extra typedef? S<A> should have a template
parameter
> > > entry pointing to A which points to int. The info should all be
there
> > > without any extra stuff. Or if you think something is missing,
please
> > > provide a more complete example.
> > My immediate concern here would be either loss of information or bloat
> > when using that with type units (either bloat because each
instantiation
> > with differently spelled (but identical) parameters is treated as a
separate
> > type - or loss when the types are considered the same and all but one
are
> > dropped at link time)
> You'll need to unpack that more because I'm not following the concern.
> If the typedefs are spelled differently, don't they count as different
types?
> DWARF wants to describe the program as-written, and there's no S<int>
written
> in the program.
>
> Maybe not in this TU, but possibly in another TU? Or by the user.
>
> void func(S<int>);
> ...
> typedef int A;
> S<A> s;
> func(s); // calls the same function
>
> The user probably wants to be able to call void func with S<int> or S<A>

Sure.

> (and, actually, in theory, with S<B> where B is another typedef of int,
but
> that'll /really/ require DWARF consumer support and/or new DWARF
wording).

Not DWARF wording. DWARF doesn't say when you can and can't call something;
that's a debugger feature and therefore a debugger decision.

What I mean is we'd need some new DWARF to help explain which types are
equivalent (or the debugger would have to do a lot of spelunking to try to
find structurally equivalent types - "S<B>" and "S<A>", go look through
their DW_TAG_template_type_params, see if they are typedefs to the same
underlying type, etc... )

> We can't emit these as completely independent types - it would be verbose
> (every instantiation with different typedefs would be a whole separate
type
> in the DWARF, not deduplicated by type units, etc) and wrong

Yes, "typedef int A;" creates a synonym/alias not a new type, so S<A> and
S<int>
describe the same type from the C++ perspective, so you don't want two
complete
descriptions with different names, because that really would be describing
them
as separate types. What wrinkles my brow is having S<int> be the "real"
description even though it isn't instantiated that way in the program. I
wonder
if it should be marked artificial... but if you do instantiate S<int> in
another
TU then you don't want that. Huh. It also seems weird to have this:
  DW_TAG_typedef
    DW_AT_name "S<A>"
    DW_AT_type -> S<int>
but I seem to be coming around to thinking that's the most viable way to
have
a single actual instantiated type, and still have the correct names of
things

Yep - it's the only way I can think of giving this information in a way
that's likely to work with existing consumers. It would probably be
harmless to add DW_AT_artificial to the DW_TAG_typedef, if that's any help
to any debug info consumer.

That said, I'm not opposed to proposing something to DWARF to define some
more 'proper' way to describe this.

One other open question is then, when, if ever, to reference the
DW_TAG_typedef rather than the underlying type? Do we just reference it
whenever the user writes using that name?

  void f(S<A>);
  ...
  void f(S<B>) { ... }

etc... (this would be just as possible/we could maybe treat it the same as
if the user wrote "void f(A); ... void f(B) { ... }")

From: David Blaikie [mailto:dblaikie@gmail.com]
> > From: David Blaikie [mailto:dblaikie@gmail.com]
> > > > > >
> > > > > > From the debugger's standpoint, the functional concern is that if you do
> > > > > something more real, like:
> > > > > >
> > > > > > typedef int A;
> > > > > > template <typename T>
> > > > > > struct S
> > > > > > {
> > > > > > T my_t;
> > > > > > };
> > > > > >
> > > > > > I want to make sure that the type of my_t is given as "A" not as "int".
> > > > > The reason for that is that it is not uncommon to have data formatters
> > > > > that trigger off the typedef name. This happens when you use some common
> > > > > underlying type like "int" but the value has some special meaning when it
> > > > > is formally an "A", and you want to use the data formatters to give it an
> > > > > appropriate presentation. Since the data formatters work by matching type
> > > > > name, starting from the most specific on down, it is important that the
> > > > > typedef name be preserved.
> > > > > >
> > > > > > However, it would be really odd to see:
> > > > > >
> > > > > > (lldb) expr -T -- my_s
> > > > > > (S<int>) $1 = {
> > > > > > (A) my_t = 5
> > > > > > }
> > > > > >
> > > > > > instead of:
> > > > > >
> > > > > > (lldb) expr -T -- my_s
> > > > > > (S<A>) $1 = {
> > > > > > (A) my_t = 5
> > > > > > }
> > > > > >
> > > > > > so I am in favor of presenting the template parameter type with the most
> > > > > specific name it was given in the overall template type name.
> > > > >
> > > > > OK, we get this wrong today. I’ll try to look into it.
> > > > >
> > > > > What’s your take on the debug info representation for the templated class
> > > > > type? The tentative patch introduces a typedef that declares S<A> as a
> > > > > typedef for S<int>. The typedef doesn’t exist in the code, thus I find it
> > > > > a bit of a lie to the debugger. I was more in favour of something like :
> > > > >
> > > > > DW_TAG_variable
> > > > > DW_AT_type: -> DW_TAG_structure_type
> > > > > DW_AT_name: S<A>
> > > > > DW_AT_specification: -> DW_TAG_structure_type
> > > > > DW_AT_name: S<int>
> > > > >
> > > > > This way the canonical type is kept in the debug information, and the
> > > > > declaration type is a real class type aliasing the canonical type. But I’m
> > > > > not sure debuggers can digest this kind of aliasing.
> > > > >
> > > > > Fred
> > > >
> > > > Why introduce the extra typedef? S<A> should have a template parameter
> > > > entry pointing to A which points to int. The info should all be there
> > > > without any extra stuff. Or if you think something is missing, please
> > > > provide a more complete example.
> > > My immediate concern here would be either loss of information or bloat
> > > when using that with type units (either bloat because each instantiation
> > > with differently spelled (but identical) parameters is treated as a separate
> > > type - or loss when the types are considered the same and all but one are
> > > dropped at link time)
> > You'll need to unpack that more because I'm not following the concern.
> > If the typedefs are spelled differently, don't they count as different types?
> > DWARF wants to describe the program as-written, and there's no S<int> written
> > in the program.
> >
> > Maybe not in this TU, but possibly in another TU? Or by the user.
> >
> > void func(S<int>);
> > ...
> > typedef int A;
> > S<A> s;
> > func(s); // calls the same function
> >
> > The user probably wants to be able to call void func with S<int> or S<A>
> Sure.
>
> > (and, actually, in theory, with S<B> where B is another typedef of int, but
> > that'll /really/ require DWARF consumer support and/or new DWARF wording).
>
> Not DWARF wording. DWARF doesn't say when you can and can't call something;
> that's a debugger feature and therefore a debugger decision.
>
What I mean is we'd need some new DWARF to help explain which types are
equivalent (or the debugger would have to do a lot of spelunking to try
to find structurally equivalent types - "S<B>" and "S<A>", go look through
their DW_TAG_template_type_params, see if they are typedefs to the same
underlying type, etc... )
>
>
> > We can't emit these as completely independent types - it would be verbose
> > (every instantiation with different typedefs would be a whole separate type
> > in the DWARF, not deduplicated by type units, etc) and wrong
>
> Yes, "typedef int A;" creates a synonym/alias not a new type, so S<A> and S<int>
> describe the same type from the C++ perspective, so you don't want two complete
> descriptions with different names, because that really would be describing them
> as separate types. What wrinkles my brow is having S<int> be the "real"
> description even though it isn't instantiated that way in the program. I wonder
> if it should be marked artificial... but if you do instantiate S<int> in another
> TU then you don't want that. Huh. It also seems weird to have this:
> DW_TAG_typedef
> DW_AT_name "S<A>"
> DW_AT_type -> S<int>
> but I seem to be coming around to thinking that's the most viable way to have
> a single actual instantiated type, and still have the correct names of things

*mostly* correct; this still loses "A" as the type of the data member. Crud.
But I haven't come up with a way to get that back without basically instantiating
S<A> and S<int> separately.

>
Yep - it's the only way I can think of giving this information in a way that's
likely to work with existing consumers. It would probably be harmless to add
DW_AT_artificial to the DW_TAG_typedef, if that's any help to any debug info
consumer.

Hmmm no, S<A> is not the artificial name; some debuggers treat DW_AT_artificial
as meaning "don't show this to the user." But S<A> is the name we *do* want to
show to the user.

That said, I'm not opposed to proposing something to DWARF to define some more
'proper' way to describe this.

Yah. I've been thinking about the DW_AT_specification idea too, which would be
something like this:
    DW_TAG_class_type
      DW_AT_name "S<A>"
      DW_AT_specification -> S<int>

      DW_TAG_template_type_parameter
        DW_AT_name "T"
        DW_AT_type -> A

The problem with this is you don't know where T is used in the template, so
you *still* don't know when to use A as the type of "field". Also it's kind
of an abuse of DW_AT_specification. If we can't get A as the type of "field"
then the typedef is more straightforward and understandable.

Maybe I'll pop a note to the DWARF committee for a broader set of opinions.

One other open question is then, when, if ever, to reference the DW_TAG_typedef
rather than the underlying type? Do we just reference it whenever the user
writes using that name?

void f(S<A>);
...
void f(S<B>) { ... }

etc... (this would be just as possible/we could maybe treat it the same as if
the user wrote "void f(A); ... void f(B) { ... }")

That's what I would do, and I think is more conformant to the DWARF spec.
--paulr

> From: David Blaikie [mailto:dblaikie@gmail.com]
> > > From: David Blaikie [mailto:dblaikie@gmail.com]
> > > > > > >
> > > > > > > From the debugger's standpoint, the functional concern is
that if you do
> > > > > > something more real, like:
> > > > > > >
> > > > > > > typedef int A;
> > > > > > > template <typename T>
> > > > > > > struct S
> > > > > > > {
> > > > > > > T my_t;
> > > > > > > };
> > > > > > >
> > > > > > > I want to make sure that the type of my_t is given as "A"
not as "int".
> > > > > > The reason for that is that it is not uncommon to have data
formatters
> > > > > > that trigger off the typedef name. This happens when you use
some common
> > > > > > underlying type like "int" but the value has some special
meaning when it
> > > > > > is formally an "A", and you want to use the data formatters to
give it an
> > > > > > appropriate presentation. Since the data formatters work by
matching type
> > > > > > name, starting from the most specific on down, it is important
that the
> > > > > > typedef name be preserved.
> > > > > > >
> > > > > > > However, it would be really odd to see:
> > > > > > >
> > > > > > > (lldb) expr -T -- my_s
> > > > > > > (S<int>) $1 = {
> > > > > > > (A) my_t = 5
> > > > > > > }
> > > > > > >
> > > > > > > instead of:
> > > > > > >
> > > > > > > (lldb) expr -T -- my_s
> > > > > > > (S<A>) $1 = {
> > > > > > > (A) my_t = 5
> > > > > > > }
> > > > > > >
> > > > > > > so I am in favor of presenting the template parameter type
with the most
> > > > > > specific name it was given in the overall template type name.
> > > > > >
> > > > > > OK, we get this wrong today. I’ll try to look into it.
> > > > > >
> > > > > > What’s your take on the debug info representation for the
templated class
> > > > > > type? The tentative patch introduces a typedef that declares
S<A> as a
> > > > > > typedef for S<int>. The typedef doesn’t exist in the code,
thus I find it
> > > > > > a bit of a lie to the debugger. I was more in favour of
something like :
> > > > > >
> > > > > > DW_TAG_variable
> > > > > > DW_AT_type: -> DW_TAG_structure_type
> > > > > > DW_AT_name: S<A>
> > > > > > DW_AT_specification: -> DW_TAG_structure_type
> > > > > > DW_AT_name: S<int>
> > > > > >
> > > > > > This way the canonical type is kept in the debug information,
and the
> > > > > > declaration type is a real class type aliasing the canonical
type. But I’m
> > > > > > not sure debuggers can digest this kind of aliasing.
> > > > > >
> > > > > > Fred
> > > > >
> > > > > Why introduce the extra typedef? S<A> should have a template
parameter
> > > > > entry pointing to A which points to int. The info should all be
there
> > > > > without any extra stuff. Or if you think something is missing,
please
> > > > > provide a more complete example.
> > > > My immediate concern here would be either loss of information or
bloat
> > > > when using that with type units (either bloat because each
instantiation
> > > > with differently spelled (but identical) parameters is treated as
a separate
> > > > type - or loss when the types are considered the same and all but
one are
> > > > dropped at link time)
> > > You'll need to unpack that more because I'm not following the
concern.
> > > If the typedefs are spelled differently, don't they count as
different types?
> > > DWARF wants to describe the program as-written, and there's no
S<int> written
> > > in the program.
> > >
> > > Maybe not in this TU, but possibly in another TU? Or by the user.
> > >
> > > void func(S<int>);
> > > ...
> > > typedef int A;
> > > S<A> s;
> > > func(s); // calls the same function
> > >
> > > The user probably wants to be able to call void func with S<int> or
S<A>
> > Sure.
> >
> > > (and, actually, in theory, with S<B> where B is another typedef of
int, but
> > > that'll /really/ require DWARF consumer support and/or new DWARF
wording).
> >
> > Not DWARF wording. DWARF doesn't say when you can and can't call
something;
> > that's a debugger feature and therefore a debugger decision.
> >
> What I mean is we'd need some new DWARF to help explain which types are
> equivalent (or the debugger would have to do a lot of spelunking to try
> to find structurally equivalent types - "S<B>" and "S<A>", go look
through
> their DW_TAG_template_type_params, see if they are typedefs to the same
> underlying type, etc... )
> >
> >
> > > We can't emit these as completely independent types - it would be
verbose
> > > (every instantiation with different typedefs would be a whole
separate type
> > > in the DWARF, not deduplicated by type units, etc) and wrong
> >
> > Yes, "typedef int A;" creates a synonym/alias not a new type, so S<A>
and S<int>
> > describe the same type from the C++ perspective, so you don't want two
complete
> > descriptions with different names, because that really would be
describing them
> > as separate types. What wrinkles my brow is having S<int> be the
"real"
> > description even though it isn't instantiated that way in the
program. I wonder
> > if it should be marked artificial... but if you do instantiate S<int>
in another
> > TU then you don't want that. Huh. It also seems weird to have this:
> > DW_TAG_typedef
> > DW_AT_name "S<A>"
> > DW_AT_type -> S<int>
> > but I seem to be coming around to thinking that's the most viable way
to have
> > a single actual instantiated type, and still have the correct names of
things

*mostly* correct; this still loses "A" as the type of the data member.

For the DW_TAG_template_type_parameter, you mean? No, it wouldn't.

(as a side note, if you do actually have a data member (or any other
mention) of the template parameter type, neither Clang nor GCC really get
that 'right' - "template<typename T> struct foo { T t; }; foo<int> f;" - in
both Clang and GCC, the type of the 't' member of foo<int> is a direct
reference to the "int" DIE, not to the DW_TAG_template_type_parameter for
"T" -> int)

Crud.
But I haven't come up with a way to get that back without basically
instantiating
S<A> and S<int> separately.

> >
> Yep - it's the only way I can think of giving this information in a way
that's
> likely to work with existing consumers. It would probably be harmless to
add
> DW_AT_artificial to the DW_TAG_typedef, if that's any help to any debug
info
> consumer.

Hmmm no, S<A> is not the artificial name;

It's not the artificial name, but it is an artificial typedef.

some debuggers treat DW_AT_artificial
as meaning "don't show this to the user."

In some sense that's what I want - we never wrote the typedef in the source
so I wouldn't want to see it rendered in the "list of typedefs" (or even
probably in the list of types, maybe).

But S<A> is the name we *do* want to
show to the user.

Maybe. Sometimes. But there could be many such aliases for the type. (&
many more that were never written in the source code, but are still valid
in the source language (every other typedef of int, every other way to name
the int type (decltype, etc)))

> That said, I'm not opposed to proposing something to DWARF to define
some more
> 'proper' way to describe this.

Yah. I've been thinking about the DW_AT_specification idea too, which
would be
something like this:
    DW_TAG_class_type
      DW_AT_name "S<A>"
      DW_AT_specification -> S<int>

      DW_TAG_template_type_parameter
        DW_AT_name "T"
        DW_AT_type -> A

The problem with this is you don't know where T is used in the template, so
you *still* don't know when to use A as the type of "field". Also it's kind
of an abuse of DW_AT_specification. If we can't get A as the type of
"field"
then the typedef is more straightforward and understandable.

It's still a lot of DWARF to emit for every way the user has named the
template & I'm not sure how much value it provides - are there use cases
you have in mind that would benefit from the increased fidelity of knowing
which template argument corresponds to the way the user wrote the type.

(& what would we emit if the user named the type in some other more exotic
way: int func(); template<typename T> struct S { }; ... S<decltype(func())>
s; )

From: David Blaikie [mailto:dblaikie@gmail.com]

From: David Blaikie [mailto:dblaikie@gmail.com]
> > From: David Blaikie [mailto:dblaikie@gmail.com]
> > > > > >
> > > > > > From the debugger's standpoint, the functional concern is that if you do
> > > > > something more real, like:
> > > > > >
> > > > > > typedef int A;
> > > > > > template <typename T>
> > > > > > struct S
> > > > > > {
> > > > > > T my_t;
> > > > > > };
> > > > > >
> > > > > > I want to make sure that the type of my_t is given as "A" not as "int".
> > > > > The reason for that is that it is not uncommon to have data formatters
> > > > > that trigger off the typedef name. This happens when you use some common
> > > > > underlying type like "int" but the value has some special meaning when it
> > > > > is formally an "A", and you want to use the data formatters to give it an
> > > > > appropriate presentation. Since the data formatters work by matching type
> > > > > name, starting from the most specific on down, it is important that the
> > > > > typedef name be preserved.
> > > > > >
> > > > > > However, it would be really odd to see:
> > > > > >
> > > > > > (lldb) expr -T -- my_s
> > > > > > (S<int>) $1 = {
> > > > > > (A) my_t = 5
> > > > > > }
> > > > > >
> > > > > > instead of:
> > > > > >
> > > > > > (lldb) expr -T -- my_s
> > > > > > (S<A>) $1 = {
> > > > > > (A) my_t = 5
> > > > > > }
> > > > > >
> > > > > > so I am in favor of presenting the template parameter type with the most
> > > > > specific name it was given in the overall template type name.
> > > > >
> > > > > OK, we get this wrong today. I’ll try to look into it.
> > > > >
> > > > > What’s your take on the debug info representation for the templated class
> > > > > type? The tentative patch introduces a typedef that declares S<A> as a
> > > > > typedef for S<int>. The typedef doesn’t exist in the code, thus I find it
> > > > > a bit of a lie to the debugger. I was more in favour of something like :
> > > > >
> > > > > DW_TAG_variable
> > > > > DW_AT_type: -> DW_TAG_structure_type
> > > > > DW_AT_name: S<A>
> > > > > DW_AT_specification: -> DW_TAG_structure_type
> > > > > DW_AT_name: S<int>
> > > > >
> > > > > This way the canonical type is kept in the debug information, and the
> > > > > declaration type is a real class type aliasing the canonical type. But I’m
> > > > > not sure debuggers can digest this kind of aliasing.
> > > > >
> > > > > Fred
> > > >
> > > > Why introduce the extra typedef? S<A> should have a template parameter
> > > > entry pointing to A which points to int. The info should all be there
> > > > without any extra stuff. Or if you think something is missing, please
> > > > provide a more complete example.
> > > My immediate concern here would be either loss of information or bloat
> > > when using that with type units (either bloat because each instantiation
> > > with differently spelled (but identical) parameters is treated as a separate
> > > type - or loss when the types are considered the same and all but one are
> > > dropped at link time)
> > You'll need to unpack that more because I'm not following the concern.
> > If the typedefs are spelled differently, don't they count as different types?
> > DWARF wants to describe the program as-written, and there's no S<int> written
> > in the program.
> >
> > Maybe not in this TU, but possibly in another TU? Or by the user.
> >
> > void func(S<int>);
> > ...
> > typedef int A;
> > S<A> s;
> > func(s); // calls the same function
> >
> > The user probably wants to be able to call void func with S<int> or S<A>
> Sure.
>
> > (and, actually, in theory, with S<B> where B is another typedef of int, but
> > that'll /really/ require DWARF consumer support and/or new DWARF wording).
>
> Not DWARF wording. DWARF doesn't say when you can and can't call something;
> that's a debugger feature and therefore a debugger decision.
>
What I mean is we'd need some new DWARF to help explain which types are
equivalent (or the debugger would have to do a lot of spelunking to try
to find structurally equivalent types - "S<B>" and "S<A>", go look through
their DW_TAG_template_type_params, see if they are typedefs to the same
underlying type, etc... )
>
>
> > We can't emit these as completely independent types - it would be verbose
> > (every instantiation with different typedefs would be a whole separate type
> > in the DWARF, not deduplicated by type units, etc) and wrong
>
> Yes, "typedef int A;" creates a synonym/alias not a new type, so S<A> and S<int>
> describe the same type from the C++ perspective, so you don't want two complete
> descriptions with different names, because that really would be describing them
> as separate types. What wrinkles my brow is having S<int> be the "real"
> description even though it isn't instantiated that way in the program. I wonder
> if it should be marked artificial... but if you do instantiate S<int> in another
> TU then you don't want that. Huh. It also seems weird to have this:
> DW_TAG_typedef
> DW_AT_name "S<A>"
> DW_AT_type -> S<int>
> but I seem to be coming around to thinking that's the most viable way to have
> a single actual instantiated type, and still have the correct names of things

*mostly* correct; this still loses "A" as the type of the data member.

For the DW_TAG_template_type_parameter, you mean? No, it wouldn't.

(as a side note, if you do actually have a data member (or any other mention) of
the template parameter type, neither Clang nor GCC really get that 'right' -
"template<typename T> struct foo { T t; }; foo<int> f;" - in both Clang and GCC,
the type of the 't' member of foo<int> is a direct reference to the "int" DIE, not
to the DW_TAG_template_type_parameter for "T" -> int)

Huh. And DWARF doesn't say you should point to the template_type_parameter...
I thought it did, but no. Okay, so nothing is lost, but it feels desirable
to me, that uses of the template parameter should cite it in the DWARF as well.
But I guess we can leave that part of the debate for another time.

Crud.
But I haven't come up with a way to get that back without basically instantiating
S<A> and S<int> separately.

>
Yep - it's the only way I can think of giving this information in a way that's
likely to work with existing consumers. It would probably be harmless to add
DW_AT_artificial to the DW_TAG_typedef, if that's any help to any debug info
consumer.

Hmmm no, S<A> is not the artificial name;

It's not the artificial name, but it is an artificial typedef.

If the source only says S<A>, then the entire S<int> description is artificial,
because *that's not what the user wrote*. So both the typedef and the class type
are artificial. Gah. Let's forget artificial here.

some debuggers treat DW_AT_artificial
as meaning "don't show this to the user."

In some sense that's what I want - we never wrote the typedef in the source
so I wouldn't want to see it rendered in the "list of typedefs" (or even
probably in the list of types, maybe).

But S<A> is the name we *do* want to
show to the user.

Maybe. Sometimes. But there could be many such aliases for the type. (& many
more that were never written in the source code, but are still valid in the
source language (every other typedef of int, every other way to name the int
type (decltype, etc)))

But you *lose* cases where the typedef is the *same* *everywhere*. And in
many cases that typedef is a valuable thing, not the trivial rename we've
been bandying about. This is a more real example:

typedef int int4 __attribute__((ext_vector_type(4)));
template<typename T> struct TypeTraits {};
template<>
struct TypeTraits<int4> {
  static unsigned MysteryNumber;
};
unsigned TypeTraits<int4>::MysteryNumber = 3U;

Displaying "TypeTraits<int __attribute__((ext_vector_type(4)))>" is much
worse than "TypeTraits<int4>" (and not just because it's shorter).
More to the point, having the debugger *complain* when the user says
something like "ptype TypeTraits<int4>" is a problem.

Reducing debug-info size is a worthy goal, but don't degrade the debugging
experience to get there.
--paulr