DWARF: Preferred names in templates

Context: https://reviews.llvm.org/D91311

So, this preferred name feature is designed to print names in a more user-centric way (eg: “std::vector<std::string, …>” instead of “std::vector<std::basic_string<char, …>, …>”.

But this causes some divergence in the DWARF - the textual string says std::string, but the DW_TAG_template_type_parameter says std::basic_string<char…

This isn’t fundamentally problematic, kind of - there’s a bunch of ways the full string name of a template won’t match perfectly between producers and so consumers basically have to do some structural equivalence testing so far as I know. Though I’m not sure exactly how much - they could do it by normalizing the string (with GCC and LLVM’s default debug info don’t include structural descriptions of template parameters on template declarations - so consumers would have to do string normalization, rather than discarding the string argument representation and relying solely on the structural representation) in which case only a very advanced normalization that parsed std::string, did a lookup, resolved through typedefs and alias templates and then used the resulting string would succeed here. I haven’t tested gdb or lldb to see if/how they cope with this situation - but I would assume it’s not good.

So I think the only good solution here is to suppress use of preferred names when printing type names for debug info?

It might be nice to have use of preferred names (& maybe take it further - I have a prototype patch - and use the preferred names/types in the structural representation as well (which presumably would break mixed clang/gcc debug info with most consumers, I’d imagine - maybe it’d fall out OK for lldb when building ASTs)) under a flag? If you’re building the codebase with one compiler and/or you just want to do more experimentation with the feature? Not sure it’s worth it, but I think I have some reasonable attempt at this… (there’s one issue around cases of template declarations not carrying preferred names - discussed on the review itself)

Thoughts, feelings, perspectives?

  • Dave

Sony already has a private patch to not desugar type names in the DW_TAG_template_type_parameter name. IIRC we were finding that e.g. enums decayed to the underlying integer, which caused loss of useful information.

It kind of feels like D91311 is heading in a similar direction, and really ought to be using the preferred name in the parameter DIEs as well as the parent, for consistency.

Full disclosure, our debugger throws away the in the parent name and reconstructs it from the parameter DIEs. So, what the compiler emits in the parent name is not interesting to our debugger. (And wasn’t there a suggestion at some point to eliminate the in the parent anyway? Because they could lead to an explosion of really long strings in the debug info, and those strings aren’t particularly user-friendly in the first place.)

–paulr

Sony already has a private patch to not desugar type names in the DW_TAG_template_type_parameter name. IIRC we were finding that e.g. enums decayed to the underlying integer, which caused loss of useful information.

Ah, hmm. If you’ve got any particular reproductions I’d love to take a look at something like that.

I guess you don’t have any compatibility needs - between DWARF produced by different producers (old Clang + new Clang, GCC + Clang, etc)? Otherwise these differences, I would think, would make matching entities difficult? Or do you have code in the consumers to look through sugar (typedefs, alias templates, etc) when doing structural equivalence matching between entities described in different CUs? Some other solutions or reason that issue doesn’t come up in your context?

It kind of feels like D91311 is heading in a similar direction, and really ought to be using the preferred name in the parameter DIEs as well as the parent, for consistency.

Ish - but it’s one thing for diagnostics to do this, it’s a bit trickier for DWARF to do it. I agree we should resolve the consistency, but due to the need for consistency with other compilers (& even just older clang) I suspect the thing to do is to avoid the preferred name - so that structural equivalence can be tested without needing to know which types are transparent for the purpose of such equivalence, and which are not?

Full disclosure, our debugger throws away the in the parent name and reconstructs it from the parameter DIEs. So, what the compiler emits in the parent name is not interesting to our debugger.

Yeah, I suspect consumers have to do something like this - though I guess GDB and LLDB do some kind of canonicalization of the string representation (owing to the lack of structural representation on declarations).

(And wasn’t there a suggestion at some point to eliminate the in the parent anyway? Because they could lead to an explosion of really long strings in the debug info, and those strings aren’t particularly user-friendly in the first place.)

Yeah, that’s where this came up - I’ve developed a patch for clang that allows choosing one of 3 options (1) the way things are now, (2) strips certain template name strings of their template parameter lists or (3) instead of stripping those lists from those names, instead adds a prefix to flag the name as one that /should/ be able to be stripped. Then I have a patch to llvm-dwarfdump --verify that, upon detecting that prefix attempts to rebuild the full name and checks that it matches. So I found this mismatch when running this checking mode over clang build inside google, which uses libc++ (I’ve already got this checking mode clear on clang and llvm-dwarfdump build in my open source build (which happens not to use libc++), which is nice to know - modulo one issue with type suffixes on non-type template parameters, which are applied only in certain cases, it’s hard/impossible to detect those certain cases in DWARF, so I always add the suffixes and have a local patch to clang to always put them on the decorated name to get past these cases/not have them noisy-up the results)

  • Dave

Context: https://reviews.llvm.org/D91311

So, this preferred name feature is designed to print names in a more user-centric way (eg: “std::vector<std::string, …>” instead of “std::vector<std::basic_string<char, …>, …>”.

But this causes some divergence in the DWARF - the textual string says std::string, but the DW_TAG_template_type_parameter says std::basic_string<char…

This isn’t fundamentally problematic, kind of - there’s a bunch of ways the full string name of a template won’t match perfectly between producers and so consumers basically have to do some structural equivalence testing so far as I know. Though I’m not sure exactly how much - they could do it by normalizing the string (with GCC and LLVM’s default debug info don’t include structural descriptions of template parameters on template declarations - so consumers would have to do string normalization, rather than discarding the string argument representation and relying solely on the structural representation) in which case only a very advanced normalization that parsed std::string, did a lookup, resolved through typedefs and alias templates and then used the resulting string would succeed here. I haven’t tested gdb or lldb to see if/how they cope with this situation - but I would assume it’s not good.

So I think the only good solution here is to suppress use of preferred names when printing type names for debug info?

I agree that it seems like the solution is to not use preferred names for debug info.

David and I chatted offline and he was able to come up with a scenario that simulates the mixed debug info case where one compiler support preferred name and the other does not and indeed LLDB has problems in this case. From what I can tell this is because we are using the DW_AT_name from the parent, we don’t attempt to reconstruct the template parameters from the children’s DW_TAG_template_type_parameter.

Besides the fact that LLDB does not handle the mixed case well, it just seems more desirable to have consistent naming.

Context: https://reviews.llvm.org/D91311

So, this preferred name feature is designed to print names in a more user-centric way (eg: “std::vector<std::string, …>” instead of “std::vector<std::basic_string<char, …>, …>”.

But this causes some divergence in the DWARF - the textual string says std::string, but the DW_TAG_template_type_parameter says std::basic_string<char…

This isn’t fundamentally problematic, kind of - there’s a bunch of ways the full string name of a template won’t match perfectly between producers and so consumers basically have to do some structural equivalence testing so far as I know. Though I’m not sure exactly how much - they could do it by normalizing the string (with GCC and LLVM’s default debug info don’t include structural descriptions of template parameters on template declarations - so consumers would have to do string normalization, rather than discarding the string argument representation and relying solely on the structural representation) in which case only a very advanced normalization that parsed std::string, did a lookup, resolved through typedefs and alias templates and then used the resulting string would succeed here. I haven’t tested gdb or lldb to see if/how they cope with this situation - but I would assume it’s not good.

So I think the only good solution here is to suppress use of preferred names when printing type names for debug info?

I agree that it seems like the solution is to not use preferred names for debug info.

David and I chatted offline and he was able to come up with a scenario that simulates the mixed debug info case where one compiler support preferred name and the other does not and indeed LLDB has problems in this case. From what I can tell this is because we are using the DW_AT_name from the parent, we don’t attempt to reconstruct the template parameters from the children’s DW_TAG_template_type_parameter.

Besides the fact that LLDB does not handle the mixed case well, it just seems more desirable to have consistent naming.

Yep, finally coming back to this - I’ve disabled the use of preferred names in debug info here: https://github.com/llvm/llvm-project/commit/2ff049b12ee3fb60581835a28bf9d0acc1723f23