__PRETTY_FUNCTION__ output loses enum type info for non-enumerated values

A request to change Clang’s PRETTY_FUNCTION output to fix an issue.

The code used in pretty function is also used to generate debug info in
DWARF etc

.

(I believe). See, for instance the comment on this bug report:

https://sourceware.org/bugzilla/show_bug.cgi?id=21492#c1

Clang shows these arguments with the same DW_AT_name, X<1>, but they’re distinct types

https://godbolt.org/z/0Ik0Hl

Pretty function output, for auto… args(variadic just so that outputs display together):

template <auto… V>
constexpr char const* auto_name()
{
return PRETTY_FUNCTION;
}

enum e { a, b, c=b };
enum class E { a, b, c=b };

char const* nameU() { return auto_name<a, b, c, e(3)>(); }
char const* nameS() { return auto_name<E::a, E::b, E::c, E{3}>(); }

Gives output with these strings embedded;

nameU:

<a, b, b, 3>

nameS:

<E::a, E::b, E::b, 3>

The first two enumerated values, a & b, are returned with no loss of type info as a, b.

The next enumerated value, c, is a duplicate label for b’s value, and is reported as b.
The final value is not enumerated - there is no enum label with value 3 -

it is reported with an unnecessary loss of type info
as plain digits 3.

The non-enumerated value should be reported with a cast or constructor:

unscoped e: 3 → (e)3 or e(3)

scoped E: 3 → (E)3 or E(3) or E{3}

(init-list E{3} only for scoped enum and only since c++17)

Question 1:

Will it be acceptable to change the output for non-enumerated values as so?

This will retain type info and allow round-trip without loss.

Either C-style cast (e)3 or function-cast / type-constructor e(3) - seem ok
(gcc uses C-style casts and Clang uses this same style in source labels).

For the duplicate-valued labels case there is no loss of type info,
but there is a loss of information about the label used for the auto arg.
Presumably, the code works from the supplied underlying-type value
and picks the first label with the given value. I’d guess that Clang knows
the label used in a direct instantiation and could return it in that case.
(c.f. named member-pointer args have to retain the id for reporting).

Question 2:

Is it possible / acceptable to retain the label in the duplicate enum value case?

Can anyone point me to the relevant code?

Thanks, Will

A request to change Clang’s PRETTY_FUNCTION output to fix an issue.

The code used in pretty function is also used to generate debug info in
DWARF etc

.

(I believe). See, for instance the comment on this bug report:

21492 – string comparison of class name from DWARF leads to false ABI incompatibility report

Clang shows these arguments with the same DW_AT_name, X<1>, but they’re distinct types

https://godbolt.org/z/0Ik0Hl

Pretty function output, for auto… args(variadic just so that outputs display together):

template <auto… V>
constexpr char const* auto_name()
{
return PRETTY_FUNCTION;
}

enum e { a, b, c=b };
enum class E { a, b, c=b };

char const* nameU() { return auto_name<a, b, c, e(3)>(); }
char const* nameS() { return auto_name<E::a, E::b, E::c, E{3}>(); }

Gives output with these strings embedded;

nameU:

<a, b, b, 3>

nameS:

<E::a, E::b, E::b, 3>

The first two enumerated values, a & b, are returned with no loss of type info as a, b.

The next enumerated value, c, is a duplicate label for b’s value, and is reported as b.
The final value is not enumerated - there is no enum label with value 3 -

it is reported with an unnecessary loss of type info
as plain digits 3.

The non-enumerated value should be reported with a cast or constructor:

unscoped e: 3 → (e)3 or e(3)

scoped E: 3 → (E)3 or E(3) or E{3}

(init-list E{3} only for scoped enum and only since c++17)

Question 1:

Will it be acceptable to change the output for non-enumerated values as so?

In the case where the template parameter involves a deduced type (auto or decltype(auto) or eventually a deduced template specialization type), it would make a lot of sense to include the actual type in the printed template argument if it differs from the type of the formatted argument string. But I think we should leave out the type in all other cases, as it would make the template argument string more verbose without (necessarily) providing any more information.

This will retain type info and allow round-trip without loss.

Either C-style cast (e)3 or function-cast / type-constructor e(3) - seem ok
(gcc uses C-style casts and Clang uses this same style in source labels).

For the duplicate-valued labels case there is no loss of type info,
but there is a loss of information about the label used for the auto arg.
Presumably, the code works from the supplied underlying-type value
and picks the first label with the given value. I’d guess that Clang knows
the label used in a direct instantiation and could return it in that case.
(c.f. named member-pointer args have to retain the id for reporting).

In the case of duplicated enumerators, there simply is no distinction between ‘b’ and ‘c’ – they are both merely names for the value 1 of type ‘e’. auto_name() and auto_name() are the same function and therefore must return the same string.

Question 2:

Is it possible / acceptable to retain the label in the duplicate enum value case?

No, per the above, we cannot and must not do that.

Can anyone point me to the relevant code?

This is printIntegral, at the top of lib/AST/TemplateBase.cpp. The “else” case at the bottom would need to print out additional text representing the cast when needed.

Thanks Richard.

Ok; ignore Question 2 - you are absolutely right (what I was thinking).Printing different labels for the same type defeats the point of this request.

OTOH, Question 1 is a request for Clang to print different output for different types.

The linked bug report is an example of where it is important to distinguish types.

Here’s a summary of suggested output vs current output for Clang, GCC, MSVC.
https://godbolt.org/z/QEae7F

Plain, unscoped enum

enum e { a, b, c=b };
auto_name<a, e(1), c, e(3)>

Suggested: a, b, b, (e)3
Clang: a, b, b, 3
GCC: (e)0, (e)1, (e)1, (e)3
MSCV: 0, 1, 1, 3

Scoped / opaque enum:
enum class E { a, b, c=b };
auto_name<E::a, E{1}, E::c, E{3}>

Suggested: E::a, E::b, E::b, (E)3
Clang: E::a, E::b, E::b, 3
GCC: (E)0, (E)1, (E)1, (E)3
MSCV: 0, 1, 1, 3

None of these are ideal -

  • Clang loses type info for non-enumerated values.
  • GCC does not distinguish enumerated / non-enumerated.
  • MSVC loses type all info, does not distinguish.

Interestingly, it turns out that GCC’s behaviour is a bug!
(The intent of gcc code is to print the suggested output.
I intend to submit a bug report and a patch.)

You are concerned that this suggested change would

make the template argument string more verbose without (necessarily) providing any more information.

Yet the information is necessary in order to distinguish different types.
This change is needed for correctness. Verbosity is a secondary concern.
Clang’s current behaviour here could be treated as a bug
(it has been described as “seriously suboptimal” by a gcc dev).

For Clang, the increased verbosity is only for non-enumerated values.
Enum-templates would be expected to mostly use enumerated values.
Also, verbosity becomes more similar for enumerated / non-enumerated
the number of characters in the identifier label vs the number of digits.

I’m now preferring C-style cast notation over C++ function-style
as it simplifies output and parsing code.

Please reflect on this and consider - is this suggested change acceptable?

Thanks Richard.

Ok; ignore Question 2 - you are absolutely right (what I was thinking).Printing different labels for the same type defeats the point of this request.

OTOH, Question 1 is a request for Clang to print different output for different types.

I think you might be under the impression that I said that we shouldn’t address this, which was very much not my intent. What I tried to say was that we should emit a cast in the case of a deduced template parameter type, when the type of the argument expression we’d otherwise print is different from that of the parameter. That should be sufficient to ensure that Clang prints different outputs for different types. Note that special-casing enums is the wrong way to fix that problem, as the problem is not specific to enums. Consider, for example:

template struct X {};
X<(short)0> xa;
X<(long)0> xb;

The types of xa and xb both pretty-print as X<0> currently.

There are (at least) two natural ways to fix this: by always printing a cast when the type of the argument isn’t the same as that of the parameter (which is the unnecessary verbosity that I was concerned with, and is what we’d get if we generalized the proposed approach for enumerations to address the whole problem), or by doing something smarter that depends on whether the template parameter was deduced (which is what I’m suggesting is the right way to address this).

The linked bug report is an example of where it is important to distinguish types.

Here’s a summary of suggested output vs current output for Clang, GCC, MSVC.
https://godbolt.org/z/QEae7F

Plain, unscoped enum

enum e { a, b, c=b };
auto_name<a, e(1), c, e(3)>

Suggested: a, b, b, (e)3
Clang: a, b, b, 3
GCC: (e)0, (e)1, (e)1, (e)3
MSCV: 0, 1, 1, 3

Scoped / opaque enum:
enum class E { a, b, c=b };
auto_name<E::a, E{1}, E::c, E{3}>

Suggested: E::a, E::b, E::b, (E)3
Clang: E::a, E::b, E::b, 3
GCC: (E)0, (E)1, (E)1, (E)3
MSCV: 0, 1, 1, 3

None of these are ideal -

  • Clang loses type info for non-enumerated values.
  • GCC does not distinguish enumerated / non-enumerated.
  • MSVC loses type all info, does not distinguish.

Interestingly, it turns out that GCC’s behaviour is a bug!
(The intent of gcc code is to print the suggested output.
I intend to submit a bug report and a patch.)

You are concerned that this suggested change would

make the template argument string more verbose without (necessarily) providing any more information.

Yet the information is necessary in order to distinguish different types.
This change is needed for correctness.

For what it’s worth, I don’t see a correctness concern here. Anything assuming our pretty-printed type names are unique is wrong. The bug you referenced is a bug caused by a tool getting this wrong and using the pretty type name string inappropriately, and the contributors on that bug report seem happy to acknowledge that.

That said, as a quality of implementation matter, we should try to ensure our pretty-printed type names are unique anyway wherever possible.

Verbosity is a secondary concern.
Clang’s current behaviour here could be treated as a bug
(it has been described as “seriously suboptimal” by a gcc dev).

For Clang, the increased verbosity is only for non-enumerated values.
Enum-templates would be expected to mostly use enumerated values.
Also, verbosity becomes more similar for enumerated / non-enumerated
the number of characters in the identifier label vs the number of digits.

I’m now preferring C-style cast notation over C++ function-style
as it simplifies output and parsing code.

Please reflect on this and consider - is this suggested change acceptable?

As noted above, the problem you described, while valid, does not justify the solution you’re suggesting. However, for enums specifically, I think there’s another argument to consider (one that I think you’ve not made yet): passing a plain integer as a template argument to an enumeration template parameter won’t compile, so our pretty-printed type is not valid C++. It’s reasonable to consider how valuable it is to fix that, compared with the value of having a shorter pretty-printed type name, but as you point out, we’re already paying the cost of including the enum type name in the template argument in the case where an scoped enumeration is named, and an integer cast to enum type would be of comparable length. So I think the change you suggested is a good idea for that reason.

Thanks again Richard for a second more detailed response setting out the broader view

(I had failed to fully interpret / misinterpreted your first response as mild pushback
apologies for that; I should’ve asked for the clarification)
(also, I shouldn’t have argued ‘correctness’ here; that was disingenuous).

So, in the broader view, the question is:

How best to pretty print general non-type template args.

Now, as non-types may be subsumed by LiteralType literals in C++20
we should look ahead to pretty printing of such general literal types
as the broader context for discussion or decisions on changes.

Non-type template args
Until C++17:

  • an integral type;
  • an enumeration type;
  • std::nullptr_t; (since C++11)
  • a pointer type (to object or to function);
  • a pointer to member type (to member object or to member function);
  • lvalue reference type (to object or to function);
    For C++20:
  • A type that satisfies LiteralType, has strong structural equality …

Pretty-print design goals:

General: minimise special-casing
Round-trip: the printed value can be plugged back into code
Low-verbosity: avoid unnecessary info or repetition
more?

The general literal type can be one of the C++17 built-ins
as above (all scalar types, apart from lvalue reference I think)
or an aggregate or class that satisfies the LiteralType requirements.

So, in general, we will now have to deal with non-scalar types

implying multiple initializers and init-lists.

Question: is there a single ‘canonical’ init-list for a LiteralType?

This is a place where the proposal to use parens for aggregates will help,
as special-casing initialization syntax, for brace or paren, is awkward:

P0960 Allow initializing aggregates from a parenthesized list of values
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0960r1.html
If this proposal is accepted then that would suggest to prefer paren-init.

Literals of the existing types remain awkward non-types -
‘function-style’ paren-init doesn’t work directly for multi-token types
(like ‘long long’) or for compound types (pointer/ref).
An added decltype can solve that. Sticking with C-style cast is simpler.

Here’s a suggestion (assuming P0960 for non-scalars):

Scalar:
Known type T; print as ‘value’
Deduced type; print as ‘(T)value’
Non-scalar:
Known type T; print as ‘(init…)’
Deduced type; print as ‘T(init…)’

‘Known type’ means non-deduced or T implicit in value

(for values of scalar type like enumerator or pointer-to-member).

(lvalue reference remains a special case).

In practice, I don’t know if compilers can easily distinguish
whether a value’s type is known or deduced, at the leaf-level.
If not then this change will be more involved than just updating
low-level printing functions.

Now let’s focus back on integral and enumeration types.

Integral / enum types

Answering my own question:

Is there a single ‘canonical’ init-list for a LiteralType?

=> No.

If LiteralType was further required to be structured-bindable,

and to have a corresponding constructor for the bound types,
then the compiler could select that as the canonical constructor:

Given a value v of type T
auto [a,b,c…]{v} structured binding
T(a,b,c…) canonical construction, reconstructs v

This provides a unique default choice for pretty-print output.

The ‘reconstructing’ constructor is likely to optimise nicely.

A requirement to be structured-bindable seems natural
(despite structured-binding not being constexpr yet).

Aggregates satisfy the reconstructing constructor requirement.

It feels more onerous to require general LiteralType classes to
provide such a constructor.