Hi everyone!
I have been working on the Clang GSoC project which has the objective of improving the situation with regards of loss of type sugar when dealing with template specializations.
The problem we are trying to solve here is old and well-known:
template <class T> auto foo(T) -> T;
int x = foo(std::string("hello"));
On the last line you would get this diagnostic: error: no viable conversion from 'std::basic_string<char>' to 'int'
Ie the infamous basic_string
, which you never wrote and had no reason to even know existed, appearing on the diagnostics.
The problem here is that we always instantiate templates using the āsimplestā type, stripped of all decorations, when instantiating a template.
std::string
is just a typedef to std::basic_string<char>
, and typedefs are always just a decoration, which I will refer here on by the term ātype sugarā.
The simplest type, which I will refer here on by the term ācanonical typeā, is used to instantiate the template, and the important reason for that is performance: we donāt want a different instance for each possible way to decorate a type, because it would be slow and waste much more memory.
Suppose the user created his own type alias: using MyString = std::basic_string<char>
.
We donāt want foo<std:::string>
and foo<MyString>
to refer to two different instances of the function template.
Note that clang already does have a hacky, incomplete solution for a very narrow set of this problem:
The clang::preferred_name
attribute.
It allows you to, with a lot of inconvenience as described in the link, associate, to the template definition, type aliases to particular specializations which will be used by the type printer as the preferred name of that specialization. libc++ uses this attribute, and it notably though that canāt deal with the MyString
case: It will print std::string
instead.
Alright, so with problem and limited existing solutions explained, I hereby explain my proposal.
We will implement a new pass that will add back the lost type sugar when performing member access on a template specialization.
First though, note that we will not change the situation with regards to the template instantiation themselves, they will still only be performed with the canonical type. Ie when instantiating this template with std::string
:
template <class T> auto foo() {
int n = T();
}
We will still get the diagnostic: error: no viable conversion from 'std::basic_string<char>' to 'int'
.
But we will fix the diagnostic from the initial example so it will print: no viable conversion from 'std::string' (aka 'basic_string<char>') to 'int'
.
To implement this new pass, we will leverage an existing feature of the AST: When Clang performs substitution as part of template instantiation, it leaves behind type sugar nodes which indicate what was substituted, ie what template parameter was replaced by what canonical type.
Implementing this as a new kind of TreeTransform, upon member access we will build a naming context used to map template parameters into sugared template arguments, and upon encountering these substitution nodes, we will get the replaced
type, look it up in the naming context, and replace the replacement
type again with the sugared argument we found, rebuilding the whole type.
Note that this pass will only rebuild types, not expressions, but we will hook this up in the right places in Sema so that member access expressions will benefit from this, Ie the expression will be built with the type already resugared, instead of resugaring the expression itself later.
Note as well that we will never emit new declarations as part of resugaring, in order to avoid their heavy cost.
Example: when resugaring a typedef instantiated from a dependent one, instead of emitting a TypedefDecl redeclaration with the new resugared type, we will modify TypedefType so that it can carry an underlying type different from the declared one, but still canonically the same.
The work in progress implementation here is in an advanced state feature wise, which I wanted to get to before writing this RFC so that folks would have something to toy around to explore the effects we want to achieve.
The compiler explorer team added an experimental Clang variant based upon my branch (kudos!) and here
is the link to a small demo: https://godbolt.org/z/o8aePToMf
It still has a few bugs, and has not been worked on from the performance side yet.
Right now this resugaring is always performed, eagerly, unless turned off by a frontend flag, as you can see on the demo.
There is also the WIP PR I have been working this feature on: https://reviews.llvm.org/D127695
The PR is not clean enough for detailed review (please donāt complain about strew around dumps and other litter!), but I think the big picture is already there and can be discussed.
There is lots that can be discussed on the performance side:
- Do we do it eagerly or lazily? Lazily would make sense if we wanted to perform this resugaring only for diagnostics. Though we would have to pack more information around in the AST, increasing memory use. But some users, like the ROOT project, want this functionality so they can use type sugar with semantics. This can also be used to preserve a Typedef which has attributes like
gnu::aligned
attached. Other such examples are the magic typedefs that ObjC specifies (NSInteger and friends). One problem perhaps that makes those two languages cooperate badly when mixed together in ObjC++ is that these are not preserved in templates. If this resugaring had to be performed for every type so that it survives to CodeGen, lazily would not buy much and would perhaps even be detrimental, so this would at least need to be controlled by a switch. - We can implement a new kind of dependence, substitution dependence, so that we can avoid recursing into types which donāt contain substitution nodes.
- We can implement caching to avoid repeated resugaring of the same thing.
Note that the patch is getting big, and there is also a stack in there with other interesting stuff that needs to progress as well, and only Richard Smith has been reviewing the stack consistently, and even then I think he hasnāt even looked at the main patch yet, because as you know he is quite busy and itās maybe a bit unfair to put all of this burden on his desk
So review volunteers are certainly welcome, we have about two months to merge this whole thing
But even just testing this, and noticing bugs and other limitations I might have missed is certainly helpful as well. So I can already consider feature requests and bug reports. Specially if they already come with reduced test cases
Note that I have been working on type sugar preservation for some time, including some of the prerequisites here.
Last year we merged https://reviews.llvm.org/D110216, which modified the type deduction machinery to preserve type sugar. That already improved diagnostics quite a lot, and was a prerequisite so that we can have type sugar which to resugar with. That patch also implemented retention of type sugar for auto
deduction, which previously would only keep the canonical type as well.
There is still work left to complete this, that you can find on the stack. For example, we donāt have a mechanism in Clang to combine the type sugar of different deductions, so one current limitation is that we will arbitrarily pick the type sugar of the value of the first return statement in the case of return type deduction.
That mechanism is implemented by https://reviews.llvm.org/D111283 and expanded by https://reviews.llvm.org/D130308, and then on https://reviews.llvm.org/D111509 we hook this mechanism into many other places where we had two types we wanted to reduce to one, but were making arbitrary choices about what sugar to preserve.
So there is still lots that can be talked about here, design decisions and implementation choices, but I think this post is big enough!
Thanks everyone! Special thanks to the mentors @vvassilev and @zygoloid !
PS:
One frequently asked question I get about this is, what is the relationship of this with āStrong Typedefsā?
Well, that depends on what one means by āStrong Typedefā. Without a concrete proposal, I think in general the term refers to some extension of the kind of aliasing you can get with enum classes, Ie a kind of type alias which can implicitly or explicitly convert to/from the underlying type, but that represents a different type in the canonical sense, so that it can be overloaded on and such.
Though Resugaring deals with type sugar: We are resugaring a type, to produce another one but which is still canonically the same, so that they canāt overload each other really.
The one connection here is that, if we had a Strong Typedef implemented, and this applies to enums as well, and it was instantiated from a declaration with a dependent underlying type, then we can resugar it so that it appears as if the underlying type had the lost type sugar added back when accessing it as explained above.
If instead Strong Typedef
is understood in colloquial terms, then we are making Typedefs stronger because we are not losing them around so much