Recovering the spelling of a typedef

Hi Everyone,

We’re trying to integrate the CERN ROOT framework with Julia, both
of which use LLVM/Clang for C++ interoperability. As such, we’re hoping
to harmonize the versions of clang used in both projects. One major
obstacle to this currently is a patch that the ROOT folks are carrying
to support their I/O system which uses the structure of C++ classes to
determine the on-disk format. The patch as is isn’t really in a form
that could be submitted upstream, but we’re hoping to solicit some advice
to come up with a solution that would be acceptable to clang, and not
require any local code patches.

With that in mind, let us describe the problem:

As mentioned, ROOT uses the structure of C++ classes to determine it’s
IO format. The one wrinkle to that is that sometimes the I/O storage
format and the in-memory format are not exactly the same. In particular,
ROOT has a

typedef double Double32_t;

where if this typedef appears in a struct that is serialized to disk,
it indicates that it should be stored with 32bit precision on disk, but
with 64bit precision in memory.

That’s only for I/O information; for anything regarding symbols we
need these two to share their instantiation data.

I.e. we want to distinguish the types of D::m and
D<Double32_t>::m (and also D<vector<Double32_t>>::m and D<vector>::m) in

template
struct D {
using type = std::remove_reference;
T m;
static int s;
};

But &D::s must the the same as D<Double32_t>::s; more importantly:

void f(D);

must be called by f(D<Double32_t>{}). That is (IIRC) in contrast of what
the C++ committee discussed for strong typedefs.

Can you elaborate on how this typedef information is used for I/O? Do you mean that it is used by some clang plugin that examines the AST, or something else?

Yes, precisely. I am not fully versed in the details (Axel, Philippe, please correct any inaccuracies), but essentially you can request an object to be written to/ read from disk and ROOT will look up the corresponding class and compute the appropriate disk format (for which it needs to distinguish between double/Double32_t for any members). ROOT use a C++ Interpreter/JIT (custom one for a very long time, transitioning to LLVM/Clang) for interactivity and introspection, so it has the ASTs for all classes in the system available.

In simple cases, this information is already available as type sugar nodes. Consider this AST dump:

typedef double Double32_t;
struct Foo { Double32_t f; };

-TypedefDecl 0xd3af50 <t.cpp:1:1, col:16> col:16 referenced Double32_t ‘double’
-BuiltinType 0xd09d50 'double' -CXXRecordDecl 0xd3afa0 <line:2:1, col:28> col:8 struct Foo definition
-CXXRecordDecl 0xd3b0c0 <col:1, col:8> col:8 implicit struct Foo
`-FieldDecl 0xd3b190 <col:14, col:25> col:25 f ‘Double32_t’:‘double’

Template instantiation uses the canonical, desugared types, though. You can see it from this dump:

typedef double Double32_t;
template struct Bar { T f; };
template struct Bar<Double32_t>;

`-ClassTemplateSpecializationDecl 0xc3b490 <line:3:1, col:31> col:17 struct Bar definition

-TemplateArgument type ‘double’
-CXXRecordDecl 0xc3b688 prev 0xc3b490 <line:2:23, col:30> col:30 implicit struct Bar
`-FieldDecl 0xc3b758 <col:36, col:38> col:38 f ‘double’:‘double’

Does ROOT need a way to push the type sugar nodes through template instantiation? I seem to recall that there are reasons why it’s hard to do that from an implementation standpoint, but it would also help us get better diagnostics when rinsing “std::string” through a template type parameter, for example.

Yes, in the very simple cases, no patch is needed, but yes, ROOT needs
to be able to look through templates which is where the problem comes
in.

From: "Keno Fischer via cfe-dev" <cfe-dev@lists.llvm.org>
To: "Reid Kleckner" <rnk@google.com>
Cc: "clang developer list" <cfe-dev@lists.llvm.org>
Sent: Tuesday, July 26, 2016 1:02:40 PM
Subject: Re: [cfe-dev] Recovering the spelling of a typedef

Yes, in the very simple cases, no patch is needed, but yes, ROOT
needs
to be able to look through templates which is where the problem comes
in.

What does your patch do?

The core problem here is that if you have:

> typedef double Double32_t;
> template <typename T> struct Bar { T f; };
> template struct Bar<Double32_t>;

then Bar<Double32_t> and Bar<double> have the same instantiation. You can have lots of different names from many different contexts. How many of these do you track and which name do you want to use?

-Hal

Hi Reid,

Does ROOT need a way to push the type sugar nodes through template instantiation?

Yes absolutely. This is the essence of the patch (for the non-templated case indeed the information was already there).

We do have users using all sorts of ‘nesting’ whether it is in the template parameter or the types of the members.

For example:

template <class T> struct D
{
    using type = std::remove_reference<D>;
    std::map< UseKeyAdapt<T>, ValueKeyAdapt<T > m1;
    std::vector< T> m2;
    // Or maybe even
    // T<Double32_t> m3;
    // T<vector<Double32_t>> m4;
    static int s;
};

or

     D< vector<Double32_t> >

or any variation thereof (the more nesting and indirection, the harder it is to recover the information with first order help).  In all those cases, the user is expecting the underlying floating point to be stored on disk with single precision [Often the user need to carry the calculation in double precision to keep the errors as little as possible but at the end of the calculation, due to those errors, the result in only known with single precision and thus the user can safely reduce the size of the file in half by storing only the single precision]

Thanks,
Philippe.

Hi Hal,

as Philippe mentioned the patch is used to force through sugar nodes through template instantiation.
I think for the ROOT use case, one needs to be careful to only think about this in the context of starting from
of fields of a class/struct. I don’t think ROOT has any problem with re-doing the template instantiation when
it needs to compute the disk layout, but we would need to be sure that all the required information is indeed
retained and that there is an API for doing so.

Keno

From: "Keno Fischer" <kfischer@college.harvard.edu>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "clang developer list" <cfe-dev@lists.llvm.org>, "Reid Kleckner" <rnk@google.com>
Sent: Thursday, July 28, 2016 1:48:40 PM
Subject: Re: [cfe-dev] Recovering the spelling of a typedef

Hi Hal,

as Philippe mentioned the patch is used to force through sugar nodes
through template instantiation.
I think for the ROOT use case, one needs to be careful to only think
about this in the context of starting from
of fields of a class/struct. I don't think ROOT has any problem with
re-doing the template instantiation when
it needs to compute the disk layout, but we would need to be sure
that all the required information is indeed
retained and that there is an API for doing so.

I don't understand exactly how this would work. You might just need to produce the patch so we can discuss something concrete.

-Hal

Hi,

> I don't understand exactly how this would work. You might just need to produce the patch so we can discuss something concrete.

I attached the 3 commits that implements this patch (but are not in a ready to push upstream state :)). Essentially what it does is add support for the ability to instantiate a template based on a typedef and have this typedef being propagated all the way through. The code in the rest of clang still would not use this ability.

> then Bar<Double32_t> and Bar<double> have the same instantiation.
> You can have lots of different names from many different contexts. How many of these do you track and which name do you want to use?

In our own code, we keep track of what the user requested, for example (s)he may have requested any of:

    Foo<double,double>
    Foo<Double32_t,double>
    Foo<double,Double32_t>
    For<Double32_t,Double32_t>

We keep one representation of the class for each of those instantiation selected by the users. To generate those instantation, we explicitly construct (or tweak) a TemplateParameterList and call for example Sema::SubstDefaultTemplateArgumentIfAvailable [This is because we also need the requested type that contains the type to be reflected in the default paramater]

Cheers,
Philippe.

0001-Intentionally-ugly-but-minimally-invasive-hack.patch (11.2 KB)

0002-Add-support-for-default-template-parameters-that-con.patch (3.64 KB)

0003-Extend-SubstTemplateTypeParmType-to-support-non-cano.patch (2.86 KB)

Instantiating the same template multiple times with canonically-equivalent template arguments with different type sugar will lead to an incoherent AST; I don’t see any way we can support that.

Also, your requirements do not appear to be coherent. You want T<Double32_t> and T to be the same type, and also be distinguishable. So what happens here:

template T f(T, T);
auto a = f(T<Double32_t>(), T());

? We need to make an (at best) arbitrary choice. So, while we may be able to improve the situation for you, you need to accept that what you’re asking for is fundamentally best-effort, rather than a sound extension to the language.

With that in mind, it seems to me that the problem you’re seeing is loss of type sugar when forming the type of a member of a class template specialization. Specifically, given ‘T<Double32_t>’, clang preserves the type sugar, but once you access the ‘m’ member, the type information is taken solely from the instantiation, and the type sugar is gone.

However, the type sugar is not entirely gone: the type of ‘m’ in this case is not ‘double’, it’s a type sugar node that says the type is canonically double, but non-canonically it’s the template type parameter at depth 0, index 0. So, when forming the type of the expression ‘T<Double32_t>::m’, we could perform a resugaring step, where we would walk the type of ‘m’ and replace each SubstTemplateTypeParmType with one that records the sugared template argument from ‘T<Double32_t>’. That should allow you to preserve the difference between double and Double32_t across template instantiation in more cases, and improve our diagnostic quality too.

Some of what your patches do seem like good steps in this direction; in particular, we would need to allow SubstTemplateTypeParmType to have a non-canonical sugared substituted type in addition to its canonical type.

Hi Richard,

template T f(T, T);
auto a = f(T<Double32_t>(), T());

? We need to make an (at best) arbitrary choice.

Indeed in this case, the result can only be ambiguous :(. Glad-fully, since the intent of the tag is for persistency, I expect that the user would seldom rely on auto to define the type and thus most source of ambiguity are likely to be avoided.

So, while we may be able to improve the situation for you, you need to accept that what you’re asking for is fundamentally best-effort, rather than a sound extension to the language.

I absolutely agree :slight_smile:

With that in mind, it seems to me that the problem you’re seeing is loss of type sugar when forming the type of a member of a class template specialization.

Yes and a few more cases, including the type of the template parameters when they are defaulted.

So, when forming the type of the expression ‘T<Double32_t>::m’, we could perform a resugaring step, where we would walk the type of ‘m’

Where it gets a bit complicated to do resugaring is when there are some layer of indirection. For example a case like:

template class Wrapper
{
Collection fMainData;
typename Collection::value_t fMaxValue;
std::vectorCollection::value_t fInterestingValues;
};
Wrapper<vector<Double32_t> > userData;

where userData.fMaxValue and userData.fInterestingValues ought to have the type Double32_t and vector<Double32_t> respectively.

Thanks,
Philippe.