RFC: Extra source location information for dependent types

Template instantiation for class templates and types is starting to shape up. One problem I'm seeing now is that we don't retain enough source location information in the type system to do a good job with instantiation. For example, here's a C++ template metafunction:

   template<typename MetaFun, typename T>
   struct apply1 {
     typedef typename MetaFun::template apply<T>::type type;
   };

We have no source location information for that nested type "type" except the location of the declarator. This leads to poor placement of the '^' in error messages, e.g.,

test/SemaTemplate/metafun-apply.cpp:25:53: error: 'apply' following the 'template' keyword does not refer to a template
   typedef typename MetaFun::template apply<T>::type type;
                                                     ^
The caret is under the declarator ("type"), which is far away from the actual error. If this were in non-template code, the error message would point to "apply", but we lost that source information when we built the uniqued type.

I believe that we have to fix this problem, and to do so we're going to need to introduce more source locations into our representation of types. Unfortunately, adding source locations to types has some serious downsides:

   - The representation of each type gets larger
   - Since source locations will need to be part of each type's profile, we'll be creating many more unique types
   - With more unique types, we'll spend more time in the type-uniquing process.

For these reasons, I think we need to introduce source locations into the type system in a few places as possible. In particular, I suggest that source locations are only available in the dependent form of a type. Thus, the type used to represent "typename MetaFun::template apply<T>::type" will have source locations for the keyword "template", the template name "apply", and the trailing type "type" only if the types depend somehow on a template parameter. Hence, when we're outside of a template we won't run into any of the downsides described above.

Comments? Concerns?

  - Doug

Hey Doug,

Some comments below...

I believe that we have to fix this problem, and to do so we're going
to need to introduce more source locations into our representation of
types. Unfortunately, adding source locations to types has some
serious downsides:

  - The representation of each type gets larger
  - Since source locations will need to be part of each type's
profile, we'll be creating many more unique types
  - With more unique types, we'll spend more time in the type-
uniquing process.

Does this imply we could no longer compare two QualType's directly (to test for equivalence)? If not, Sema performance might also be effected. Since most types are never used, maybe this wouldn't be a big deal...

Since this info is only used for error diagnostics (for a small percentage of types), it would be great if we could generate this extra info lazily. This would still add some overhead to remember all the type refs, however it's worth considering. Since PTH has all the tokens & source locations pre-computed, maybe it could be extended to do token caching in general (which would support the lazy generation). Definitely not a quick hack:-)

For these reasons, I think we need to introduce source locations into
the type system in a few places as possible. In particular, I suggest
that source locations are only available in the dependent form of a
type. Thus, the type used to represent "typename MetaFun::template
apply<T>::type" will have source locations for the keyword "template",
the template name "apply", and the trailing type "type" only if the
types depend somehow on a template parameter. Hence, when we're
outside of a template we won't run into any of the downsides described
above.

As you probably know, we already have a couple types that aren't unique (VariableArrayType and DependentSizedArrayType). Both of these have SourceLocation info contained within their expression. That said, what specific Type classes do you anticipate adding source location information? If the classes are self-contained and don't effect the semantics of the type system in general, there may not be a need for any fancy infrastructure (or sweeping changes).

snaroff

Hey Doug,

Some comments below...

I believe that we have to fix this problem, and to do so we're going
to need to introduce more source locations into our representation of
types. Unfortunately, adding source locations to types has some
serious downsides:

- The representation of each type gets larger
- Since source locations will need to be part of each type's
profile, we'll be creating many more unique types
- With more unique types, we'll spend more time in the type-
uniquing process.

Does this imply we could no longer compare two QualType's directly (to test for equivalence)? If not, Sema performance might also be effected. Since most types are never used, maybe this wouldn't be a big deal...

No, we'll still be able to compare two (canonical) QualTypes directly. That's an extremely important aspect of our type system that we must not compromise.

Since this info is only used for error diagnostics (for a small percentage of types), it would be great if we could generate this extra info lazily. This would still add some overhead to remember all the type refs, however it's worth considering. Since PTH has all the tokens & source locations pre-computed, maybe it could be extended to do token caching in general (which would support the lazy generation). Definitely not a quick hack:-)

Interesting. We'd have to be able to re-parse non-trivial types (within templates, nonetheless!) to get this information. Not a quick hack, but it could be a good solution.

For these reasons, I think we need to introduce source locations into
the type system in a few places as possible. In particular, I suggest
that source locations are only available in the dependent form of a
type. Thus, the type used to represent "typename MetaFun::template
apply<T>::type" will have source locations for the keyword "template",
the template name "apply", and the trailing type "type" only if the
types depend somehow on a template parameter. Hence, when we're
outside of a template we won't run into any of the downsides described
above.

As you probably know, we already have a couple types that aren't unique (VariableArrayType and DependentSizedArrayType).

The latter of these will, eventually, need to be uniqued so that we can match template declarations, e.g.,

  template<int N> void foo(char (&str)[N]); // #1
  template<int M> void foo(char (&str)[N]) { ... } // #2

We need to be able to compare those DependentSizedArrayType nodes to realize that #2 matches #1.

Both of these have SourceLocation info contained within their expression. That said, what specific Type classes do you anticipate adding source location information? If the classes are self-contained and don't effect the semantics of the type system in general, there may not be a need for any fancy infrastructure (or sweeping changes).

We'll need to add SourceLocations to nearly every compound type, because you can build just about any type out of other, dependent types and we need the location information during template instantiation. This means everything from FunctionProtoTypes to PointerTypes would have SourceLocations.

  - Doug

Template instantiation for class templates and types is starting to
shape up. One problem I'm seeing now is that we don't retain enough
source location information in the type system to do a good job with
instantiation.

I'm fine with this on general principles, and actually have assumed that we would get this at some point someday anyway. I always assumed that it would take the form of an optional flag to sema that enables full source location info for all types. This would cause sema to create a "PointerTypeWithLoc" type instead of a PointerType node. The former would just be a sugared version of the type whose canonical version would desugar down to the pointer type, so all semantic clients would be fine. This would be incredibly useful for some clients, such as a refactoring tool.

As we've previously discussed this would also be a good way to shake out bugs where clients might be looking at the non-canonical version of a type accidentally: with this optional flag, no types would be canonical :slight_smile:

Both of these have SourceLocation info contained within their
expression. That said, what specific Type classes do you anticipate
adding source location information? If the classes are self-
contained and don't effect the semantics of the type system in
general, there may not be a need for any fancy infrastructure (or
sweeping changes).

We'll need to add SourceLocations to nearly every compound type,
because you can build just about any type out of other, dependent
types and we need the location information during template
instantiation. This means everything from FunctionProtoTypes to
PointerTypes would have SourceLocations.

I am really concerned about doing this unconditionally, and by default. I'm actually much less concerned about doing this for dependent types and those derived from a dependent type. Is there a constant time operation to tell if a random type is dependent?

It would probably be best to implement this as an optional opt-in thing as described above, and then consider turning it on for dependent types. What do you think?

-Chris

Template instantiation for class templates and types is starting to
shape up. One problem I'm seeing now is that we don't retain enough
source location information in the type system to do a good job with
instantiation.

I'm fine with this on general principles, and actually have assumed that we would get this at some point someday anyway. I always assumed that it would take the form of an optional flag to sema that enables full source location info for all types. This would cause sema to create a "PointerTypeWithLoc" type instead of a PointerType node. The former would just be a sugared version of the type whose canonical version would desugar down to the pointer type, so all semantic clients would be fine. This would be incredibly useful for some clients, such as a refactoring tool.

As much as I dislike the idea of having a flag that changes the AST representation, I do think this makes sense.

As we've previously discussed this would also be a good way to shake out bugs where clients might be looking at the non-canonical version of a type accidentally: with this optional flag, no types would be canonical :slight_smile:

Yep.

Both of these have SourceLocation info contained within their
expression. That said, what specific Type classes do you anticipate
adding source location information? If the classes are self-
contained and don't effect the semantics of the type system in
general, there may not be a need for any fancy infrastructure (or
sweeping changes).

We'll need to add SourceLocations to nearly every compound type,
because you can build just about any type out of other, dependent
types and we need the location information during template
instantiation. This means everything from FunctionProtoTypes to
PointerTypes would have SourceLocations.

I am really concerned about doing this unconditionally, and by default. I'm actually much less concerned about doing this for dependent types and those derived from a dependent type. Is there a constant time operation to tell if a random type is dependent?

Yes, isDependentType is constant time.

It would probably be best to implement this as an optional opt-in thing as described above, and then consider turning it on for dependent types. What do you think?

I think that makes sense. We'll want to do the performance tests each way and weigh the advantages/disadvantages.

   - Doug

Whether a type is dependent is cached in the field
clang::Type::Dependent, so it's very fast to check.

-Eli

I'm fine with this on general principles, and actually have assumed that we would get this at some point someday anyway. I always assumed that it would take the form of an optional flag to sema that enables full source location info for all types. This would cause sema to create a "PointerTypeWithLoc" type instead of a PointerType node. The former would just be a sugared version of the type whose canonical version would desugar down to the pointer type, so all semantic clients would be fine. This would be incredibly useful for some clients, such as a refactoring tool.

As much as I dislike the idea of having a flag that changes the AST representation, I do think this makes sense.

In a refactoring tool, this is very useful: you can parse the code the "fast" way to build whole program symbol databases etc. When you decide you need to do rewriting on a specific file, you can go back and reparse it in "full loc" mode, etc.

I am really concerned about doing this unconditionally, and by default. I'm actually much less concerned about doing this for dependent types and those derived from a dependent type. Is there a constant time operation to tell if a random type is dependent?

Yes, isDependentType is constant time.

Awesome.

It would probably be best to implement this as an optional opt-in thing as described above, and then consider turning it on for dependent types. What do you think?

I think that makes sense. We'll want to do the performance tests each way and weigh the advantages/disadvantages.

Sounds great!

-Chris