Type matcher madness

Given Sam's open review, I've been pondering how to implement a type
matcher (so far we've punted on that).
The obvious idea would be to have a type() function that returns a
Matcher<QualType>. Apart from some technical hoops to jump through I
have a patch ready for that.

This leads to a different design decision, though. We'll want type()
to return a BindableMatcher, and for that we need to implement type
binding. This is not quite straight forward, as types are split into
QualType and Type nodes in the AST, and only the Type nodes model the
inheritance hierarchy. My understanding is that this is for
performance reasons, so that every Type* exists only once, and
equality on types can be implemented by a pointer comparison.

The question is: how much of that do we want to express in the matcher language.
I see two main options:
1. Fully model that relationship. When somebody writes matchers for
types, they will look like this:
qualType(arrayType(hasElementType(qualType(asString("int"))))

2. Model QualTypes in the matchers, and allow going "through" to the
underlying type; this will kill the type safety for the matchers, but
will make the matcher expressions more concise:
arrayType(hasElementType(asString("int")))

The problem is that this means in (2):
type(hasElementType(asString("int")))
must be allowed by the type system, even though not every type is an array type

Thoughts?
/Manuel

I would have expected to be able to type arrayType(“int”) this seems possible with a bit of overloading and/or implicit conversions. For example:

inline auto qualType(StringRef s) → decltype(qualType(asString(s))) { return qualType(asString(s)); }

inline auto hasElement(StringRef s) → decltype(hasElement(qualType(s))) { return hasElement(qualType(s)); }

inline auto arrayType(StringRef s) → decltype(arrayType(hasElement(s))) { return arrayType(hasElement(s)); }

would trivially expand arrayType(“int”) into arrayType(hasElement(qualType(asString(StringRef(“int”))))) auto-magically. And it’s much nicer for the user/reader.

On the other hand, if having a qualType for the array is necessary, then I am not opposed to qualType(arrayType(“int”)) and this should patch your hole whilst still providing a succinct interface…

… even though it might be simpler to just have arrayType return a bindable “QualType” directly, would it not ?

– Matthieu

Is there a fundamental reason why we can’t have the best of both? That is, a matcher which matches Type or QualType, and constrains its inner matchers to match ArrayType.

+mdiamond, who is interested in contributing here

It seems to me that the distinction between QualType and Type is an implementation detail that we shouldn’t bother users with. If we make everything a QualType, the users will have the same functionality in a more easily readable format. I don’t really see any disadvantages to this approach. You mentioned that this would “kill the type safety for the matchers”–what did you mean exactly?

It seems to me that the distinction between QualType and Type is an
implementation detail that we shouldn't bother users with. If we make
everything a QualType, the users will have the same functionality in a more
easily readable format. I don't really see any disadvantages to this
approach. You mentioned that this would "kill the type safety for the
matchers"--what did you mean exactly?

The question is specifically: how do we intend to do the binding, and
what types will the matchers have.

One possibility:

BindableMatcher<QualType> type(Matcher<QualType>...)

Then we implement all things that match on a specific type to go
through QualType. But, to get type safety, we'll want to have matchers
that actually break compilation if you use them in a wrong context,
like:

Matcher<RecordType> hasConstFields();

If we only use QualType, as far as I can tell we cannot do that.

Thoughts?
/Manuel

We discussed this offline a bit, and we basically came to the conclusion that we should support both Type and QualType with auto-conversions from Matcher to Matcher. We will only support binding QualTypes, but this shouldn’t be a big deal since every Type is wrapped by a QualType. Are there any objections to this approach?

We discussed this offline a bit, and we basically came to the conclusion
that we should support both Type and QualType with auto-conversions from
Matcher<Type> to Matcher<QualType>. We will only support binding QualTypes,
but this shouldn't be a big deal since every Type is wrapped by a QualType.
Are there any objections to this approach?

So, Chandler got curious which led to a longish design argument :slight_smile:

We noticed a problem with the automatic conversion, specifically given
for example:
hasType(pointerType(pointee(intType())))

... and given that type node matchers will in general have similar
signatures as other matchers, so we'd have something like:
Matcher<Stmt> hasType(Matcher<QualType>)
Matcher<Type> pointerType(Matcher<PointerType>)
Matcher<PointerType> pointee(Matcher<QualType>)
Matcher<QualType> intType()

Now Chandler pointed out that the interesting part is that there's a
difference how pointer qualifications work on the different levels of
the stack. More specifically, clang's type hierarchy is designed so
that type equality is given by Type-pointer-equality, which leads to
interesting attributes for composite types.

The gist of the argument was, that the example expression above should
match an int * const, but shouldn't match a const int *, as the inner
qualification difference would make the outer types differ.

My main concern was that users will not expect this, as this runs
counter to how matcher work normally - if you do *not* specify
something, by default you'll get what matches *any* value.

Now we'll only see what problems users have when we actually start
giving them something to play around with. So I think the argument
that erring on the side of a stronger match is the safer bet is
actually a good one.

One thing that I'm not sure we clarified in the end, is that I still
see two options (given that we don't want the above example to match a
const int *):
Do we want it to match a int * const, or not. I'd rather keep the
outermost layer symmetric to the inside, but I understand that there
is the argument that they are not symmetric in the language, and thus
shouldn't be in the matchers.

The way to have the outermost layer behave differently, would be to
say something like:
Matcher<Stmt> hasType(RelaxedQualTypeMatcher)
where an implicit conversion would convert a Matcher<Type> into a
RelaxedQualTypeMatcher, which would then be equivalent to writing
hasType(qualified(...)) where qualified has the signature:
Matcher<QualType> qualified(Matcher<Type>)

If we could live with the default behavior on the outside being strict
matching we wouldn't need that and could simply use Matcher<Type> ->
Matcher<QualType> conversion to mean "only matches the type without
any qualifiers.

Thoughts?
/Manuel