Identifiying (successfully resolved) overloaded function references.

Hello.

We would like to identify the DeclRefExpr and MemberExpr AST nodes
that have been created by looking up an overloaded function reference.
As far as we can tell, there is no way to extract this information
from the current AST. (Is such an impression correct?)

Hence, we would add a Boolean bitfield flag to AST nodes
    DeclRefExpr and MemberExpr
so as to be able to recognize those that originally were
    UnresolvedLookupExpr and UnresolvedMemberExpr
and have been changed by Sema method
    Sema::FixOverloadedFunctionReference().

The new bitfield will be called, e.g.,
IsResolvedOverloadedFunctionReference.

The addition of the Boolean flag is not going to impact on the memory
footprint of the AST, because for both DeclRefExpr and MemberExpr nodes
the new flag can be packed with 3 already existing Boolean bitfields.

Would such a change be welcome?

Regards,
Enea.

We would like to identify the DeclRefExpr and MemberExpr AST nodes
that have been created by looking up an overloaded function reference.
As far as we can tell, there is no way to extract this information
from the current AST. (Is such an impression correct?)

Yes, this information is lost because the OverloadExpr is thrown
away during resolution. It wouldn't be difficult to preserve it, though,
maybe in some sort of side table on the ASTContext.

That won't catch every instance of overload resolution, though,
because operators are resolved without creating intermediate notes.
Is that acceptable?

Hence, we would add a Boolean bitfield flag to AST nodes
   DeclRefExpr and MemberExpr
so as to be able to recognize those that originally were
   UnresolvedLookupExpr and UnresolvedMemberExpr
and have been changed by Sema method
   Sema::FixOverloadedFunctionReference().

The new bitfield will be called, e.g.,
IsResolvedOverloadedFunctionReference.

The addition of the Boolean flag is not going to impact on the memory
footprint of the AST, because for both DeclRefExpr and MemberExpr nodes
the new flag can be packed with 3 already existing Boolean bitfields.

I have no problem with this.

John.

We would like to identify the DeclRefExpr and MemberExpr AST nodes
that have been created by looking up an overloaded function reference.
As far as we can tell, there is no way to extract this information
from the current AST. (Is such an impression correct?)

Yes, this information is lost because the OverloadExpr is thrown
away during resolution. It wouldn't be difficult to preserve it, though,
maybe in some sort of side table on the ASTContext.

That won't catch every instance of overload resolution, though,
because operators are resolved without creating intermediate notes.
Is that acceptable?

Good point.

Besides operators, we were also missing the case of constructor calls
implied by CXXConstructExpr (and derived node CXXTemporaryObjectExpr)
and CXXNewExpr nodes: these too require an additional bit (again, no
space consumed).

As for the side table in the ASTContext: that sounds really interesting
and we may consider that option in the near future.
For our immediate needs, it should be enough to track the (successfully
resolved) function/method references that had more than a single
candidate. So we opted for a bitfield named "HadMultipleCandidates".

In order to track the value of the flag from the place where overloading
is resolved to the place where AST nodes are created, the flag needs to
be added to helper classes InitializationSequence and
UserDefinedConversionSequence.

Please find attached a patch for review.

Regards,
Enea.

HadMultipleCandidates.patch (47.7 KB)

We would like to identify the DeclRefExpr and MemberExpr AST nodes
that have been created by looking up an overloaded function reference.
As far as we can tell, there is no way to extract this information
from the current AST. (Is such an impression correct?)

[...]

Please find attached a patch for review.

No comments?
OK to commit?

Enea.

It looks fine to me if you're sure this is useful to you. Please do remove
it if you find that it doesn't actually help.

John.

OK, committed in r141171.

Regarding its usefulness, our immediate use of this flag is to
distinguish those explicit casts applied to function (pointer)
references that are meant to "implement" overload resolution.

/***********************************************
Start of example that can be skipped if not really interested in
details, but please do read below example if skipping
***********************************************/

As an example, consider the following chunk of code:

bool operator< (const S&, const S&);
bool operator< (const T&, const T&);

typedef bool (*CMP)(const S&, const S&);

template <typename Iter, typename Pred>
bool is_sorted(Iter first, Iter last, Pred cmp);

void test(const std::vector<S>& vs) {
  is_sorted(vs.begin(), vs.end(), static_cast<CMP>(operator<));
}

Here the static_cast is needed, because there are multiple candidates
when resolving "operator<". Otherwise, if there were a single candidate,
then the cast would not have been needed.

The AST in clang (for the relevant portion of code) for both the cases
(single candidate or multiple candidates) is

    (CXXStaticCastExpr 0x444c960 <col:35, col:61> 'CMP':'_Bool (*)(const
struct S &, const struct S &)' static_cast<CMP> <NoOp>
      (ImplicitCastExpr 0x444c948 <col:52, <invalid sloc>> '_Bool
(*)(const struct S &, const struct S &)' <FunctionToPointerDecay>
        (DeclRefExpr 0x444c920 <col:52, <invalid sloc>> '_Bool (const
struct S &, const struct S &)' lvalue Function 0x442f1d0 'operator<'
'_Bool (const struct S &, const struct S &)')))))

In the two cases, we have two NoOp explicit casts and the added flag
allows us to distinguish between them.

/***********************************************
End of example that can be skipped
***********************************************/

Would there be objections to the "splitting" of the current NoOp cast
kind into a few, slightly different flavors of NoOp providing a clearer
distinction, that will be useful for source code based tools?

For instance, in the case of the example above, the explicit NoOp
static_cast kind could be replaced (when there are indeed multiple
candidates) by an "OverloadResolution" cast kind.

Another example is that of (usually implicit) NoOp casts resulting from
qualification conversions, i.e., when the "const" qualifier is added to
a modifiable lvalue argument expression passed to a function by const&:
in this case, we would opt for a "QualificationConversion" cast kind.

The final goal would be to have a NoOp cast kind only for those casts
that are NoOp under all points of view: no change for the type, no
change for the value and no change for the lvalue-ness of the argument
expression.

Clearly, the new "source code based" cast kinds will be treated in
other compilation phases as if they were plain NoOp.
(If deemed useful, for clarity, we can even uses names such as
NoOp_OverloadResolution and NoOp_QualificationConversion).

Opinions?

Enea.

I have no problem with sub-dividing NoOp, but I seem to remember
Doug having opinions in this area, so I've CC'ed him.

John.

Would there be objections to the "splitting" of the current NoOp cast
kind into a few, slightly different flavors of NoOp providing a clearer
distinction, that will be useful for source code based tools?

For instance, in the case of the example above, the explicit NoOp
static_cast kind could be replaced (when there are indeed multiple
candidates) by an "OverloadResolution" cast kind.

Another example is that of (usually implicit) NoOp casts resulting from
qualification conversions, i.e., when the "const" qualifier is added to
a modifiable lvalue argument expression passed to a function by const&:
in this case, we would opt for a "QualificationConversion" cast kind.

The final goal would be to have a NoOp cast kind only for those casts
that are NoOp under all points of view: no change for the type, no
change for the value and no change for the lvalue-ness of the argument
expression.

Clearly, the new "source code based" cast kinds will be treated in
other compilation phases as if they were plain NoOp.
(If deemed useful, for clarity, we can even uses names such as
NoOp_OverloadResolution and NoOp_QualificationConversion).

I have no problem with sub-dividing NoOp, but I seem to remember
Doug having opinions in this area, so I've CC'ed him.

I'm fine with subdividing NoOp further.