Strange AST behavior with C++ casts

Hello @clang.

I'm working on a fix for a strange behavior of clang's
AST regarding casts in C++.

This issue has been pointed out before, mainly
by Abramo Bagnara and Enea Zaffanella, but
I'll briefly recap. Consider the following code:

int func() {
return (int)0.5; // but also int(0.5) or static_cast<int>(0.5)
}

For this function, clang produces this AST:
int func() (CompoundStmt
(ReturnStmt
  (CStyleCastExpr 'int' functional cast to int <NoOp> // Or CXXFunctionalCastExpr or CXXStaticCastExpr
    (ImplicitCastExpr 'int' <FloatingToIntegral>
      (FloatingLiteral 'double' 5.000000e-01)))))

As you can see, the AST node for the explicit CStyle cast (or the static_cast, or the functional cast) is marked as
NoOp, and the actual cast seems to be performed by an implicit cast expression that pops out of nowhere.

What we would like to have is an AST like this, that is also what happens for a c-style cast in C mode:

int func() (CompoundStmt
(ReturnStmt
  (CStyleCastExpr 'int' functional cast to int <FloatingToIntegral>
    (FloatingLiteral 'double' 5.000000e-01)))))

The main point here is that there seems to be an extra and useless node in the AST.
This behavior introduces problems in some applications where we want, for example, to warn the user on
useless casts or force the user to not write code that involves implicit casts. In general, this breaks clients that want
an high AST source-code fidelity.

The first question that arises is: is this behavior intended? If so, why?

Anyway, as I've inspected the code it seems to be accidental. The "wrong" behavior comes
from TryStaticImplicitCast(), that is indirectly called by CheckStaticCast().
IMHO, the function name already suggests that something's wrong, as in this case
we are checking an "explicit" cast.

This function tries to see if a static cast is applicable, and to do so, builds an initialization sequence that when performed creates the wrong ImplicitCastExpr node. It generally works as expected because,
as pointed out by a nearby comment, a static_cast of expression e to type T can be
done if "T t(e);" is well-formed. However, I think the initialization shouldn't really be performed
as if that code had really been written. At this point, InitializationSequence::Perform() should
have been told that it's an explicit cast.

So, to recap:
- Have I understood correctly why this issue happens?
- Is it ok if I try to fix it? i.e. will this change in the AST cause problems in other parts of the compiler? (in this case, I would also fix them of course)
- Any suggestions to how to fix the issue?

Thank you,
Nicola

This function tries to see if a static cast is applicable, and to do so, builds an initialization sequence that when performed creates the wrong ImplicitCastExpr node. It generally works as expected because,
as pointed out by a nearby comment, a static_cast of expression e to type T can be
done if "T t(e);" is well-formed. However, I think the initialization shouldn't really be performed
as if that code had really been written. At this point, InitializationSequence::Perform() should
have been told that it's an explicit cast.

It has been; it's part of the InitializedEntity.

So, to recap:
- Have I understood correctly why this issue happens?

Well, it's not accidental. C's semantics are easy enough to check with a function that, logically, just computes a cast kind for the conversion, adding extra implicit casts in the rare cases where they're required. C++'s semantics aren't that simple; we can generate a lot of expressions for one of those casts, and the outermost node is not always a cast at all.

- Is it ok if I try to fix it? i.e. will this change in the AST cause problems in other parts of the compiler? (in this case, I would also fix them of course)

- Any suggestions to how to fix the issue?

Well, you could teach the explicit cast code to strip off an outermost ImplicitCastExpr and use its semantics for the ExplicitCastExpr; that would generate tidier ASTs in some common cases. It still won't stop your code from potentially needing to handle arbitrary stuff "implementing" a NoOp explicit cast, though.

John.

Hello @clang.

I'm working on a fix for a strange behavior of clang's
AST regarding casts in C++.

This issue has been pointed out before, mainly
by Abramo Bagnara and Enea Zaffanella, but
I'll briefly recap. Consider the following code:

int func() {
return (int)0.5; // but also int(0.5) or static_cast<int>(0.5)
}

For this function, clang produces this AST:
int func() (CompoundStmt
(ReturnStmt
(CStyleCastExpr 'int' functional cast to int <NoOp> // Or CXXFunctionalCastExpr or CXXStaticCastExpr
(ImplicitCastExpr 'int' <FloatingToIntegral>
(FloatingLiteral 'double' 5.000000e-01)))))

As you can see, the AST node for the explicit CStyle cast (or the static_cast, or the functional cast) is marked as
NoOp, and the actual cast seems to be performed by an implicit cast expression that pops out of nowhere.

What we would like to have is an AST like this, that is also what happens for a c-style cast in C mode:

int func() (CompoundStmt
(ReturnStmt
(CStyleCastExpr 'int' functional cast to int <FloatingToIntegral>
(FloatingLiteral 'double' 5.000000e-01)))))

The main point here is that there seems to be an extra and useless node in the AST.
This behavior introduces problems in some applications where we want, for example, to warn the user on
useless casts or force the user to not write code that involves implicit casts. In general, this breaks clients that want
an high AST source-code fidelity.

The first question that arises is: is this behavior intended? If so, why?

No, not really... I agree with your analysis that it's an accident.

Anyway, as I've inspected the code it seems to be accidental. The "wrong" behavior comes
from TryStaticImplicitCast(), that is indirectly called by CheckStaticCast().
IMHO, the function name already suggests that something's wrong, as in this case
we are checking an "explicit" cast.

This function tries to see if a static cast is applicable, and to do so, builds an initialization sequence that when performed creates the wrong ImplicitCastExpr node. It generally works as expected because,
as pointed out by a nearby comment, a static_cast of expression e to type T can be
done if "T t(e);" is well-formed. However, I think the initialization shouldn't really be performed
as if that code had really been written. At this point, InitializationSequence::Perform() should
have been told that it's an explicit cast.

So, to recap:
- Have I understood correctly why this issue happens?
- Is it ok if I try to fix it? i.e. will this change in the AST cause problems in other parts of the compiler? (in this case, I would also fix them of course)
- Any suggestions to how to fix the issue?

It sounds like you're correctly understanding what is happening.
Extending InitializationSequence::Perform to optionally return a cast
kind instead of actually performing the last cast in its sequence
seems reasonable. Just try and make sure your patch isn't too
invasive.

-Eli

To give you one example: a cast to a class type using a user-defined conversion constructor will be a noop cast expression containing a CXXConstructExpr (or something similar). There is not really a good way you can embed that functionality in the cast expression itself.

Sebastian

Well, it's not accidental. C's semantics are easy enough to check with a function that, logically, just computes a cast kind for the conversion, adding extra implicit casts in the rare cases where they're required. C++'s semantics aren't that simple; we can generate a lot of expressions for one of those casts, and the outermost node is not always a cast at all.

To give you one example: a cast to a class type using a user-defined
conversion constructor will be a noop cast expression containing a
CXXConstructExpr (or something similar). There is not really a good way
you can embed that functionality in the cast expression itself.

Hm, if I got it you mean something like:
struct C {
C(int);
};

C func() {
  return static_cast<C>(0);
}

which produces this AST:
C func() (CompoundStmt
  (ReturnStmt
    (CXXConstructExpr 'struct C''void (const struct C &) throw()' elidable
      (MaterializeTemporaryExpr 'const struct C' lvalue
        (ImplicitCastExpr 'const struct C' <NoOp>
          (CXXStaticCastExpr 'struct C' static_cast<struct C> <ConstructorConversion>
            (CXXConstructExpr 'struct C''void (int)'
              (IntegerLiteral 'int' 0))))))))

I don't find anything wrong in this AST, mainly because the kind of the static cast node
is ConstructorConversion instead of NoOp.
There's a NoOp cast outside but that's another thing. Actually, this is the only case where
TryStaticImplicitCast() doesn't do the ImplicitCastExpr trick,
maybe because it creates the CXXConstructExpr instead.
In this case, the CXXConstructExpr is useful because it stores info
that could not be stored directly in the cast node.
However, in the other cases the ImplicitCastExpr node doesn't carry any additional information.
Imho is redundant, tell me if you don't agree.

Let me be clear: we don't want to get rid of every NoOp cast nor every implicit cast in the AST. That's just
a specific corner case in which we think this node is useless.

Would it be ok if I modify the function to not create the ImplicitCastExpr node in those cases,
relying on the external explicit cast node instead, as suggested (I think) by Eli?

Sebastian

Thanks,
Nicola