Tentative typeid Parser/AST implementation

Hi,

Attached patch implements parsing and AST support for typeid. It's not complete in that the type of a typeid expression is const void instead of const std::type_info, but otherwise everything should be correct. However, since this is my first big change of the parser and AST, I might well have overlooked something.

The work raises some questions about our handling of the C++ support library, but that's for its own thread.

Sebastian

typeid.patch (12.6 KB)

Hi Sebastian,

Sebastian Redl wrote:

Attached patch implements parsing and AST support for typeid. It's not complete in that the type of a typeid expression is const void instead of const std::type_info, but otherwise everything should be correct. However, since this is my first big change of the parser and AST, I might well have overlooked something.

Your approach is not wrong, but I think that is better for consistency if you implement typeid in a similar way as sizeof/alignof is implemented.
sizeof for type calls Action.ActOnSizeOfAlignOfTypeExpr (creating a SizeOfAlignOfTypeExpr node), while sizeof for expression calls Action.ActOnUnaryOp (creating a UnaryOperator expression node).

-Argiris

Argiris Kirtzidis wrote:

Your approach is not wrong, but I think that is better for consistency if you implement typeid in a similar way as sizeof/alignof is implemented.
sizeof for type calls Action.ActOnSizeOfAlignOfTypeExpr (creating a SizeOfAlignOfTypeExpr node), while sizeof for expression calls Action.ActOnUnaryOp (creating a UnaryOperator expression node).

I saw that, started implementing it that way ... and then decided that the AST would be prettier if I did it my way.

But if that's not such a good idea, I can change it.

Sebastian

Hi Sebastian,

Attached patch implements parsing and AST support for typeid. It's not
complete in that the type of a typeid expression is const void instead of
const std::type_info, but otherwise everything should be correct. However,
since this is my first big change of the parser and AST, I might well have
overlooked something.

Great! Some comments follow:

+class CXXTypeidExpr : public Expr {

Please provide documentation for this expression type, along with an
example of C++ code where it would show up.

+public:
+ enum OperandKind {
+ OpType,
+ OpExpr
+ };

Personally, I think the approach taken by alignof and sizeof is not
ideal. If one wants to look through the AST to find sizeof nodes, for
example, you currently have to do a strange two-step of detecting
UnaryOperator or SizeofAlignOfTypeExpr and then check auxiliary fields
to figure out if you have 'sizeof'. The control flow is very messy.
I'd rather see this split out into SizeofExpr and AlignOfExpr classes,
each of which has a similar implementation to Sebastian's
CXXTypeidExpr.

  - Doug

Doug Gregor wrote:

Personally, I think the approach taken by alignof and sizeof is not
ideal. If one wants to look through the AST to find sizeof nodes, for
example, you currently have to do a strange two-step of detecting
UnaryOperator or SizeofAlignOfTypeExpr and then check auxiliary fields
to figure out if you have 'sizeof'. The control flow is very messy.
I'd rather see this split out into SizeofExpr and AlignOfExpr classes,
each of which has a similar implementation to Sebastian's
CXXTypeidExpr.
  
Good idea, that looks like a better approach for sizeof/alignof.

-Argiris

Hi Sebastian,

I have another question for you. :slight_smile: In this node, you're taking the approach of handling exprs and types with the same AST node class. In sizeof/alignof, we have different nodes for the expr/type case.

I agree with Doug that splitting up UnaryOperator (and friends) is goodness, but I'm not sure that having the node handle both expr and type subexpressions is a good approach. What do you think?

-Chris

Chris Lattner wrote:

Hi Sebastian,

I have another question for you. :slight_smile: In this node, you're taking the approach of handling exprs and types with the same AST node class. In sizeof/alignof, we have different nodes for the expr/type case.

I agree with Doug that splitting up UnaryOperator (and friends) is goodness, but I'm not sure that having the node handle both expr and type subexpressions is a good approach. What do you think?

I could have made a TypeidExpr and a TypeidOfTypeExpr and TypeidOfExprExpr that derive from it, but I felt that it was overkill.

Sebastian

Doug Gregor wrote:

Hi Sebastian,

Attached patch implements parsing and AST support for typeid. It's not
complete in that the type of a typeid expression is const void instead of
const std::type_info, but otherwise everything should be correct. However,
since this is my first big change of the parser and AST, I might well have
overlooked something.
    
Great! Some comments follow:
  

Thanks.

+class CXXTypeidExpr : public Expr {

Please provide documentation for this expression type, along with an
example of C++ code where it would show up.
  

Done.

+public:
+ enum OperandKind {
+ OpType,
+ OpExpr
+ };
+
+private:
+ OperandKind OpKind;

Why not just have a single bit:

  bool OperandIsExpression : 1;

?
  

Done.

+CXXTypeidExpr*
+CXXTypeidExpr::CreateImpl(llvm::Deserializer& D, ASTContext& C) {
+ QualType Ty = QualType::ReadVal(D);
+ OperandKind OpKind = static_cast<OperandKind>(OpKind);

Something isn't right here... the argument to the static_cast should
be an integer value read from D, right?
  

Right. Fixed while rewriting to use a single bit.

+ if (Tok.is(tok::r_paren))
+ RParenLoc = ConsumeParen();
+ else
+ MatchRHSPunctuation(tok::r_paren, LParenLoc);

I believe this can just be:

  RParenLoc = MatchRHSPunctuation(tok::r_paren, LParenLoc);

(This happens twice in ParseCXXTypeid)
  

I copied the code from ParseParenExpression.
Done.

Here's an updated patch. It now looks up std::type_info and fails if it can't find it. This means that the type of the expression is an lvalue of type const std::type_info, as it should be. Semantic checking is thus complete.

Sebastian

typeid.patch (19.5 KB)

I don't really care one way or the other, as long as we're consistent. Do you think one approach is better than the other? If you like merging the two together, would you be willing to change sizeof/alignof to match?

-Chris

Chris Lattner wrote:

I don't really care one way or the other, as long as we're consistent. Do you think one approach is better than the other? If you like merging the two together, would you be willing to change sizeof/alignof to match?

I'll do that.

Sebastian

One trick I found while looking at this... instead of having a bit for
isTypeOp, you can split the SourceRange into two SourceLocations:

  SourceLocation OpLoc; // location of the 'typeid', 'sizeof', or 'alignof'
  SourceLocation RParenLoc; // location of the right paren, if this
we're storing a type

When RParenLoc is a valid location, the Operand is a type and
RParenLoc will be the source location of the right paren. When
RParenLoc is invalid, the Operand is an expression (and the
expression's getSourceRange will tell us where the end of the
expression is). getSourceRange gets slightly more complicated, but
we'll save ourselves 4 bytes per typeid/sizeof/alignof.

  - Doug

Doug Gregor wrote:

One trick I found while looking at this... instead of having a bit for
isTypeOp, you can split the SourceRange into two SourceLocations:

  SourceLocation OpLoc; // location of the 'typeid', 'sizeof', or 'alignof'
  SourceLocation RParenLoc; // location of the right paren, if this
we're storing a type

When RParenLoc is a valid location, the Operand is a type and
RParenLoc will be the source location of the right paren. When
RParenLoc is invalid, the Operand is an expression (and the
expression's getSourceRange will tell us where the end of the
expression is). getSourceRange gets slightly more complicated, but
we'll save ourselves 4 bytes per typeid/sizeof/alignof.
  

I can do that for sizeof/alignof, but not really for typeid, because the right paren is part of typeid's own syntax. In other words, "typeid a + 2" is illegal code. I'd have to do some ugly trickery to have a valid typeid parsed as a parenthesized expression - do I really want to do this?

Also, we don't save anything for sizeof/alignof, because there's already a boolean (isSizeof) there, and I can collapse that to a single bit, which gives me plenty of space for the isTypeOperand bit.

Sebastian

Doug Gregor wrote:

One trick I found while looking at this... instead of having a bit for
isTypeOp, you can split the SourceRange into two SourceLocations:

SourceLocation OpLoc; // location of the 'typeid', 'sizeof', or 'alignof'
SourceLocation RParenLoc; // location of the right paren, if this
we're storing a type

When RParenLoc is a valid location, the Operand is a type and
RParenLoc will be the source location of the right paren. When
RParenLoc is invalid, the Operand is an expression (and the
expression's getSourceRange will tell us where the end of the
expression is). getSourceRange gets slightly more complicated, but
we'll save ourselves 4 bytes per typeid/sizeof/alignof.

I can do that for sizeof/alignof, but not really for typeid, because the
right paren is part of typeid's own syntax. In other words, "typeid a + 2"
is illegal code. I'd have to do some ugly trickery to have a valid typeid
parsed as a parenthesized expression - do I really want to do this?

Oh, ick... then we certainly don't want to do it for typeid.

Also, we don't save anything for sizeof/alignof, because there's already a
boolean (isSizeof) there, and I can collapse that to a single bit, which
gives me plenty of space for the isTypeOperand bit.

I guess that depends... if you're planning to make sizeof and alignof
different AST nodes, then we can eliminate the bit for isSizeOf and
make the structures smaller than they used to be. If sizeof and
alignof are the same AST node with an isSizeOf bit, then of course the
isTypeOperand bit is free.

  - Doug

Doug Gregor wrote:

Okay, thanks for doing this!

  - Doug

Chris Lattner wrote:

Chris Lattner wrote:

Hi Sebastian,

I have another question for you. :slight_smile: In this node, you're taking the approach of handling exprs and types with the same AST node class. In sizeof/alignof, we have different nodes for the expr/type case.

I agree with Doug that splitting up UnaryOperator (and friends) is goodness, but I'm not sure that having the node handle both expr and type subexpressions is a good approach. What do you think?

I could have made a TypeidExpr and a TypeidOfTypeExpr and TypeidOfExprExpr that derive from it, but I felt that it was overkill.

I don't really care one way or the other, as long as we're consistent. Do you think one approach is better than the other? If you like merging the two together, would you be willing to change sizeof/alignof to match?

-Chris

Here's the patch. Passes all regressions, but this was an ugly, wide-reaching change, so perhaps some people can take a look at it before I commit. I do feel the code is improved now.

Sebastian

sizeofalignof.patch (40.9 KB)

This is a nice savings, but I wouldn't worry about it right now. I'd rather design the ASTs for clarity and simplicity, and we can micro-optimize them later when clang is more "complete". For example, it would be easy to steal a few bits from Stmt to hold bitfields for the subclasses if needed.

-Chris

Chris Lattner wrote:

Chris Lattner wrote:

Hi Sebastian,

I have another question for you. :slight_smile: In this node, you're taking the
approach of handling exprs and types with the same AST node class. In
sizeof/alignof, we have different nodes for the expr/type case.

I agree with Doug that splitting up UnaryOperator (and friends) is
goodness, but I'm not sure that having the node handle both expr and type
subexpressions is a good approach. What do you think?

I could have made a TypeidExpr and a TypeidOfTypeExpr and
TypeidOfExprExpr that derive from it, but I felt that it was overkill.

I don't really care one way or the other, as long as we're consistent. Do
you think one approach is better than the other? If you like merging the
two together, would you be willing to change sizeof/alignof to match?

-Chris

Here's the patch. Passes all regressions, but this was an ugly,
wide-reaching change, so perhaps some people can take a look at it before I
commit. I do feel the code is improved now.

You're right this was an ugly change! :slight_smile:

Thanks for taking it on though, patch looks good to me.

- Daniel