Patch proposal for __builtin_offsetof construct.

Hi,

I'm the student who is trying to fix the problem mentioned in
http://lists.cs.uiuc.edu/pipermail/cfe-dev/2010-February/007833.html .

I followed the guidelines described in the previous messages, creating a
dedicated class OffsetOfExpr which doesn't contain any sub-expression.
The offsetof base type is encoded in a TypeSourceInfo field, while
the sub-components are stored in a SmallVector of
OffsetOfExpr::OffsetOfNode (a public struct similar to Action::OffsetOfComponent).

Note that the goal of this patch is solely to avoid the unsuitable use of
UnaryOperator for OffsetOf expressions: namely, it is not meant to provide a
solution to other FIXME's (e.g., type-dependent OffsetOfExpr are still not handled).
This patch passes all clang tests (it refers to revision 100962), but since
I'm only a beginner in clang development I am note sure about the code I wrote,
so any comments and/or suggestions are really welcome.

Regards,
Roberto.

OffsetOfExpr.patch (29.1 KB)

Hi,

I'm the student who is trying to fix the problem mentioned in
http://lists.cs.uiuc.edu/pipermail/cfe-dev/2010-February/007833.html .

I followed the guidelines described in the previous messages, creating a
dedicated class OffsetOfExpr which doesn't contain any sub-expression.
The offsetof base type is encoded in a TypeSourceInfo field, while
the sub-components are stored in a SmallVector of
OffsetOfExpr::OffsetOfNode (a public struct similar to Action::OffsetOfComponent).

Note that the goal of this patch is solely to avoid the unsuitable use of
UnaryOperator for OffsetOf expressions: namely, it is not meant to provide a
solution to other FIXME's (e.g., type-dependent OffsetOfExpr are still not handled).
This patch passes all clang tests (it refers to revision 100962), but since
I'm only a beginner in clang development I am note sure about the code I wrote,
so any comments and/or suggestions are really welcome.

Thanks for tackling this! Even if you're not handling type-dependent __builtin_offsetof expressions, having a proper AST for __builtin_offsetof improves Clang and will make it easier to add support for type-dependent __builtin_offsetof in the future.

Some comments inline.

For reference, your attachment shows up inline, which makes it hard to apply/test. Please follow the instructions

  http://llvm.org/docs/DeveloperPolicy.html#patches

to configure Thunderbird to send patches in a more Apple-Mail-friendly way.

Index: include/clang/AST/Expr.h

--- include/clang/AST/Expr.h (revisione 100962)
+++ include/clang/AST/Expr.h (copia locale)
@@ -991,6 +985,78 @@
  virtual child_iterator child_end();
};

+/// OffsetOfExpr
+class OffsetOfExpr : public Expr {
+

Please provide some documentation for this AST node, including what it represents and an example of code that will be represented as an OffsetOfExpr.

+public:
+ // __builtin_offsetof(type, identifier(.identifier|[expr])*)
+ // FIXME: Type-dependent case ignored.
+ struct OffsetOfNode {
+ SourceLocation LocStart, LocEnd;
+ bool isBrackets; // true if [expr], false if .ident
+ union {
+ FieldDecl *MemberDecl;
+ Expr *IndexExpr;
+ } U;
+ };

You could save a word of storage per OffsetOfNode by mangling the "isBrackets" bit into the lowest bit of the FieldDecl or Expr pointer. llvm::PointerUnion<FieldDecl *, Expr *> will do this for you.

But, see below for more comments on this representation.

+private:
+ typedef llvm::SmallVector<OffsetOfNode, 4> Components;
+ // Base type;
+ TypeSourceInfo *TSInfo;
+ SourceLocation Loc;
+ // Keep track of the various subcomponents (e.g. instance of OffsetOfNode).
+ Components Comps;

We're trying to eliminate all SmallVectors (and other malloc-allocated memory) from the AST representation. Please put a count of the number of components into this structure, then allocate storage for the OffsetOfNode structures after the OffsetOfExpr node (you'll want to make the OffsetOfExpr constructor private, then add a Create memory to do allocation + construction).

+public:
+
+ OffsetOfExpr(QualType type, SourceLocation l,
+ TypeSourceInfo *tsi, const Components &comps)
+ : Expr(OffsetOfExprClass, type, false, false),
+ TSInfo(tsi), Loc(l), Comps(comps) {}
+
+ /// \brief Build an empty offsetof.
+ explicit OffsetOfExpr(EmptyShell Empty)
+ : Expr(OffsetOfExprClass, Empty), TSInfo(0), Comps(0) {}
+
+ /// getOperatorLoc - Return the location of the operator.
+ SourceLocation getOperatorLoc() const { return Loc; }
+ void setOperatorLoc(SourceLocation L) { Loc = L; }
+
+ TypeSourceInfo *getTypeSourceInfo() const {
+ return TSInfo;
+ }
+ void setTypeSourceInfo(TypeSourceInfo *tsi) {
+ TSInfo = tsi;
+ }
+
+ const OffsetOfNode* getComponents() const {
+ return &Comps[0];
+ }
+ void setComponents(Components comps) {
+ Comps = comps;
+ }
+
+ size_t getNumComponents() const {
+ return Comps.size();
+ }
+
+ virtual SourceRange getSourceRange() const {
+ return SourceRange(Loc, Comps.back().LocEnd);
+ }
+ virtual SourceLocation getExprLoc() const { return Loc; }
+
+ static bool classof(const Stmt *T) {
+ return T->getStmtClass() == OffsetOfExprClass;
+ }
+
+ static bool classof(const OffsetOfExpr *) { return true; }
+
+ // Iterators
+ virtual child_iterator child_begin();
+ virtual child_iterator child_end();
+};
+
/// SizeOfAlignOfExpr - [C99 6.5.3.4] - This is for sizeof/alignof, both of
/// types and expressions.
class SizeOfAlignOfExpr : public Expr {
Index: include/clang/AST/StmtNodes.def

--- include/clang/AST/StmtNodes.def (revisione 100962)
+++ include/clang/AST/StmtNodes.def (copia locale)
@@ -78,6 +78,7 @@
EXPR(CharacterLiteral , Expr)
EXPR(ParenExpr , Expr)
EXPR(UnaryOperator , Expr)
+EXPR(OffsetOfExpr , Expr)
EXPR(SizeOfAlignOfExpr , Expr)
EXPR(ArraySubscriptExpr , Expr)
EXPR(CallExpr , Expr)
Index: tools/CIndex/CXCursor.cpp

--- tools/CIndex/CXCursor.cpp (revisione 100962)
+++ tools/CIndex/CXCursor.cpp (copia locale)
@@ -139,7 +139,8 @@
  case Stmt::StringLiteralClass:
  case Stmt::CharacterLiteralClass:
  case Stmt::ParenExprClass:
- case Stmt::UnaryOperatorClass:
+ case Stmt::UnaryOperatorClass:
+ case Stmt::OffsetOfExprClass:
  case Stmt::SizeOfAlignOfExprClass:
  case Stmt::ArraySubscriptExprClass:
  case Stmt::BinaryOperatorClass:
Index: lib/Frontend/StmtXML.cpp

--- lib/Frontend/StmtXML.cpp (revisione 100962)
+++ lib/Frontend/StmtXML.cpp (copia locale)
@@ -125,6 +125,7 @@
    void VisitFloatingLiteral(FloatingLiteral *Node);
    void VisitStringLiteral(StringLiteral *Str);
    void VisitUnaryOperator(UnaryOperator *Node);
+ void VisitOffsetOfExpr(OffsetOfExpr *Node);
    void VisitSizeOfAlignOfExpr(SizeOfAlignOfExpr *Node);
    void VisitMemberExpr(MemberExpr *Node);
    void VisitExtVectorElementExpr(ExtVectorElementExpr *Node);
@@ -260,7 +261,6 @@
  case UnaryOperator::Real: return "__real";
  case UnaryOperator::Imag: return "__imag";
  case UnaryOperator::Extension: return "__extension__";
- case UnaryOperator::OffsetOf: return "__builtin_offsetof";
  }
}

@@ -308,6 +308,10 @@
  Doc.addAttribute("op_code", getOpcodeStr(Node->getOpcode()));
}

+void StmtXML::OffsetOfExpr(OffsetOfExpr *Node) {
+ DumpExpr(Node);
+}
+
void StmtXML::VisitSizeOfAlignOfExpr(SizeOfAlignOfExpr *Node) {
  DumpExpr(Node);
  Doc.addAttribute("is_sizeof", Node->isSizeOf() ? "sizeof" : "alignof");
Index: lib/Sema/TreeTransform.h

--- lib/Sema/TreeTransform.h (revisione 100962)
+++ lib/Sema/TreeTransform.h (copia locale)
@@ -3828,6 +3828,12 @@

template<typename Derived>
Sema::OwningExprResult
+TreeTransform<Derived>::TransformOffsetOfExpr(OffsetOfExpr *E) {
+ return SemaRef.Owned(new (SemaRef.Context) OffsetOfExpr(*E));
+}

Please add a FIXME here so that we don't forget to add the template-instantiation logic later.

Index: lib/Sema/SemaExpr.cpp

--- lib/Sema/SemaExpr.cpp (revisione 100962)
+++ lib/Sema/SemaExpr.cpp (copia locale)
@@ -18,6 +18,7 @@
#include "clang/AST/ASTContext.h"
#include "clang/AST/DeclObjC.h"
#include "clang/AST/DeclTemplate.h"
+#include "clang/AST/Expr.h"
#include "clang/AST/ExprCXX.h"
#include "clang/AST/ExprObjC.h"
#include "clang/Basic/PartialDiagnostic.h"
@@ -6401,9 +6402,6 @@
  Expr *Input = (Expr *)InputArg.get();
  QualType resultType;
  switch (Opc) {
- case UnaryOperator::OffsetOf:
- assert(false && "Invalid unary operator");
- break;

  case UnaryOperator::PreInc:
  case UnaryOperator::PreDec:
@@ -6571,8 +6569,8 @@
                                                  SourceLocation RPLoc) {
  // FIXME: This function leaks all expressions in the offset components on
  // error.
- // FIXME: Preserve type source info.
- QualType ArgTy = GetTypeFromParser(argty);
+ TypeSourceInfo *ArgTInfo;
+ QualType ArgTy = GetTypeFromParser(argty, &ArgTInfo);
  assert(!ArgTy.isNull() && "Missing type argument!");

When ArgTy.isNull(), we should just return with an error.

  bool Dependent = ArgTy->isDependentType();
@@ -6592,6 +6590,7 @@
  Expr* Res = new (Context) ImplicitValueInitExpr(ArgTyPtr);
  Res = new (Context) UnaryOperator(Res, UnaryOperator::Deref,
                                    ArgTy, SourceLocation());
+ llvm::SmallVector<OffsetOfExpr::OffsetOfNode, 4> Comps(NumComponents);

  // offsetof with non-identifier designators (e.g. "offsetof(x, a.b[c])") are a
  // GCC extension, diagnose them.
@@ -6608,83 +6607,63 @@
                            diag::err_offsetof_incomplete_type))
      return ExprError();

- // FIXME: Dependent case loses a lot of information here. And probably
- // leaks like a sieve.
+ QualType CurrentType = ArgTy;
    for (unsigned i = 0; i != NumComponents; ++i) {
      const OffsetOfComponent &OC = CompPtr[i];
      if (OC.isBrackets) {
- // Offset of an array sub-field. TODO: Should we allow vector elements?
- const ArrayType *AT = Context.getAsArrayType(Res->getType());
- if (!AT) {
- Res->Destroy(Context);
+ const ArrayType *AT = Context.getAsArrayType(CurrentType);
+ if(!AT)
          return ExprError(Diag(OC.LocEnd, diag::err_offsetof_array_type)
- << Res->getType());
- }
+ << CurrentType);

        // FIXME: C++: Verify that operator isn't overloaded.

- // Promote the array so it looks more like a normal array subscript
- // expression.
- DefaultFunctionArrayLvalueConversion(Res);
-
        // C99 6.5.2.1p1
        Expr *Idx = static_cast<Expr*>(OC.U.E);
- // FIXME: Leaks Res
        if (!Idx->isTypeDependent() && !Idx->getType()->isIntegerType())
          return ExprError(Diag(Idx->getLocStart(),
                                diag::err_typecheck_subscript_not_integer)
- << Idx->getSourceRange());
-
- Res = new (Context) ArraySubscriptExpr(Res, Idx, AT->getElementType(),
- OC.LocEnd);
- continue;
+ << Idx->getSourceRange());
+ CurrentType = AT->getElementType();
+ Comps[i].U.IndexExpr = Idx;
      }
-
- const RecordType *RC = Res->getType()->getAs<RecordType>();
- if (!RC) {
- Res->Destroy(Context);
- return ExprError(Diag(OC.LocEnd, diag::err_offsetof_record_type)
- << Res->getType());
- }
-
- // Get the decl corresponding to this.
- RecordDecl *RD = RC->getDecl();
- if (CXXRecordDecl *CRD = dyn_cast<CXXRecordDecl>(RD)) {
- if (!CRD->isPOD() && !DidWarnAboutNonPOD &&
+ else {
+ const RecordType *RC = CurrentType->getAs<RecordType>();
+ if (!RC)
+ return ExprError(Diag(OC.LocEnd, diag::err_offsetof_array_type)
+ << CurrentType);

This should be diag::err_offsetof_record_type?

+ RecordDecl *RD = RC->getDecl();
+ if (CXXRecordDecl *CRD = dyn_cast<CXXRecordDecl>(RD)) {
+ if (!CRD->isPOD() && !DidWarnAboutNonPOD &&
            DiagRuntimeBehavior(BuiltinLoc,
                                PDiag(diag::warn_offsetof_non_pod_type)
                                  << SourceRange(CompPtr[0].LocStart, OC.LocEnd)
- << Res->getType()))
+ << CurrentType))
          DidWarnAboutNonPOD = true;
- }
+ }
+ LookupResult R(*this, OC.U.IdentInfo, OC.LocStart, LookupMemberName);
+ LookupQualifiedName(R, RD);
+ FieldDecl *MemberDecl = R.getAsSingle<FieldDecl>();
+ if (!MemberDecl)
+ return ExprError(Diag(BuiltinLoc, diag::err_no_member)
+ << OC.U.IdentInfo << RD << SourceRange(OC.LocStart, OC.LocEnd));

- LookupResult R(*this, OC.U.IdentInfo, OC.LocStart, LookupMemberName);
- LookupQualifiedName(R, RD);
-
- FieldDecl *MemberDecl = R.getAsSingle<FieldDecl>();
- // FIXME: Leaks Res
- if (!MemberDecl)
- return ExprError(Diag(BuiltinLoc, diag::err_no_member)
- << OC.U.IdentInfo << RD << SourceRange(OC.LocStart, OC.LocEnd));
-
- // FIXME: C++: Verify that MemberDecl isn't a static field.
- // FIXME: Verify that MemberDecl isn't a bitfield.
- if (cast<RecordDecl>(MemberDecl->getDeclContext())->isAnonymousStructOrUnion()) {
- Res = BuildAnonymousStructUnionMemberReference(
- OC.LocEnd, MemberDecl, Res, OC.LocEnd).takeAs<Expr>();
- } else {
- PerformObjectMemberConversion(Res, /*Qualifier=*/0,
- *R.begin(), MemberDecl);
- // MemberDecl->getType() doesn't get the right qualifiers, but it
- // doesn't matter here.
- Res = new (Context) MemberExpr(Res, false, MemberDecl, OC.LocEnd,
- MemberDecl->getType().getNonReferenceType());
+ // FIXME: C++: Verify that MemberDecl isn't a static field.
+ // FIXME: Verify that MemberDecl isn't a bitfield.
+ if (cast<RecordDecl>(MemberDecl->getDeclContext())->isAnonymousStructOrUnion())
+ CurrentType = MemberDecl->getType();

This isn't going to work. When we find a member of an anonymous struct or union, we need to introduce OffsetOfNodes for the FieldDecls that represent the anonymous unions. Imagine, e.g.,

  struct A {
    struct {
      union { int i; float f; };
      bool isInt;
    };
  };

If I ask for the offset of "i" in A, we should (semantically!) get an OffsetOfNode referring to the anonymous struct, another OffsetOfNode for the anonymous union, and then, finally, the OffsetOfNode for "i". So you can have more OffsetOfNodes in the AST than there were components parsed by the parser, since we're adding implicit ones.

+ else
+ CurrentType = MemberDecl->getType().getNonReferenceType();
+ Comps[i].U.MemberDecl = MemberDecl;
      }
+ Comps[i].LocStart = OC.LocStart;
+ Comps[i].LocEnd = OC.LocEnd;
    }
+
  }

- return Owned(new (Context) UnaryOperator(Res, UnaryOperator::OffsetOf,
- Context.getSizeType(), BuiltinLoc));
+ return Owned(new (Context) OffsetOfExpr(Context.getSizeType(), BuiltinLoc,
+ ArgTInfo, Comps));
}

Aside from the anonymous struct/union issue, this is a big improvement.

Index: lib/AST/ExprConstant.cpp

--- lib/AST/ExprConstant.cpp (revisione 100962)
+++ lib/AST/ExprConstant.cpp (copia locale)
@@ -16,7 +16,9 @@
#include "clang/AST/CharUnits.h"
#include "clang/AST/RecordLayout.h"
#include "clang/AST/StmtVisitor.h"
+#include "clang/AST/TypeLoc.h"
#include "clang/AST/ASTDiagnostic.h"
+#include "clang/AST/Expr.h"
#include "clang/Basic/Builtins.h"
#include "clang/Basic/TargetInfo.h"
#include "llvm/ADT/SmallString.h"
@@ -221,7 +223,7 @@
  APValue VisitStmt(Stmt *S) {
    return APValue();
  }
-
+
  APValue VisitParenExpr(ParenExpr *E) { return Visit(E->getSubExpr()); }
  APValue VisitDeclRefExpr(DeclRefExpr *E);
  APValue VisitPredefinedExpr(PredefinedExpr *E) { return APValue(E); }
@@ -821,6 +823,7 @@

  bool VisitCallExpr(CallExpr *E);
  bool VisitBinaryOperator(const BinaryOperator *E);
+ bool VisitOffsetOfExpr(const OffsetOfExpr *E);
  bool VisitUnaryOperator(const UnaryOperator *E);
  bool VisitConditionalOperator(const ConditionalOperator *E);

@@ -1365,20 +1368,56 @@
  return Success(Info.Ctx.getTypeSizeInChars(SrcTy).getQuantity(), E);
}

+bool IntExprEvaluator::VisitOffsetOfExpr(const OffsetOfExpr *E) {
+ CharUnits Result;
+ const OffsetOfExpr::OffsetOfNode *Comps = E->getComponents();
+ unsigned n = E->getNumComponents();
+ if (n == 0)
+ return false;
+ QualType CurrentType = E->getTypeSourceInfo()->getType();
+ for (unsigned i = 0; i != n; ++i) {
+ const OffsetOfExpr::OffsetOfNode& ON = Comps[i];
+ if (ON.isBrackets) {
+ Expr *Idx = static_cast<Expr*>(ON.U.IndexExpr);
+ APSInt IdxResult;
+ if (!EvaluateInteger(Idx, IdxResult, Info))
+ return false;
+ const ArrayType *AT = Info.Ctx.getAsArrayType(CurrentType);
+ if (!AT)
+ return false;
+ CurrentType = AT->getElementType();
+ CharUnits ElementSize = Info.Ctx.getTypeSizeInChars(CurrentType);
+ Result += IdxResult.getSExtValue() * ElementSize;
+ }
+ else {
+ FieldDecl *MemberDecl = ON.U.MemberDecl;
+ const RecordType *RT = CurrentType->getAs<RecordType>();
+ if (!RT)
+ return false;
+ RecordDecl *RD = RT->getDecl();
+ const ASTRecordLayout &RL = Info.Ctx.getASTRecordLayout(RD);
+ unsigned i = 0;
+ for (RecordDecl::field_iterator Field = RD->field_begin(),
+ FieldEnd = RD->field_end();
+ Field != FieldEnd; (void)++Field, ++i) {
+ if (*Field == MemberDecl)
+ break;
+ }
+ if (i < RL.getFieldCount())
+ Result += CharUnits::fromQuantity(RL.getFieldOffset(i) / 8);

Use ASTContext::getCharWidth() rather than 8.

+ if (cast<RecordDecl>(MemberDecl->getDeclContext())->isAnonymousStructOrUnion())
+ CurrentType = MemberDecl->getType();

If you follow my advice about expanded references into anonymous structs/unions, you won't need to do anything special here.

Index: lib/AST/StmtProfile.cpp

--- lib/AST/StmtProfile.cpp (revisione 100962)
+++ lib/AST/StmtProfile.cpp (copia locale)
@@ -261,6 +261,10 @@
  ID.AddInteger(S->getOpcode());
}

+void StmtProfiler::VisitOffsetOfExpr(OffsetOfExpr *S) {
+ VisitExpr(S);
+}
+

To profile an OffsetOfExpr, we'll need to visit the type and all of the OffsetOfNodes as well. That way, we can distinguish between different OffsetOfExprs on the same base type.

Index: lib/AST/Expr.cpp

--- lib/AST/Expr.cpp (revisione 100962)
+++ lib/AST/Expr.cpp (copia locale)
@@ -334,7 +334,6 @@
  case Real: return "__real";
  case Imag: return "__imag";
  case Extension: return "__extension__";
- case OffsetOf: return "__builtin_offsetof";
  }
}

@@ -1835,7 +1834,9 @@
    case UnaryOperator::Real:
    case UnaryOperator::Imag:
      return CheckICE(Exp->getSubExpr(), Ctx);
- case UnaryOperator::OffsetOf:
+ }
+ }
+ case Expr::OffsetOfExprClass: {
      // Note that per C99, offsetof must be an ICE. And AFAIK, using
      // Evaluate matches the proposed gcc behavior for cases like
      // "offsetof(struct s{int x[4];}, x[!.0])". This doesn't affect
@@ -1843,7 +1844,6 @@
      // array subscripts that aren't ICEs, and if the array subscripts
      // are ICEs, the value of the offsetof must be an integer constant.
      return CheckEvalInICE(E, Ctx);
- }
  }
  case Expr::SizeOfAlignOfExprClass: {
    const SizeOfAlignOfExpr *Exp = cast<SizeOfAlignOfExpr>(E);
@@ -2582,6 +2582,15 @@
Stmt::child_iterator UnaryOperator::child_begin() { return &Val; }
Stmt::child_iterator UnaryOperator::child_end() { return &Val+1; }

+// OffsetOfExpr
+Stmt::child_iterator OffsetOfExpr::child_begin() {
+ return child_iterator();
+}
+
+Stmt::child_iterator OffsetOfExpr::child_end() {
+ return child_iterator();
+}

This is a tricky area. child_begin()/child_end() are supposed to provide iterators that cover all of the Expr*'s (actually, Stmt*'s) that are subexpressions of the expression node. With OffsetOfExpr, those Expr*'s aren't currently in contiguous memory.

In DesignatedInitExpr, which is very similar to OffsetOfExpr, we actually did a little dance where we have an array of Expr* nodes that are all of the subexpressions. Then, each node was either non-Expr* data (e.g., a FieldDecl*) or it was an index into the array of Expr* nodes. child_begin()/child_end() just pointed to the beginning/end of that Expr* array.

It's a bit clumsy to implement, but that's what we're stuck with at the moment.

Overall, this is a great start on __builtin_offsetof, thanks! I look forward to seeing a revised patch.

  - Doug

Douglas Gregor ha scritto:

Hi,

I'm the student who is trying to fix the problem mentioned in
http://lists.cs.uiuc.edu/pipermail/cfe-dev/2010-February/007833.html .
    

[...]

Index: include/clang/AST/Expr.h

--- include/clang/AST/Expr.h (revisione 100962)
+++ include/clang/AST/Expr.h (copia locale)
@@ -991,6 +985,78 @@
    

[...]

+public:
+ // __builtin_offsetof(type, identifier(.identifier|[expr])*)
+ // FIXME: Type-dependent case ignored.
+ struct OffsetOfNode {
+ SourceLocation LocStart, LocEnd;
+ bool isBrackets; // true if [expr], false if .ident
+ union {
+ FieldDecl *MemberDecl;
+ Expr *IndexExpr;
+ } U;
+ };
    
You could save a word of storage per OffsetOfNode by mangling the "isBrackets" bit into the lowest bit of the FieldDecl or Expr pointer. llvm::PointerUnion<FieldDecl *, Expr *> will do this for you.

But, see below for more comments on this representation.
  

In Order to have Expr*'s in contiguous memory, I replaced *IndexExpr
field with an unsigned integer ExprIndex, which represents the index of the
corresponding Expr* in the array of subscript expressions: in this way, I can't
use a PointerUnion, or not?

Index: lib/AST/StmtProfile.cpp

--- lib/AST/StmtProfile.cpp (revisione 100962)
+++ lib/AST/StmtProfile.cpp (copia locale)
@@ -261,6 +261,10 @@
  ID.AddInteger(S->getOpcode());
}

+void StmtProfiler::VisitOffsetOfExpr(OffsetOfExpr *S) {
+ VisitExpr(S);
+}
+
    
To profile an OffsetOfExpr, we'll need to visit the type and all of the OffsetOfNodes as well. That way, we can distinguish between different OffsetOfExprs on the same base type.

Ok, but I have a doubt: VisitExpr(S) visits recursively all the sub-expressions,
thus I think that I haven't to explicitly visit such expressions while I'm iterating on
OffsetOfNodes array for declarations visit... Or not?

Overall, this is a great start on __builtin_offsetof, thanks! I look forward to seeing a revised patch.

  - Doug
  

Thanks for your comments! I tried to follow your advices...
I attached a new patch which refers to revision no. 102050.

Regards,
Roberto.

OffsetOfExpr0422.patch (34.4 KB)

Douglas Gregor ha scritto:

Hi,

I'm the student who is trying to fix the problem mentioned in
http://lists.cs.uiuc.edu/pipermail/cfe-dev/2010-February/007833.html .
   

[...]

Index: include/clang/AST/Expr.h

--- include/clang/AST/Expr.h (revisione 100962)
+++ include/clang/AST/Expr.h (copia locale)
@@ -991,6 +985,78 @@
   

[...]

+public:
+ // __builtin_offsetof(type, identifier(.identifier|[expr])*)
+ // FIXME: Type-dependent case ignored.
+ struct OffsetOfNode {
+ SourceLocation LocStart, LocEnd;
+ bool isBrackets; // true if [expr], false if .ident
+ union {
+ FieldDecl *MemberDecl;
+ Expr *IndexExpr;
+ } U;
+ };
   
You could save a word of storage per OffsetOfNode by mangling the "isBrackets" bit into the lowest bit of the FieldDecl or Expr pointer. llvm::PointerUnion<FieldDecl *, Expr *> will do this for you.

But, see below for more comments on this representation.

In Order to have Expr*'s in contiguous memory, I replaced *IndexExpr
field with an unsigned integer ExprIndex, which represents the index of the
corresponding Expr* in the array of subscript expressions: in this way, I can't
use a PointerUnion, or not?

Correct. PointerUnion won't help here. Instead, I played more fun tricks by using the lower two bits of a pointer-sized integer as a descriminator. For a FieldDecl*, we mask off those bits and cast the type; for an expression index, we shift away those bits.

Index: lib/AST/StmtProfile.cpp

--- lib/AST/StmtProfile.cpp (revisione 100962)
+++ lib/AST/StmtProfile.cpp (copia locale)
@@ -261,6 +261,10 @@
ID.AddInteger(S->getOpcode());
}

+void StmtProfiler::VisitOffsetOfExpr(OffsetOfExpr *S) {
+ VisitExpr(S);
+}
+
   
To profile an OffsetOfExpr, we'll need to visit the type and all of the OffsetOfNodes as well. That way, we can distinguish between different OffsetOfExprs on the same base type.

Ok, but I have a doubt: VisitExpr(S) visits recursively all the sub-expressions,
thus I think that I haven't to explicitly visit such expressions while I'm iterating on
OffsetOfNodes array for declarations visit... Or not?

Right. VisitStmt(S) will get the subexpressions, but it won't get the field declarations.

Overall, this is a great start on __builtin_offsetof, thanks! I look forward to seeing a revised patch.

  - Doug

Thanks for your comments! I tried to follow your advices...
I attached a new patch which refers to revision no. 102050.

This is great. I've committed a modified version of your patch, where I compressed the storage of OffsetOfNode a bit and added support for template instantiation and precompiled headers. Since we already had __builtin_offsetof that somewhat worked, everything had to go in at once. Here's what I ended up doing:

  - Implemented the bit-mangled encoding I described above, to shrink the size of OffsetOfNode. Added accessors/constructors to OffsetOfNode, so that the representation is hidden.
  - Simplified the pointer arithmetic inside OffsetOfExpr.
  - Added support for templates:
    - OffsetOfNode now has an "Identifier", for the case where the offsetof designates a field but we can't look up the field yet (since the record type is dependent).
    - Reorganized semantic checking so that it copes with dependent types.
    - Added template instantiation logic
  - Added support for reading/writer OffsetOfExpr to precompiled headers

The commit is here:

  http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20100426/029847.html
  
Thanks for working on this!

  - Doug