Lambda expr AST representation

Despite what is written in C++11 5.1.2p7:

The lambda-expression’s compound-statement yields the function-body
(8.4) of the function call operator, but for purposes of name lookup
(3.4), determining the type and value of this (9.3.2) and transforming
id-expressions referring to non-static class members into class member
access expressions using (*this) (9.3.1), the compound-statement is
considered in the context of the lambda-expression.

currently clang in its AST insert DeclRefExpr instead of correct
MemberExpr, as the following typescript shows:

$ cat p.cc
int f(int a) {
  return [a]()->int { return a; }();
}
$ _clang++ -cc1 -ast-dump -std=c++0x p.cc
typedef __int128 __int128_t;
typedef unsigned __int128 __uint128_t;
typedef __va_list_tag __builtin_va_list[1];
int f(int a) (CompoundStmt 0x4629a50 <p.cc:1:14, line:3:1>
  (ReturnStmt 0x4629a30 <line:2:3, col:35>
    (CXXOperatorCallExpr 0x46299b0 <col:10, col:35> 'int'
      (ImplicitCastExpr 0x4629998 <col:34, col:35> 'auto (*)(void) const
-> int' <FunctionToPointerDecay>
        (DeclRefExpr 0x4629910 <col:34, col:35> 'auto (void) const ->
int' lvalue CXXMethod 0x4629580 'operator()' 'auto (void) const -> int'))
      (ImplicitCastExpr 0x4629a18 <col:10, col:33> 'const class <lambda
at p.cc:2:10>' <NoOp>
        (LambdaExpr 0x4629748 <col:10, col:33> 'class <lambda at p.cc:2:10>'
          (ImplicitCastExpr 0x46296b0 <col:11> 'int' <LValueToRValue>
            (DeclRefExpr 0x4629688 <col:11> 'int' lvalue ParmVar
0x45fbf00 'a' 'int'))
          (CompoundStmt 0x4629728 <col:21, col:33>
            (ReturnStmt 0x4629708 <col:23, col:30>
              (ImplicitCastExpr 0x46296f0 <col:30> 'int' <LValueToRValue>
                (DeclRefExpr 0x46296c8 <col:30> 'const int' lvalue
ParmVar 0x45fbf00 'a' 'int')))))))))

Although I'm aware that these DeclRefExpr are handled especially in
CodeGen I think that this behavior should be considered a defect of AST.

Also, I've noted that lambda class fields generated for captured
variables are anonymous instead to have the name of the captured variable.

Do we agree that these two points should be fixed or there are reasons
I'm missing for current state?

Despite what is written in C++11 5.1.2p7:

The lambda-expression’s compound-statement yields the function-body
(8.4) of the function call operator, but for purposes of name lookup
(3.4), determining the type and value of this (9.3.2) and transforming
id-expressions referring to non-static class members into class member
access expressions using (*this) (9.3.1), the compound-statement is
considered in the context of the lambda-expression.

currently clang in its AST insert DeclRefExpr instead of correct
MemberExpr, as the following typescript shows:

$ cat p.cc
int f(int a) {
  return [a]()->int { return a; }();
}
$ _clang++ -cc1 -ast-dump -std=c++0x p.cc
typedef __int128 __int128_t;
typedef unsigned __int128 __uint128_t;
typedef __va_list_tag __builtin_va_list[1];
int f(int a) (CompoundStmt 0x4629a50 <p.cc:1:14, line:3:1>
  (ReturnStmt 0x4629a30 <line:2:3, col:35>
    (CXXOperatorCallExpr 0x46299b0 <col:10, col:35> 'int'
      (ImplicitCastExpr 0x4629998 <col:34, col:35> 'auto (*)(void) const
-> int' <FunctionToPointerDecay>
        (DeclRefExpr 0x4629910 <col:34, col:35> 'auto (void) const ->
int' lvalue CXXMethod 0x4629580 'operator()' 'auto (void) const -> int'))
      (ImplicitCastExpr 0x4629a18 <col:10, col:33> 'const class <lambda
at p.cc:2:10>' <NoOp>
        (LambdaExpr 0x4629748 <col:10, col:33> 'class <lambda at p.cc:2:10>'
          (ImplicitCastExpr 0x46296b0 <col:11> 'int' <LValueToRValue>
            (DeclRefExpr 0x4629688 <col:11> 'int' lvalue ParmVar
0x45fbf00 'a' 'int'))
          (CompoundStmt 0x4629728 <col:21, col:33>
            (ReturnStmt 0x4629708 <col:23, col:30>
              (ImplicitCastExpr 0x46296f0 <col:30> 'int' <LValueToRValue>
                (DeclRefExpr 0x46296c8 <col:30> 'const int' lvalue
ParmVar 0x45fbf00 'a' 'int')))))))))

Although I'm aware that these DeclRefExpr are handled especially in
CodeGen I think that this behavior should be considered a defect of AST.

Despite the reference to "transforming id-expressions" in the
standard, the ASTs were intentionally designed the way they are
because an expression in a lambda acts more like a reference to the
original variable in terms of semantic analysis than some sort of
member reference expression. The alternative involves some entirely
new AST nodes to keep around the relevant semantic information, and
substantial benefit.

Also, I've noted that lambda class fields generated for captured
variables are anonymous instead to have the name of the captured variable.

That's what the standard says, sorry.

-Eli

[...]

Also, I've noted that lambda class fields generated for captured
variables are anonymous instead to have the name of the captured variable.

That's what the standard says, sorry.

Does it? That doesn't match my understanding, which is that the
standard doesn't specify the names. Can you point at that text in the
standard?

-- James

Despite what is written in C++11 5.1.2p7:

The lambda-expression’s compound-statement yields the function-body
(8.4) of the function call operator, but for purposes of name lookup
(3.4), determining the type and value of this (9.3.2) and transforming
id-expressions referring to non-static class members into class member
access expressions using (*this) (9.3.1), the compound-statement is
considered in the context of the lambda-expression.

currently clang in its AST insert DeclRefExpr instead of correct
MemberExpr, as the following typescript shows:

$ cat p.cc
int f(int a) {
  return [a]()->int { return a; }();
}
$ _clang++ -cc1 -ast-dump -std=c++0x p.cc
typedef __int128 __int128_t;
typedef unsigned __int128 __uint128_t;
typedef __va_list_tag __builtin_va_list[1];
int f(int a) (CompoundStmt 0x4629a50 <p.cc:1:14, line:3:1>
  (ReturnStmt 0x4629a30 <line:2:3, col:35>
    (CXXOperatorCallExpr 0x46299b0 <col:10, col:35> 'int'
      (ImplicitCastExpr 0x4629998 <col:34, col:35> 'auto (*)(void) const
-> int' <FunctionToPointerDecay>
        (DeclRefExpr 0x4629910 <col:34, col:35> 'auto (void) const ->
int' lvalue CXXMethod 0x4629580 'operator()' 'auto (void) const -> int'))
      (ImplicitCastExpr 0x4629a18 <col:10, col:33> 'const class <lambda
at p.cc:2:10>' <NoOp>
        (LambdaExpr 0x4629748 <col:10, col:33> 'class <lambda at p.cc:2:10>'
          (ImplicitCastExpr 0x46296b0 <col:11> 'int' <LValueToRValue>
            (DeclRefExpr 0x4629688 <col:11> 'int' lvalue ParmVar
0x45fbf00 'a' 'int'))
          (CompoundStmt 0x4629728 <col:21, col:33>
            (ReturnStmt 0x4629708 <col:23, col:30>
              (ImplicitCastExpr 0x46296f0 <col:30> 'int' <LValueToRValue>
                (DeclRefExpr 0x46296c8 <col:30> 'const int' lvalue
ParmVar 0x45fbf00 'a' 'int')))))))))

Although I'm aware that these DeclRefExpr are handled especially in
CodeGen I think that this behavior should be considered a defect of AST.

Despite the reference to "transforming id-expressions" in the
standard, the ASTs were intentionally designed the way they are
because an expression in a lambda acts more like a reference to the
original variable in terms of semantic analysis than some sort of
member reference expression.

I'm amazed by this phrase: my message is specifically oriented to have a
proper built AST under a semantic analysis point of view. AFAIK the
reference to captured variables are *indeed* references to record field
and not to original variable: e.g. if original variable captured by
value changes after lambda class (closure type) instance generation and
before operator() call the value that should be seen is the field of
lambda class instance and not the value of captured variable.

I'm missing something?

The alternative involves some entirely
new AST nodes to keep around the relevant semantic information, and
from my perspective that would just bloat the AST without any
substantial benefit.

Can you explain which semantic information?

Also, I've noted that lambda class fields generated for captured
variables are anonymous instead to have the name of the captured variable.

That's what the standard says, sorry.

Sorry I missed that (if someone is interested it is specified in 5.1.2p14)

Hmm... maybe the current implementation makes more sense to me because
I implemented large parts of it, but I don't think of references to
captured variables like normal member variables... I think of them
more like references to the original variable from the perspective of
inside the lambda. The whole implementation was a bi colored by the
existing implementation of the Apple blocks extension, where it isn't
as clear-cut that there's actually an in-memory object containing the
relevant members.

There are two reasons we'd need new AST nodes: one, we have to treat
the "implicit this" differently from the implicit this for normal
class members, and two, we would need a different kind of member
reference expression to track the original variable referred to.

-Eli

5.1.2/14: “For each entity captured by copy, an unnamed non-static data member is declared in the closure type”.

I'd suggest a slightly different path:

1) the closure type FieldDecl has the name (actually a pseudo-name) of
the captured variable ("this" to refer to captured this)

2) the FieldDecl uses a bit to represent the fact that fields are the
fields of the closure type (this means they are actually unnamed)

In this way the source pretty printing is easily doable, the semantic
info is accurate, no new AST node is needed, CodeGen is simpler (it does
not need to map DeclRefExpr to MemberExpr).

I've forgot something?

That could work... although it would be a bit tricky to find the
original captured variable given a MemberExpr of this sort.

-Eli

Thanks for the reference.

That wording makes lots of room under "as-if", as all that's required
is that valid user code can't use (or be affected by) the names used
for the closure type. I see nothing in 5.1.2/14 that prevents an
implementation naming the captured "foo" as "__captured__foo" or
similar. Well-defined code can't tell the difference between an
unnamed member and a member with a reserved name, AFAIK. There don't
even have to _be_ the specified members (and hopefully implementations
will be smart enough not to make closure types which hold nothing but
references to n local variables, when they could just re-use the stack
frame).

Most of the wording around closure types is meant to mean (very)
roughly "implementations can do what they like, within limits, and
users can't look".

That said, so long as the AST is rich enough for tools to navigate
from the capture type's members back to what they captured from, I
don't think this is particularly hard to work with. Even if Clang's
AST gave somewhat more friendly names, they still wouldn't match the
original names exactly, so that work would have to be done anyway.

-- James

I've thought to that, but I failed to imagine a case where this is needed.

Also I've thought that this info would pertain to capture list and not
to field. If we need it we might save the FieldDecl* together with the
VarDecl* in LambdaExpr::Capture.

[…]

Also, I’ve noted that lambda class fields generated for captured
variables are anonymous instead to have the name of the captured
variable.

That’s what the standard says, sorry.

Does it? That doesn’t match my understanding, which is that the
standard doesn’t specify the names. Can you point at that text in the
standard?

5.1.2/14: “For each entity captured by copy, an unnamed non-static data
member is declared in the closure type”.

Thanks for the reference.

That wording makes lots of room under “as-if”, as all that’s required
is that valid user code can’t use (or be affected by) the names used
for the closure type. I see nothing in 5.1.2/14 that prevents an
implementation naming the captured “foo” as “__captured__foo” or
similar. Well-defined code can’t tell the difference between an
unnamed member and a member with a reserved name, AFAIK.

Yeah, I agree that that would be fine. 5.1.2/3 generally restricts similar changes to the closure type, but this case isn’t observable behavior, so it seems permissible. What goal are you working towards here? (Is this for the benefit of a debugger?)

There don’t even have to be the specified members (and hopefully implementations
will be smart enough not to make closure types which hold nothing but
references to n local variables, when they could just re-use the stack
frame).

Well, the rule I quoted only applies to capture by copy, where there really do have to be the specified members, under 5.1.2/14 and 5.1.2/3.

Despite what is written in C++11 5.1.2p7:

The lambda-expression’s compound-statement yields the function-body
(8.4) of the function call operator, but for purposes of name lookup
(3.4), determining the type and value of this (9.3.2) and transforming
id-expressions referring to non-static class members into class member
access expressions using (*this) (9.3.1), the compound-statement is
considered in the context of the lambda-expression.

currently clang in its AST insert DeclRefExpr instead of correct
MemberExpr, as the following typescript shows:

$ cat p.cc
int f(int a) {
return [a]()->int { return a; }();
}
$ _clang++ -cc1 -ast-dump -std=c++0x p.cc
typedef __int128 __int128_t;
typedef unsigned __int128 __uint128_t;
typedef __va_list_tag __builtin_va_list[1];
int f(int a) (CompoundStmt 0x4629a50 <p.cc:1:14, line:3:1>
(ReturnStmt 0x4629a30 <line:2:3, col:35>
   (CXXOperatorCallExpr 0x46299b0 <col:10, col:35> 'int'
     (ImplicitCastExpr 0x4629998 <col:34, col:35> 'auto (*)(void) const
-> int' <FunctionToPointerDecay>
       (DeclRefExpr 0x4629910 <col:34, col:35> 'auto (void) const ->
int' lvalue CXXMethod 0x4629580 'operator()' 'auto (void) const -> int'))
     (ImplicitCastExpr 0x4629a18 <col:10, col:33> 'const class <lambda
at p.cc:2:10>' <NoOp>
       (LambdaExpr 0x4629748 <col:10, col:33> 'class <lambda at p.cc:2:10>'
         (ImplicitCastExpr 0x46296b0 <col:11> 'int' <LValueToRValue>
           (DeclRefExpr 0x4629688 <col:11> 'int' lvalue ParmVar
0x45fbf00 'a' 'int'))
         (CompoundStmt 0x4629728 <col:21, col:33>
           (ReturnStmt 0x4629708 <col:23, col:30>
             (ImplicitCastExpr 0x46296f0 <col:30> 'int' <LValueToRValue>
               (DeclRefExpr 0x46296c8 <col:30> 'const int' lvalue
ParmVar 0x45fbf00 'a' 'int')))))))))

Although I'm aware that these DeclRefExpr are handled especially in
CodeGen I think that this behavior should be considered a defect of AST.

Despite the reference to "transforming id-expressions" in the
standard, the ASTs were intentionally designed the way they are
because an expression in a lambda acts more like a reference to the
original variable in terms of semantic analysis than some sort of
member reference expression.

I'm amazed by this phrase: my message is specifically oriented to have a
proper built AST under a semantic analysis point of view. AFAIK the
reference to captured variables are *indeed* references to record field
and not to original variable: e.g. if original variable captured by
value changes after lambda class (closure type) instance generation and
before operator() call the value that should be seen is the field of
lambda class instance and not the value of captured variable.

I'm missing something?

The alternative involves some entirely
new AST nodes to keep around the relevant semantic information, and
from my perspective that would just bloat the AST without any
substantial benefit.

Can you explain which semantic information?

Hmm... maybe the current implementation makes more sense to me because
I implemented large parts of it, but I don't think of references to
captured variables like normal member variables... I think of them
more like references to the original variable from the perspective of
inside the lambda. The whole implementation was a bi colored by the
existing implementation of the Apple blocks extension, where it isn't
as clear-cut that there's actually an in-memory object containing the
relevant members.

There are two reasons we'd need new AST nodes: one, we have to treat
the "implicit this" differently from the implicit this for normal
class members, and two, we would need a different kind of member
reference expression to track the original variable referred to.

I'd suggest a slightly different path:

1) the closure type FieldDecl has the name (actually a pseudo-name) of
the captured variable ("this" to refer to captured this)

2) the FieldDecl uses a bit to represent the fact that fields are the
fields of the closure type (this means they are actually unnamed)

In this way the source pretty printing is easily doable, the semantic
info is accurate, no new AST node is needed, CodeGen is simpler (it does
not need to map DeclRefExpr to MemberExpr).

I've forgot something?

That could work... although it would be a bit tricky to find the
original captured variable given a MemberExpr of this sort.

I've thought to that, but I failed to imagine a case where this is needed.

It matters a lot for features that care more about the results of name lookup than the underlying semantics. For example, libclang's clang_findReferencesInFile, which finds all of the references to a given declaration, would need to introduce new code to map the fields of implicitly-generated MemberExprs back to references to a normal variable declaration. In general, these tools expect (reasonably, IMO) that a local variable or static data member will be referenced with DeclRefExpr, while a non-static data member will be referenced with a MemberExpr. That's actually a very nice invariant. Doing as you suggest would complicate the invariants for these clients, forcing them to deal specifically with lambda captures (which they otherwise wouldn't have to consider). And if we have to have the complication somewhere, I'd rather it be with the more intelligent clients that care about semantics, rather than the clients that only care about cross-referencing.

Also I've thought that this info would pertain to capture list and not
to field. If we need it we might save the FieldDecl* together with the
VarDecl* in LambdaExpr::Capture.

I'd be perfectly fine with adding the FieldDecl* into LambdaExpr::Capture, so it's easy to map between the two.

Elsewhere, you said:

I'd suggest a slightly different path:

1) the closure type FieldDecl has the name (actually a pseudo-name) of
the captured variable ("this" to refer to captured this)

As noted elsewhere, we can't do this exactly. The __name bit could work, but I'd prefer that we simply keep these as anonymous fields, because the __ names really don't help that much.

2) the FieldDecl uses a bit to represent the fact that fields are the
fields of the closure type (this means they are actually unnamed)

There's no need for this bit; simply check whether the DeclContext of the FieldDecl is a lambda class. From that lambda class, you can get at the LambdaExpr::Captures, and therefore the names and original VarDecls.

  - Doug

I'd suggest a slightly different path:

1) the closure type FieldDecl has the name (actually a
pseudo-name) of the captured variable ("this" to refer to
captured this)

2) the FieldDecl uses a bit to represent the fact that fields
are the fields of the closure type (this means they are
actually unnamed)

In this way the source pretty printing is easily doable, the
semantic info is accurate, no new AST node is needed, CodeGen
is simpler (it does not need to map DeclRefExpr to
MemberExpr).

I've forgot something?

That could work... although it would be a bit tricky to find the
original captured variable given a MemberExpr of this sort.

I've thought to that, but I failed to imagine a case where this is
needed.

It matters a lot for features that care more about the results of
name lookup than the underlying semantics. For example, libclang's
clang_findReferencesInFile, which finds all of the references to a
given declaration, would need to introduce new code to map the fields
of implicitly-generated MemberExprs back to references to a normal
variable declaration. In general, these tools expect (reasonably,
IMO) that a local variable or static data member will be referenced
with DeclRefExpr, while a non-static data member will be referenced
with a MemberExpr. That's actually a very nice invariant. Doing as
you suggest would complicate the invariants for these clients,
forcing them to deal specifically with lambda captures (which they
otherwise wouldn't have to consider). And if we have to have the
complication somewhere, I'd rather it be with the more intelligent
clients that care about semantics, rather than the clients that only
care about cross-referencing.

I have a rather different perspective for libclang that IMHO is more
accurate (and congruent with semantic): the variable in body references
the capture list entry (implicit or explicit) while the capture list
entry references the captured variable.

Also I've thought that this info would pertain to capture list and
not to field. If we need it we might save the FieldDecl* together
with the VarDecl* in LambdaExpr::Capture.

I'd be perfectly fine with adding the FieldDecl* into
LambdaExpr::Capture, so it's easy to map between the two.

Elsewhere, you said:

I'd suggest a slightly different path:

1) the closure type FieldDecl has the name (actually a pseudo-name)
of the captured variable ("this" to refer to captured this)

As noted elsewhere, we can't do this exactly. The __name bit could
work, but I'd prefer that we simply keep these as anonymous fields,
because the __ names really don't help that much.

I apologize, it seems I've missed the elsewhere where this has been
clarified. Can you explain that? What's the problem having fake names
(marked as fake)?

It would be a pity to not have such names in FieldDecl, this would make
pretty printer job very easy.

2) the FieldDecl uses a bit to represent the fact that fields are
the fields of the closure type (this means they are actually
unnamed)

There's no need for this bit; simply check whether the DeclContext of
the FieldDecl is a lambda class. From that lambda class, you can get
at the LambdaExpr::Captures, and therefore the names and original
VarDecls.

Yes this is definitely easily doable if indeed we can't have name in
FieldDecl.

I'd suggest a slightly different path:

1) the closure type FieldDecl has the name (actually a
pseudo-name) of the captured variable ("this" to refer to
captured this)

2) the FieldDecl uses a bit to represent the fact that fields
are the fields of the closure type (this means they are
actually unnamed)

In this way the source pretty printing is easily doable, the
semantic info is accurate, no new AST node is needed, CodeGen
is simpler (it does not need to map DeclRefExpr to
MemberExpr).

I've forgot something?

That could work... although it would be a bit tricky to find the
original captured variable given a MemberExpr of this sort.

I've thought to that, but I failed to imagine a case where this is
needed.

It matters a lot for features that care more about the results of
name lookup than the underlying semantics. For example, libclang's
clang_findReferencesInFile, which finds all of the references to a
given declaration, would need to introduce new code to map the fields
of implicitly-generated MemberExprs back to references to a normal
variable declaration. In general, these tools expect (reasonably,
IMO) that a local variable or static data member will be referenced
with DeclRefExpr, while a non-static data member will be referenced
with a MemberExpr. That's actually a very nice invariant. Doing as
you suggest would complicate the invariants for these clients,
forcing them to deal specifically with lambda captures (which they
otherwise wouldn't have to consider). And if we have to have the
complication somewhere, I'd rather it be with the more intelligent
clients that care about semantics, rather than the clients that only
care about cross-referencing.

I have a rather different perspective for libclang that IMHO is more
accurate (and congruent with semantic): the variable in body references
the capture list entry (implicit or explicit) while the capture list
entry references the captured variable.

You're saying it's "more accurate" because it matches more closely with the as-i" implementation written in the standard, and I don't dispute that. What I dispute is that exactly modeling the as-if rule in the standard is the right thing for Clang. We're not bound to implement lambda classes via exactly that as-i" rule, and we probably shouldn't do so for [&] lambdas anyway (because it's silly to store several references to local variables in the same stack frame). And, as noted above, our ASTs are meant to describe the language (which they certainly do, even if not strictly based on the as-if rule) and are meant to be usable by clients. Your suggestion may make some trivial improvement in the former (for those who want to think in terms of the as-if rule), but complicate other clients. That's not a good trade-off.

Also, don't forget blocks. Blocks use a very different implementation of the capture mechanism. Would you also want to rework the blocks capture ASTs to reflect how block captures are actually handled? They're different yet again from lambdas, so you'll end up with different AST schemes for captures in lambdas and blocks. Today, we use a simple representation that works well for both lambdas and blocks.

Elsewhere, you said:

I'd suggest a slightly different path:

1) the closure type FieldDecl has the name (actually a pseudo-name)
of the captured variable ("this" to refer to captured this)

As noted elsewhere, we can't do this exactly. The __name bit could
work, but I'd prefer that we simply keep these as anonymous fields,
because the __ names really don't help that much.

I apologize, it seems I've missed the elsewhere where this has been
clarified. Can you explain that?

The __ names still require you to detect that you're in a lambda and then remove the '__' from the name. You can't simply remove __ from every name, or you'll get bogus results from various system classes that actually do use __.

What's the problem having fake names
(marked as fake)?

I don't know what "marked as fake" means. If it means introducing some other notion of name into NamedDecl, it's very certainly not worth the additional complexity in the AST. If you aren't introducing a new kind of name, you really haven't marked the name as being "fake" in any meaningful way.

It would be a pity to not have such names in FieldDecl, this would make
pretty printer job very easy.

Adding a

  VarDecl *getCapturedVariable(FieldDecl *)

for lambda classes would make the pretty printer's job trivial. Besides, what's a pretty-printer doing printing the generated lambda class?

  - Doug

I'd suggest a slightly different path:

1) the closure type FieldDecl has the name (actually a
pseudo-name) of the captured variable ("this" to refer to
captured this)

2) the FieldDecl uses a bit to represent the fact that
fields are the fields of the closure type (this means they
are actually unnamed)

In this way the source pretty printing is easily doable,
the semantic info is accurate, no new AST node is needed,
CodeGen is simpler (it does not need to map DeclRefExpr to
MemberExpr).

I've forgot something?

That could work... although it would be a bit tricky to find
the original captured variable given a MemberExpr of this
sort.

I've thought to that, but I failed to imagine a case where this
is needed.

It matters a lot for features that care more about the results
of name lookup than the underlying semantics. For example,
libclang's clang_findReferencesInFile, which finds all of the
references to a given declaration, would need to introduce new
code to map the fields of implicitly-generated MemberExprs back
to references to a normal variable declaration. In general, these
tools expect (reasonably, IMO) that a local variable or static
data member will be referenced with DeclRefExpr, while a
non-static data member will be referenced with a MemberExpr.
That's actually a very nice invariant. Doing as you suggest would
complicate the invariants for these clients, forcing them to deal
specifically with lambda captures (which they otherwise wouldn't
have to consider). And if we have to have the complication
somewhere, I'd rather it be with the more intelligent clients
that care about semantics, rather than the clients that only care
about cross-referencing.

I have a rather different perspective for libclang that IMHO is
more accurate (and congruent with semantic): the variable in body
references the capture list entry (implicit or explicit) while the
capture list entry references the captured variable.

You're saying it's "more accurate" because it matches more closely
with the as-i" implementation written in the standard, and I don't
dispute that. What I dispute is that exactly modeling the as-if rule
in the standard is the right thing for Clang. We're not bound to
implement lambda classes via exactly that as-i" rule, and we probably
shouldn't do so for [&] lambdas anyway (because it's silly to store
several references to local variables in the same stack frame). And,
as noted above, our ASTs are meant to describe the language (which
they certainly do, even if not strictly based on the as-if rule) and
are meant to be usable by clients. Your suggestion may make some
trivial improvement in the former (for those who want to think in
terms of the as-if rule), but complicate other clients. That's not a
good trade-off.

I'm rather unconvinced this is a good choice: what about AST Matchers
that want to find references to captured variable? I'm sure there are
also other example of things that become rather complex if we permit
such a large deviation from correct semantics to AST representation (and
as far as I know this is almost unprecedented).

1) the closure type FieldDecl has the name (actually a
pseudo-name) of the captured variable ("this" to refer to
captured this)

As noted elsewhere, we can't do this exactly. The __name bit
could work, but I'd prefer that we simply keep these as anonymous
fields, because the __ names really don't help that much.

I apologize, it seems I've missed the elsewhere where this has
been clarified. Can you explain that?

The __ names still require you to detect that you're in a lambda and
then remove the '__' from the name. You can't simply remove __ from
every name, or you'll get bogus results from various system classes
that actually do use __.

I meant to name the fields exactly with the captured name (without __
prefix) and "this" for captured this. These names would be hidden to lookup.

It would be a pity to not have such names in FieldDecl, this would
make pretty printer job very easy.

Adding a

VarDecl *getCapturedVariable(FieldDecl *)

for lambda classes would make the pretty printer's job trivial.
Besides, what's a pretty-printer doing printing the generated lambda
class?

The body of operator() of lambda class should be printed.

I'd suggest a slightly different path:

1) the closure type FieldDecl has the name (actually a
pseudo-name) of the captured variable ("this" to refer to
captured this)

2) the FieldDecl uses a bit to represent the fact that
fields are the fields of the closure type (this means they
are actually unnamed)

In this way the source pretty printing is easily doable,
the semantic info is accurate, no new AST node is needed,
CodeGen is simpler (it does not need to map DeclRefExpr to
MemberExpr).

I've forgot something?

That could work... although it would be a bit tricky to find
the original captured variable given a MemberExpr of this
sort.

I've thought to that, but I failed to imagine a case where this
is needed.

It matters a lot for features that care more about the results
of name lookup than the underlying semantics. For example,
libclang's clang_findReferencesInFile, which finds all of the
references to a given declaration, would need to introduce new
code to map the fields of implicitly-generated MemberExprs back
to references to a normal variable declaration. In general, these
tools expect (reasonably, IMO) that a local variable or static
data member will be referenced with DeclRefExpr, while a
non-static data member will be referenced with a MemberExpr.
That's actually a very nice invariant. Doing as you suggest would
complicate the invariants for these clients, forcing them to deal
specifically with lambda captures (which they otherwise wouldn't
have to consider). And if we have to have the complication
somewhere, I'd rather it be with the more intelligent clients
that care about semantics, rather than the clients that only care
about cross-referencing.

I have a rather different perspective for libclang that IMHO is
more accurate (and congruent with semantic): the variable in body
references the capture list entry (implicit or explicit) while the
capture list entry references the captured variable.

You're saying it's "more accurate" because it matches more closely
with the as-i" implementation written in the standard, and I don't
dispute that. What I dispute is that exactly modeling the as-if rule
in the standard is the right thing for Clang. We're not bound to
implement lambda classes via exactly that as-i" rule, and we probably
shouldn't do so for [&] lambdas anyway (because it's silly to store
several references to local variables in the same stack frame). And,
as noted above, our ASTs are meant to describe the language (which
they certainly do, even if not strictly based on the as-if rule) and
are meant to be usable by clients. Your suggestion may make some
trivial improvement in the former (for those who want to think in
terms of the as-if rule), but complicate other clients. That's not a
good trade-off.

I'm rather unconvinced this is a good choice: what about AST Matchers
that want to find references to captured variable?

Those are simple today, because they're just DeclRefExprs. What you're asking for would complicate those matchers, because you'd have to match an implicitly-generated MemberExpr and map from the FieldDecl back to the VarDecl.

I'm sure there are
also other example of things that become rather complex if we permit
such a large deviation from correct semantics to AST representation (and
as far as I know this is almost unprecedented).

If you want to make this case, you'll need to come up with those examples. None of the clients of the current AST are particularly complex, and those that I've enumerated (and the one you've listed above) are simpler with the current AST representation.

I see that you've dropped my comment about blocks. It is important, however. Having two different ways of expressing closures is unfortunate, and forced on us by the various dialects we support, but representing their captures in very different ways in the AST would be a lot more unfortunate (and is avoidable).

1) the closure type FieldDecl has the name (actually a
pseudo-name) of the captured variable ("this" to refer to
captured this)

As noted elsewhere, we can't do this exactly. The __name bit
could work, but I'd prefer that we simply keep these as anonymous
fields, because the __ names really don't help that much.

I apologize, it seems I've missed the elsewhere where this has
been clarified. Can you explain that?

The __ names still require you to detect that you're in a lambda and
then remove the '__' from the name. You can't simply remove __ from
every name, or you'll get bogus results from various system classes
that actually do use __.

I meant to name the fields exactly with the captured name (without __
prefix) and "this" for captured this. These names would be hidden to lookup.

"Hidden to lookup" is going to mean changing name lookup in a few places. That's rather unfortunate.

It would be a pity to not have such names in FieldDecl, this would
make pretty printer job very easy.

Adding a

VarDecl *getCapturedVariable(FieldDecl *)

for lambda classes would make the pretty printer's job trivial.
Besides, what's a pretty-printer doing printing the generated lambda
class?

The body of operator() of lambda class should be printed.

… and in that pretty-printing, we should print out the name of the captured variable, not a member expression referring to the field corresponding to the captured variable. Your suggestion either changes nothing (because an implicit member expression to the field prints just the field name), or complicates matters (if we don't
have the field names match up with the captured variable names). Why is this an interesting case to discuss?

Fundamentally, you're arguing from the perspective of "purity of representation." There are two problems here: the pure representation you want isn't really "pure", in the sense that you want to represent an as-if rule that doesn't have to be the actual implementation approach. The second problem is that you're not considering the effect of this change on clients, which (so far) look like they would get *more* complex than what we have now.

  - Doug

I'd suggest a slightly different path:

1) the closure type FieldDecl has the name (actually a
pseudo-name) of the captured variable ("this" to refer
to captured this)

2) the FieldDecl uses a bit to represent the fact that
fields are the fields of the closure type (this means
they are actually unnamed)

In this way the source pretty printing is easily
doable, the semantic info is accurate, no new AST node
is needed, CodeGen is simpler (it does not need to map
DeclRefExpr to MemberExpr).

I've forgot something?

That could work... although it would be a bit tricky to
find the original captured variable given a MemberExpr of
this sort.

I've thought to that, but I failed to imagine a case where
this is needed.

It matters a lot for features that care more about the
results of name lookup than the underlying semantics. For
example, libclang's clang_findReferencesInFile, which finds
all of the references to a given declaration, would need to
introduce new code to map the fields of implicitly-generated
MemberExprs back to references to a normal variable
declaration. In general, these tools expect (reasonably, IMO)
that a local variable or static data member will be
referenced with DeclRefExpr, while a non-static data member
will be referenced with a MemberExpr. That's actually a very
nice invariant. Doing as you suggest would complicate the
invariants for these clients, forcing them to deal
specifically with lambda captures (which they otherwise
wouldn't have to consider). And if we have to have the
complication somewhere, I'd rather it be with the more
intelligent clients that care about semantics, rather than
the clients that only care about cross-referencing.

I have a rather different perspective for libclang that IMHO
is more accurate (and congruent with semantic): the variable in
body references the capture list entry (implicit or explicit)
while the capture list entry references the captured variable.

You're saying it's "more accurate" because it matches more
closely with the as-i" implementation written in the standard,
and I don't dispute that. What I dispute is that exactly modeling
the as-if rule in the standard is the right thing for Clang.
We're not bound to implement lambda classes via exactly that
as-i" rule, and we probably shouldn't do so for [&] lambdas
anyway (because it's silly to store several references to local
variables in the same stack frame). And, as noted above, our ASTs
are meant to describe the language (which they certainly do, even
if not strictly based on the as-if rule) and are meant to be
usable by clients. Your suggestion may make some trivial
improvement in the former (for those who want to think in terms
of the as-if rule), but complicate other clients. That's not a
good trade-off.

I'm rather unconvinced this is a good choice: what about AST
Matchers that want to find references to captured variable?

Those are simple today, because they're just DeclRefExprs. What
you're asking for would complicate those matchers, because you'd have
to match an implicitly-generated MemberExpr and map from the
FieldDecl back to the VarDecl.

There is a misunderstanding: I meant the difficulty to discriminate
between real DeclRefExpr and fake one (references to captured variables)
in AST Matchers. Every time the matcher get a DeclRefExpr should do
something special to understand that.

I'm sure there are also other example of things that become rather
complex if we permit such a large deviation from correct semantics
to AST representation (and as far as I know this is almost
unprecedented).

If you want to make this case, you'll need to come up with those
examples. None of the clients of the current AST are particularly
complex, and those that I've enumerated (and the one you've listed
above) are simpler with the current AST representation.

I see that you've dropped my comment about blocks. It is important,
however. Having two different ways of expressing closures is
unfortunate, and forced on us by the various dialects we support, but
representing their captures in very different ways in the AST would
be a lot more unfortunate (and is avoidable).

I've dropped it only because I'm very ignorant about that, not to
dismiss its relevance. I definitely trust you about that.

It would be a pity to not have such names in FieldDecl, this
would make pretty printer job very easy.

Adding a

VarDecl *getCapturedVariable(FieldDecl *)

for lambda classes would make the pretty printer's job trivial.
Besides, what's a pretty-printer doing printing the generated
lambda class?

The body of operator() of lambda class should be printed.

… and in that pretty-printing, we should print out the name of the
captured variable, not a member expression referring to the field
corresponding to the captured variable. Your suggestion either
changes nothing (because an implicit member expression to the field
prints just the field name), or complicates matters (if we don't have
the field names match up with the captured variable names). Why is
this an interesting case to discuss?

I was only making the point that representing references to captured
variable with a MemberExpr does not make pretty printer job more complex
wrt current status, *if* fields have a name.

Fundamentally, you're arguing from the perspective of "purity of
representation." There are two problems here: the pure representation
you want isn't really "pure", in the sense that you want to represent
an as-if rule that doesn't have to be the actual implementation
approach. The second problem is that you're not considering the
effect of this change on clients, which (so far) look like they would
get *more* complex than what we have now.

My understand is that codegen and analysis would be rather simplified,
libclang should consider the existence of captured variables (cause they
exist ;-), tooling would be grateful if AST is conformant to semantics.
About reference collectors I definitely don't think they should confuse
references to captured variabile with references to variables, they are
not the same thing.

However it is not my intention to insist too much, once the presence in
AST of DeclRefExpr that are not real DeclRefExpr is understood and
accepted despite its cost, I've little arguments beyond that.

I’d suggest a slightly different path:

  1. the closure type FieldDecl has the name (actually a
    pseudo-name) of the captured variable (“this” to refer to
    captured this)

  2. the FieldDecl uses a bit to represent the fact that
    fields are the fields of the closure type (this means they
    are actually unnamed)

In this way the source pretty printing is easily doable,
the semantic info is accurate, no new AST node is needed,
CodeGen is simpler (it does not need to map DeclRefExpr to
MemberExpr).

I’ve forgot something?

That could work… although it would be a bit tricky to find
the original captured variable given a MemberExpr of this
sort.

I’ve thought to that, but I failed to imagine a case where this
is needed.

It matters a lot for features that care more about the results
of name lookup than the underlying semantics. For example,
libclang’s clang_findReferencesInFile, which finds all of the
references to a given declaration, would need to introduce new
code to map the fields of implicitly-generated MemberExprs back
to references to a normal variable declaration. In general, these
tools expect (reasonably, IMO) that a local variable or static
data member will be referenced with DeclRefExpr, while a
non-static data member will be referenced with a MemberExpr.
That’s actually a very nice invariant. Doing as you suggest would
complicate the invariants for these clients, forcing them to deal
specifically with lambda captures (which they otherwise wouldn’t
have to consider). And if we have to have the complication
somewhere, I’d rather it be with the more intelligent clients
that care about semantics, rather than the clients that only care
about cross-referencing.

I have a rather different perspective for libclang that IMHO is
more accurate (and congruent with semantic): the variable in body
references the capture list entry (implicit or explicit) while the
capture list entry references the captured variable.

You’re saying it’s “more accurate” because it matches more closely
with the as-i" implementation written in the standard, and I don’t
dispute that. What I dispute is that exactly modeling the as-if rule
in the standard is the right thing for Clang. We’re not bound to
implement lambda classes via exactly that as-i" rule, and we probably
shouldn’t do so for [&] lambdas anyway (because it’s silly to store
several references to local variables in the same stack frame). And,
as noted above, our ASTs are meant to describe the language (which
they certainly do, even if not strictly based on the as-if rule) and
are meant to be usable by clients. Your suggestion may make some
trivial improvement in the former (for those who want to think in
terms of the as-if rule), but complicate other clients. That’s not a
good trade-off.

I’m rather unconvinced this is a good choice: what about AST Matchers
that want to find references to captured variable? I’m sure there are
also other example of things that become rather complex if we permit
such a large deviation from correct semantics to AST representation (and
as far as I know this is almost unprecedented).

Most simple cases “just work” if captured variables are represented as DeclRefExprs referring to the actual variable. For instance: cross-referencing, renaming, a “replace with initializer” refactoring, etc. These tools need no knowledge of lambdas or blocks to work correctly right now. Essentially, we want to make it easy to find all uses of a declaration, and requiring all consumers to check whether the variable is a capture and then to map to the captured entity is not easy.

I don’t find your argument about standards-conformance to be persuasive. The capturing names used inside the lambda do refer to the variables declared outside, right up until the transformation in [expr.prim.lambda]p17 is applied. But Clang’s AST is designed to represent the program before desugaring transformations are applied – we don’t prematurely lower things to make semantic analysis easier – so there’s certainly an argument that the current representation is more faithful.

Finally, it would seem truly strange to rewrite odr-uses as MemberExprs but leave non-odr-use mentions of variables as DeclRefExprs, and that is the natural consequence of the approach you are suggesting. (And, as Doug points out, rewriting references to entities captured by reference would seem like a bad idea too.)

  1. the closure type FieldDecl has the name (actually a
    pseudo-name) of the captured variable (“this” to refer to
    captured this)

As noted elsewhere, we can’t do this exactly. The __name bit
could work, but I’d prefer that we simply keep these as anonymous
fields, because the __ names really don’t help that much.

I apologize, it seems I’ve missed the elsewhere where this has
been clarified. Can you explain that?

The __ names still require you to detect that you’re in a lambda and
then remove the ‘__’ from the name. You can’t simply remove __ from
every name, or you’ll get bogus results from various system classes
that actually do use __.

I meant to name the fields exactly with the captured name (without __
prefix) and “this” for captured this. These names would be hidden to lookup.

How would the AST refer to that captured ‘this’? (implicit this)->“this”->whatever doesn’t work, because you lose source fidelity if the original reference to “this” was implicit. Again, this seems to add more complexity.

It would be a pity to not have such names in FieldDecl, this would
make pretty printer job very easy.

Adding a

VarDecl *getCapturedVariable(FieldDecl *)

for lambda classes would make the pretty printer’s job trivial.
Besides, what’s a pretty-printer doing printing the generated lambda
class?

The body of operator() of lambda class should be printed.

So you only need the fields to have names because you’re proposing that we build MemberExprs referring to them, right? That is a complexity not present in the current design.