I'd suggest a slightly different path:
1) the closure type FieldDecl has the name (actually a
pseudo-name) of the captured variable ("this" to refer to
captured this)
2) the FieldDecl uses a bit to represent the fact that
fields are the fields of the closure type (this means they
are actually unnamed)
In this way the source pretty printing is easily doable,
the semantic info is accurate, no new AST node is needed,
CodeGen is simpler (it does not need to map DeclRefExpr to
MemberExpr).
I've forgot something?
That could work... although it would be a bit tricky to find
the original captured variable given a MemberExpr of this
sort.
I've thought to that, but I failed to imagine a case where this
is needed.
It matters a lot for features that care more about the results
of name lookup than the underlying semantics. For example,
libclang's clang_findReferencesInFile, which finds all of the
references to a given declaration, would need to introduce new
code to map the fields of implicitly-generated MemberExprs back
to references to a normal variable declaration. In general, these
tools expect (reasonably, IMO) that a local variable or static
data member will be referenced with DeclRefExpr, while a
non-static data member will be referenced with a MemberExpr.
That's actually a very nice invariant. Doing as you suggest would
complicate the invariants for these clients, forcing them to deal
specifically with lambda captures (which they otherwise wouldn't
have to consider). And if we have to have the complication
somewhere, I'd rather it be with the more intelligent clients
that care about semantics, rather than the clients that only care
about cross-referencing.
I have a rather different perspective for libclang that IMHO is
more accurate (and congruent with semantic): the variable in body
references the capture list entry (implicit or explicit) while the
capture list entry references the captured variable.
You're saying it's "more accurate" because it matches more closely
with the as-i" implementation written in the standard, and I don't
dispute that. What I dispute is that exactly modeling the as-if rule
in the standard is the right thing for Clang. We're not bound to
implement lambda classes via exactly that as-i" rule, and we probably
shouldn't do so for [&] lambdas anyway (because it's silly to store
several references to local variables in the same stack frame). And,
as noted above, our ASTs are meant to describe the language (which
they certainly do, even if not strictly based on the as-if rule) and
are meant to be usable by clients. Your suggestion may make some
trivial improvement in the former (for those who want to think in
terms of the as-if rule), but complicate other clients. That's not a
good trade-off.
I'm rather unconvinced this is a good choice: what about AST Matchers
that want to find references to captured variable?
Those are simple today, because they're just DeclRefExprs. What you're asking for would complicate those matchers, because you'd have to match an implicitly-generated MemberExpr and map from the FieldDecl back to the VarDecl.
I'm sure there are
also other example of things that become rather complex if we permit
such a large deviation from correct semantics to AST representation (and
as far as I know this is almost unprecedented).
If you want to make this case, you'll need to come up with those examples. None of the clients of the current AST are particularly complex, and those that I've enumerated (and the one you've listed above) are simpler with the current AST representation.
I see that you've dropped my comment about blocks. It is important, however. Having two different ways of expressing closures is unfortunate, and forced on us by the various dialects we support, but representing their captures in very different ways in the AST would be a lot more unfortunate (and is avoidable).
1) the closure type FieldDecl has the name (actually a
pseudo-name) of the captured variable ("this" to refer to
captured this)
As noted elsewhere, we can't do this exactly. The __name bit
could work, but I'd prefer that we simply keep these as anonymous
fields, because the __ names really don't help that much.
I apologize, it seems I've missed the elsewhere where this has
been clarified. Can you explain that?
The __ names still require you to detect that you're in a lambda and
then remove the '__' from the name. You can't simply remove __ from
every name, or you'll get bogus results from various system classes
that actually do use __.
I meant to name the fields exactly with the captured name (without __
prefix) and "this" for captured this. These names would be hidden to lookup.
"Hidden to lookup" is going to mean changing name lookup in a few places. That's rather unfortunate.
It would be a pity to not have such names in FieldDecl, this would
make pretty printer job very easy.
Adding a
VarDecl *getCapturedVariable(FieldDecl *)
for lambda classes would make the pretty printer's job trivial.
Besides, what's a pretty-printer doing printing the generated lambda
class?
The body of operator() of lambda class should be printed.
… and in that pretty-printing, we should print out the name of the captured variable, not a member expression referring to the field corresponding to the captured variable. Your suggestion either changes nothing (because an implicit member expression to the field prints just the field name), or complicates matters (if we don't
have the field names match up with the captured variable names). Why is this an interesting case to discuss?
Fundamentally, you're arguing from the perspective of "purity of representation." There are two problems here: the pure representation you want isn't really "pure", in the sense that you want to represent an as-if rule that doesn't have to be the actual implementation approach. The second problem is that you're not considering the effect of this change on clients, which (so far) look like they would get *more* complex than what we have now.
- Doug