for-range; or: distinct source and semantic ASTs

Hi,

Once again I'm pondering the C++0x for-range loop. I'm not sure how it would best fit into Clang.

Some background: for-range looks like this:

for (type name : collection) body;

where collection can be an array, a uniform initializer list, or an arbitrary object if some functions are overloaded for it.
The standard specifies the semantics of for-range in terms of a rewrite (let __ denote some invented unique variable):

1) If collection is an array of size N:

auto&& __ar = collection;
for (auto __p = __a+0; __p != __a+N; ++__a) {
  type name(*__p);
  body;
}

2) If collection is an initializer list or object:

auto&& __c = collection;
for (auto __it = begin(__c), __e = end(__c); __it != __e; ++it) {
  type name(*__it);
}

Clients interested in the semantics of the code (e.g. CodeGen) would be easier to satisfy if the AST contained a representation of the rewrite. It would also be easier to implement in Sema, because Sema can just construct the rewrite and validate it using existing routines. CodeGen could also just use existing routines.
Another client that is really interested in the semantics would be a search for function usage. Say I have this:

struct mycollection { ... };
mycollection::iterator begin(mycollection&);
mycollection::iterator end(mycollection&);

And now I want to find all references to the begin function. Then a for-range loop over mycollection is such a use, because it calls this function.

On the other hand, clients interested in the source representation (e.g. the pretty printer) want the original code, which is not easy to regenerate from the rewritten form. (For example, the declaration of the loop variable has gained an initializer.)

Unfortunately, I don't think our AST supports having distinct "source view" and "semantic view" tree visitation strategies, does it? How would I best implement such a case?

Sebastian

Once again I'm pondering the C++0x for-range loop. I'm not sure how it would best fit into Clang.

Cool

And now I want to find all references to the begin function. Then a for-range loop over mycollection is such a use, because it calls this function.

On the other hand, clients interested in the source representation (e.g. the pretty printer) want the original code, which is not easy to regenerate from the rewritten form. (For example, the declaration of the loop variable has gained an initializer.)
Unfortunately, I don't think our AST supports having distinct "source view" and "semantic view" tree visitation strategies, does it? How would I best implement such a case?

I think it is best to represent it as a source-level representation. The two clients that should "lower" it are the CFG and CodeGen. If there are non-obvious things that sema does that would be bad to reconstruct (e.g. name lookups), then those should be cached as instance variables in the AST node.

-Chris

Hi,

Once again I'm pondering the C++0x for-range loop. I'm not sure how it would best fit into Clang.

Some background: for-range looks like this:

for (type name : collection) body;

where collection can be an array, a uniform initializer list, or an arbitrary object if some functions are overloaded for it.
The standard specifies the semantics of for-range in terms of a rewrite (let __ denote some invented unique variable):

1) If collection is an array of size N:

auto&& __ar = collection;
for (auto __p = __a+0; __p != __a+N; ++__a) {
type name(*__p);
body;
}

2) If collection is an initializer list or object:

auto&& __c = collection;
for (auto __it = begin(__c), __e = end(__c); __it != __e; ++it) {
type name(*__it);
}

Clients interested in the semantics of the code (e.g. CodeGen) would be easier to satisfy if the AST contained a representation of the rewrite. It would also be easier to implement in Sema, because Sema can just construct the rewrite and validate it using existing routines. CodeGen could also just use existing routines.

Sema will have to do the lookup and overload resolution for begin, end, !=, and ++it, and that information will have to be stored in the AST somewhere.

Another client that is really interested in the semantics would be a search for function usage. Say I have this:

struct mycollection { ... };
mycollection::iterator begin(mycollection&);
mycollection::iterator end(mycollection&);

And now I want to find all references to the begin function. Then a for-range loop over mycollection is such a use, because it calls this function.

I think this is a completely separable issue.

On the other hand, clients interested in the source representation (e.g. the pretty printer) want the original code, which is not easy to regenerate from the rewritten form. (For example, the declaration of the loop variable has gained an initializer.)

Unfortunately, I don't think our AST supports having distinct "source view" and "semantic view" tree visitation strategies, does it? How would I best implement such a case?

There is one case where our AST has a distinct "source view" and "semantic view", which is in InitListExpr when dealing with designated initializers. However, I'd rather not repeat this pattern.

I suggest storing the source view in the AST. Then, for the non-array case, have expressions for the begin() initialization, end() initialization, != expression, and ++ expression, possibly using OpaqueValueExprs as stand-ins for __c, __e, and __it. We've done this kind of thing in a few other places, e.g., the expression needed to copy a non-POD block variable and the bool-conversion/result-conversion expression for the GNU x ? : y extension.

  - Doug