Lookahead vs. Tentative Parsing

Since C++ lambda expressions and Objective-C message expressions can each start with the same two tokens (l_square followed by identifier), it can take a lookahead of three tokens to differentiate the two cases. There are other instances, especially in C++, where further lookahead, or even semantic analysis, may be required. In the case above, I guess that lookahead is enough to decide which parsing path to take, but in general, what are the criteria for deciding to use lookahead vs. tentative parsing?

With tentative parsing, are diagnostics suppressed until the parsing has been committed?

- John

Since C++ lambda expressions and Objective-C message expressions can
each start with the same two tokens (l_square followed by identifier),
it can take a lookahead of three tokens to differentiate the two cases.

I’m playing around with implementing this myself. But why would you need a lookahead when these only appear in distinct languages? Currently there’s a “if objective C, start parsing a message” I was just going to add an “if C++, start parsing a lambda” after that.

Are you working on this too?

Objective-C++ exists and is important to a lot of our developers; any implementation of lambdas in Clang will eventually need to handle both.

John.

Objective-C++ exists and is important to a lot of our developers; any implementation of lambdas in Clang will eventually need to handle both.

Ah, Ok then. Though I assume lambdas aren’t a high priority for Apple at the moment, given the presence of Blocks. Is that the case? That’s why I thought I’d start looking at it myself as it seems like a feature the wider community might have more interest/motivation to work on than the Apple folks.

(I at least assume that a solution similar to the one I outlined wouldn’t make it terribly inconvenient to improve later to allow for ObjC++ lambdas)

I suppose the question is whether “getLang().CPlusPlus0x” is true for Objective-C++, or only actual C++0x? Well for now the ObjC case for messages comes first - so it won’t pollute the C++0x lambda case until someone wants to add in Obj-C++ lambdas.

Objective-C++ exists and is important to a lot of our developers; any implementation of lambdas in Clang will eventually need to handle both.

Ah, Ok then. Though I assume lambdas aren’t a high priority for Apple at the moment, given the presence of Blocks. Is that the case? That’s why I thought I’d start looking at it myself as it seems like a feature the wider community might have more interest/motivation to work on than the Apple folks.

I don’t think we’ll openly weep if the first patches we see don’t work in Objective-C++, but don’t do anything that would it impossible.

I suppose the question is whether “getLang().CPlusPlus0x” is true for Objective-C++, or only actual C++0x?

Objective-C (and its revisions and supplements) is orthogonal to C++. We can and do support using the Objective-C extensions in code whose “base” standard is C++0x.

Well for now the ObjC case for messages comes first - so it won’t pollute the C++0x lambda case until someone wants to add in Obj-C++ lambdas.

That seems like a reasonable approach.

John.

I suppose the question is whether “getLang().CPlusPlus0x” is true for Objective-C++, or only actual C++0x?

Objective-C (and its revisions and supplements) is orthogonal to C++. We can and do support using the Objective-C extensions in code whose “base” standard is C++0x.

More formally, that is to say that getLang().CPlusPlus0x and getLang().ObjC can both be true at the same time?

Well for now the ObjC case for messages comes first - so it won’t pollute the C++0x lambda case until someone wants to add in Obj-C++ lambdas.

That seems like a reasonable approach.

Great - thanks. (but perhaps John will get there before me - we’ll see. I’ve just been looking at it the last few days as an experiment/foray into the Clang/LLVM codebase)

  • David

Yes.

John.

We even support the new ARC Objective-C feature in all modes as well, meaning that folks can write code in Objective-ARC++'0x if they so choose.

-Chris

Yes. My parsing patch will be ready for review today, and next is AST generation.

Since C++ lambda expressions and Objective-C message expressions can
each start with the same two tokens (l_square followed by identifier),
it can take a lookahead of three tokens to differentiate the two cases.
There are other instances, especially in C++, where further lookahead,
or even semantic analysis, may be required. In the case above, I guess
that lookahead is enough to decide which parsing path to take, but in
general, what are the criteria for deciding to use lookahead vs.
tentative parsing?

Lookahead ahead is more efficient than tentative parsing. Typically, do lookahead if possible, to to catch all of the common cases. Then, fall back to tentative parsing if it's a tricky case.

With tentative parsing, are diagnostics suppressed until the parsing has
been committed?

No. Make sure that tentative parsing doesn't generate any diagnostics or perform any semantic analysis.

  - Doug

Since C++ lambda expressions and Objective-C message expressions can
each start with the same two tokens (l_square followed by identifier),
it can take a lookahead of three tokens to differentiate the two cases.
There are other instances, especially in C++, where further lookahead,
or even semantic analysis, may be required. In the case above, I guess
that lookahead is enough to decide which parsing path to take, but in
general, what are the criteria for deciding to use lookahead vs.
tentative parsing?

Lookahead ahead is more efficient than tentative parsing.

Actually there's not much difference in efficiency, they use the same preprocessor cache. The important thing is how many tokens you are going ahead, it doesn't matter what way you are using to get the tokens.

-Argiris

I've been playing with this over the weekend. Many thanks to everyone who worked on it. Being able to put Objective-C objects in C++ collections and have memory management (even with weak references) Just Work™ is amazing! It's probably going to let me delete about a thousand lines of code.

David