Decoupling semantics from parsing

Hello,

I am currently a math/physics student at UCSD and am looking for a way to get involved in clang. Is there a desire or interest to decouple semantic analysis form syntax parsing? Please let me know.

Thank,
-McCorney

Hello,

I am currently a math/physics student at UCSD and am looking for a way to get involved in clang. Is there a desire or interest to decouple semantic analysis form syntax parsing? Please let me know.

Thank,
-McCorney

What do you mean by that ? In general you need to do some semantic analysis to parse C++
(eg: what is x * y; ?). How you could get involved depends on what you would be
interested to work on. One of the first step is probably to gain familiarity with
the codebase. Familiarity with the standard(s) is also going to be required depending
on what you want to work on.

Bruno

Hello,

  1. What I meant was very clear: currently, type checking is embedded into the parsing logic and AST node creation and it would be really clean if they were decoupled.
  2. Yep, I have been spending time with the codebase for the past couple of years … just never contributed.

-McCorney

Earlier in Clang’s life, the parser did not depend on semantic analysis (lib/Parse did not depend on lib/Sema). However, my understanding is that as C++ support was added, it became clear that this was awkward, so in r112244, John removed the virtual ‘Action’ interface that Sema implemented and made Parse depend directly on Sema. I wasn’t around at the time, so I don’t know the exact motivations, but from what I can tell, clang has intentionally moved away from the kind of model you are proposing.

There are two somewhat-separable subjects here.

The first is doing parsing without doing semantic analysis. C is formally a
context-sensitive grammar, but it is possible to parse a C token sequence
into an ambiguous syntax tree (which would simply contain both valid parses
of e.g. size_t *x; in statement context) without semantic information.
Clang has never been written to do this; the abstraction layer we used to
have between Parser and Sema still had queries like “does this name resolve
to a type” which had to be answered before parsing could continue. Building
ambiguous parse trees can be useful for source tools but creates a lot of
complexity for a compiler, which has always been Clang’s primary mission.

The second is how information is exchanged between the parser and
semantic analysis. Clang’s parser used to call its semantic analysis
layer through an abstracted interface, but we never had a useful
alternative implementation, and the sheer breadth of the interactions
required for C++ (just because there are so many new grammatical
productions) made the interface increasingly unwieldy (and hard to
imagine providing an alternative implementation of), so we killed it off.

Also, in C there’s a massive performance optimization available if you can
combine the lookup performed by the lexer (to check whether something is a
macro and/or a keyword) with the identifier lookups performed by the parser
(in these ambiguous-parse cases) and semantic analysis (for actual name
resolution). We wouldn’t want an abstraction layer to interfere with that
optimization. (Unfortunately, this optimization loses a lot of its
effectiveness in C++ because there’s so much non-lexical lookup of
unqualified names.)

John.