Lenient lexing/parsing of code snippets

Hello awesome clang community!

The combination of clang/llvm is so powerful that I'm sure what I'm
about to ask is simple. Unfortunately I've worked for about a month
and gotten nowhere. After RTFMing as much as possible, I'm reaching
out here.

I'm trying to use cfe to lex/parse code snippets. The snippets will be
complete functions but they will be taken mostly out of their original
context. I'd like to generate an AST for the code. I've tried several
different ways of doing this one my own. The major problem is that
when taken out of context, most of the variables are undefined. I've
tried iteratively compiling the code and modifying it at after each
compilation by adding declarations for the missing variables (which I
catch using a DiagnosticConsumer). This is cumbersome and, actually,
mostly just does not work.

Is there a way to do this? Did I miss some obvious set of
documentation somewhere? If this isn't already doable, can any of the
experts here recommend some best practices for something like this?

Any help would be greatly appreciated. Thanks for everything that you
are doing to make the CFE as great a toolset as it is!


This problem doesn't sound possible. When you have a code snippet taken out of any context, then you have no information about the types of any variable (or the definition of most of the types). This makes constructing an AST impossible (you don't say what language you're using, but if it's C++ then you don't even know from a snippet whether a+b is an arithmetic operation or a method invocation, even if it's C, then you don't know what the type promotion should be).

I think, to answer the question that you want to be asking, we need to know what you want to do with the AST.


Take a look a this interesting summer of code project. I think it goes
into the same direction as yours, flexible parsing of incomplete code.


Thank you Guilherme and David for your responses!

As you rightly pointed out, David, this is not a "normal" thing to do.
And, again as you rightly deciphered, the target language is C++.

All might not be lost, though. My ultimate goal is not so detailed
that I would need to know every detail of the program. In fact, David,
your example is quite illustrative of what I do NOT need to know.


y = x + b;

I don't particularly care whether or not that is a method invocation
and/or the type conversion. I am mostly concerned with doing analysis
on the names of variables and very high-level control flow

Ideally, I would really just like to have a parse tree, although I've
yet to find a way to get that without clang knowing much more about
the types than I can possibly tell it.

I will look very closely at the GSOC output, Guilherme. Based on what
little I've seen so far, it looks incredibly promising. Thank you for
digging that up for me.

Based on the additional information that I sent here (which I should
have provided in my first message, sorry!), please let me know if I am
missing something obvious.

Again, I appreciate your willingness to let me tap into the collective
expertise on this list!


Hi Will, I don’t think you’re missing anything obvious. Clang’s parse tree is the result of semantic analysis, at which point a + b is fully resolved (type promotion, overload resolution, etc.). Clang knows how to skip function bodies for example but that’s much more clearly defined than what you’re trying to do.