ANTLR?

We are looking for an open source C++ parser other than g++ if possible. Clang would be great but its C++ support is still some way away and we need something that works or nearly works now. Does anyone have any experience with ANTLR for parsing C++ and for extending their C++ parser? Any other feedback on ANTLR in general would be welcome too. Thanks,

–Vikram
Associate Professor, Computer Science
University of Illinois at Urbana-Champaign
http://llvm.org/~vadve

P.S. Sorry for the spam. I know this question is not directly LLVM related but it is peripherally related and this list is the best source I could think of for C++ parsing experience.

Hi,

I’ve not got any experience using ANTLR to parse C++, however, you will find that there only exists a C code generator for ANTLR and NOT a C++ one. Over the years numerous people have requested a C++ code generation template but alas there is still only a C one. Just a heads up.

Granville

2009/7/11 Vikram S. Adve <vadve@cs.uiuc.edu>

That sounds like a problem. Just so I understand, do you mean there isn’t the run-time support etc. to write back ends for the C++ language, or that the compiler IR is also somehow insufficient to write a code generator?

–Vikram
Associate Professor, Computer Science
University of Illinois at Urbana-Champaign
http://llvm.org/~vadve

You need a lot more than a traditional parser to parse c++, you have to do full template instantiation, partial template specialization etc just to be able to correctly parse it. The "best answer" if you need something ASAP is to use Elsa.

-Chris

When you create a parser via ANTLR you specify the output language of the resulting recursive descent parser, at the moment there exists no C++ output template to my knowledge, thus you would have to generate the parser as C code for which a template exists.

The runtime support should be there, at least partially but it won’t use things like exceptions, nor will it have a very modular design (obviously, but in reality the C target is pretty good) etc. It would be best for you to post on the ANTLR mailing list, however, this request has been posted several times. Jim Idle is the name of the guy who last said he was going to look at creating a proper C++ target (Jim wrote the C target). After a quick search I found the following mail archive (http://markmail.org/message/lv2v272mi6njzx5m#query:antlr%20c%2B%2B%20target+page:1+mid:lv2v272mi6njzx5m+state:results) and from the date it would probably fit in with my mental records as being the last time the request was posted.

For this very reason I actually started using things like Spirit (http://spirit.sourceforge.net/), and Coco/R (http://ssw.jku.at/coco/) rather than ANTLR as my parser generation tool when I needed something that worked well with C++.

I take it though that the main reason you are thinking about using ANTLR is because of the availability of a pretty good, and existing C++ grammar?

Granville

PS. I strongly advise you to post this on the ANTLR mailing list (http://www.antlr.org/mailman/listinfo/antlr-interest), things may have changed since I last looked.

2009/7/11 Vikram S. Adve <vadve@cs.uiuc.edu>

Chris, I’m not sure how good this is, but there does exist a C++ grammar for ANTLR that is pretty actively maintained by Sun Microsystems - http://hg.netbeans.org/main/file/tip/cnd.modelimpl/src/org/netbeans/modules/cnd/modelimpl/parser/cppparser.g

Granville

2009/7/11 Chris Lattner <clattner@apple.com>

Right, I understand that. I was hoping there was such an implementation using ANTLR since it looks like a fairly mature project.

I’m not sure how stable or mature Elsa is (but comments to clarify that would be appreciated.). E.g., a quick scan of their Web page shows the comment that they only have a partial type checker. It also says their template instantiation is incomplete.

–Vikram
Associate Professor, Computer Science
University of Illinois at Urbana-Champaign
http://llvm.org/~vadve

Granville,

We actually need more than just a grammar, we really need a full compiler, either to native code or C. Our goal is to extend the C++ type system to enable static type checking of “non-interference” between parallel computations and use that to enforce deterministic semantics. We want to build a C++ version of a Java-based language we’ve developed called Deterministic Parallel Java:

http://dpj.cs.uiuc.edu

Thanks for all the pointers. This is very helpful.

–Vikram
Associate Professor, Computer Science
University of Illinois at Urbana-Champaign
http://llvm.org/~vadve

Right, I understand that. I was hoping there was such an implementation using ANTLR since it looks like a fairly mature project.

Not that I'm aware of.

I'm not sure how stable or mature Elsa is (but comments to clarify that would be appreciated.). E.g., a quick scan of their Web page shows the comment that they only have a partial type checker. It also says their template instantiation is incomplete.

Elsa definitely has its share of problems, but it is the best answer if you don't want to use G++.

-Chris

Chris Lattner wrote:

Right, I understand that. I was hoping there was such an implementation using ANTLR since it looks like a fairly mature project.

Not that I'm aware of.

I'm not sure how stable or mature Elsa is (but comments to clarify that would be appreciated.). E.g., a quick scan of their Web page shows the comment that they only have a partial type checker. It also says their template instantiation is incomplete.

Elsa definitely has its share of problems, but it is the best answer if you don't want to use G++.

-Chris

I'm not sure if it mature enough to meet your needs, but I've been working on a fork of Elsa for a while now. Definitely a work in progress. It is pretty much stock Elsa, except:

* It uses clang's file handling, preprocessor, and error reporting mechanisms.
* It uses the LLVM back end for optimization and code generation.
* It handles just about all of C now and a bit of C++.
* The driver program is gcc-ish and supports many of the gcc command line arguments.
* It can call all of the LLVM code generators out of the box.

I can't really characterize how much C++ it can do yet: I've been more interested in getting it to work with LLVM and the clang bits. I know it can parse a lot of C++. I'm missing support for code generation for some of the C++ parts of the AST.

It is a part time project, so my progress hasn't been as fast as I'd like. You DPJ project sound exactly like what I've been hoping to do eventually with my project, except I'd also like to throw in heterogeneous multiprocessing support.

http://ellcc.org

Most of the information, such as it is, is on the wiki: http::/ellcc.org/wiki

-Rich

For a LL(1) parser, it might be a little bit difficult to parse complex grammar like C++, but it might work.

ANTLR worked great when other codes were written in Java, but it was a little bit painful when using other languages like python.

I worked on it two years ago. I guess they might have some improvement now.

Haohui

For a LL(1) parser, it might be a little bit difficult to parse complex grammar like C++, but it might work.

ANTLR is an LL(*) parser, i.e. it will resolve parsing ambiguities by looking ahead as many tokens required in order to choose the correct alternative. Of course, you can fix the lookahead for ANTLR within the preamble of the grammar.

As I mentioned earlier I think the C++ grammar I linked to is pretty good, but then Vikram wants something more end-to-end based on his response to my comment.

Granville

2009/7/11 Mai, Haohui <haohui.mai@gmail.com>

Granville Barnett wrote:

Hi,

I've not got any experience using ANTLR to parse C++, however, you will find that there only exists a C code generator for ANTLR and NOT a C++ one. Over the years numerous people have requested a C++ code generation template but alas there is still only a C one. Just a heads up.

ANTLR v3 yes, but ANTLR 2.7.7 supports C++ code generation directly.
You just don't get all the nifty features and support tools that are written for v3.

I’ve done extensive research on the subject and if you want to parse ALL of C++, there are only two options, g++ or the Edison Design Group C++ front-end. Both of these have projects designed to make this easier LLVM (as you know) and Rose (http://www.rosecompiler.org/), which works with the EDG compiler. AspectC++, OpenC++, Antler, TLX, all work with only a portion of the grammer.

Thanks to everyone who has responded to my questions about a C++ parser. My original impression matched what David wrote below, which is that only g++ and EDG have complete, standards-compliant C++ front ends. Unfortunately, both licenses are less than ideal from our POV. Also, we need both a front-end and a code generator. On the other hand, we don't need full compliance *today* -- something that is close 3 months from today and nearly fully compliant in 6-9 months would work for us -- so we are going to take a closer look at the plans for clang.

--Vikram
Associate Professor, Computer Science
University of Illinois at Urbana-Champaign
http://llvm.org/~vadve