LLVM grammar for ANTLR

Has anyone written a grammar for LLVM for ANTLR. I mean an ANTLR
grammar that parses LLVM instructions. Is an LLVM grammar available
for any other parsing tool?


Hello Surinder,

The existing hand-written parser is callable from almost anywhere so the only
reason you'd need to have a parser for it would be to extend it. Originally it
was written using Flex and Bison but Chris Lattner rewrote it from scratch to
catch more errors at the parsing stage.

The only feature I've found to be missing from the existing LLVM-AS utility was
an include function with automatic unique detection. This would allow the use
of headers instead of having to manually copy and paste them into the source.
What were you planning on doing with your LLVM parser, if I may ask?



is a grammar for LLVM assembly language written for a YACC-like LALR(1)
parser generator. (The actual grammar rules start at the "define parser
llvm-parser" line and continue to the end of the file.) This was
constructed by starting from the original YACC-based parser for llvm-as
(using a script that extracted raw grammar rules from the output of
bison -v), and then augmenting it to disambiguate several shift/reduce
conflicts and to handle changes to LLVM assembly language after the
lib/AsmParser parser was re-implemented as a recursive-descent parser.

Note that as usual with LR grammars, it contains a fair amount of
left-recursion and would not be directly suitable for ANTLR, which
implements LL(k).


Hi Sam,

Thanks for your reply.

I am implementing my research
(http://www.it.usyd.edu.au/~suri/Detecting%20Buffer%20Over.pdf), a
translation of LLVM to a simple non-deterministic language to detect
buffer overflows. It involves

(1) printing a control flow graph of basic blocks of a function (easily done)
(2) translating each llvm statement to a corresponding data flow
language (needs ASTs to traverse)
(3) translating control flow graph to a regular expression
(4) translating data flow language statements to non-deterministic language
(5) translating non-deterministic language to CLP (Prolog like solver)
(6) finding a Prolog solution as proof of or absence of buffer overflows

I have implemented Step (1) & (2) above by writing a transform pass.
Remaining steps (3) to (6) have been written in Haskell. Step (2),
translation from llvm to dfl language has been done by implementing an
AnnotationWriter and it emits dfl format of each llvm statement. This
is rather crude and causes crashes between llvm/haskell interface. I
am considering to dump llvm as it is into Haskell program and then use
a parser tool to do llvm statement translation to dfl. The
translation is a set of simple rewrite tools.

You can get further details from my website
http://it.usyd.edu.au/~suri or you can ask me.

Thank you for your interest.


Hello Surinder,

I understand now what you are trying to do. Unfortunately what I was going to
suggest takes things in exactly the opposite direction of what you need.

I have a parsing expression grammar-based parser generator that generates LLVM
bitcode by using the LLVM assembly parser to generate all rules. I was
considering adding an LLVM assembly parser to it to simplify the parsing of
actions. This would be exactly the opposite of what you need since you
essentially need an LLVM backend to generate code in your nondeterministic

Also, I'm unfamiliar with ANTLR since the unusual platforms I support oftentimes
don't have a Java VM to run ANTLR on. The PEG parser generator I use generates
C rather than Haskell. I'm not interested in making another parser generator to
generate Haskell parsing code. If you need a parser in C I might be able to get
something set up for you using PEG.

I'm sorry I wasn't able to help you more,