Formal spec for LLVM IR (Was: LLVM Language Reference Strictness)


Are you working on a grammar of the LLVM syntax or also on a full semantics?

Steve Zdancewic's group at U. Penn. is working on a formal operational semantics for LLVM. It is partially complete and Greg Morrisett at Harvard is planning to build further on it.


Professor, Computer Science
University of Illinois at Urbana-Champaign

Hi VIkram,

Here is roughly my plan.

The first thing is a grammar and English description.

With this I can produce a more complete reference manual and have grammar cross reference and such.

The extended bnf to write the grammar is variant of the one that Pennello and Deremer used in their Translator Writing System (TWS).

Then I want to implement a basic TWS so that this LLVM language can be parsed with a tool and not with ad hoc recursive descent. I've started this but have done just enough so that I can check the grammar and make sure all the non terminals are defined and such and produce cross reference.
I thought about using YACC or ANTLR or some other tools but non of them were what I wanted. YACC is too old and ANTLR has too much other baggage, doesn't really work well with C++ and is a whole other project. This tool is just for LLVM. If the tool can be used for other things later, that would be okay but I'm not interested in starting another general parser generator project.

The next step is to augment the grammar meta language (of the TWS) so that I can write high level grammar transformations and other processing on LLVM IR and translate them into C++ code for LLVM. I would like to be able to express optimizations that look like they would in a paper and not have to be shackled by the expressive limitations of C++.

I also want to augment the grammar so that I can insert descriptive information in it with the eventual ability to produce the reference manual from the augmented grammar.

I'm still at the beginning stage which is a complete grammar but that part is 80% or more done.
I could finish it in a week if I could find a week.

I'm not really sure what they are doing with the formal semantics so I can't say if we are trying to do the same things.

I'm mostly interesting on nailing down the syntax and semantics of LLVM IR and to start to develop some tools that let it be manipulated in a higher level and more intuitive way then just writing C++ code.