LLVM parsers for popular languages? - Python, Rust, Go

IIRC when LLVM came out a bunch of community-contributed parsers were available on your website.

Essentially I want to read in a programming language, prune the AST until it contains only what I define as a “summary”, then convert that AST to that of another language, before finally outputting [code-generating] a compilable/interpretable source [think boilerplate].

Would be good to have Python, Rust and Go.

Are there any LLVM parsers around for these popular languages? - I can write my own, but then I’d need to maintain them against the latest language specs >.<

Thanks for all suggestions

Typically, the AST is language-specific [as concepts in one language often doesn’t exist at all in another language, e.g. Python List and Dictionaries don’t exist in C], so outputting it as “another language” is not always easy (or even possible without much infrastructure in the target language).

I also have absolutely no idea what you mean by “Summary” - I expect you mean the essential parts of the code, minus error checking and such, but I have a hard time understanding how you differentiate between the algorithm’s essential parts, and “unnecessary error checks” [unless your code understands the algorithm, but then you could just as well write a program that outputs various algorithms in different languages, which would be easier than taking in source in one language and output it in another language].


Alec Taylor <alec.taylor6 <at> gmail.com> writes:

Would be good to have Python, Rust and Go.Are there any LLVM parsers
around for these popular languages?

A programming language is much more than a parser and AST. It has
specific semantics, and a runtime (in the case of Python, the runtime is
very large as it hosts a lot of functionality).

So it wouldn't make much sense to have "just a parser".

However, if you are looking for an implementation of a subset of Python
using LLVM, you can take a look at Numba: http://numba.pydata.org/

(disclaimer: I am part of the Numba team)



Thanks, happy to of confirmed.

With that in mind, will use the AST modules provided by the languages (with the exception of libclang for C++).

Antoine: Am aware of Numba, nice job there BTW. So is there a [decoupled] LLVM parser which I can use to read Python files and analyse objects (including computing their attributes in OO and setattr scenarios)?

Alec Taylor <alec.taylor6 <at> gmail.com> writes:

So is there a [decoupled] LLVM parser which I can use to read Python files

and analyse objects (including computing their attributes in OO and setattr

There isn't. We simply let Python parse the JITted code itself. The parser
is written in C, and is in the CPython code base. If you want to tinker
with that, there is a doc at https://docs.python.org/devguide/compiler.html



No worries, I think the ast module will suffice for now.

There is also the Pyston project from Dropbox. Presumably that includes a Python parser.

I’m not affiliated with the project.

Yep we have our own parser and we would love to see other people use it. When we looked around at some other Python parsers we didn’t feel like any of them were easy to extract and use on their own, so we wrote our own and I think were able to keep ours well-separated. There are some things that make parsing Python somewhat difficult to do in a fully project-agnostic way: any syntax errors usually get thrown as user-level exceptions, you probably don’t want to encode the full set of unicode character names into your parser to handle u"\N{POUND SIGN}", and the parser has to support calling back into Python code for supporting custom encodings requested via “# coding” lines.

I think we’ve done a decent job factoring those things out (they get provided by your project via callbacks), but you do have to provide those features or avoid parsing code that would need them. If you can get the job done by working in Python using the ast module, I would recommend that.

Thanks, that looks like an interesting project. How do I build it?

I’ve tried:

$ cd libpypa && mkdir build && cd $_ && cmake … -G ‘Unix Makefiles’ && make

But that didn’t give me the parser-test binary for experimenting with (as per your README usage).

Hmm I’m not sure; might be best to have the discussion at https://gitter.im/vinzenz/libpypa where you can reach the parser’s author.

Perfect, a 20 minutes after joining and everything is working. Will experiment with this option, otherwise will just use the Python ast module and see how far that takes me.

With Go I’ll probably just use their AST module, and I’ll wait a while for Rust (spoke to the guys on IRC, it’s somewhere on their roadmap).