LLVM and PyPy


I've been following these messages and just thought I would mention
a couple of our near-term goals which may be related to what you all are
interested in:

(1) Alkis is really working on building a "toolkit" for implementing virtual
    machines on top of LLVM. This means that different VMs (like JVM, CLI,
    and PyPy) only need to implement their specific runtime requirements,
    and a fast, simple (online or offline) translator to LLVM.
    All the the native code generation and runtime optimization would happen
    in the LLVM framework. Is this more or less what you have in mind
    for using LLVM as a back end in PyPy?

    Note that in this view, *all* the decisions about whether or when to
    recompile some unit (e.g., hot functions as in Self) would happen in
    the LLVM framework, independent of what language is being compiled.
    Does that make sense for Python (and for PyPy)?

    Supporting a Psyco-style basic-block-at-a-time compilation model
    (as described by Armin below) on top of this toolkit is not
    something we had considered so far. It would be interesting to see
    how that could be done.

(2) One difficult part in building such a toolkit is to abstract the
    interfaces between code generation and the runtime components
    implemented in the language VM (like GC, exception handling, etc.).
    We have been assuming that these runtime components must be controlled
    by the language VM (e.g., JVM), since their semantics and performance
    constraints are language-specific. The toolkit would only provide
    some common primitives, to interface with the code generator and
    to make these more efficient.

(3) Patrick Meredith is going to be working on a CAML (and perhaps later,
    OCAML) front end to LLVM.

Note that these are all at a very early stage of work.

From: Armin Rigo <arigo@tunes.org>
Subject: [LLVMdev] Re: LLVM and PyPy
Sender: llvmdev-admin@cs.uiuc.edu
Date: Fri, 31 Oct 2003 20:48:40 +0000

Hello Chris,

> These are definitely features that we plan to add, but just haven't gotten
> to yet. In particular, Alkis is working on a Java front-end, which will
> require similar features. In the beginning, we will probably just use a
> conservative collector, eventually adding support for precise GC.


> We already have the capability of doing function-at-a-time code
> generation: what is basic-block at a time generation used for? How do you
> do global optimizations like register allocation?

It is central to Psyco, the Python just-in-time specializer
(http://psyco.sourceforge.net) whose techniques we plan to integrate with
PyPy. Unlike other environments like Self, which collects execution profiles
during interpretation and use them to recompile whole functions, Psyco has no
interpretation stage: it directly emits a basic block and run it; the values
found at run-time trigger the compilation of more basic blocks, which are run,
and so on. So each function's machine code is a dynamic network of basic
blocks which are various specialized versions of a bit of the original
function. This network is not statically known, in particular because basic
blocks often have a "switch" exit based on some value or type collected at
run-time. Every new value encountered at this point triggers the compilation
of a new switch case jumping to a new basic block.

We will also certainly consider Self-style recompilations, as they allow more
agressive optimizations. (Register allocation in Psyco is done using a simple
round-robin scheme; code generation is very fast.)

> That would be great! We've tossed around the idea of creating C bindings
> for LLVM, which would make interfacing from other languages easier than
> going directly to the C++ API, but we just haven't had a chance to yet.
> Maybe you guys would be interested in helping with that project?

Well, as the C++ API is nice and clean it is probably simpler to bind it
directly to Python. We would probably go for Boost-Python, which makes C++
objects directly accessible to Python. But nothing is sure about this; maybe
driving LLVM from LLVM code is closer to our needs. Is there a specific
interface to do that? Is it possible to extract from LLVM the required code
only, and link it with the final executable? In my experience, there are a
few limitations of C that require explicit assembly code, like building calls
dynamically (i.e. the caller's equivalent of varargs).

A bientot,