I've been following these messages and just thought I would mention
a couple of our near-term goals which may be related to what you all are
(1) Alkis is really working on building a "toolkit" for implementing virtual
machines on top of LLVM. This means that different VMs (like JVM, CLI,
and PyPy) only need to implement their specific runtime requirements,
and a fast, simple (online or offline) translator to LLVM.
All the the native code generation and runtime optimization would happen
in the LLVM framework. Is this more or less what you have in mind
for using LLVM as a back end in PyPy?
Note that in this view, *all* the decisions about whether or when to
recompile some unit (e.g., hot functions as in Self) would happen in
the LLVM framework, independent of what language is being compiled.
Does that make sense for Python (and for PyPy)?
Supporting a Psyco-style basic-block-at-a-time compilation model
(as described by Armin below) on top of this toolkit is not
something we had considered so far. It would be interesting to see
how that could be done.
(2) One difficult part in building such a toolkit is to abstract the
interfaces between code generation and the runtime components
implemented in the language VM (like GC, exception handling, etc.).
We have been assuming that these runtime components must be controlled
by the language VM (e.g., JVM), since their semantics and performance
constraints are language-specific. The toolkit would only provide
some common primitives, to interface with the code generator and
to make these more efficient.
(3) Patrick Meredith is going to be working on a CAML (and perhaps later,
OCAML) front end to LLVM.
Note that these are all at a very early stage of work.
From: Armin Rigo <email@example.com>
Subject: [LLVMdev] Re: LLVM and PyPy
Date: Fri, 31 Oct 2003 20:48:40 +0000
> These are definitely features that we plan to add, but just haven't gotten
> to yet. In particular, Alkis is working on a Java front-end, which will
> require similar features. In the beginning, we will probably just use a
> conservative collector, eventually adding support for precise GC.
> We already have the capability of doing function-at-a-time code
> generation: what is basic-block at a time generation used for? How do you
> do global optimizations like register allocation?
It is central to Psyco, the Python just-in-time specializer
(http://psyco.sourceforge.net) whose techniques we plan to integrate with
PyPy. Unlike other environments like Self, which collects execution profiles
during interpretation and use them to recompile whole functions, Psyco has no
interpretation stage: it directly emits a basic block and run it; the values
found at run-time trigger the compilation of more basic blocks, which are run,
and so on. So each function's machine code is a dynamic network of basic
blocks which are various specialized versions of a bit of the original
function. This network is not statically known, in particular because basic
blocks often have a "switch" exit based on some value or type collected at
run-time. Every new value encountered at this point triggers the compilation
of a new switch case jumping to a new basic block.
We will also certainly consider Self-style recompilations, as they allow more
agressive optimizations. (Register allocation in Psyco is done using a simple
round-robin scheme; code generation is very fast.)
> That would be great! We've tossed around the idea of creating C bindings
> for LLVM, which would make interfacing from other languages easier than
> going directly to the C++ API, but we just haven't had a chance to yet.
> Maybe you guys would be interested in helping with that project?
Well, as the C++ API is nice and clean it is probably simpler to bind it
directly to Python. We would probably go for Boost-Python, which makes C++
objects directly accessible to Python. But nothing is sure about this; maybe
driving LLVM from LLVM code is closer to our needs. Is there a specific
interface to do that? Is it possible to extract from LLVM the required code
only, and link it with the final executable? In my experience, there are a
few limitations of C that require explicit assembly code, like building calls
dynamically (i.e. the caller's equivalent of varargs).