LLVM and PyPy

Hello Chris,

We have been investigating your project and the good documentation
and are very impressed. If we understood your goals correctly
this seems like a good match for our ongoing and active PyPy project,
a reimplementation of the Python language in Python.

Cool. We are all big fans of Python here. :slight_smile:

We'll definitely try using llvm as our low-level backend.
But actually we contact you now to ask you if you'd be interested
in some bidirectional collaboration.

I've read up a bit on PyPy, and it looks like LLVM could be a nice way to
get the JIT type interface that you would like. Also, making use of the
LLVM optimizer can make your staticly generated code nice and fast. :slight_smile:

Maybe a bit more background what we did or are:

- we are an open international group of individuals collaborating on
  our free time mostly. We are very involved with research, open source
  communities and especially the Python communities.

- during the course of four one-week meetings (which we call development
  "sprints") we have done a rather complete interpreter and can translate
  parts of it to C or Lisp code already (using a control-flow representation
  which is actually very similar to LLVM code, hence our enthusiasm!)

- we very recently submitted a funding proposal to the European Union:

    http://codespeak.net/svn/pypy/trunk/doc/funding/proposal/part_b.pdf

  and you may find these two chapters particularly interesting:

    http://codespeak.net/pypy/index.cgi?doc/funding/B1.0_objectives
    http://codespeak.net/pypy/index.cgi?doc/funding/B6.0_detailed

It sounds like LLVM could be a good implementation strategy for your
goals. You're right that developing a JIT from stratch is a lot of work
:slight_smile:

On the technical level we are interested to know about and maybe collaborate
with efforts to support very-high-level language features in LLVM (e.g.
walking the stack, for garbage collection), fine-grained runtime code generation
(generating code only one basic block at a time),

These are definitely features that we plan to add, but just haven't gotten
to yet. In particular, Alkis is working on a Java front-end, which will
require similar features. In the beginning, we will probably just use a
conservative collector, eventually adding support for precise GC.

We already have the capability of doing function-at-a-time code
generation: what is basic-block at a time generation used for? How do you
do global optimizations like register allocation?

and possibly also contribute
a PowerPC back-end, and Python bindings for LLVM.

That would be great! We've tossed around the idea of creating C bindings
for LLVM, which would make interfacing from other languages easier than
going directly to the C++ API, but we just haven't had a chance to yet.
Maybe you guys would be interested in helping with that project?

On another level, some of the PyPy core developers are actually also
involved with the 'codespeak' site which aims at connecting interesting
open source projects and provide new collaborative development services.
The PyPy project is extensively using subversion which is a very interesting
(and stable) alternative to cvs. So if you need any help with setting
up some publically accessible infrastructure the codespeak guys will
certainly welcome you.

At this point, we're working like crazy to get important features
implemented in LLVM. We certainly acknowledge that CVS has severe
deficiencies, but in the near future we'll probably stay with it.
Perhaps after SVN 1.0 comes out... :slight_smile:

Feel free to forward this mail to the LLVM mailing list, btw. We are
just interested in getting some first contact and enter a productive
discussion and - who knows - some interesting collaboration!

Done. In general, that's a good place to discuss all kinds of issues like
this. Please let us know what your plans are and what the next step is.
Perhaps C bindings would be the most logical starting place?

-Chris

Hi Chris,

[Chris Lattner Fri, Oct 31, 2003 at 10:58:45AM -0600]

> Hello Chris,
>
> We have been investigating your project and the good documentation
> and are very impressed. If we understood your goals correctly
> this seems like a good match for our ongoing and active PyPy project,
> a reimplementation of the Python language in Python.

Cool. We are all big fans of Python here. :slight_smile:

That's good because we might want to recode some LLVM functionalities
in Python :slight_smile:

> We'll definitely try using LLVM as our low-level backend.
> But actually we contact you now to ask you if you'd be interested
> in some bidirectional collaboration.

I've read up a bit on PyPy, and it looks like LLVM could be a nice way to
get the JIT type interface that you would like. Also, making use of the
LLVM optimizer can make your staticly generated code nice and fast. :slight_smile:

Yes, but we also would want to dynamically emit and execute LLVM code.
But a static translation is indeed our first goal :slight_smile:

We've tossed around the idea of creating C bindings
for LLVM, which would make interfacing from other languages easier than
going directly to the C++ API, but we just haven't had a chance to yet.
Maybe you guys would be interested in helping with that project?

Thinking some more about it, we would probably try to translate our PyPy
implementation into LLVM-code and also generate some glue-LLVM-code
which allows us to programmatically drive LLVM from Python. Is LLVM
able to "drive" itself? I mean can the LLVM-low-level object code
generate LLVM-low-level object code and then execute it?

This would fit nicely with PyPy because we are running ourselves (in
'abstract interpretation' mode) in order to generate a low-level
representation of ourselves. This low-level representation is already
close to LLVM's low-level view. So if the LLVM-code gets executed
(beeing a python interpreter) it should be able to just-in-time-compile
new LLVM code and execute it. With our architecture, for such a JIT we
could reuse a good part of the code we already have for generating our
low-level representation. It's a rather self-referential thing (also
see our logo: http://codespeak.net/pypy/ :-).

> On another level, some of the PyPy core developers are actually also
> involved with the 'codespeak' site which aims at connecting interesting
> open source projects and provide new collaborative development services.
> The PyPy project is extensively using subversion which is a very interesting
> (and stable) alternative to cvs. So if you need any help with setting
> up some publically accessible infrastructure the codespeak guys will
> certainly welcome you.

At this point, we're working like crazy to get important features
implemented in LLVM. We certainly acknowledge that CVS has severe
deficiencies, but in the near future we'll probably stay with it.
Perhaps after SVN 1.0 comes out... :slight_smile:

then we may want to mirror your cvs repo to subversion :slight_smile:
The reason is that we want to provide consistent versions of all
the libraries/modules/projects we use. And subversion makes
this rather easy (if the other project is svn-controled, too),
e.g. you can say 'i want to follow the HEAD version of LLVM
in this branch' or 'i want to use this stable version of LLVM
for my own stable-release'. Then you can just issue 'svn up' and
you will have the desired versions on your working-copy.
However, i can understand that you don't want to consider
subversion right now and will stop advertising now :slight_smile:

cheers,

    holger

> Cool. We are all big fans of Python here. :slight_smile:

That's good because we might want to recode some LLVM functionalities
in Python :slight_smile:

As long as it makes sense. Needless duplication of effort is never a good
idea...

> I've read up a bit on PyPy, and it looks like LLVM could be a nice way to
> get the JIT type interface that you would like. Also, making use of the
> LLVM optimizer can make your staticly generated code nice and fast. :slight_smile:

Yes, but we also would want to dynamically emit and execute LLVM code.
But a static translation is indeed our first goal :slight_smile:

Of course. We can do both. In fact, we can even emit C code, which will
be useful initially if you're work on PowerPC machines.

Thinking some more about it, we would probably try to translate our PyPy
implementation into LLVM-code and also generate some glue-LLVM-code
which allows us to programmatically drive LLVM from Python. Is LLVM
able to "drive" itself? I mean can the LLVM-low-level object code
generate LLVM-low-level object code and then execute it?

Yes, this should certainly be possible. Kindof like what the
Jalapeno/Jikes JVM does with Java. The point about the C bindings is that
they will allow a nice interface between the parts written in python, and
the parts written in C++. It doesn't make sense for you to rewrite all of
LLVM in python, especially since the interface to build the LLVM is pretty
clean.

This would fit nicely with PyPy because we are running ourselves (in
'abstract interpretation' mode) in order to generate a low-level
representation of ourselves. This low-level representation is already
close to LLVM's low-level view. So if the LLVM-code gets executed
(beeing a python interpreter) it should be able to just-in-time-compile
new LLVM code and execute it. With our architecture, for such a JIT we

Makes a lot of sense.

> At this point, we're working like crazy to get important features
> implemented in LLVM. We certainly acknowledge that CVS has severe
> deficiencies, but in the near future we'll probably stay with it.
> Perhaps after SVN 1.0 comes out... :slight_smile:

then we may want to mirror your cvs repo to subversion :slight_smile:

That is obviously no problem. :slight_smile:

The reason is that we want to provide consistent versions of all
the libraries/modules/projects we use. And subversion makes

Makes sense. If it is publically accessible and stable, perhaps we can
add information about it on the LLVM pages for others who would prefer to
work with SVN...

-Chris

Hello Chris,

These are definitely features that we plan to add, but just haven't gotten
to yet. In particular, Alkis is working on a Java front-end, which will
require similar features. In the beginning, we will probably just use a
conservative collector, eventually adding support for precise GC.

Great!

We already have the capability of doing function-at-a-time code
generation: what is basic-block at a time generation used for? How do you
do global optimizations like register allocation?

It is central to Psyco, the Python just-in-time specializer
(http://psyco.sourceforge.net) whose techniques we plan to integrate with
PyPy. Unlike other environments like Self, which collects execution profiles
during interpretation and use them to recompile whole functions, Psyco has no
interpretation stage: it directly emits a basic block and run it; the values
found at run-time trigger the compilation of more basic blocks, which are run,
and so on. So each function's machine code is a dynamic network of basic
blocks which are various specialized versions of a bit of the original
function. This network is not statically known, in particular because basic
blocks often have a "switch" exit based on some value or type collected at
run-time. Every new value encountered at this point triggers the compilation
of a new switch case jumping to a new basic block.

We will also certainly consider Self-style recompilations, as they allow more
agressive optimizations. (Register allocation in Psyco is done using a simple
round-robin scheme; code generation is very fast.)

That would be great! We've tossed around the idea of creating C bindings
for LLVM, which would make interfacing from other languages easier than
going directly to the C++ API, but we just haven't had a chance to yet.
Maybe you guys would be interested in helping with that project?

Well, as the C++ API is nice and clean it is probably simpler to bind it
directly to Python. We would probably go for Boost-Python, which makes C++
objects directly accessible to Python. But nothing is sure about this; maybe
driving LLVM from LLVM code is closer to our needs. Is there a specific
interface to do that? Is it possible to extract from LLVM the required code
only, and link it with the final executable? In my experience, there are a
few limitations of C that require explicit assembly code, like building calls
dynamically (i.e. the caller's equivalent of varargs).

A bientot,

Armin.

> We already have the capability of doing function-at-a-time code
> generation: what is basic-block at a time generation used for? How do you
> do global optimizations like register allocation?

It is central to Psyco, the Python just-in-time specializer
(http://psyco.sourceforge.net) whose techniques we plan to integrate with
PyPy. Unlike other environments like Self, which collects execution profiles

Ok, makes sense.

> That would be great! We've tossed around the idea of creating C bindings
> for LLVM, which would make interfacing from other languages easier than

Well, as the C++ API is nice and clean it is probably simpler to bind it
directly to Python. We would probably go for Boost-Python, which makes C++
objects directly accessible to Python. But nothing is sure about this; maybe

Ok, I didn't know the boost bindings allowed calling C++ code from python.
In retrospect, that makes a lot of sense. :slight_smile:

driving LLVM from LLVM code is closer to our needs. Is there a specific
interface to do that?

Sure, what exactly do you mean by driving LLVM code from LLVM? The main
interface for executing LLVM code is the ExecutionEngine interface:
http://llvm.cs.uiuc.edu/doxygen/classExecutionEngine.html

There are concrete implementations of this interface for the JIT and for
the interpreter. Note that we will probably need to add some additional
methods to this class to enable all of the functionality that you need
(that's not a problem though :).

Is it possible to extract from LLVM the required code
only, and link it with the final executable? In my experience, there are a
few limitations of C that require explicit assembly code, like building calls
dynamically (i.e. the caller's equivalent of varargs).

What do you mean by the "required code only"? LLVM itself is very
modular, you only have to link the libraries in that you use. It's also
very easy to slice and dice LLVM code from programs or functions, etc.
For example, the simple 'extract' tool rips a function out of a module
(this is typically useful only when debugging though)...

-Chris

Hello Chris,

> driving LLVM from LLVM code is closer to our needs. Is there a specific
> interface to do that?

Sure, what exactly do you mean by driving LLVM code from LLVM?

Writing LLVM code that contains calls to the LLVM framework's compilation
routines. Sorry if this is a newbie question, but are we supposed to be able
to use all the classes like ExecutionEngine from LLVM code produced by our
tools (as opposed to by the C++ front-end) ? Or would that be a real hack ?

In other words, can we write a JIT in LLVM code ? I understand this is not
what you have in mind for Java, for example, where you'd rather write the JIT
*for* but not *in* LLVM code. In PyPy we are considering generating different
versions of the low-level code:

The first is a regular Python interpreter (I), similar to the general design
of interpreters written in C. It is the direct translation of the Python
source code of PyPy.

Now consider a clever meta-interpreter (M) that interprets (I) with as
argument an input user program (P) in Python. Note that we include the user
program's runtime arguments in (P). Using feed-back, (M) could specialize (I)
to some partial information about (P); a typical choice is the user code and
the type of the user variables, but in general it is a more dynamic part of
(P). Now consider (M) itself and specialize it statically for its first
argument (I) for optimization. The result is efficient low-level code that
can dynamically instrument and compile any user program (P). This efficient
low-level code can also be written by hand; it is what I did in Psyco. Now
that I know exactly how such code must be written it is not difficult to
actually generate it out of the regular Python source code of (I), i.e. PyPy.
We won't actually write (M).

A bientot,

Armin.

> > driving LLVM from LLVM code is closer to our needs. Is there a specific
> > interface to do that?
>
> Sure, what exactly do you mean by driving LLVM code from LLVM?

Writing LLVM code that contains calls to the LLVM framework's compilation
routines.

Oh, I see. :slight_smile:

Sorry if this is a newbie question, but are we supposed to be able to
use all the classes like ExecutionEngine from LLVM code produced by our
tools (as opposed to by the C++ front-end) ? Or would that be a real
hack ? In other words, can we write a JIT in LLVM code ?

This is not something that we had considered or planned to do, but there
is no reason it shouldn't work. LLVM compiled code follows the same ABI
as the G++ compiler, so you can even mix and match translation units or
libraries.

I understand this is not what you have in mind for Java, for example,
where you'd rather write the JIT *for* but not *in* LLVM code.

Well, that's sort-of true. You can ask Alkis for more details, but I think
that he's writing the Java->LLVM converter in Java, which will mean that
the converter is going to be compiled to LLVM as well. He's doing this
work in the context of the Jikes RVM.

In PyPy we are considering generating different versions of the
low-level code:

The first is a regular Python interpreter (I), similar to the general design
of interpreters written in C. It is the direct translation of the Python
source code of PyPy.

Now consider a clever meta-interpreter (M) that interprets (I) with as
argument an input user program (P) in Python. Note that we include the
user program's runtime arguments in (P). Using feed-back, (M) could
specialize (I) to some partial information about (P); a typical choice
is the user code and the type of the user variables, but in general it
is a more dynamic part of (P). Now consider (M) itself and specialize
it statically for its first argument (I) for optimization. The result
is efficient low-level code that can dynamically instrument and compile
any user program (P). This efficient low-level code can also be written
by hand; it is what I did in Psyco. Now that I know exactly how such
code must be written it is not difficult to actually generate it out of
the regular Python source code of (I), i.e. PyPy. We won't actually
write (M).

This should be doable. :slight_smile: I've read up a little bit on Psyco, but I'm
still not sure I understand the advantage of translating a basic block at
a time. An easier way to tackle the above problem is to use an already
supported language for the bootstrap, but I think the above should work...

-Chris