9 Ideas To Better Support Source Language Developers

A while back I promised to provide some feedback on useful extensions to
LLVM to better support source language writers (i.e. those _using_ LLVM,
not developing it). Below is a list of the ideas I've come up with so
far.

As I get more of XPL's compiler done, I'll start diving into each of the
these areas. I'm posting early in the hopes that discussion will bear
some fruit. In discussing these things, I'm mostly interested in
learning whether any of the following ideas should or should not be part
of LLVM as opposed to part of XPS.

DISCLAIMER:
If any of the following items are already implemented, I missed it! So,
please enlighten me!

NOTE:
If you respond to this, please respond to each item in a separate
message to the list. That way we can keep track of different topics on
different discussion threads. I doubt you'll want to do that, however:
these are all great ideas and should just be adopted without further
discussion! :)))) <kidding!>

The following items are ranked roughly in order of importance to _me_.
Feel free to rank them for your needs -- it would be interesting to see
what's important to others.

Hello Reid and LLVMers,

...

10. Basic support for distributed computations.

A while back I promised to provide some feedback on useful extensions to
LLVM to better support source language writers (i.e. those _using_ LLVM,
not developing it). Below is a list of the ideas I've come up with so
far.

Cool! Ideas are alway welcome!

If you respond to this, please respond to each item in a separate
message to the list. That way we can keep track of different topics on
different discussion threads.

I'll let you split it up as you see fit. :slight_smile:

------------------------------------------------------------------
1. Definition Import
Source languages are likely to create lots of named type and value
definitions for the memory objects the language manipulates. Redefining
these in every module produces byte code bloat. It would be very useful
for LLVM to natively support some kind of import capability that would
properly declare global types and global values into the module being
considered.

Unfortunately, this would break the ability to take a random LLVM bytecode
file and use it in a self-contained way. In general, the type names and
external declarations are actually stored very compactly, and the
optimizers remove unused ones. Is this really a problem for you in
practice?

Even better would be a way to have this capability supported
as a first-class citizen with some kind of "Import" class and/or
instruction: simply point an Import class to an existing bytecode file
and it causes the global declarations from that bytecode file to be
imported into the current Module.

We already have this: the linker. :slight_smile: Just put whatever you want into an
LLVM bytecode file, then use the LinkModules method (from
llvm/Transforms/Utils/Linker.h) to "import" it. Alternatively, in your
front-end, you could just start with this module instead of an empty one
when you compile a file...

------------------------------------------------------------------
2. Memory Management

My programming system (XPS) has some very powerful and efficient memory
allocation mechanisms built into it. It would be useful to allow users
of LLVM to control how (and more importantly where) memory is allocated
by LLVM.

What exactly would this be used for? Custom allocators for performance?
Or something more important? In general, custom allocators for
performance are actually a bad idea...

------------------------------------------------------------------
3. Code Signing Support

One of the requirements for XPL is that the author and/or distributor of
a piece of software be known before execution and that there is a way to
validate the integrity of the bytecodes. To that end, I'm planning on
providing message digesting and signing on LLVM bytecode files. This is
pretty straight forward to implement. The only question is whether it
really belongs in LLVM or not.

I don't think that this really belongs in LLVM itself: Better would be to
wrap LLVM bytecode files in an application (ie, XPL) specific file format
that includes the digest of the module, the bytecode itself, and whatever
else you wanted to keep with it. That way your tool, when commanded to
load a file, would check the digest, and only if it matches call the LLVM
bytecode loader.

There's one issue with code signing: it thwart's global optimization
because changing the byte code means changing the signature. While the
software's author can always do this, a signed bytecode file could not
be globally optimized into another program without breaking the
signature. It would probably be acceptable to allow LLVM to modify the
bytecode in memory at runtime after de-encryption and verification of
the signature.

I'm not sure that there is a wonderful solution to this. You could go the
route of having a "trusted" compiler, which has the necessary keys built
into it or something, but I don't know very much about this area.

------------------------------------------------------------------
4. Threading Support

Some low level support for threading is needed. I think there are really
just a very few primitives we need from which higher order things can be
constructed. One is a memory barrier to ensure cache is flushed, etc. so
we can be certain a write to memory has "taken".

Just out of curiousity, what do you need a membar for? The only thing
that I'm aware of it being useful for (besides implementing threading
packages) are Read-Copy-Update algorithms.

This goes beyond the current volatile support and will need to access
specific machine instructions if a native barrier is supported. Another
is a thread forking instruction. I'd like to see TLS supported but that
can probably be constructed from lower level primitives. A nice-to-have
would be critical section support. This could be done similar to java's
monitorenter and monitorexit instructions. If I recall correctly, I
believe this capability is being worked on currently.

Yup, Misha is currently working on adding these capabilities to LLVM. In
the meantime, calling into a pthreads library directly is the preferred
solution. I agree that TLS would be very handy to have.

------------------------------------------------------------------
5. Fully Developed ByteCode Archives

XPL programs are developed into packages. Packages are the unit of
deployment and as such I need a way to (a) archive several bytecode
files together, (b) index the globals in them, and (c) compress the
whole thing with bzip2. Although LLVM has some support for this today
with the llvm-ar program, I don't believe it supports (b) and (c).

This makes a lot of sense. The LLVM bytecode reader supports loading a
bytecode file from a memory buffer, so I think it would be pretty easy to
implement this. Note that llvm-ar is currently a work-in-progress, but it
might make sense to implement support for this directly in it. Afterall,
we aren't constrained by what the format of the ".o" files in the .a file
look like (as long as gccld and llvm-nm support the format).

Note that bytecode files compress to about 50% with bzip2 which means
faster transmission times to their destinations (oh, did I mention that
XPL supports distributed programming? :slight_smile: The resulting archive program
would be more similar to jar/tar than to ar.

Also note that we are always interested in finding ways to shink the
bytecode files. Right now they are basically comperable to native
executable sizes, but smaller is always better!

------------------------------------------------------------------
6. Incremental Code Generation

The conventional wisdom for compilation is to emit object code (or in
our case the byte code) from a compiler incrementally on a per-function
basis. This is necessary so that one doesn't have to keep the memory for
every function around for the entire compilation. This allows much

That makes sense.

I'm not sure if LLVM supports this now, but I'd like LLVM to be able to
write byte code for an llvm::Function object and then "drop" the
function's body and carry on. It isn't obvious from llvm::Function's
interface if this is supported or not.

This has not yet been implemented, but a very similar thing has:
incremental bytecode loading. The basic idea is that you can load a
bytecode file without all of the function bodies. As you need the
contents of a function body, it is streamed in from the bytecode file.
Misha added this for the JIT.

Doing the reverse seems very doable, but noone has tried it. If you're
interested, take a look at the llvm::ModuleProvider interface and the
implementations of it to get a feeling for how the incremental loader
works.

The only drawback to this is the effect on optimization. I would suggest
that after bytecode generation, a function's "body" be replaced with
some kind of summary (annotation?) of interest to optimization passes.

This is very similar to the functionality required for the incremental
loader, so when it gets developed for the loader, the writer could use
similar kinds of interfaces.

Taking the above suggestion to its logical conclusion, it might be
useful to create a general mechanism for passes to leave "tidbits" of
information around for other passes. The Annotation mechanism probably
could be used for this purpose but something a little more formal would
probably be better. It's highly likely there's something like this in
place already that I'm not aware of.

LLVM already has an llvm::Annotation class that does exactly this :slight_smile:

------------------------------------------------------------------
7. Idioms Package

As I learned from Stacker (the hard way), there are certain idioms that
occur in using LLVM over and over again. These idioms need to be either
(a) documented or (b) implemented in a library. I prefer (b) because it
implies (a) ;> Such idioms as if-then-else, for (pre; cond; post),
while(cond), etc. should be just coded into a framework so that compiler
writers have a slightly higher level interface to work with.

Although I like this idea, its low on my list because I regard LLVM
_already_ incredibly easy to use as a compiler writer's tool. But, hey,
why stop at "incredibly easy" when there's "amazingly trivial" waiting
in the wings?

Developing a new "front-end helper" library could be interesting! The
only challange would be to make it general purpose enough that it would
actually be useful for multiple languages.

------------------------------------------------------------------
8. Create a ConstantString class

Constant strings are very common occurrences in XPL and probably are in
other source languages as well. The current implementation of
ConstantArray::get(std::string&) is a bit weak. It creates a
ConstantSInt for every character. What if the strings are long and the
program creates many of them? It seems a little heavy weight to me.

This is something that might make sense to deal with in the future, but it
has a lot of implications in the compiler and optimizer. Look at GCC for
example, there are many optimizations that works on constant strings but
not on arrays of characters or any other type. At this stage in the game,
effort is probably best spent elsewhere. :slight_smile:

On the other hand, adding a hack to the bytecode format to efficiently
encode strings is something that I have been considering: there the effect
of the change is more contained.

------------------------------------------------------------------
9. More Native Platforms Supported

To get the platform coverage that I need, I'm making the XPL compiler
use the C back end. Its slower to compile that way but I'll only need it
for those programs that want to go fully native. The back end support in
LLVM is a bit weak right now in terms of both optimizations available
and platforms supported. This isn't a big priority for me as there is a
viable alternative to native platform support.

Yup, that makes sense. Supporting the CBE will always be a good idea, but
adding new native platforms and improving the ones we do will be
increasingly important over time. :slight_smile:

------------------------------------------------------------------

I'll do another one of these postings as I get nearer to the end of the
XPL Compiler implementation. There should be lots more ideas by then.
Don't hold your breath :slight_smile:

Cool, keep us informed! :slight_smile:

-Chris

What kind of support? What do you think should be included in LLVM
directly, as opposed to being built on top of it?

-Chris

> ------------------------------------------------------------------
> 1. Definition Import
> Source languages are likely to create lots of named type and value
> definitions for the memory objects the language manipulates. Redefining
> these in every module produces byte code bloat. It would be very useful
> for LLVM to natively support some kind of import capability that would
> properly declare global types and global values into the module being
> considered.

Unfortunately, this would break the ability to take a random LLVM bytecode
file and use it in a self-contained way. In general, the type names and
external declarations are actually stored very compactly, and the
optimizers remove unused ones. Is this really a problem for you in
practice?

I'm trying to get to a "once-and-done" solution on compilation. That is,
a given module is compiled exactly once (per version). There's no such
thing as "include" in XPL, only "import". The difference is that
"import" loads the results of previous compilations (i.e. a bytcode
file). I included it in my list because I thought it would be something
quite handy for other source languages (Java would need it, for
example). The functionality is something like Java's class loader except
its a module loader for LLVM and it doesn't load the function bodies.

> Even better would be a way to have this capability supported
> as a first-class citizen with some kind of "Import" class and/or
> instruction: simply point an Import class to an existing bytecode file
> and it causes the global declarations from that bytecode file to be
> imported into the current Module.

We already have this: the linker. :slight_smile: Just put whatever you want into an
LLVM bytecode file, then use the LinkModules method (from
llvm/Transforms/Utils/Linker.h) to "import" it. Alternatively, in your
front-end, you could just start with this module instead of an empty one
when you compile a file...

Okay, I'll take a look at this and see if it fits the bill.

> ------------------------------------------------------------------
> 2. Memory Management
>
> My programming system (XPS) has some very powerful and efficient memory
> allocation mechanisms built into it. It would be useful to allow users
> of LLVM to control how (and more importantly where) memory is allocated
> by LLVM.

What exactly would this be used for? Custom allocators for performance?
Or something more important? In general, custom allocators for
performance are actually a bad idea...

My memory system can do seamless persistent memory as well (i.e. its
almost a full scale OO Database). One of my ideas for the "import"
functionality was to simply save the LLVM objects for each module
persistently. Import then takes no longer than an mmap(2) call to load
the LLVM data structures associated with the module into memory. I can't
think of a faster way to do it.

The reason this is so important to me is that I expect to be doing lots
of on the fly compilation. XPL is highly dynamic. What I'm trying to
avoid is the constant recompilation of included things as with C/C++.
The time taken to recompile headers is, in my opinion, just wasted time.
That's why pre-compiled header support exists in so many compilers.

I have also tuned my allocators so that they can do multiple millions of
allocations per second on modest hardware. There's a range of allocators
available each using different algorithms. Each has space/time
tradeoffs. The performance of "malloc(2)" sucks on most platforms and
sucks on all platforms when there's a lot of memory thrash. None of my
allocators suffer these problems.

Curious: why do you think custom allocators for performance are a bad
idea?

> ------------------------------------------------------------------
> 3. Code Signing Support
>
I don't think that this really belongs in LLVM itself: Better would be to
wrap LLVM bytecode files in an application (ie, XPL) specific file format
that includes the digest of the module, the bytecode itself, and whatever
else you wanted to keep with it. That way your tool, when commanded to
load a file, would check the digest, and only if it matches call the LLVM
bytecode loader.

I'd probably be more inclined to just add an internal global array of
bytes to the LLVM bytecode format. Supporting a new file format means
that I'd have to re-write all the LLVM tools -- not worth the time.

So, I'll implement this myself and not extend LLVM with it.

> ------------------------------------------------------------------
> 4. Threading Support
>
> Some low level support for threading is needed. I think there are really
> just a very few primitives we need from which higher order things can be
> constructed. One is a memory barrier to ensure cache is flushed, etc. so
> we can be certain a write to memory has "taken".

Just out of curiousity, what do you need a membar for? The only thing
that I'm aware of it being useful for (besides implementing threading
packages) are Read-Copy-Update algorithms.

Um, to implement a threading package :slight_smile: I have assumed that, true to
its name, LLVM will only provide the lowest level primitives needed to
implement a threading package, not actually provide a threading package.
I'm sure you don't want to put all the different kinds of
synchronization concepts (mutex, semaphore, barrier, futex, etc.) into
LLVM? All of them need the membar. For that matter, you'll probably
need an efficient thread barrier as well.

> ------------------------------------------------------------------
> 5. Fully Developed ByteCode Archives
>
This makes a lot of sense. The LLVM bytecode reader supports loading a
bytecode file from a memory buffer, so I think it would be pretty easy to
implement this. Note that llvm-ar is currently a work-in-progress, but it
might make sense to implement support for this directly in it. Afterall,
we aren't constrained by what the format of the ".o" files in the .a file
look like (as long as gccld and llvm-nm support the format).

But if the file gets compressed, it isn't a .a file any more, right? Or,
were you suggesting that only the archive members get compressed and the
file is otherwise an archive? The problem with that approach is that it
limits the compression somewhat. Think about an archive with 1000
bytecode files each using common declarations. Compressed individually
those common declarations are repeated in each file. Compressed en
masse, only one copy of the common declarations is stored achieving
close to 1000:1 compression for those declarations.

Also note that we are always interested in finding ways to shrink the
bytecode files. Right now they are basically comparable to native
executable sizes, but smaller is always better!

Unfortunately, the answer to that is to utilize higher level
instructions. LLVM is comparable to native because it isn't a whole lot
higher level. Compared with Java, whose byte code knows about things
like classes, LLVM will always be larger because expression of the
higher level concepts in LLVM's relatively low level takes more bytes.

That said, we _should_ strive to minimize

I haven't really looked into the bytecode format in much detail. Are we
doing things like constant string folding? Could the bytecode format be
natively compressed (i.e. not with bz2 or zip but simply be not
duplicating anything in the output)?

> ------------------------------------------------------------------
> 6. Incremental Code Generation
>
> The conventional wisdom for compilation is to emit object code (or in
> our case the byte code) from a compiler incrementally on a per-function
> basis. This is necessary so that one doesn't have to keep the memory for
> every function around for the entire compilation. This allows much

That makes sense.

> I'm not sure if LLVM supports this now, but I'd like LLVM to be able to
> write byte code for an llvm::Function object and then "drop" the
> function's body and carry on. It isn't obvious from llvm::Function's
> interface if this is supported or not.

This has not yet been implemented, but a very similar thing has:
incremental bytecode loading. The basic idea is that you can load a
bytecode file without all of the function bodies.

That's what I want for importing! .. item (1) above!

As you need the
contents of a function body, it is streamed in from the bytecode file.
Misha added this for the JIT.

Cool.

Doing the reverse seems very doable, but noone has tried it. If you're
interested, take a look at the llvm::ModuleProvider interface and the
implementations of it to get a feeling for how the incremental loader
works.

Okay, I'll see what I can come up with.

Developing a new "front-end helper" library could be interesting! The
only challenge would be to make it general purpose enough that it would
actually be useful for multiple languages.

You're right, it would need to be useful for multiple languages. Here's
what I do. I'll revisit this when I get closer to done on the XPL
compiler. I'm building things now that are somewhat framework oriented.
If there are specific patterns that arise and could be useful, I'll
submit them back to the list at that time for review.

> ------------------------------------------------------------------
> 8. Create a ConstantString class

This is something that might make sense to deal with in the future, but it
has a lot of implications in the compiler and optimizer.

Consider it postponed.Reid.

> > ------------------------------------------------------------------
> > 1. Definition Import
> > Source languages are likely to create lots of named type and value
> > definitions for the memory objects the language manipulates. Redefining
> > these in every module produces byte code bloat. It would be very useful
> > for LLVM to natively support some kind of import capability that would
> > properly declare global types and global values into the module being
> > considered.
>
> Unfortunately, this would break the ability to take a random LLVM bytecode
> file and use it in a self-contained way. In general, the type names and
> external declarations are actually stored very compactly, and the
> optimizers remove unused ones. Is this really a problem for you in
> practice?

I'm trying to get to a "once-and-done" solution on compilation. That is,
a given module is compiled exactly once (per version). There's no such
thing as "include" in XPL, only "import". The difference is that
"import" loads the results of previous compilations (i.e. a bytcode
file). I included it in my list because I thought it would be something
quite handy for other source languages (Java would need it, for
example). The functionality is something like Java's class loader except
its a module loader for LLVM and it doesn't load the function bodies.

Java's import actually does two different things: it makes the
source-level definitions of the imported module available (effecting name
lookup, for example), and it makes the code from the module available.

The first would have to be handled by the source compiler (likewise with
XPL I would assume), the second is in the domain of LLVM. However, in
Java at least, an import doesn't mean that the code for the imported
module should get linked into the importing module. Instead, at runtime
or "link time", all of the bytecode files for the include DAG should get
put together and optimized into the program. If you don't do this (ie, you
try to do it at static compile time), you get into problems when A is
imported by B and C: you get two copies of all of the code in A.

I don't know XPL, but I would assume that it is similar in this respect.

> > ------------------------------------------------------------------
> > 2. Memory Management
> >
> > My programming system (XPS) has some very powerful and efficient memory
> > allocation mechanisms built into it. It would be useful to allow users
> > of LLVM to control how (and more importantly where) memory is allocated
> > by LLVM.
>
> What exactly would this be used for? Custom allocators for performance?
> Or something more important? In general, custom allocators for
> performance are actually a bad idea...

My memory system can do seamless persistent memory as well (i.e. its
almost a full scale OO Database). One of my ideas for the "import"
functionality was to simply save the LLVM objects for each module
persistently. Import then takes no longer than an mmap(2) call to load
the LLVM data structures associated with the module into memory. I can't
think of a faster way to do it.

Hrm, this is interesting. In the LLVM context it could get tricky,
because things like LLVM types and constants are shared between modules,
but it would still be very interesting. If you could come up with some
clean and non-invasive interfaces for doing this, its pretty likely that
would accept them. Everyone appreciates a fast compiler. :slight_smile:

The reason this is so important to me is that I expect to be doing lots
of on the fly compilation. XPL is highly dynamic. What I'm trying to
avoid is the constant recompilation of included things as with C/C++.
The time taken to recompile headers is, in my opinion, just wasted time.
That's why pre-compiled header support exists in so many compilers.

Sure, that makes sense. Using a structured mechanism like an 'import' is
a much better way to do it than the way PCH is implemented in C/C++
compilers. That said, it's still usually better to have the linker
assemble programs than trying to do it at compile time (at compile time
you just remember the dependencies).

I have also tuned my allocators so that they can do multiple millions of
allocations per second on modest hardware. There's a range of allocators
available each using different algorithms. Each has space/time
tradeoffs. The performance of "malloc(2)" sucks on most platforms and
sucks on all platforms when there's a lot of memory thrash. None of my
allocators suffer these problems. Curious: why do you think custom
allocators for performance are a bad idea?

There are two main reasons. First, there is research showing that many
custom allocators are slower than general purpose allocators, unless they
actually take into consideration special properties of the program (such
as region allocators): "Reconsidering custom memory allocation":
http://citeseer.nj.nec.com/berger01reconsidering.html

Overall, the default std::allocator is quite fast with the GCC runtime.

The other problem, which hits closer to home, is that custom allocators
obscure the behavior of the program, making it harder for the compiler to
do all kinds of neat transformations to the program.

> > ------------------------------------------------------------------
> > 3. Code Signing Support
> >
> I don't think that this really belongs in LLVM itself: Better would be to
> wrap LLVM bytecode files in an application (ie, XPL) specific file format
> that includes the digest of the module, the bytecode itself, and whatever
> else you wanted to keep with it. That way your tool, when commanded to
> load a file, would check the digest, and only if it matches call the LLVM
> bytecode loader.

I'd probably be more inclined to just add an internal global array of
bytes to the LLVM bytecode format. Supporting a new file format means
that I'd have to re-write all the LLVM tools -- not worth the time.
So, I'll implement this myself and not extend LLVM with it.

Ok, alternatively you could just store the signature in an LLVM global
variable...

> > ------------------------------------------------------------------
> > 4. Threading Support
> >
> > Some low level support for threading is needed. I think there are really
> > just a very few primitives we need from which higher order things can be
> > constructed. One is a memory barrier to ensure cache is flushed, etc. so
> > we can be certain a write to memory has "taken".
>
> Just out of curiousity, what do you need a membar for? The only thing
> that I'm aware of it being useful for (besides implementing threading
> packages) are Read-Copy-Update algorithms.

Um, to implement a threading package :slight_smile: I have assumed that, true to
its name, LLVM will only provide the lowest level primitives needed to
implement a threading package, not actually provide a threading package.
I'm sure you don't want to put all the different kinds of
synchronization concepts (mutex, semaphore, barrier, futex, etc.) into
LLVM? All of them need the membar. For that matter, you'll probably
need an efficient thread barrier as well.

Ok, gotcha. You're right that we're probably not going to expose every
possible synch operaions through LLVM, prefering instead to expose
low-level primitives. :slight_smile: Misha knows more about future plans in this
area though.

> > ------------------------------------------------------------------
> > 5. Fully Developed ByteCode Archives
> >
> This makes a lot of sense. The LLVM bytecode reader supports loading a
> bytecode file from a memory buffer, so I think it would be pretty easy to
> implement this. Note that llvm-ar is currently a work-in-progress, but it
> might make sense to implement support for this directly in it. Afterall,
> we aren't constrained by what the format of the ".o" files in the .a file
> look like (as long as gccld and llvm-nm support the format).

But if the file gets compressed, it isn't a .a file any more, right? Or,
were you suggesting that only the archive members get compressed and the
file is otherwise an archive?

I was suggesting the second.

The problem with that approach is that it limits the compression
somewhat. Think about an archive with 1000 bytecode files each using
common declarations. Compressed individually those common declarations
are repeated in each file. Compressed en masse, only one copy of the
common declarations is stored achieving close to 1000:1 compression for
those declarations.

Hrm, I think it's best to try out several possibilities and see which
actually make sense. Using standard formats is usually much better unless
there is a reason to use different formats. That said, the .a file format
is pretty archaic, and using something better would make linking more
efficient as well.

> Also note that we are always interested in finding ways to shrink the
> bytecode files. Right now they are basically comparable to native
> executable sizes, but smaller is always better!

Unfortunately, the answer to that is to utilize higher level
instructions. LLVM is comparable to native because it isn't a whole lot
higher level. Compared with Java, whose byte code knows about things
like classes, LLVM will always be larger because expression of the
higher level concepts in LLVM's relatively low level takes more bytes.

Sure, that makes sense. But even in our current approach there is still
room to encode bytecode files more efficiently.

That said, we _should_ strive to minimize

I haven't really looked into the bytecode format in much detail. Are we
doing things like constant string folding? Could the bytecode format be
natively compressed (i.e. not with bz2 or zip but simply be not
duplicating anything in the output)?

We aren't doing anything obviously stupid in the bytecode format, but
there are probably many things that could be improved. In 1.1 for
example, I took out some unnecessary stuff that shrunk files by about 20%
for a given .ll file. There is probably more room for improvement. Right
now, constant starts are output basically the same as they are represented
(ie, each character is emitted, then the string is a list of "pointers" to
the chars). This is quite wasteful even if each of the "pointers" is
typically one byte anyway.

> Developing a new "front-end helper" library could be interesting! The
> only challenge would be to make it general purpose enough that it would
> actually be useful for multiple languages.

You're right, it would need to be useful for multiple languages. Here's
what I do. I'll revisit this when I get closer to done on the XPL
compiler. I'm building things now that are somewhat framework oriented.
If there are specific patterns that arise and could be useful, I'll
submit them back to the list at that time for review.

Cool, sounds great. You could use stacker and xpl as two good (extremely
different) languages, which can hopefully share the same library of code.

-Chris

Hello Chris,

10. Basic support for distributed computations.

What kind of support? What do you think should be included in LLVM
directly, as opposed to being built on top of it?

nice question :slight_smile:

(don't sue me, I am far from being expert like you are!)

just imagine, that we have Linux cluster, and we have two functions in
one module (`f' and `g'). If they are about to be executed at one host,
then one is allowed to do very aggressive interprocedural
optimizations between these `f' and `g'. However if `g' should be
"outsourced to" (i.e. "executed at") other host then `f', then one is
prohibited to do almost any optimizations between `f' and `g'.

Am I right up to here? if `yes' then:

One is hardly able to make support for distributed calculations on top
of LLVM. Because, in order to make legal optimizations LLVM should
_know_ where the code is really executed.

Right?..

(Should I reformulate the tings above?)

just imagine, that we have Linux cluster, and we have two functions in
one module (`f' and `g'). If they are about to be executed at one host,
then one is allowed to do very aggressive interprocedural
optimizations between these `f' and `g'. However if `g' should be
"outsourced to" (i.e. "executed at") other host then `f', then one is
prohibited to do almost any optimizations between `f' and `g'.

Typically distributed computing like this is performed at a much higher
level than things like LLVM. Mechanisms like RPC (remote procedure calls)
are used to do things like this, which makes the low-level code look a lot
different than a standard call.

Am I right up to here? if `yes' then:
One is hardly able to make support for distributed calculations on top
of LLVM. Because, in order to make legal optimizations LLVM should
_know_ where the code is really executed.

The RPC calls would automatically make the LLVM transformations safe: even
the interprocedural optimizations are conservatively correct in the face
of partial and incomplete programs (if they can't figure out what a piece
of code is doing, they won't break it).

LLVM should support distributed computing as well as, say, C does... not
that C supports is particularly well... :slight_smile:

-Chris

Hello Chris,

Wednesday, January 7, 2004, 9:37:19 PM, you wrote:

Typically distributed computing like this is performed at a much higher
level than things like LLVM.

almost right, if you mean "distributed computing is usually
implemented as some compiled libs". But if it is a part of the
language then it is not as you say :slight_smile:

Well, Chris, let's forget about traditions (finally LLVM is
tradition-breaking thing!). At which level the optimization like i've
meant *should* be implemented?..

If a priori optimization is restricted to a one host, then nothing to
discuss. But just imagine that you say: "OK, let's make a basic
support for code eval distributed to multiple hosts" :slight_smile:

Mechanisms like RPC (remote procedure calls)
are used to do things like this, which makes the low-level code look a lot
different than a standard call.

RPC call has nothing to do with optimization. The code should be
_ready_ before applying RPC.

LLVM should support distributed computing as well as, say, C does... not
that C supports is particularly well... :slight_smile:

you are right, but you didn't get my point completely. I'd say:

  1. LLVM is OK for distributed computing, because ...it has nothing
     to do with this directly, e.g. like C :slight_smile:

  2. LLVM could bring a *lot* to distributed computing, if distributed
     computing will be a *part* of LLVM concept.

OK, Chris, let's put it in another way. Just think of those guys, who
implement nice languages for distributed computing
(e.g. http://www.mozart-oz.org/) and ask yourself, why
<b>exactly this</b> audience should be excited with LLVM?
Why should they find some nice LLVM optimization specific for the
applications, where _part_ of the code might be EITHER executed at
current host OR outsourced to some other host, because, say,
current host is just quite busy? -- Maybe you'll find answer more
suitable for you, then I try to formulate in Idea 10 :slight_smile:

Wednesday, January 7, 2004, 9:37:19 PM, you wrote:

Well, Chris, let's forget about traditions (finally LLVM is
tradition-breaking thing!). At which level the optimization like i've
meant *should* be implemented?..

Ok, I thought you were concerned about LLVM breaking the _correctness_ of
distributed programs, sorry. :slight_smile:

If a priori optimization is restricted to a one host, then nothing to
discuss. But just imagine that you say: "OK, let's make a basic
support for code eval distributed to multiple hosts" :slight_smile:
  2. LLVM could bring a *lot* to distributed computing, if distributed
     computing will be a *part* of LLVM concept.

Sure, that makes sense. It's quite possible that there are things that
make sense to move down to the LLVM level, exposing all kinds of neat
opportunities. If you'd like to look into this, that would be cool! :slight_smile:

-Chris

My $0.02 worth on this topic ..

I'm also interested in distributed computing as XPL/XPS will support it.
However, I find it unreasonable to expect LLVM to provide any features
in this area. In order to do anything meaningful, LLVM would have to
have some kind of awareness of networks (typically an operating system
concern). That seems at odds with the "low level" principles of LLVM.

Valery, could you be more explicit about what kind of features in LLVM
would support distributed computing? How would code evaluation
distributed to multiple hosts benefit anyone? The two programs only
communicate via a network connection. The only places you can optimize
that network connection are in (a) the kernel and (b) the application.
Case (a) is outside the scope of LLVM and case (b) is supported by LLVM
today. I assume this is obvious so what else were you thinking?

Reid.

My $0.02 worth on this topic ..

and again |0.02 of mein :slight_smile:

However, I find it unreasonable to expect LLVM to provide
any features in this area. In order to do anything meaningful,
LLVM would have to have some kind of awareness of networks
(typically an operating system concern).
That seems at odds with the "low level" principles of LLVM.

When I look at what we have today -- I agree.
But when I think about what we *should* have -- I don't agree.

For the very beginning think of a host with multiple CPUs.
I could formulate my proposal even for this non-networked
case.

There should be an engine and layer for making dispatching optimizations in run-time. If one CPU is loaded and code is
"parallelizable" why then not to send some part of
calculation to other CPU? This kind of on-fly decision will
be one day incorporated in something like LLVM.

Valery, could you be more explicit about what kind of
features in LLVM would support distributed computing?

OK, np, I will try.

Consider this Fibonacci function as the model for our
using case:

f(int n) {
  if(n<2) return 1;
  return f(n-1) + f(n+2);
}

the complexity of this non-optimal version of Fibonacci
function is O(2^n). The number of calls after start
grows exponentionaly. It means that we have CPU loaded
very quickly and we have a lot of calls to be
"outsourced" to other CPUs.

Is it OK up to here and I could continue using this
model example?

How would code evaluation
distributed to multiple hosts benefit anyone?

For me it sounds like:
"do we need distributed computations at all?"

oh... could we take this as axiom?.. :slight_smile:
anyway, if you accept this model examlpe above we could
see how it really could work.

The two programs only communicate via a network connection.
The only places you can optimize that network connection are
in (a) the kernel and (b) the application.
Case (a) is outside the scope of LLVM and case (b) is supported by LLVM
today. I assume this is obvious so what else were you thinking?

Let's consider both case (a) and (b) in more details.
Think of Fib. example as application from your case (b).
We transfer Fib. module into LLVM bytecode without
optimization. Now I run this application
with some dispatcher (someday it will be part of kernel
of LLVM-based OS). This dispatcher makes some reasonable
optimizations to a LLVM-bytecode and starts the
Fib code with argument provided by user.
If argument is small enough, then function will give us a
result soon, of course. Otherwize is however more interesting:
if argument is big enough then we load CPU very quickly and
we have to splash out our code to any other available CPUs.

Before LLVM project I'd never discuss something like this at
all, because of non-feasibility of such a realisation. I do
believe that with LLVM such a future is really possible.
I do believe as well, that there should appear OSes
with LLVM in kernel. Chris, you and others, could find
these statements funny today, but tomorrow you could find
more reason in these "strange" statements :slight_smile:
(Who knows, maybe Chris is laughing now, because it is
more then clear for him)

What should I expand/reformulate?

Valery.

Interesting email address there :slight_smile:

For the very beginning think of a host with multiple CPUs.
I could formulate my proposal even for this non-networked
case.

On the same machine, LLVM definitely needs to support both symmetric and
asymmetric multi-processing. I believe some primitives to support this
are being worked on now by Misha. However, I don't really consider this
"distributed systems" because distribution by a few inches doesn't
really amount to much. In my way of thinking distributed computing
*always* involves a network.

There should be an engine and layer for making dispatching optimizations in run-time. If one CPU is loaded and code is
"parallelizable" why then not to send some part of
calculation to other CPU? This kind of on-fly decision will
be one day incorporated in something like LLVM.

Okay, this kind of "distribution" (what I call parallel computing)
should also definitely be supported by LLVM. There are several
primitives that could be added to LLVM to enable compiler writers to
more easily write parallel programs. However, I don't think LLVM should
get involved in parallel programs that run on multiple computers, only a
single computer (possibly with multiple CPUs).

> Valery, could you be more explicit about what kind of
> features in LLVM would support distributed computing?

OK, np, I will try.

Consider this Fibonacci function as the model for our
using case:

f(int n) {
  if(n<2) return 1;
  return f(n-1) + f(n+2);
}
the complexity of this non-optimal version of Fibonacci
function is O(2^n). The number of calls after start
grows exponentionaly. It means that we have CPU loaded
very quickly and we have a lot of calls to be
"outsourced" to other CPUs.

Is it OK up to here and I could continue using this
model example?

Yes, but I think the confusion was simply one of terminology. What
you're talking about is what I call parallel computing or
multi-processing (symmetric or asymetric). This isn't really distributed
computing although one could think of the operations being "distributed"
across several CPUs on the _same_ computer.

> How would code evaluation
> distributed to multiple hosts benefit anyone?

Okay, now you're talking about "hosts" which I take to mean separate
physical computers with the only way for the "hosts" to communicate is
via a network. Is this what you mean by "host"?

For me it sounds like:
"do we need distributed computations at all?"

No, we don't. Distributed computing would be built on top of LLVM (in my
opinion). But, we _do_ need support for parallel or multi-processing
computation as you described above (again, on a single physical
computer).

oh... could we take this as axiom?.. :slight_smile:

I would accept "we need parallel computation support" as an axiom. I
don't think Chris will disagree, but I'll let him chime in on that.

anyway, if you accept this model examlpe above we could
see how it really could work.

Sure ..

> The two programs only communicate via a network connection.
> The only places you can optimize that network connection are
> in (a) the kernel and (b) the application.
> Case (a) is outside the scope of LLVM and case (b) is supported by LLVM
> today. I assume this is obvious so what else were you thinking?

Let's consider both case (a) and (b) in more details.
Think of Fib. example as application from your case (b).
We transfer Fib. module into LLVM bytecode without
optimization. Now I run this application
with some dispatcher (someday it will be part of kernel
of LLVM-based OS). This dispatcher makes some reasonable
optimizations to a LLVM-bytecode and starts the
Fib code with argument provided by user.
If argument is small enough, then function will give us a
result soon, of course. Otherwize is however more interesting:
if argument is big enough then we load CPU very quickly and
we have to splash out our code to any other available CPUs.

okay .. again, this isn't "distributed", just "parallel", and I agree
its needed.

Before LLVM project I'd never discuss something like this at
all, because of non-feasibility of such a realisation. I do
believe that with LLVM such a future is really possible.
I do believe as well, that there should appear OSes
with LLVM in kernel. Chris, you and others, could find
these statements funny today, but tomorrow you could find
more reason in these "strange" statements :slight_smile:
(Who knows, maybe Chris is laughing now, because it is
more then clear for him)

I would tend to agree. I fully expect LLVM to fill the mission that Java
started: ubiquitous computing. I would expect to see LLVM programs
running on everything from gameboy to supercomputer, possibly with
kernel support. I've toyed with the idea of a Linux kernel module for
LLVM already. It would allow a bytecode file to be executed like any
other program. The JIT essentially inside the kernel. Weird idea but it
could be beneficial in terms of security. The module could be written in
such a way that it examines the bytecode before executing anything
making sure the bytecode isn't doing anything "weird". Since only the
kernel can create the actual executable for the bytecode, no malicious
bytecode program could run. This assumes that "malicious" is detectable
:slight_smile:

What should I expand/reformulate?

Nothing at this point. I think I realize where you're coming from now
and agree that _parallel_ computing is a very important next step for
LLVM. Let's see what happens with the current work on synchronization
and threading primitives. These things will be needed to support the
kind of parallelization you're talking about.

Valery.

Best Regards,

Reid.

Interesting email address there :slight_smile:

unfortunally some email parsers and email clients deny to work correctly with international conventions :frowning:
follow this URL for more details:
http://www.python.org/doc/current/lib/module-email.Header.html

On the same machine, LLVM definitely needs to support both symmetric and
asymmetric multi-processing. I believe some primitives to support this
are being worked on now by Misha. However, I don't really consider this
"distributed systems" because distribution by a few inches doesn't
really amount to much. In my way of thinking distributed computing
*always* involves a network.

you are right, but bigger steps after smaller ones?
let's agree with a simple things and then we could
proceed (and probably die) with networking case.

Okay, this kind of "distribution" (what I call parallel computing)

(see above)

should also definitely be supported by LLVM. There are several
primitives that could be added to LLVM to enable compiler writers to
more easily write parallel programs. However, I don't think LLVM should
get involved in parallel programs that run on multiple computers, only a
single computer (possibly with multiple CPUs).

Oh, we agreed to a single host so soon? :slight_smile:

> Is it OK up to here and I could continue using this
> model example?

Yes, but I think the confusion was simply one of terminology. What
you're talking about is what I call parallel computing or
multi-processing (symmetric or asymetric).

(see above again).

This isn't really distributed computing although
one could think of the operations being "distributed"
across several CPUs on the _same_ computer.

(people speaking about distributed computations even in
more boring context: "one has transfered stand-alone
application to a remote PC, executed it, and this already
mean distributed computation").

My idea was to consider Fib example for a single PC with several CPUs and when you are agreed that it makes sense to bring notion of CPU in LLVM layer, i should ask you: "Raid, why should we restrict ourselves with _single_ PC only ?!"

Okay, now you're talking about "hosts" which I take to mean separate
physical computers with the only way for the "hosts" to communicate is
via a network. Is this what you mean by "host"?

We could restrict ourselves to notion of host as in http://www.faqs.org/rfcs/rfc1.html and later RFCs for
TCP/IP.

No, we don't. Distributed computing would be built on top of LLVM (in my
opinion). But, we _do_ need support for parallel or multi-processing
computation as you described above (again, on a single physical
computer).

wait, let's fix your elrier statement. As far as I understood
you do agree that case of multiple CPUs on a _single_ host
should be supported at LLVM layer. Right?

as I see from here:

okay .. again, this isn't "distributed", just "parallel", and I agree
its needed.

you call this case just "parallel" and agree.

I would tend to agree. I fully expect LLVM to fill the mission that Java
started: ubiquitous computing. I would expect to see LLVM programs
running on everything from gameboy to supercomputer, possibly with
kernel support.

"possibly with kernel support" is kind of crutch to get supercomputer with networking architecture :wink:

Reid, couldn't you agree that networking is only a particular interface to get an access to others CPUs?

why should then LLVM be abstracted from suppercomputers with
CPU distributed over network?

All benefits, what one could obtain from "LLVM supporting multiple CPU at single host", one might obtaine from "LLVM supporting multiple CPU at multiple hosts". Isn't that logical?

I've toyed with the idea of a Linux kernel module for
LLVM already. [...]

then even easier to speak :slight_smile:

Nothing at this point. I think I realize where you're coming from now
and agree that _parallel_ computing is a very important next step for
LLVM. Let's see what happens with the current work on synchronization
and threading primitives. These things will be needed to support the
kind of parallelization you're talking about.

Right.

...I almost hear the thoughts from Chris:
"guys, instead of trolling about cool things make something cool!"
:wink:

Hello again Valery,

Valery A.Khamenya wrote:

All benefits, what one could obtain from "LLVM supporting multiple CPU at single host", one might obtaine from "LLVM supporting multiple CPU at multiple hosts". Isn't that logical?

I see more precisely what you mean, but I don't think it is that straightforward to generalise the benefits multiple CPU on single host programming to multiple CPU at multiple hosts. I don't think that both cases involve the same techniques.

For example, in "single host" configuration you get a very low cost for communicating data because you use memory instead of network. Memory is a low-level primitive medium for communicating data, while network is not a primitive medium because it involves many communication protocols.

Memory is simple to manage when it is on a single hardware, but when it turns to a "shared" (or "distributed") memory, things can get pretty complicated (because you have -among others- security, latency, integrity and fault-tolerance to take into account).

Moreover, there are many, many ways to implement "distributed" or "parallel" computing. Some approaches are based on distributing data according to the available CPU resources, others on distributing the program control flow according to the proximity of data, and so on.

What would you consider as the core primitives of single host, multi-CPU programming ?

Cheers,

-- Se'bastien

I see more precisely what you mean, but I don't think it is that
straightforward to generalise the benefits multiple CPU on single host
programming to multiple CPU at multiple hosts. I don't think that both
cases involve the same techniques.

you are right, just think of shared memory.

For example, in "single host" configuration you get a very low cost for
communicating data because you use memory instead of network. Memory is
[...]

you are right, that's exactly the point.
I'd never tell that the difference is tiny.

What would you consider as the core primitives of single host, multi-CPU
programming ?

if LLVM would be really "low-level", then
I would consider a generalized CALL as such a primitive,
i.e. where not only address of subroutine is supplied,
but the address of host as well.

something like

call host:subr_address

However LLVM is very high-level !

It is so high-level, that I'd propose ...not include
any primitives at LLVM language at all !
(So, Se'bastien, sry for answering "rather yes" about UCP)

Indeed, let's consider Fib example:

//------------
int f(int n) {
  if(n<2) return 1;
  return f(n-1) + f(n-2);
}
//------------

in this LLVM byte code output:

Hello Valery,

I have some comments regarding your thoughts on LLVM support for distributed computing.

Valery A.Khamenya wrote:

There should be an engine and layer for making dispatching optimizations in run-time. If one CPU is loaded and code is "parallelizable" why then not to send some part of calculation to other CPU? This kind of on-fly decision will
be one day incorporated in something like LLVM.

I'm not sure to correctly understand what you mean, but I interpret it as LLVM deciding where the code should be executed, like some load-balancing strategy. In this perspective, I think this should be left up to the higher-level language, or event to the application developer: I don't think incorporating load balancing strategies directly into LLVM would be interesting, because strategies are high level patterns.

Consider this Fibonacci function as the model for our
using case:

f(int n) {
if(n<2) return 1;
return f(n-1) + f(n+2);
}

the complexity of this non-optimal version of Fibonacci function is O(2^n). The number of calls after start grows exponentionaly. It means that we have CPU loaded very quickly and we have a lot of calls to be "outsourced" to other CPUs.

To me this appears more as an algorithmic design issue, this function could be rewritten in "continuation passing style", and each continuation could be distributed by a load-balancing strategy to the computers sharing CPU resources. Using mechanisms such as "futures" (as in Mozart) allows to do this easily... but I don't think these features belong to the set of languages low level primitives and constructs.

I personally don't think that automating the conversion of an algorithm from non-distributed to distributed execution can be done at a low level, mainly because it involves many high-level constructs. On the other hand, I have heard of languages that try to implement primitives for easy distributed computing, like the "Unified Parallel C" (see http://www.gwu.edu/~upc/documentation.html and http://ludo.humanoidz.org/doc/report-en.pdf), which may help to figure out what kind of primitives would be useful for adding distributing computing support into LLVM.

Maybe you were thinking of something similar to UPC primitives added to LLVM ?

-- Se'bastien

Hello Se'bastien,

I'm not sure to correctly understand what you mean, but I interpret it
as LLVM deciding where the code should be executed, like some
load-balancing strategy.

in this particular example it was really like that.
However I've tried to emphasize as well, that a decision
"where to execute" is strongly connected with
LLVM optimizations, which become either
applicable or not applicable depending on result of
the decision.

In this perspective, I think this should be
left up to the higher-level language, or event to the application
developer: I don't think incorporating load balancing strategies
directly into LLVM would be interesting, because strategies are high
level patterns.

1. strategy of balancing does not completely belong to application.

2. even being no LLVMdeveloper, I do like minimalist approach to LLVM design "throw out everything, what might be thrown away", but "don't throw more!" :slight_smile: -- How could you express in LLVM "optimization for remote call" if "remote" doesn't fit to LLVM?..

To me this appears more as an algorithmic design issue, this function
could be rewritten in "continuation passing style", and each
continuation could be distributed by a load-balancing strategy to the
computers sharing CPU resources. Using mechanisms such as "futures" (as
in Mozart) allows to do this easily...

1. It looks like you mean here, that one *must* change the
code of Fib function in order to control the
parallelization, right?

2. do you propose to ignore languages not supporting "continuation passing style"?

but I don't think these features
belong to the set of languages low level primitives and constructs.

if you don't pollute Fib. function with explicit
optimizations and use language like C, then what kind
of level should bring "parallel optimization"
to your Fib. code?..

don't forget, I'd like to parallelize any parallelizable
code, like in this Fib. example, written in C

I personally don't think that automating the conversion of an algorithm
from non-distributed to distributed execution can be done at a low
level, mainly because it involves many high-level constructs.

well, don't kill me, but I don't share your religion in this point :slight_smile:

More practically: let's use Fib example to get it parallelized
in terms of LLVM. The example is simple enough!

On the
other hand, I have heard of languages that try to implement primitives
for easy distributed computing, like the "Unified Parallel C" (see
http://www.gwu.edu/~upc/documentation.html and
http://ludo.humanoidz.org/doc/report-en.pdf), which may help to figure
out what kind of primitives would be useful for adding distributing
computing support into LLVM.

i agree. (and thank you for the nice links)

Maybe you were thinking of something similar to UPC primitives
added to LLVM ?

rather "yes".

Just a few notes from the peanut gallery... note that I can't really say
what is and is not possible with LLVM, I can just talk about our plans
(which we believe to be possible). If you guys are really interested in
this stuff, try hacking something together and see if it works! :slight_smile:

Indeed, LLVM will allocate registers and regular memory for us. This
output is free of any strategy of this allocation. We do *not* define
strategy of this allocation and we are even quite happy with that!
Therefore I could expect, that allocation of CPU could be also a
internal back-end problem of the LLVM like registers.

In theory that is possible, but it is somewhat problematic, at least
is the current representation. LLVM is designed to provide a convenient
level of abstraction that is good for code representation, optimization,
etc.

Though I am mostly ignorant about leading edge distributed (i.e., loosely
coupled parallel) programming stuff, I _think_ that network latencies are
such that you really need the "big picture" of what a program is doing to
know how to profitably distribute and adapt the program to the environment
it finds itself running in. I believe the current state of the art with
this relies on programmer annotations of various sorts to do this (Vikram
knows a lot more about this than I do, he should chime in :). It's
possible that there is some way to expose this information down to the
LLVM level, allowing neat optimizations to happen, but I don't really
know. :slight_smile:

In other words, indirectly, I propose to treat CPUs as the same resource
class as memory :slight_smile: Chris, LLVMers, could you allocate CPUs resources as
you do it for registers and memory? :stuck_out_tongue:

Our mid-term plans include looking at multithreading/parallel processing
kinds of things, at least for shared memory systems. At the LLVM level,
we are interested in _exposing_ parallelism. In the fib example you are
using, for example, it is pretty easy to show that all of the subcalls to
fib can be run independently of each other (ie, in parallel). Combined
with a suitable runtime library (e.g., pthreads), you could imagine an
LLVM transformation that allows it to run well on a machine with 10K
processors. :slight_smile:

Actually, maybe the problem is that scince is not ready arround
phi-functions, SSA and so on, if we speak about calculating machine with
several CPUs spread via network... What could LLVM-gurus say here?

"It's all about the abstraction." :slight_smile: If you come up with a clean design
that integrates well with LLVM, and makes sense, it's quite possible that
we can integrate it. If it doesn't fit well with LLVM, or, well, doesn't
make sense, then we'll probably wait for something that does. :slight_smile: That
said, LLVM is actually a really nice platform for playing around with
"experimental" ideas like this, so feel free to start hacking once you
get a concrete idea!

-Chris

Hello Chris,

Thursday, January 8, 2004, 5:28:47 PM, you wrote:

Though I am mostly ignorant about leading edge distributed (i.e., loosely
coupled parallel) programming stuff, I _think_ that network latencies are
such that you really need the "big picture" of what a program is doing to
know how to profitably distribute and adapt the program to the environment
it finds itself running in.

1. Hard disks are much more slower then registers, but they are used
quite automatically for swapping. By analogy: if some module/function
is very active for a long time, but not very much communicating with
environment, then it might be transferred over slow network to proceed
somewhere else.

2. what about this run-time profiling? It might be quite a good basis
for getting *real* info about interactions between functions/modules
without this "big picture" provided by programmer, who could be even
very wrong with his/her prediction on profiling of own program :slight_smile:
(haven't you been surprised sometime looking into profiling of your
own program? I am surprised very often )

Our mid-term plans include looking at multithreading/parallel processing
kinds of things, at least for shared memory systems.

cool... guys, I pray you get your "financial air" on a regular basis!
what you do is very important.

At the LLVM level, we are interested in _exposing_ parallelism.

hm... strange. You mean, you going to define explicitly what and how to
parallelize? Why then we don't need similar unpleasant things for
registers?..

In the fib example you are
using, for example, it is pretty easy to show that all of the subcalls to
fib can be run independently of each other (ie, in parallel). Combined
with a suitable runtime library (e.g., pthreads), you could imagine an
LLVM transformation that allows it to run well on a machine with 10K
processors. :slight_smile:

Chris, that's clear. But (let's me use this annoying analogy between
memory and CPUs) if you have strategy today where to allocate a
variable, then you need strategy to start or not to start a new
thread. Are there any good strategies for threads, as good as for
memory allocations?

Actually, maybe the problem is that scince is not ready arround
phi-functions, SSA and so on, if we speak about calculating machine with
several CPUs spread via network... What could LLVM-gurus say here?

"It's all about the abstraction." :slight_smile: If you come up with a clean design
that integrates well with LLVM, and makes sense, it's quite possible that
we can integrate it. If it doesn't fit well with LLVM, or, well, doesn't
make sense, then we'll probably wait for something that does. :slight_smile: That
said, LLVM is actually a really nice platform for playing around with
"experimental" ideas like this, so feel free to start hacking once you
get a concrete idea!

oopsss, Chris, but I was talking about scientific basis. Abstraction
has a lot to do here as well, but, as mathematician, I could say,
if theory is ready (e.g. orthogonal system of functions) then you could
play with special-purpose applications of this theory (FFT, wavelets,
etc).

Otherwise, one could die in debris of heuristics.

SSA, phi-operator and alike is not an empty sound, isn't it? I think
there is a lot of reasonable theory around which allows you think of
design and adequate implementation instead of thinking how to
formulate the task, what kind of property has the construction you use
and so on... Theory is either ready (then happy hacking!) or not ready
(then be careful before hacking in big troubles).

Or?..