LLVM IR is a compiler IR

In this email, I argue that LLVM IR is a poor system for building a
Platform, by which I mean any system where LLVM IR would be a
format in which programs are stored or transmitted for subsequent
use on multiple underlying architectures.

LLVM IR initially seems like it would work well here. I myself was
once attracted to this idea. I was even motivated to put a bunch of
my own personal time into making some of LLVM's optimization passes
more robust in the absence of TargetData a while ago, even with no
specific project in mind. There are several things still missing,
but one could easily imagine that this is just a matter of people
writing some more code.

However, there are several ways in which LLVM IR differs from actual
platforms, both high-level VMs like Java or .NET and actual low-level
ISAs like x86 or ARM.

First, the boundaries of what capabilities LLVM provides are nebulous.
LLVM IR contains:

* Explicitly Target-specific features. These aren't secret;
   x86_fp80's reason for being is pretty clear.

* Target-specific ABI code. In order to interoperate with native
   C ABIs, LLVM requires front-ends to emit target-specific IR.
   Pretty much everyone around here has run into this.

* Implicitly Target-specific features. The most obvious examples of
   these are all the different Linkage kinds. These are all basically
   just gateways to features in real linkers, and real linkers vary
   quite a lot. LLVM has its own IR-level Linker, but it doesn't
   do all the stuff that native linkers do.

* Target-specific limitations in seemingly portable features.
   How big can the alignment be on an alloca? Or a GlobalVariable?
   What's the widest supported integer type? LLVM's various backends
   all have different answers to questions like these.

Even ignoring the fact that the quality of the backends in the
LLVM source tree varies widely, the question of "What can LLVM IR do?"
has numerous backend-specific facets. This can be problematic for
producers as well as consumers.

Second, and more fundamentally, LLVM IR is a fundamentally
vague language. It has:

* Undefined Behavior. LLVM is, at its heart, a C compiler, and
   Undefined Behavior is one of its cornerstones.

   High-level VMs typically raise predictable exceptions when they
   encounter program errors. Physical machines typically document
   their behavior very extensively. LLVM is fundamentally different
   from both: it presents a bunch of rules to follow and then offers
   no description of what happens if you break them.

   LLVM's optimizers are built on the assumption that the rules
   are never broken, so when rules do get broken, the code just
   goes off the rails and runs into whatever happens to be in
   the way. Sometimes it crashes loudly. Sometimes it silently
   corrupts data and keeps running.

   There are some tools that can help locate violations of the
   rules. Valgrind is a very useful tool. But they can't find
   everything. There are even some kinds of undefined behavior that
   I've never heard anyone even propose a method of detection for.

* Intentional vagueness. There is a strong preference for defining
   LLVM IR semantics intuitively rather than formally. This is quite
   practical; formalizing a language is a lot of work, it reduces
   future flexibility, and it tends to draw attention to troublesome
   edge cases which could otherwise be largely ignored.

   I've done work to try to formalize parts of LLVM IR, and the
   results have been largely fruitless. I got bogged down in
   edge cases that no one is interested in fixing.

* Floating-point arithmetic is not always consistent. Some backends
   don't fully implement IEEE-754 arithmetic rules even without
   -ffast-math and friends, to get better performance.

If you're familiar with "write once, debug everywhere" in Java,
consider the situation in LLVM IR, which is fundamentally opposed
to even trying to provide that level of consistency. And if you allow
the optimizer to do subtarget-specific optimizations, you increase
the chances that some bit of undefined behavior or vagueness will be
exposed.

Third, LLVM is a low level system that doesn't represent high-level
abstractions natively. It forces them to be chopped up into lots of
small low-level instructions.

* It makes LLVM's Interpreter really slow. The amount of work
   performed by each instruction is relatively small, so the interpreter
   has to execute a relatively large number of instructions to do simple
   tasks, such as virtual method calls. Languages built for interpretation
   do more with fewer instructions, and have lower per-instruction
   overhead.

* Similarly, it makes really-fast JITing hard. LLVM is fast compared
   to some other static C compilers, but it's not fast compared to
   real JIT compilers. Compiling one LLVM IR level instruction at a
   time can be relatively simple, ignoring the weird stuff, but this
   approach generates comically bad code. Fixing this requires
   recognizing patterns in groups of instructions, and then emitting
   code for the patterns. This works, but it's more involved.

* Lowering high-level language features into low-level code locks
   in implementation details. This is less severe in native code,
   because a compiled blob is limited to a single hardware platform
   as well. But a platform which advertizes architecture independence
   which still has all the ABI lock-in of HLL implementation details
   presents a much more frightening backwards compatibility specter.

* Apple has some LLVM IR transformations for Objective-C, however
   the transformations have to reverse-engineer the high-level semantics
   out of the lowered code, which is awkward. Further, they're
   reasoning about high-level semantics in a way that isn't guaranteed
   to be safe by LLVM IR rules alone. It works for the kinds of code
   clang generates for Objective C, but it wouldn't necessarily be
   correct if run on code produced by other front-ends. LLVM IR
   isn't capable of representing the necessary semantics for this
   unless we start embedding Objective C into it.

In conclusion, consider the task of writing an independent implementation
of an LLVM IR Platform. The set of capabilities it provides depends on who
you talk to. Semantic details are left to chance. There are features
which require a bunch of complicated infrastructure to implement which
are rarely used. And if you want light-weight execution, you'll
probably need to translate it into something else better suited for it
first. This all doesn't sound very appealing.

LLVM isn't actually a virtual machine. It's widely acknoledged that the
name "LLVM" is a historical artifact which doesn't reliably connote what
LLVM actually grew to be. LLVM IR is a compiler IR.

Dan

Thank you for writing this. First, I’d like to say that I am in 100% agreement with your points. I’ve been tempted many times to write something similar, although what you’ve written has been articulated much better than what I would have said.

When I try to explain to people what LLVM is I say “It’s essentially the back-end of a compiler” - a job it does extremely well. I don’t say “It’s a virtual machine”, because that is a job it doesn’t do very well at all.

I’d like to add a couple of additional items to your list - first, LLVM IR isn’t stable, and it isn’t backwards compatible. Bitcode is not useful as an archival format, because a bitcode file cannot be loaded if it’s even a few months out of sync with the code that loads it. Loading a bitcode file that is years old is hopeless.

Also, bitcode is large compared to Java or CLR bitcodes. This isn’t such a big deal, but for people who want to ship code over the network it could be an issue.

I’ve been thinking that it would be a worthwhile project to develop a high-level IR that avoids many of the issues that you raise. Similar in concept to Java byte code, but without Java’s limitations - for example it would support pass-by-value types. (CLR has this, but it also has limitations). Of course, this IR would of necessity be less flexible than LLVM IR, but you could always dip into IR where needed, such as C programs dip into assembly on occasion.

This hypothetical IR language would include a type system that was rich enough to express all of the DWARF semantics - so that instead of having two parallel representations of every type (one for LLVM’s code generators and one for DWARF), you could instead generate both the LLVM types and the DWARF DI’s from a common representation. This would have a huge savings in both complexity and the size of bitcode files.

Interestingly I wrote a bytecode language exactly like this for my master's thesis, based atop of LLVM. I abandoned the project after graduating, but it had it's promising moments.

Hi Talin,

I too agree 100% with Dan's words, and this could be a good pointer
for Jin-Gu Kang to continue on his pursuit for a better
target-independent bitcode.

Also, add your backwards compatibility issue to debug metadata in IR,
in which fields appear or disappear without notice.

But I think you hit a sweet spot here...

This hypothetical IR language would include a type system that was rich
enough to express all of the DWARF semantics - so that instead of having two
parallel representations of every type (one for LLVM's code generators and
one for DWARF), you could instead generate both the LLVM types and the DWARF
DI's from a common representation. This would have a huge savings in both
complexity and the size of bitcode files.

This is a really interesting idea. If you could describe your type
system in terms of Dwarf, you would have both: a rich type system AND
free Dwarf.

However, writing a back-end that would understand such a rich type
system AND language ABIs is out of the question.

We were discussing JIT and trying to come to a solution where JIT
wouldn't be as heavy as it has to be now, to no avail. Unless there is
a language that is of a higher level (like Java bytecode) or JIT will
always suffer.

If you join Dan's well said points, plus yours, Jin-Gu's and the
necessity of a decent JIT, it's almost reason enough to split the IR
into higher and lower versions (as proposed last year to deal with
complex type systems and ABIs).

Even some optimisations (maybe even Polly) could benefit from this
higher level representation, and all current optimisations can still
pass on the current, low-level, IR.

My tuppence.

cheers,
--renato

Hi Talin,

I too agree 100% with Dan's words, and this could be a good pointer
for Jin-Gu Kang to continue on his pursuit for a better
target-independent bitcode.

Also, add your backwards compatibility issue to debug metadata in IR,
in which fields appear or disappear without notice.

But I think you hit a sweet spot here...

This hypothetical IR language would include a type system that was rich
enough to express all of the DWARF semantics - so that instead of having two
parallel representations of every type (one for LLVM's code generators and
one for DWARF), you could instead generate both the LLVM types and the DWARF
DI's from a common representation. This would have a huge savings in both
complexity and the size of bitcode files.

This is a really interesting idea. If you could describe your type
system in terms of Dwarf, you would have both: a rich type system AND
free Dwarf.

This sounds interesting. I did not get what is a ``rich type system to
express all of the DWARF semantics''. Could you show an example
program that the rich type system can define, but the current IR fails
to present? And how does it improve the IR?

Any code with C++ classes, C unions and bit-fields would be much
improved. Talin might help you with non-C++ types.

Basically anything that has to be kludged to be lowered to IR would
benefit from a higher-level IR. ByValue calls, self pointers, RTTI,
virtual inheritance, multiple inheritance.

Dan's argument that IR is unstable is clear when you get to write a
front-end from scratch. The first front-end generated a lot of kludge
to lower C++, since it'd take years to implement it properly in IR
(and all back-ends). The second, third and so on were forced to follow
the same kludge. Non-C++ front-ends suffer even more, since they have
to kludge their languages into a C++-semantic kludge, which has no
1-to-1 relationship with the original semantics of their code.

Search on the list about the topics above and you'll see that there
was a lot of discussion wasted on them for years and years.

In this email, I argue that LLVM IR is a poor system for building a
Platform, by which I mean any system where LLVM IR would be a
format in which programs are stored or transmitted for subsequent
use on multiple underlying architectures.

Hi Dan,

I agree with almost all of the points you make, but not your conclusion. Many of the issues that you point out as problems are actually "features" that a VM like Java doesn't provide. For example, Java doesn't have uninitialized variables on the stack, and LLVM does. LLVM is capable of expressing the implicit zero initialization of variables that is implicit in Java, it just leaves the choice to the frontend.

Many of the other issues that you raise are true, but irrelevant when compared to other VMs. For example, LLVM allows a frontend to produce code that is ABI compatible with native C ABIs. It does this by requiring the frontend to know a lot about the native C ABI. Java doesn't permit this at all, and so LLVM having "this feature" seems like a feature over-and-above what high-level VMs provide. Similiarly, the "conditionally" supported features like large and obscurely sized integers simply don't exist in these VMs.

The one key feature that LLVM doesn't have that Java does, and which cannot be added to LLVM "through a small matter of implementation" is verifiable safety. Java bytecode verification is not something that LLVM IR permits, which you can't really do in LLVM (without resorting to techniques like SFI).

With all that said, I do think that we have a real issue here. The real issue is that we have people struggling to do things that a "hard" and see LLVM as the problem. For example:

1. The native client folks trying to use LLVM IR as a portable representation that abstracts arbitrary C calling conventions. This doesn't work because the frontend has to know the C calling conventions of the target.

2. The OpenCL folks trying to turn LLVM into a portable abstraction language by introducing endianness abstractions. This is hard because C is inherently a non-portable language, and this is only scratching the surface of the issues. To really fix this, OpenCL would have to be subset substantially, like the EFI C dialect.

LLVM isn't actually a virtual machine. It's widely acknoledged that the
name "LLVM" is a historical artifact which doesn't reliably connote what
LLVM actually grew to be. LLVM IR is a compiler IR.

It sounds like you're picking a very specific definition of what a VM is. LLVM certainly isn't a high level virtual machine like Java, but that's exactly the feature that makes it a practical target for C-family languages. It isn't LLVM's fault that people want LLVM to magically solve all of C's portability problems.

-Chris

Hi Talin,

I too agree 100% with Dan’s words, and this could be a good pointer
for Jin-Gu Kang to continue on his pursuit for a better
target-independent bitcode.

Also, add your backwards compatibility issue to debug metadata in IR,
in which fields appear or disappear without notice.

But I think you hit a sweet spot here…

This hypothetical IR language would include a type system that was rich
enough to express all of the DWARF semantics - so that instead of having two
parallel representations of every type (one for LLVM’s code generators and
one for DWARF), you could instead generate both the LLVM types and the DWARF
DI’s from a common representation. This would have a huge savings in both
complexity and the size of bitcode files.

This is a really interesting idea. If you could describe your type
system in terms of Dwarf, you would have both: a rich type system AND
free Dwarf.

This sounds interesting. I did not get what is a ``rich type system to
express all of the DWARF semantics’'. Could you show an example
program that the rich type system can define, but the current IR fails
to present? And how does it improve the IR?

One thing you would need is the ability to assign names to struct members. Currently LLVM refers to struct members by numerical index, and I wouldn’t want to change that. However, in order to be visible in the debugger, you also have to assign a name to each member. Note that this information doesn’t need to take a lot of space in the module - names are highly compressible, especially fully qualified names (foo.bar.X.Y.Z) where you have a whole bunch of names that start with the same prefix. (In my own frontend, I sort names by frequency, so that the names that are used most often have the lowest assigned IDs. This allows the reference to the name to be stored in fewer bits.)

Similarly, in order to handle inheritance, you would need a way to indicate which fields of the struct were “inherited”. Currently, inheritance is handled by embedding the parent class as the first member of the child class - but the IR level can’t tell whether that first member is inherited or is just a regular member.

Both of these are features that LLVM IR doesn’t need, but which would be nice to have in a higher-level IR based on top of LLVM.

Note that you could also use the “rich type system” for generating reflection data as well. So that’s a third use case.

I better stop right now before I convince myself to do something crazy, like write my own VM.

Talin, if you’re trying to produce a “C virtual machine” you basically need everything in a C AST. You’re scratching the surface of the issues.

-Chris

This sounds interesting. I did not get what is a ``rich type system to
express all of the DWARF semantics’'. Could you show an example
program that the rich type system can define, but the current IR fails
to present? And how does it improve the IR?

One thing you would need is the ability to assign names to struct members. Currently LLVM refers to struct members by numerical index, and I wouldn’t want to change that. However, in order to be visible in the debugger, you also have to assign a name to each member. Note that this information doesn’t need to take a lot of space in the module - names are highly compressible, especially fully qualified names (foo.bar.X.Y.Z) where you have a whole bunch of names that start with the same prefix. (In my own frontend, I sort names by frequency, so that the names that are used most often have the lowest assigned IDs. This allows the reference to the name to be stored in fewer bits.)

Talin, if you’re trying to produce a “C virtual machine” you basically need everything in a C AST. You’re scratching the surface of the issues.

I was trying to limit my response to just a few examples - I realize that there’s a ton of things I’ve left out. I guess what I am describing is the same as Microsoft’s “Managed C++” - the ability to compile C++ programs to the CLR. Note that Managed C++ doesn’t give you every possible feature of C++, you can’t take an arbitrary C++ program and expect to compile it in managed mode.

1. The native client folks trying to use LLVM IR as a portable representation that abstracts arbitrary C calling conventions. This doesn't work because the frontend has to know the C calling conventions of the target.

(...)

2. The OpenCL folks trying to turn LLVM into a portable abstraction language by introducing endianness abstractions. This is hard because C is inherently a non-portable language, and this is only scratching the surface of the issues. To really fix this, OpenCL would have to be subset substantially, like the EFI C dialect.

(...)

It sounds like you're picking a very specific definition of what a VM is. LLVM certainly isn't a high level virtual machine like Java, but that's exactly the feature that makes it a practical target for C-family languages. It isn't LLVM's fault that people want LLVM to magically solve all of C's portability problems.

Chris,

This is a very simplistic point of view, and TBH, I'm a bit shocked.

Having a "nicer codebase" and "friendlier community" are two strong
points for LLVM against GCC, but they're too weak to migrate people
from GCC to LLVM.

JIT, "the native client folks", "the openCL folks" are showing how
powerful LLVM could be, if it was a bit more accommodating. Without
those troublesome folks, LLVM is just another compiler, like GCC, and
being completely blunt, it's no better.

The infrastructure to add new passes is better, but the number and
quality of passes is not. It's way easier to create new back-ends, but
the existing number and quality, again, no better. The "good code" is
suffering a diverse community, large codebase and company's interests,
which is not a good forecast for code quality. It's not just the IR
that has a lot of kludge, back-ends, front-ends, dwarf emitter,
exception handling, etc., although some nicer than GCC, it's not
complete nor accurate.

If you want to bet on a "fun community" to drive LLVM, I don't think
you'll go too far. And if you want to discard the OpenCL, JIT and
NativeClient-style community, well, there won't be much of a community
to be any fun...

If you want to win on the code quality battle, while working for a big
company, good luck. Part of the GCC community's grunts are towards
companies trying to push selfish code in, and well, their reasons are
not all without merit.

I didn't see people looking for a magic wand on these discussions so far...

In this email, I argue that LLVM IR is a poor system for building a
Platform, by which I mean any system where LLVM IR would be a
format in which programs are stored or transmitted for subsequent
use on multiple underlying architectures.

Hi Dan,

I agree with almost all of the points you make, but not your conclusion. Many of the issues that you point out as problems are actually “features” that a VM like Java doesn’t provide. For example, Java doesn’t have uninitialized variables on the stack, and LLVM does. LLVM is capable of expressing the implicit zero initialization of variables that is implicit in Java, it just leaves the choice to the frontend.

Many of the other issues that you raise are true, but irrelevant when compared to other VMs. For example, LLVM allows a frontend to produce code that is ABI compatible with native C ABIs. It does this by requiring the frontend to know a lot about the native C ABI. Java doesn’t permit this at all, and so LLVM having “this feature” seems like a feature over-and-above what high-level VMs provide. Similiarly, the “conditionally” supported features like large and obscurely sized integers simply don’t exist in these VMs.

The one key feature that LLVM doesn’t have that Java does, and which cannot be added to LLVM “through a small matter of implementation” is verifiable safety. Java bytecode verification is not something that LLVM IR permits, which you can’t really do in LLVM (without resorting to techniques like SFI).

With all that said, I do think that we have a real issue here. The real issue is that we have people struggling to do things that a “hard” and see LLVM as the problem. For example:

  1. The native client folks trying to use LLVM IR as a portable representation that abstracts arbitrary C calling conventions. This doesn’t work because the frontend has to know the C calling conventions of the target.

  2. The OpenCL folks trying to turn LLVM into a portable abstraction language by introducing endianness abstractions. This is hard because C is inherently a non-portable language, and this is only scratching the surface of the issues. To really fix this, OpenCL would have to be subset substantially, like the EFI C dialect.

LLVM isn’t actually a virtual machine. It’s widely acknoledged that the
name “LLVM” is a historical artifact which doesn’t reliably connote what
LLVM actually grew to be. LLVM IR is a compiler IR.

It sounds like you’re picking a very specific definition of what a VM is. LLVM certainly isn’t a high level virtual machine like Java, but that’s exactly the feature that makes it a practical target for C-family languages. It isn’t LLVM’s fault that people want LLVM to magically solve all of C’s portability problems.

I understand that the official goals of the LLVM project are carefully limited. A large number of LLVM users are perfectly happy to live within the envelope of what LLVM provides. At the same time, there are also a fair number of users who are aiming for things that appear to be just outside that envelope. These “near miss” users are looking at Java, at CLR, and constantly asking themselves “did I make the right decision betting on LLVM rather than these other platforms?” Unfortunately, there are frustratingly few choices available in this space, and LLVM happens to be “nearest” conceptually to what these users want to accomplish. But bridging the gap between where they want to go and where LLVM is headed is often quite a challenge, one that is measured in multiple man-years of effort.

Are you willing to give up the ability to interoperate with existing ABIs and code compiled by other compilers?

-Chris

I completely agree, and I’m really interested in LLVM improving to solve these sorts of problems. I’m not sure how this relates to Dan’s email or my response though.

-Chris

1. The native client folks trying to use LLVM IR as a portable representation that abstracts arbitrary C calling conventions. This doesn't work because the frontend has to know the C calling conventions of the target.

(...)

2. The OpenCL folks trying to turn LLVM into a portable abstraction language by introducing endianness abstractions. This is hard because C is inherently a non-portable language, and this is only scratching the surface of the issues. To really fix this, OpenCL would have to be subset substantially, like the EFI C dialect.

(...)

It sounds like you're picking a very specific definition of what a VM is. LLVM certainly isn't a high level virtual machine like Java, but that's exactly the feature that makes it a practical target for C-family languages. It isn't LLVM's fault that people want LLVM to magically solve all of C's portability problems.

Chris,

This is a very simplistic point of view, and TBH, I'm a bit shocked.

I'm sorry, I didn't mean to be offensive.

JIT, "the native client folks", "the openCL folks" are showing how
powerful LLVM could be, if it was a bit more accommodating. Without
those troublesome folks, LLVM is just another compiler, like GCC, and
being completely blunt, it's no better.

I'm not sure what you're getting at here. My email was not intended to say that I'm not interested in LLVM improving - quite the contrary. My email was to rebut Dan's implicit claim that PNaCL and using LLVM as a portable IR is never going to work. I'm arguing in the "opencl" and "pnacl" folks favor :slight_smile:

That said, I'm trying to also inject realism. C is an inherently hostile language to try to get portability out of.

The infrastructure to add new passes is better, but the number and
quality of passes is not. It's way easier to create new back-ends, but
the existing number and quality, again, no better. The "good code" is
suffering a diverse community, large codebase and company's interests,
which is not a good forecast for code quality. It's not just the IR
that has a lot of kludge, back-ends, front-ends, dwarf emitter,
exception handling, etc., although some nicer than GCC, it's not
complete nor accurate.

I'm not sure what point you're trying to make here. We all know that LLVM sucks, fortunately lots of people seem to be motivated to help make it better :slight_smile:

If you want to bet on a "fun community" to drive LLVM, I don't think
you'll go too far. And if you want to discard the OpenCL, JIT and
NativeClient-style community, well, there won't be much of a community
to be any fun...

I think that we're pretty strongly miscommunicating...

-Chris

All,

I should have chimed in earlier, but have been working on two more side-channel variants of this conversation.
At the beginning the PNaCl team was strongly pushing for trying to keep platform ABI compatibility on all
platforms while taking one portable bitcode stream as input. During the discussions we’ve had over the past few
weeks it became obvious that that is simply not tractable, and we’re trying to work in a slightly different direction
that meets PNaCl’s needs.

PNaCl needs to have one bitcode representation that can be lowered to efficient ABIs on all target platforms.
We’re not constrained to have to use the exact ABI that trusted code would on each of the platforms, however.
For reasons of portability we’ve already eliminated long double support (or made it equivalent to double if you prefer).
And we’re currently proposing to use the platform ABIs, except that structure arguments (including unions, bitfields, etc.)
are always passed in memory. With this set of caveats we think we can meet our objectives for at least x86-32,
x86-64, and ARM. The remaining complexity is making byval usable on all platforms so that we can have one representation.
Of course we’d like community input on the issues we’ve overlooked.

There are other issues of compatibility that still keep us awake at night, of course, and I hope the next developers’
conference will give us the chance to wrangle with some of those.

David

Dan Gohman <gohman@apple.com> writes:

Great post, Dan. Some comments follow.

[snip]

* Target-specific ABI code. In order to interoperate with native
   C ABIs, LLVM requires front-ends to emit target-specific IR.
   Pretty much everyone around here has run into this.

There are places where compatibility with the native C ABI is taken too
far. For instance, time ago I noted that what the user sets through
Module::setDataLayout is simply ignored. LLVM uses the data layout
required by the native C ABI, which is hardcoded into LLVM's source
code. So I asked: pass the value setted by Module::setDataLayout to the
layers that are interested on it, as any user would expect. The response
I got was, in essence, "As you are not working on C/C++, I couldn't care
less about your language's requirements." So I have a two-line patch on
my LLVM local copy, which has the effect of making the IR code generated
by my compiler portable across Linux/x86 and Windows/x86 (although that
was not the reason I wanted the change.)

So it is true that LLVM IR has portability limitations, but not all of
them are intrinsic to the LLVM IR nature.

[snip]

For my own needs, I’d be willing to live with a system that requires extra effort on behalf of the programmer (in the form of __attributes or #pragmas or SWIG) when calling out to code produced by other compilers. In other words, I’d like for the kind of interoperability you describe to be possible, but I don’t necessarily require or expect that it be easy.

– Talin

Here’s my position in a nutshell: The kind of things that Dan wants LLVM to do should really be a separate sub-project from LLVM proper, built on top of LLVM. I think it’s unrealistic to expect LLVM proper to adopt Dan’s stated objectives - but at the same time, it would be a awful shame if there wasn’t something that could meet his needs, since I think many people other than Dan would benefit from such a thing.

For example, I don’t expect that LLVM IR should suddenly become stable and usable as an archive format, but I think it entirely reasonable that someone could come up with a higher-level IR, translatable into LLVM IR, that does have those qualities. The fact that we have projects that convert JVM bytecode into LLVM IR is proof that such a thing is possible. (Except that any language implementer willing to live with the limitations of the JVM probably wouldn’t be on this mailing list to begin with, but I digress.)

The question I am interested in exploring is whether the goals of the “near miss” users of LLVM are similar enough to each other to be worth having a conversation about how to achieve those goals collaboratively, or whether it is better that we should each continue to struggle with our own problems individually.

Excellent! Assuming that PNaCl has settled on a standard object layout — obviously a hard requirement in any case — this seems like a solid path towards portability.

John.