Greetings & Javascript -> LLVM...

I have a concept for which I’m conducting an initial analysis. The broader idea is to create an LLVM, JIT based runtime that would create a platform amenable to scripting languages, but do so while enforcing an optional sandbox environment when dictated by security concerns (browsers, user preferences). With this approach, the community would gain language independence for browsers, as well as enabling much needed standardization over tooling support for debugging, refactoring, and even general editing concerns.

The first language I’d like to tackle is ECMAScript / Javascript.

So, aside from all the issues with the strategy (getting buy-in from browser / tooling teams, development community), my first concern is that it is even possible. Theoretically, it should be possible to express Javascript in LLVM. But a quick review of existing projects indicates that, while LLVM → Javascript has been taken on, what I’m seeking has not been done to date.

Have I missed anything, or is there any reason not to attempt a project like this?

Regards,

Julian Klappenbach

Most of the performance wins for dynamic languages are not from the
kinds of optimizations that LLVM does; you basically gain performance
by doing run-time specialization of dynamic language constructs to
become static, which is something that LLVM really won't help you do,
and which practically speaking is extremely language-specific.

For example, in JavaScript, all numbers are officially doubles, but in
many cases it is profitable to runtime-specialize them to be integers,
on the other hand, Python distinguishes between floats and integers,
but Python integers overflow into arbitrary-precision integers
on-demand.

For another example, in Python, dictionaries can have any hashable
type as their key, but in JavaScript, "objects" (which double as
hashtables) can only have strings as their keys (although numbers get
implicitly converted to strings which is another big language-specific
optimization opportunity). Oh, and in JavaScript "objects" have a
number of additional semantics which prevent them from being actually
used as pure hashtables!

--Sean Silva

Not necessarily looking for performance gains from LLVM. Instead, the value comes from having a common base platform which can gain language independence, address security concerns, support common tooling (debugging, editing, etc), and perhaps even introduce common language features (annotations / AOP).

I’m envisioning a use case where browsers would utilize this runtime to execute not only javascript, but also python, ruby, etc. Language specific interpreters could be downloaded on the fly to support scripts, and security would be ensured due to the fact that it would be based within the LLVM layer. The LLVM layer would also provide the access point for common browser APIs like access to the DOM and HTML nodes, XSLT, and XHR invocations. This would open up both the browser and current HTML5 application platforms to a wide variety of languages. Some may be quite happy with Javascript, but I’m sure others would be excited to see that stranglehold broken.

And for this purpose, I think LLVM would be well suited. It really depends on how much existing work can be leveraged, and how much interest exists within the community to see this happen.

-jjk

I think a good starting point for you would be the work that has gone
into Native Client.

Thanks, I’ll take a look. If any are interested, please feel free to contact me off list.

-jjk

With this approach, the community would gain language independence for browsers

Browser community is strongly opposed to the idea of having multiple web-faced languages

The first language I'd like to tackle is ECMAScript / Javascript.

You can tale a look at llvm-lua project. However, speed of JIT achieved by llvm-lua is much worse than language-specific LuaJIT.

[1] http://lists.webkit.org/pipermail/webkit-dev/2011-December/018813.html
[2] http://code.google.com/p/llvm-lua/

19.08.2012, 00:39, “Julian Klappenbach” <jklappenbach@gmail.com>:

With this approach, the community would gain language independence for browsers

Browser community is strongly opposed to the idea of having multiple web-faced languages

The browser development community may be opposed, but the general community of developers appears to think otherwise. I’m looking at the number of languages that are sprouting up that are interpreted into JavaScript (Ceylon, CoffeeScript, etc).

The first language I’d like to tackle is ECMAScript / Javascript.

You can tale a look at llvm-lua project. However, speed of JIT achieved by llvm-lua is much worse than language-specific LuaJIT.

[1] http://lists.webkit.org/pipermail/webkit-dev/2011-December/018813.html
[2] http://code.google.com/p/llvm-lua/

I’ll take a look at the performance aspect of LLVM vs language specific JIT. Thank you, your input is very helpful.

-jjk

I'm envisioning a use case where browsers would utilize this runtime to execute not only javascript, but also python, ruby, etc. Language specific interpreters could be downloaded on the fly to support scripts, and security would be ensured due to the fact that it would be based within the LLVM layer.

It's possible you didn't mean it this way, but it's important to avoid a
common confusion. LLVM IR provides no security. None whatsoever. Code in
LLVM IR has the same level of arbitrary memory access and access to the
enclosing system as C code does. In fact, if anything, LLVM probably
makes the security story worse.

It is possible to use LLVM within an independent sandbox, and various people
are doing that, but that's not a unique property of LLVM.

The LLVM layer would also provide the access point for common browser APIs like access to the DOM and HTML nodes, XSLT, and XHR invocations. This would open up both the browser and current HTML5 application platforms to a wide variety of languages. Some may be quite happy with Javascript, but I'm sure others would be excited to see that stranglehold broken.

I think there's no question that a lot of people want something like this.
However, there are a bunch of challenges. Some of the big ones include:

How are objects (as in object-oriented programming) going to work? Do you
envision the platform providing a generic object model that all high-level
languages will share, or do you envision every language framework building
its own object model on top of a set of primitive operations? This question,
and questions which follow it, will determine what kinds of languages can
be ported to the platform, as well as play a large role in determining how
hard it'll be to make them run efficiently, and how much cross-language
interoperability you can have.

Also, how is GC going to work? How is concurrency going to work? How is
security going to work? How are third-party libraries going to work?

These are some of the big important questions which will form the overall
shape of your design. And it turns out that LLVM itself doesn't provide
any significant help on any of them. So while LLVM may be a useful tool
in the implementation stage, it's probably not where you want to start
in the design stage for your project.

Dan

I’m envisioning a use case where browsers would utilize this runtime to execute not only javascript, but also python, ruby, etc. Language specific interpreters could be downloaded on the fly to support scripts, and security would be ensured due to the fact that it would be based within the LLVM layer.

It’s possible you didn’t mean it this way, but it’s important to avoid a
common confusion. LLVM IR provides no security. None whatsoever. Code in
LLVM IR has the same level of arbitrary memory access and access to the
enclosing system as C code does. In fact, if anything, LLVM probably
makes the security story worse.

It is possible to use LLVM within an independent sandbox, and various people
are doing that, but that’s not a unique property of LLVM.

No, I have simply been evaluating LLVM from he perspective of a common, intermediate format for execution. Security would be implemented in terms of the services / APIs that would be made available to the execution context.

The LLVM layer would also provide the access point for common browser APIs like access to the DOM and HTML nodes, XSLT, and XHR invocations. This would open up both the browser and current HTML5 application platforms to a wide variety of languages. Some may be quite happy with Javascript, but I’m sure others would be excited to see that stranglehold broken.

I think there’s no question that a lot of people want something like this.
However, there are a bunch of challenges. Some of the big ones include:

How are objects (as in object-oriented programming) going to work? Do you
envision the platform providing a generic object model that all high-level
languages will share, or do you envision every language framework building
its own object model on top of a set of primitive operations? This question,
and questions which follow it, will determine what kinds of languages can
be ported to the platform, as well as play a large role in determining how
hard it’ll be to make them run efficiently, and how much cross-language
interoperability you can have.

Also, how is GC going to work? How is concurrency going to work? How is
security going to work? How are third-party libraries going to work?

These are some of the big important questions which will form the overall
shape of your design. And it turns out that LLVM itself doesn’t provide
any significant help on any of them. So while LLVM may be a useful tool
in the implementation stage, it’s probably not where you want to start
in the design stage for your project.

Dan

All valid points. The interpreter layer, whether it uses LLVM or not, is but one layer of the concept, and one that sits on top of an equally important service layer. But in putting together a stack, I like to take a holistic approach.

But as it stands, earlier comments were on the mark about LLVM and its viability for JIT interpreters. Tracing JIT interpreters have been demonstrating impressive benchmarks over LLVM implementations, often in excess of 30%. Furthermore, the work on a few of these tracing JIT engines has been well architected, and language independent. The problem of enabling a common tooling interface for supported languages still exists, but that appears minor to the concern of performance.

I really want to thank those of you who took the time to respond, your input has been invaluable.

-jjk