[RFC] WebAssembly Backend

Hello all,

WebAssembly [0] its a new virtual ISA being designed to efficiently
run compiled code in web browsers and other things, starting with
C/C++, and eventually many other languages [1]. WebAssembly
distinguishes itself from other virtual ISAs with optimizations to
reduce download size and decode time, strong portability and
predictability invariants (for example, the base has no undefined
behavior in the C/C++ sense), and participation from several browser
vendors.

We're interested in developing and contributing an LLVM backend to
target this new ISA. There are many interesting technical aspects that
we’re excited to discuss with the LLVM community. Before we get
started though, we need to figure out how to do our development. Most
backends in LLVM were initially submitted in monolithic form and
developed incrementally thereafter. However, we have contributors from
multiple organizations, and a monolithic patch wouldn’t accurately
reflect the separate contributions.

Would the LLVM community be willing to let us start a new target from
scratch within the LLVM tree, following normal LLVM
incremental-development practices? The target would naturally start as
"experimental", excluded from the default build. The code organization
would look like any other backend, with everything under
lib/Target/WebAssembly except for various bits of configury that any
backend needs. We have need of the functionality provided by
SelectionDAG, MI and others, so this will pretty clearly be a backend,
rather than a specialized serialization. Also, the people leading the
project are JF Bastien and Dan Gohman, existing LLVM contributors
familiar with various relevant areas of LLVM.

Additionally, there are opportunities to refactor generic
infrastructure in LLVM to better support the needs of virtual ISAs,
including those in LLVM already and possibly more in the future.
Working in LLVM from the start would make collaboration with the rest
of the community easier.

We look forward to your feedback and questions. Thanks!

        - The WebAssembly Community Group

[0] design/README.md at main · WebAssembly/design · GitHub
[1] design/HighLevelGoals.md at main · WebAssembly/design · GitHub

From: "Dan Gohman" <dan433584@gmail.com>
To: llvmdev@cs.uiuc.edu
Sent: Wednesday, June 17, 2015 11:16:21 AM
Subject: [LLVMdev] [RFC] WebAssembly Backend

Hello all,

WebAssembly [0] its a new virtual ISA being designed to efficiently
run compiled code in web browsers and other things, starting with
C/C++, and eventually many other languages [1]. WebAssembly
distinguishes itself from other virtual ISAs with optimizations to
reduce download size and decode time, strong portability and
predictability invariants (for example, the base has no undefined
behavior in the C/C++ sense), and participation from several browser
vendors.

We're interested in developing and contributing an LLVM backend to
target this new ISA. There are many interesting technical aspects
that
we’re excited to discuss with the LLVM community. Before we get
started though, we need to figure out how to do our development. Most
backends in LLVM were initially submitted in monolithic form and
developed incrementally thereafter. However, we have contributors
from
multiple organizations, and a monolithic patch wouldn’t accurately
reflect the separate contributions.

Would the LLVM community be willing to let us start a new target from
scratch within the LLVM tree, following normal LLVM
incremental-development practices? The target would naturally start
as
"experimental", excluded from the default build. The code
organization
would look like any other backend, with everything under
lib/Target/WebAssembly except for various bits of configury that any
backend needs. We have need of the functionality provided by
SelectionDAG, MI and others, so this will pretty clearly be a
backend,
rather than a specialized serialization. Also, the people leading the
project are JF Bastien and Dan Gohman, existing LLVM contributors
familiar with various relevant areas of LLVM.

Additionally, there are opportunities to refactor generic
infrastructure in LLVM to better support the needs of virtual ISAs,
including those in LLVM already and possibly more in the future.
Working in LLVM from the start would make collaboration with the rest
of the community easier.

I agree, developing the backend in tree is the best path. Specifically, it is the best way for the community to monitor the development and ensure proper test coverage. Obviously, there are some number of files needed for a backend to be minimally functional, and I think that putting those together and submitting them for review as a new experimental backend is perfectly appropriate.

-Hal

From: “Dan Gohman” <dan433584@gmail.com>
To: llvmdev@cs.uiuc.edu
Sent: Wednesday, June 17, 2015 11:16:21 AM
Subject: [LLVMdev] [RFC] WebAssembly Backend

Hello all,

WebAssembly [0] its a new virtual ISA being designed to efficiently
run compiled code in web browsers and other things, starting with
C/C++, and eventually many other languages [1]. WebAssembly
distinguishes itself from other virtual ISAs with optimizations to
reduce download size and decode time, strong portability and
predictability invariants (for example, the base has no undefined
behavior in the C/C++ sense), and participation from several browser
vendors.

We’re interested in developing and contributing an LLVM backend to
target this new ISA. There are many interesting technical aspects
that
we’re excited to discuss with the LLVM community. Before we get
started though, we need to figure out how to do our development. Most
backends in LLVM were initially submitted in monolithic form and
developed incrementally thereafter. However, we have contributors
from
multiple organizations, and a monolithic patch wouldn’t accurately
reflect the separate contributions.

Would the LLVM community be willing to let us start a new target from
scratch within the LLVM tree, following normal LLVM
incremental-development practices? The target would naturally start
as
“experimental”, excluded from the default build. The code
organization
would look like any other backend, with everything under
lib/Target/WebAssembly except for various bits of configury that any
backend needs. We have need of the functionality provided by
SelectionDAG, MI and others, so this will pretty clearly be a
backend,
rather than a specialized serialization. Also, the people leading the
project are JF Bastien and Dan Gohman, existing LLVM contributors
familiar with various relevant areas of LLVM.

Additionally, there are opportunities to refactor generic
infrastructure in LLVM to better support the needs of virtual ISAs,
including those in LLVM already and possibly more in the future.
Working in LLVM from the start would make collaboration with the rest
of the community easier.

I agree, developing the backend in tree is the best path. Specifically, it is the best way for the community to monitor the development and ensure proper test coverage. Obviously, there are some number of files needed for a backend to be minimally functional, and I think that putting those together and submitting them for review as a new experimental backend is perfectly appropriate.

FWIW, I agree.

However, I also suggested this approach, so I’m probably biased. ;] I just wanted to explicitly say that I think this is a great path forward and I’m cautiously optimistic about the entire effort.

Sounds great to me,

-Chris

Would the LLVM community be willing to let us start a new target from
scratch within the LLVM tree, following normal LLVM
incremental-development practices? The target would naturally start as
"experimental", excluded from the default build. The code organization
would look like any other backend, with everything under
lib/Target/WebAssembly except for various bits of configury that any
backend needs. We have need of the functionality provided by
SelectionDAG, MI and others, so this will pretty clearly be a backend,
rather than a specialized serialization. Also, the people leading the
project are JF Bastien and Dan Gohman, existing LLVM contributors
familiar with various relevant areas of LLVM.

+1. I see no real downside to this proposal.

Additionally, there are opportunities to refactor generic
infrastructure in LLVM to better support the needs of virtual ISAs,
including those in LLVM already and possibly more in the future.
Working in LLVM from the start would make collaboration with the rest
of the community easier.

Can you list some of the areas you know are going to need attention? What problems are you seeing? (It might be best to move this to it's own thread or keep this very high level for now.)

Philip

We'll definitely be starting separate threads when we're ready, but
briefly, one of the things we'll need to look into is MC. Right now,
MC's interfaces appear to be tailored for native-binary-style
encodings, so if we're going to use MC, it'll likely need some
refactoring to generalize it.

Dan

Hello all,

WebAssembly [0] its a new virtual ISA being designed to efficiently
run compiled code in web browsers and other things, starting with
C/C++, and eventually many other languages [1]. WebAssembly
distinguishes itself from other virtual ISAs with optimizations to
reduce download size and decode time, strong portability and
predictability invariants (for example, the base has no undefined
behavior in the C/C++ sense), and participation from several browser
vendors.

We're interested in developing and contributing an LLVM backend to
target this new ISA. There are many interesting technical aspects that
we’re excited to discuss with the LLVM community. Before we get
started though, we need to figure out how to do our development. Most
backends in LLVM were initially submitted in monolithic form and
developed incrementally thereafter. However, we have contributors from
multiple organizations, and a monolithic patch wouldn’t accurately
reflect the separate contributions.

Would the LLVM community be willing to let us start a new target from
scratch within the LLVM tree, following normal LLVM
incremental-development practices? The target would naturally start as
"experimental", excluded from the default build. The code organization
would look like any other backend, with everything under
lib/Target/WebAssembly except for various bits of configury that any
backend needs. We have need of the functionality provided by
SelectionDAG, MI and others, so this will pretty clearly be a backend,
rather than a specialized serialization. Also, the people leading the
project are JF Bastien and Dan Gohman, existing LLVM contributors
familiar with various relevant areas of LLVM.

Additionally, there are opportunities to refactor generic
infrastructure in LLVM to better support the needs of virtual ISAs,
including those in LLVM already and possibly more in the future.
Working in LLVM from the start would make collaboration with the rest
of the community easier.

We look forward to your feedback and questions. Thanks!

Hi,

This seems interesting, I have a few questions:

Has the ISA been finalized yet or is it still a work in progress? Will
there be a fixed number of registers?

How will the ISA be transformed to machine code?

Why do you want to develop a full backend as opposed to a simple LLVM
IR translation pass that converts IR directly to WebAssembly?

-Tom

For people following this thread, I've now posted such a patch here:

http://reviews.llvm.org/D10569

Dan

The NVPTX target does this with the existing register allocator by defining 800 physical registers of each type. It means you have to deal with unwanted spill code for those programs that exhaust the 800 registers.

The greedy register allocator doesn’t scale too well along that axis IMO. I think you get something like O(#vregs x #colors-used) behavior. It’s really designed to handle a small, fixed number of physical registers.

Matthias, Quentin: How well do the SSA-based register allocator algorithms work with infinite colors available?

Thanks,
/jakob

We foresee having an infinite number of locals per function, but we plan to pre-color them so that locals whose lifetimes don’t interfere can be merged. We can get clever and do this in an interesting order.

The NVPTX target does this with the existing register allocator by defining 800 physical registers of each type. It means you have to deal with unwanted spill code for those programs that exhaust the 800 registers.

The greedy register allocator doesn’t scale too well along that axis IMO. I think you get something like O(#vregs x #colors-used) behavior. It’s really designed to handle a small, fixed number of physical registers.

Matthias, Quentin: How well do the SSA-based register allocator algorithms work with infinite colors available?

The SSA-based register allocators work in linear time of the number of instructions.

Cheers,
Q.

Hi Jakob,

We foresee having an infinite number of locals per function, but we plan to pre-color them so that locals whose lifetimes don’t interfere can be merged. We can get clever and do this in an interesting order.

The NVPTX target does this with the existing register allocator by defining 800 physical registers of each type. It means you have to deal with unwanted spill code for those programs that exhaust the 800 registers.

The greedy register allocator doesn’t scale too well along that axis IMO. I think you get something like O(#vregs x #colors-used) behavior. It’s really designed to handle a small, fixed number of physical registers.

Matthias, Quentin: How well do the SSA-based register allocator algorithms work with infinite colors available?

The SSA-based register allocators work in linear time of the number of instructions.

For those interested in figures, check out:
http://q-colo01.appspot.com/thesis_slides.pdf slides 33/40 for high level information.
http://q-colo01.appspot.com/thesis.pdf 5.5.2.1 page 109, for more details.

Q.

If you don’t have any register constraints SSA coloring boils down to: “Walk the program in dominance order and assign a fresh register to each def and release the register on a last use”. However you need a way to implement register swaps for phi instructions afterwards although with infinite temp registers this is trivial.
I’d definitely recommend implementing a simple allocator from scratch, maybe reusing the liveness calculation, as you don’t need all the tricks necessary to deal with register constraints, two-address instructions, precolored registers, spill code placement, …

  • Matthias

Hi,

I've read the documentation (looks very interesting!) and have some
questions; apologies if these are answered elsewhere.

That's implementation dependent. Initially, a polyfill to JavaScript because
that's what exists today. We'll also implement a reference interpreter to
validate the spec. Each browser can do what it wants to have fast and secure
native support.

Is there likely to be a desire at some point to produce an LLVM
frontend (i.e. to convert from WebAssembly to LLVM IR), for example
for compiling WebAssembly to native binaries? Furthermore, does it
seem likely that any of the browsers would use LLVM on the client side
for optimisation or JIT execution? Based on the comparison against
LLVM IR, I assume the model is to use WebAssembly as a portable means
of communicating programs but LLVM IR (or some other IR) for
performing optimisations and other transformations (reminds me of this
discussion: http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-October/043719.html
).

I'm also wondering about the C ABI for WebAssembly. It sounds like
this currently isn't specified because dynamic linking isn't supported
yet, but presumably front-ends should be working on some kind of
placeholder standard to avoid later incompatibility? And won't the Web
APIs need to have some sort of ABI?

I noticed the C and C++ page says pointers are 32-bits, but there will
also ultimately be a 64-bit variant. Is there any reason to not just
start off with 64-bit sized pointers and initially restrict usage to a
32-bit sized range?

In the post-MVP page it mentions 128-bit vectors; why not allow
arbitrary (fixed-size) vectors and then have these lowered in an
appropriate way on the target?

Thanks,
Stephen

Hi,

I've read the documentation (looks very interesting!) and have some
questions; apologies if these are answered elsewhere.

That's implementation dependent. Initially, a polyfill to JavaScript because
that's what exists today. We'll also implement a reference interpreter to
validate the spec. Each browser can do what it wants to have fast and secure
native support.

Is there likely to be a desire at some point to produce an LLVM
frontend (i.e. to convert from WebAssembly to LLVM IR), for example
for compiling WebAssembly to native binaries? Furthermore, does it
seem likely that any of the browsers would use LLVM on the client side
for optimisation or JIT execution? Based on the comparison against
LLVM IR, I assume the model is to use WebAssembly as a portable means
of communicating programs but LLVM IR (or some other IR) for
performing optimisations and other transformations (reminds me of this
discussion: http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-October/043719.html
).

Yes, it is likely someone will have this desire at some point. Such an
effort would be an entirely separate project from this WebAssembly
backend project though. To put things in perspective though,
WebAssembly is lower-level than LLVM IR. Round-tripping from LLVM IR
down to WebAssembly and back to LLVM IR would be lossy.

I'm also wondering about the C ABI for WebAssembly. It sounds like
this currently isn't specified because dynamic linking isn't supported
yet, but presumably front-ends should be working on some kind of
placeholder standard to avoid later incompatibility? And won't the Web
APIs need to have some sort of ABI?

In the MVP, WebAssembly won't be able to talk to the Web APIs
directly, and applications will include a layer of JS which the
WebAssembly code can call into to do the work for it, similar to how
asm.js works today. In the future, the plan is for that layer to
become unnecessary. ABIs will be very important -- so important, that
we want to avoid getting stuck too early :-).

I noticed the C and C++ page says pointers are 32-bits, but there will
also ultimately be a 64-bit variant. Is there any reason to not just
start off with 64-bit sized pointers and initially restrict usage to a
32-bit sized range?

Requiring 64-bit pointers when only 32 bits of them are used would
waste a lot of memory, which is not a cheap commodity in some of the
environments WebAssembly is targeting.

In the post-MVP page it mentions 128-bit vectors; why not allow
arbitrary (fixed-size) vectors and then have these lowered in an
appropriate way on the target?

128 bits is a good place to start, and it leaves room for adding more
things in the future.

Dan

This looks like déjà vu with the current discussion on SPIR-V in another
thread. :slight_smile:

At some point any virtual IR (pick yours: SPIR, SPIR-V, PTX, AIR,
HSA...) could benefit from having some generic support for this in
LLVM...