[RFC] Building LLVM for WebAssembly

Overview

I would like to propose a patchset that allows LLVM to be built for WebAssembly/WASI (this includes WASIp1 and WASIp2). I am interested in this functionality for the YoWASP project, and it will be immediately useful for the Amaranth HDL as well as Spade HDL communities, with obvious other applications for projects such as Compiler Explorer. Through the use of wasm2c, it will also bring modern LLVM/Clang to older systems that no longer fit the minimum build requirements.

Previous efforts

Over the years there have been a few efforts to ship Clang built for Wasm. The most complete ones are:

Porting to WASI

WASI is a platform that presents a small subset of the POSIX API. Unlike many other platforms subsetting POSIX, I would hesitate to describe it as “*nix-like” because it misses some core *nix features:

  • No process management
  • No asynchronous signals
  • No filesystem permissions
  • In some configurations, no Berkeley sockets
  • No discretionary access control

In practical terms, WASI is closer to a bare metal RTOS with a POSIX-like API than to a *nix-like OS.

Goal of porting LLVM

The end result of the port I am working on will be a build of Clang and LLD that can be used to translate C++ code to WebAssembly in the browser or in a standalone Wasm engine such that the WebAssembly artifact can be executed on the same engine. This will enable applications such as C++-based simulation for the YoWASP toolchain VS Code extension.

I expect that other online development platforms like Arduino Cloud will pick up the port as well once the initial work is done, since shifting compilation to the client reduces infrastructure costs and addresses privacy concerns.

Porting strategy

I propose the following steps for porting LLVM to WASI:

  1. Conditionalize use of POSIX features missing in WASI. Some of these features are truly conditional, i.e. some WASI builds will include them and some will not, while other features are expected to be absent in all present day WASI builds. This wouldn’t add any defined(__wasm__) or defined(__wasi__) yet.
  2. Conditionalize remaining platform-specific code for WASI. These changes mostly relate to signals and subprocesses, with some minor filesystem-related ones. This would only add defined(__wasm__) or defined(__wasi__).
  3. (Optional) Fix build without -pthread. While WASI does support threads, these complicate deployment, and in many environments would not be desirable. Currently LLVM does not build if STL lacks std::mutex, etc, even with -DLLVM_ENABLE_THREADS=OFF, seemingly due to bitrot.
  4. (Optional) Add -fintegrated-lld to the compiler driver. WASI does not have subprocesses, and while the compiler driver will work fine invoked as clang -c since it supports integrated cc1plus, it is not able to link the binary. If not implemented, the embedder of the WASI build will have to either patch LLVM or to invoke the linker themselves, possibly by invoking clang -### and parsing its output.

Steps (1) and (2) that touch almost exclusively LLVMSupport are enough to build a practically useful toolchain. Steps (3) and (4) on top of that are enough to build a toolchain that is similar in terms of usability to any other Clang/LLD build when combined with a suitable Wasm engine such as wasmtime.

Prototype

The YoWASP/clang repository contains a harness based on LLVM 18.1.2 release that builds LLVM, Clang, and LLD. It can compilea and link binaries, both freestanding and those using the C library, although for some reason I can’t reuse the sysroot from wasi-sdk directly.

The main+wasm branch of the YoWASP/llvm-project repository contains the same patch set on top of main and split into commits for review.

Testing

Although a buildbot providing post-commit testing is an ideal solution, this may not be practical to run for me; I already maintain an extensive amount of infrastructure in public interest, and while I would (as it will become clear) contribute maintenance effort I’m not willing to pay for a buildbot runner. I expect that the two principal ways through which this port will stay functional will be:

  • YoWASP nightly builds that already cover other FPGA toolchain components like Yosys, nextpnr, and openFPGALoader, maintained by the YoWASP interest group (primarily myself);
  • wasi-sdk builds, maintained by the WebAssembly interest group.

Pull requests

Changelog

  • 2024-05-19T2135Z: added references to wasi-sdk discussion on shipping a self-hosted build and a section on testing
  • 2024-05-20T1716Z: LLVM/Clang/LLD build is verified to be fully functional

Questions?

24 Likes

CC @AaronBallman

Thank you for this RFC! I did have one question:

So the basic idea is that this functionality would land in-tree but the YoWASP/WebAssembly interest groups would be responsible for maintaining it rather than the LLVM community?

2 Likes

This is my proposal, yes. I do not want to burden the entire LLVM community with this somewhat unusual target; I believe that there is enough interest in a Wasm port of LLVM/Clang/LLD to keep them functional without requiring everyone else to learn the idiosyncrasies of WASI.

If this is a deal-breaker I can look into other options but from my perspective this solution provides the most benefit for all parties: (most) LLVM contributors don’t have to think about Wasm, and Wasm developers don’t have to maintain fragile patchsets.

2 Likes

Oh, I can’t edit the post after 24 hours? That’s unexpected. I can now!

In any case, I can confirm that the Clang/LLVM/LLD build produces a working toolchain and the toolchain produces working binaries that can in turn be run with Wasmtime. This includes the C standard library:

$ cat test.c
#include <stdio.h>

int main() {
    puts("Hello, world!");
}
$ # the commands to compile it are a little cursed due to sysroot nonsense
$ wasmtime run test.wasm
Hello, world!
5 Likes

Nope, not a deal-breaker at all! In fact, I think this strikes a really good balance between LLVM community resources and your needs. Thank you for verifying!

I’m in support of the RFC.

4 Likes

FYI I also had a shot at doing something similar (Add support for WASI builds by veluca93 · Pull Request #91051 · llvm/llvm-project · GitHub), although @whitequark’s proposal seems significantly more polished and I am happy to go with that one instead :slight_smile:

One note I wanted to mention is that to my understanding clangd seems to require thread support, and I’d expect the usecases for wasm-llvm to care about clangd (at least, mine does), so I’m unsure about the benefits of point 3 in the porting strategy…

2 Likes

I personally don’t care about clangd at all–I want strictly a batch compiler. This is because I want to support a workflow where languages like Verilog and Amaranth are compiled to C++ which is then executed to emulate the hardware design at a high clock rate. Building with threads requires support for SharedArrayBuffer, which requires adding CORS headers, so I try to avoid it; also some Wasm runtimes don’t have threads at all, or have bugs (e.g. Wasmtime seems to crash if LLD spawns a thread, in a way that can’t be easily debugged).

1 Like

Great work and thank you so much @whitequark for writing up this proposal, and properly “upstreaming” the changes needed to get a WebAssembly build of LLVM/Clang :clap: The patch set looks much cleaner than what I had tried in WASI support by turbolent · Pull Request #304 · WebAssembly/wasi-sdk · GitHub and what others have tried before :ok_hand:

I can only second that having a WebAssembly build of LLVM/Clang would be very useful, as it allows brining a modern compiler to older systems that are no longer supported, or where never supported in the first place. This can be achieved by compiling the resulting WebAssembly binary to portable C, using tools like wasm2c and w2c2 (which ships with an old WebAssembly build of clang as an example, see w2c2/examples/clang at main · turbolent/w2c2 · GitHub).

Regarding point 3: It would be great to have as much of LLVM/Clang be made available in a non-threading build, as both some WebAssembly runtimes and operating systems targetable by wasm2c/w2c2 might not have threading support. So it would be really nice to have a non-threaded clangd if possible, but it is not a deal-breaker. Having it would allow bringing modern IDE features to older systems :slight_smile:

2 Likes

Thanks for this RFC!

I would very much love a buildbot because otherwise it is impossible to guarantee it will work over time (And we will want to ensure all clang tests pass). But I understand that might be asking a lot

I also want to understand the scope of the effort beyond clang: IE, what about clang tools / clangd? libc++? Other runtimes/tools?

Otherwise, having look at the patch set, it looks small enough that it seems reasonable to upstream it (I also got a couple of folks telling me privately they hope they will be upstream, so i guess there is definitively a community interest for it)

I would very much love a buildbot because otherwise it is impossible to guarantee it will work over time (And we will want to ensure all clang tests pass). But I understand that might be asking a lot

I don’t think it is realistic to run clang tests given that subprocesses simply do not exist on WASI; there is no way to spawn another executable that is broadly supported. See also point (4) that solves a part of this problem by providing a single-call executable (the compiler driver with integrated cc1plus/cc1as and lld).

As for ensuring that it will work over time, I think the dual efforts in the YoWASP and the wasi-sdk repositories should be enough. These are already the people most interested in keeping the port alive–there’s not much point in having a buildbot if nobody pays attention to it, so I’m not sure if having a buildbot will materially change much. I do understand that it’s not quite according to the community norm, but I will also not pay for more infrastructure as an individual.

I think most ancillary LLVM/Clang tools should work. I’m undecided on whether to ship them–it’s a bunch of effort to verify that they all work, and I suspect there will be cases where some don’t, so I want to do it on an as-needed basis. libc++abi/libc++ is already shipped by the wasi-sdk project, so no change needed there.

Regarding clangd, interactive tooling isn’t my area of expertise (I only work with batch compilers because of the domain I currently operate in) so I’ll leave that to somebody who will actually use it.

This seems just fine to me. That just means you have to fix issues when they will be caught on your infrastructure, instead of having fast/automatic revert of changes like when they break an upstream buildbot (there is no FreeBSD or OpenBSD bot anymore either for example).

I’m completely okay with that; I spent two years maintaining the Yosys Wasm support in the same way and it wasn’t too much of a burden. I presume that it will be a simple matter of submitting a PR and in most cases merging it myself, which is something I’m ready to do for as long as I maintain my infrastructure.

In the end Yosys has added a Wasm build to the build matrix, and it’s possible that eventually the same will happen with LLVM (I think the fine folks at Bytecode Alliance might be able to provide one), but I’d really rather avoid gating landing the initial patchset on having a buildbot, since I can’t afford it at the moment.

1 Like

I don’t think it is realistic to run clang tests given that subprocesses simply do not exist on WASI; there is no way to spawn another executable that is broadly supported.

Does the test runner need to run inside wasi? Can’t we just run lit on Linux, and make the lit substitution for %clang invoke a wasm runtime, or something like that?

That won’t work at least for compiler driver tests, since there clang will attempt to spawn another subprocess. But yes, tests should be done in this way; it’s just a decent chunk of complexity someone would have to work on.

Thanks for your effort improving WebAssembly builds.
I took a glance at your patch (Porting Strategy step 1).
While I appreciate the effort to increase LLVM portability and benefit WebAssembly communities, I’m concerned about the lack of support for some really basic functionalities:

pwd.h sys/wait.h alarm gethostname getpid fchown raise setjmp socket umask.

Here’s a practical concern: Reviewing patches that impact these basic functionalities requires additional consideration for WebAssembly portability.
This adds complexity compared to reviewing patches using basic POSIX APIs, where broader compatibility across *NIX platforms can be more readily assumed.

On a separate note, there have been discussions about improving CMake invocation time (see: CMake compiler flag checks are really slow, ideas to speed them up for a recent one).
llvm/cmake/config-ix.cmake and llvm/include/llvm/Config/config.h.cmake contain symbol checks added over a decade ago.
Some checks should really go away.
Removing some checks for these basic functionalities might unintentionally cause build failures for WebAssembly builds.

An alternative approach might be for WebAssembly to explore providing stub implementations for these headers/symbols, focusing on functionality essential for your needs.

The lack of these APIs is a ground reality for the WASI platform: it very intentionally lacks discretionary access control, asynchronous signals, Unix process management, or Unix password database, and that’s not changing in foreseeable future (or in some cases, likely ever). Therefore this is something portable software must cope with.

For some of these APIs (notably setjmp and socket), a check cannot be removed, because you could conceivably want builds with or without those APIs enabled, targeting different embeddings and/or Wasm engines. For the rest of them, I can easily enough replace them with defined(__wasm__)—I do want feedback on the direction to take here.

Since I am not asking the LLVM community to gate landing a patch on said patch providing Wasm compatibility, how much of a problem is it? It’s OK if someone lands a patch that works on most *nix platforms but breaks Wasm builds; I will just fix it in a follow-up PR.

Where would these stubs live? There are some stubs in wasi-libc, but it would be difficult to add more stubs on short notice, as these are practically speaking tied to the wasi-sdk release cycle. That would impact the goal of this RFC, which is to make upstream LLVM buildable for Wasm/WASI.

In addition, this doesn’t even entirely solve the problem, because a patch adding the use of another unimplemented POSIX API would still require stub updates, which is basically the same effort, done by the same people, in the same kind of follow-up PR, as adding a #ifdef around it. At least assuming that the stubs live in the LLVM source tree (something that I’m not sure is upstreamable); if the stubs live in wasi-sdk then LLVM is just broken until the next wasi-sdk release and until someone implements them, which diminishes usefulness of the effort a lot.

5 Likes

@whitequark : For your step (4), I have some ongoing work to generalize the mechanics of clang -fintegrated-cc1 by allowing LLVM « tools » to call other tools in-process when they are linked together into the llvm-driver [1][2]. See branch « inprocess » here [3]. This currently works on Windows but Linux needs a bit more work before I can send a PR.

[1] âš™ D109977 LLVM Driver Multicall tool
[2] https://www.youtube.com/watch?v=bbOFgpQ_QWA
[3] GitHub - aganea/llvm-project at in-process

3 Likes

That’s exciting! Is it possible to precisely pick which tools go into the binary? E.g. I want strictly clang+(cc1+cc1plus+cc1as)+lld and nothing else, as I’m quite size-conscious due to PyPI total upload quota.