[clang-repl] Add WebAssembly Support in clang-repl

vvassilev · March 21, 2023, 7:54am

Description

The Clang compiler is part of the LLVM compiler infrastructure and supports various languages such as C, C++, ObjC and ObjC++. The design of LLVM and Clang enables them to be used as libraries, and has led to the creation of an entire compiler-assisted ecosystem of tools. The relatively friendly codebase of Clang and advancements in the JIT infrastructure in LLVM further enable research into different methods for processing C++ by blurring the boundary between compile time and runtime. Challenges include incremental compilation and fitting compile/link time optimizations into a more dynamic environment.

Incremental compilation pipelines process code chunk-by-chunk by building an ever-growing translation unit. Code is then lowered into the LLVM IR and subsequently run by the LLVM JIT. Such a pipeline allows creation of efficient interpreters. The interpreter enables interactive exploration and makes the C++ language more user friendly. The incremental compilation mode is used by the interactive C++ in Jupyter via the xeus kernel protocol. Newer versions of the protocol allow possible in-browser execution allowing further possibilities for clang-repl and Jupyter.

Our group puts efforts to incorporate and possibly redesign parts of Cling in Clang mainline through a new tool, clang-repl. The project aims to add WebAssembly support in clang-repl and adopt it in xeus-clang-repl to aid Jupyter-based C++.

Expected results

There are several foreseen tasks:

Investigate feasibility of generating WebAssembly in a similar way to the new interactive CUDA support.
Enable generating WebAssembly in clang-repl.
Adopt the feature in xeus-clang-repl.
Prepare a blog post about clang-repl and possibly Jupyter.
Present the work at the relevant meetings and conferences.

Desirable skills

Good C++; Understanding of Clang and the Clang API and the LLVM JIT in particular

Project type

Large

Mentors

@vvassilev, Alexander Penev

Hi, I’m a newbie to Clang and LLVM. Please forgive my ignorance, but I have a little question about this project. I followed the links and read the files in clang/include/clang/Interpreter and clang/lib/Interpreter, and to my surprise, there is not too much code (probably < 2000KLOC?). It seems clang-repl is not a very big complicated tool, and as far as I know, LLVM already has some support for emitting LLVM IR to WASM. So why is this project considered as Large? Where could the potential difficulty be?

vvassilev · March 23, 2023, 7:01am

@yhgu2000 good question! The short answer is that usually the devil is in the details.

The longer answer is that the clang/Interpreter has to rethink what certain things mean for incremental compilation. This is, how CodeGen responds to multiple requests for adding more LLVM IR (fwiw, CodeGen is designed for a single llvm::Module). The same holds for WASM – I would really hope things to be plug an play and all can be done in a week. Even we could get the patches that soon (which I doubt) we need to go via a usually slow review process. That being, if you are worried that a person can complete the project in half of the time and stay idle – that’s not going to happen as there are a lot of things in the area they will be doing

However, in more realistic scenario when almost everything work straight out of the box, I suspect having to touch multiple, non-trivial places of Clang and LLVM. In addition, I expect a lot of time to go into adoption/integration where we actually need to make this functional in a browser for example.

Sahil · March 30, 2023, 11:34am

I am understanding the project. the goal of the project is to generate WASM but I have a question that is there any support for IR to WASM in JIT? because as per my understanding, JIT Engine handles the compilation.

wu-s-john · March 31, 2023, 2:35am

@vvassilev

Wow, this is an amazing project.I would love to work on this project. It’s a serious game changer as you can run a C/C++/ObjC code on the REPL in the browser. Other implications include having hot reloading for C++ modules in the browser which can be useful for update serverless web workers easily.

I will try to write a proposal over the weekend for this project. I was just wondering what was the motivation of starting WebAssembly. Was there a certain use case that you guys wanted to compile code to WebAssembly? How does compiling to WebAssembly help with using Jupyter notebook besides having the possibility of running in-browser execution?

vvassilev · April 3, 2023, 2:33pm

The jupyter-lite is the major usecase for now.

Sahil · April 3, 2023, 3:54pm

hey @vvassilev,
For the time being, and because the code base is large, could you kindly tell me how JIT handles code generation for IR to other targets, and where code for code generation in JIT is located?

vvassilev · April 3, 2023, 8:40pm

The JIT lowering infrastructure is located in the llvm repository mostly under lib/ExecutionEngine/ Probably @lhames is a person to go to about these things. There is also an llvm discord jit channel where you can get more information.

wu-s-john · April 3, 2023, 10:06pm

Hi @vvassilev

I was looking into this problem more. Are there are key security issues to consider for this project since WASM is a much more limiting runtime than others? That would probably have to be researched more as this project begins. Also, one challenge that I could see in this project is that LLVM-JIT typically compiles inputs into native machine code and then executes that native machine code. We want to compile IR code to WASM though. I was thinking about having a third party binary like wasmtime to interpret and execute WASM code. Do you have any other considerations to execute WASM code?

jkshtj · April 4, 2023, 1:46pm

@Sahil LLVM vends its JIT APIs for end users using the ORCv2 APIs. This video by @lhames is a great watch to understand the details of the ORCv2 APIs and how a JIT is composed using the ORCv2 APIs.

That said, after watching the video you’ll see that a JIT built using ORCv2 is composed of layers, one of them being the compile layer which assists in performing any compilation tasks that the JIT need to do. You can start out from LLJIT (an in-tree JIT built using ORCv2) and dive into the internals of the compile layer from there.

lhames · April 4, 2023, 5:33pm

I don’t think we want to use the LLVM JIT for this. I’m not a WASM expert, but I think WASM fits best as an alternative execution engine to the LLVM JIT: When running in a browser clang-repl would target WASM and use the WASM runtime’s APIs to load the modules, and when running on bare metal clang-repl would target the host architecture and platform and use the LLVM JIT to load the modules.

For simple C++ programs (staying away from syscalls) it looks like clang-repl should be able to target wasm32-unknown-unknown for compilation, and use WASM’s JS API to load and run the modules.

daivikbhatia · April 11, 2023, 12:50pm

Hi, I really liked the idea and I would like to contribute to this project. Can you please guide me how can I get started?

Topic		Replies	Views
[clang-repl] Tutorial development with clang-repl GSoC clang , gsoc2022 , gsoc2023	6	2209	March 20, 2023
[clang-repl] Implement autocompletion in clang-repl GSoC clang , gsoc2022 , gsoc2023	13	1400	March 27, 2023
[clang] Out-of-process execution for clang-repl GSoC gsoc2023	16	1659	January 24, 2024
[RFC] Moving (parts of) the Cling REPL in Clang LLVM Dev List Archives	18	124	December 1, 2020
[RFC] Add a code owner for incremental compilation/incremental C++ Clang Frontend clang	4	612	November 15, 2022