[clang] Out-of-process execution for clang-repl

Description of the project: The Clang compiler is part of the LLVM compiler infrastructure and supports various languages such as C, C++, ObjC and ObjC++. The design of LLVM and Clang enables them to be used as libraries, and has led to the creation of an entire compiler-assisted ecosystem of tools. The relatively friendly codebase of Clang and advancements in the JIT infrastructure in LLVM further enable research into different methods for processing C++ by blurring the boundary between compile time and runtime. Challenges include incremental compilation and fitting compile/link time optimizations into a more dynamic environment.

Incremental compilation pipelines process code chunk-by-chunk by building an ever-growing translation unit. Code is then lowered into the LLVM IR and subsequently run by the LLVM JIT. Such a pipeline allows creation of efficient interpreters. The interpreter enables interactive exploration and makes the C++ language more user friendly. Clang-Repl is one example.

Clang-Repl uses the Orcv2 JIT infrastructure within the same process. That design is efficient and easy to implement however it suffers from two significant drawbacks. First, it cannot be used in devices which do not have sufficient resources to host the entire infrastructure, such as the arduino due (see this talk for more details). Second, crashes in user codes mean that the entire process crashes, hindering overall reliability and ease of use.

This project aims to move Clang-Repl to an out-of-process execution model in order to address both of these issues.

Expected result: Implement an out-of-process execution of statements with Clang-Repl; Demonstrate that Clang-Repl can support some of the ez-clang use-cases; Research into approaches to restart/continue the session upon crash; As a stretch goal design a versatile reliability approach for crash recovery;

Project size:Either medium or large.

Difficulty: Medium

Confirmed Mentor: Vassil Vassilev, Stefan Gränitz, Lang Hames

1 Like

The Swift people learned from their experience with libclang.

There is now for code completion a source kit daemon, i.e. out-of-process.

@tschuett, thanks for the comment. I think I understand what you mean but could you probably elaborate for the students that will be skimming through this thread trying to make sense of it?

Sorry for being terse. I wanted to hint at that there is prior art in moving Clang/Swift execution into a separate process. People wait for Clang/Swift to crash.

The hard question is the zero-cost restoring of the state. For example, if we use some checkpointing approach we could relatively reliably restore the state of the session before crashing, however, we would penalize the people who write correct code. There have been some outside-of-the-kernel approaches to restore the session in cheaper ways but that’s pretty much a research goal for this project. That is, figure out what’s out there and if/how we could use it and at what cost in terms of performance but also maintenance, etc…

In an ideal world, could we somehow have a scratch area and a stable area where we run the new portion of the code in the scratch area and move it into the stable area after it was run (possibly fixing instruction PLT, GOT and what not again). This way we could restore cheaply by throwing away the scratch area and start over…

This project feels in some ways like it was made for me. A couple years ago when covid broke out I decided to read SICP, which lead me down the rabbit hole of interpreters, and more recently (only after accepting that I’m an addict), into the world of compilers with the Chez Scheme Nanopass framework. At that time, I was writing a philosophy dissertation on the internal languages of topoi in Badiou’s Logics of Worlds, which was gradually being pushed into a direction far flung from my genuine interests by my supervisor, causing me to recently decide to return to working in software as my interest in interpreters & compilers started to outweigh my interests in the ontology of mathematics.

During this time, I’ve become particularly interested in the use of continuations and CPS to build big-step interpreter based REPLs that allow for stepping through code execution and editing the substitutions of various parts of the code on the fly using partial evaluation, following the works of Kenichi Asai, Kent Dybvig, and Nada Amin on this subject, and experimenting with various techniques that they demonstrate in various papers.

Before returning to academia, I worked as an interactive VFX designer for stage & events using C++ openFrameworks, something that I’ve continued for particularly big comissions throughout this time (most recently for a public art commission in downtown Singapore last summer). At the same time, I have no experience in C++ outside of openFrameworks & openGL, and I have no experience with LLVM, but I feel confident in my ability to immerse myself in it from now until the summer, and be able to offer solid contributions. I’m also trying to make a transition of career into open source tools of some sort, and have never worked “in the industry”; I’ve only ever worked as a private contractor, primarily as a one-man show. I contribute to GNU Guix and Guile Scheme, but never in any kind of professional capacity.

I believe I meet the intern requirements of GSoC, and I feel confident in my ability to add valuable contributions to this effort, and hopefully breach the stretch goal. I believe the direct relation of this project to my current interests will motivate some great work if you’re still accepting interns for this project.

Happy hacking,
Blake

1 Like

here is a talk from past students of Vassil Vassilev concerning their work on Clang-REPL if anyone is interested:

1 Like

Hi @cybersyn, thanks for reaching out and your interest!

That project might indeed help broaden your expertise with C++, compilers and low-level infrastructure. I’d propose to build clang-repl and start playing with it to get a feeling of what the work would be like. Perhaps firing a debugger and going deeper into the JIT can be helpful.

After these steps if you are still interested you could send me (in private) your CV. Then one needs to write a proposal gathering proactively feedback from the mentors. It needs to include major work items, deliverables and a timeline. We generally recommend having a patch or two in llvm fixing some “good first issue” from the llvm bug tracker. That usually increases the chances for successful application.

Hello @vvassilev, I have a experience with C++ and am also interested in this project idea but am less experience with compilers. Are there any good resources that will get me up to speed on the knowledge necessary? And around when do you recommend submitting first proposal draft?

Hi @Daniel-Yang,

I would recommend starting from the LLVM documentation and the Kaleidoscope tutorials. You can find other resources on these topics over the years in the presentations from past LLVM Dev meetings (videos should be available on youtube).

It is good to start drafting a proposal early. We recommend to build the project and start playing with it. That would give you a hint about the type of the work is in front of you. If you like it, then the proposal should be ready by the deadlines announced by GSoC. However, you should probably ask the mentors for feedback couple of weeks before the deadline.

@cybersyn and @Daniel-Yang if you are still interested in pursing these project I’d like to let you know that you should start working on your proposals and get some feedback by the mentors to increase your chances.

Ok will do! I’ve been busy preparing for final exams but I’ll send over my proposal draft to you as soon as I am done.

Hi all, sorry for having missed to reply in this thread so far. It would be great to review your proposals. Please PM me if you want to share yours!

A few notes from my side: This project does involve some serious C++, if you have no experience with it then it will be hard. If doesn’t necessarily require prior experience in working on compilers. It’s more about extending the RPC capabilities in LLVM ORC and wiring it up in JITed code for value printing. You will have to debug through (at least some) assembly code.

Last but not least: This is not about running the compiler in a separate process as SourceKit appears to do, but rather the JITed code itself.

Hi all, so far we have not seen any proposal draft. If you are still interested in working on this project we have couple of days left to write a strong proposal.

Hi @vvassilev @weliveindetail, since this was not chosen as a GSoC project this year, I would like to take this up. I am currently trying out clang-repl, after which I shall look into the code and get back here with followup questions.

Hi, have there been any updates on this project? I noticed the comment on Any open source projects available? - #5 by vvassilev that the project is still open but wanted to check to be absolutely sure before I potentially sink too much time researching an already completed project. I also searched through the Discourse logs but haven’t seen any mention of the project since that post.

I participated in GSoC this summer (Not with LLVM) and really enjoyed it, so I’m curious about continuing to work on open source outside of GSoC. If this isn’t possible, I totally understand!
cc: @vvassilev @weliveindetail @lhames
Thanks!

Hi @vvassilev @weliveindetail

I’m very much interested in working on this project. Is it still open?

PS: I have shared my CV and more details with both of you in Inbox, as you both were mentioned as mentors in the GSoC 2024 projects.

Thanks!