ClangIR is a new, MLIR-based intermediate representation of C and C++ code. It has been developed in an LLVM incubator project, but work is now underway to migrate the code from the incubator to the main LLVM repository. As the code is moved, it must be updated to align with LLVM coding standards and quality expectations. The goal for this project is to participate in the ClangIR upstreaming process and help improve both the code and the upstreaming process.
This will be an opportunity to gain hands-on experience with MLIR development with a focus on day-to-day software engineering discipline. Participants will work side-by-side with other LLVM contributors to achieve a common goal.
Expected results
Migrate ClangIR support for C and C++ language features into the main LLVM repository
Improve the quality of code as it is being migrated
Suggest ways to improve the migration process
Requirements
Skills: Proficiency with modern C++ programming and familiarity with basic compiler design concepts are required. Prior experience with LLVM IR, MLIR, Clang or ClangIR programming is a big plus, but since the goal of this project is to gain such experience, it is not a prerequisite.
Project size: Medium to Large – This project is open-ended and can be as large as you like, but at least a month of commitment will be necessary to be successful in the goals of the project.
Hey @andykaylor! I’m really interested in this project and wanted to get your thoughts on defining a reasonable goal for a large project.
From the RFC and current work, I see that lowering to MLIR is still in early stages, and that’s where the main focus is right now. Given that, I’d love to get an idea of what might come next after this stage and what directions would make sense for a proposal. I understand that the scope is flexible and that this is more of a general project where I’ll likely contribute to different areas but are there specific areas (i.e. which feature migrations) that would be particularly valuable?
It’s hard to say where we’ll be at the start of summer. Right now, we’re working on basic C language features like accessing local variables(!). Once that’s working, we’ll add control flow and such. I expect we’ll be able to compile a simple program soon.
I’m not sure it’s feasible to anticipate a specific unimplemented feature to enable for a proposal that you’re writing now. I think it would be more reasonable to use more general terms if that’s acceptable. Basically, find a program that doesn’t compile at the start of the project because of some significant unimplemented feature and make it compile.
If I had to name something specific, I’d say C++ exception handling is probably something that won’t be done yet by summer and would be about the right size for a large project. I hope I’m not being too hasty in saying that this will be available – we may need to implement parts of it for other features – but I think it’s more likely than most other features to be left unfinished by summer.
The third bullet in the expected results section is also important, but difficult to anticipate in a concrete way. The process of bringing a feature over from the incubator involves several steps:
Identify the feature to be ported
Locate the parts of the incubator code that implement that feature
Understand the incubator code
Figure out how to isolate a portion of the code that is both testable and reviewable
Find the incubator tests which exercise the code being ported
Create a pull request in the upstream repo
Steps 1 and 6 are probably the way most of us imagine the work, but steps 2-5 are not trivial. Steps 3 and 4 are probably where most of the time is spent. I think things could be done to leverage tools like static analysis (step 4), code coverage (steps 2 and 5), and maybe AI (steps 3 and 6) to make this easier.
Contributing to the actual open source work is the part that’s most valuable to us, and honestly it’s what would be most valuable to you as a learning experience as well, but the “process” part of it is where you are most likely to be able to propose novel work.
Hi @Joejiong. There is definitely a practical limit to how many people can simultaneously work on the upstreaming effort, just based on interdependencies in the code, but I hope that by summer we’ll have more independent paths open. There’s also a limit to how much time I will be able to dedicate to mentoring, and I want to be sure to give each mentee a solid amount of attention. I’m hesitant to commit to mentoring two people, but it might be possible to find another mentor to help out.
As for non-GSoC contributions, help is always welcome and I think we have a good community of contributors who are happy to answer questions and give advice, but there would be more expectation that you’d be willing and able to work independently in that situation.
Hi @andykaylor, I am quite interested in this project and want to contribute! I have experience with C++ and would like to start by working on some first issues to get familiar with ClangIR.
@yimdx It’s a good idea to work on some of the issues labeled “good first issue” in the ClangIR incubator project as a way to become familiar with ClangIR.
I am an undergraduate in computer science and I am very interested in this project as well as compiler design. I have familiarity with C++ programming and have begun working on some good first issues.
hey @andykaylor my name is siya chopra and currently i am an undergrad pursuing computer science,with fair knowledge of c++ and compiler design would love to work on this project
Hey @andykaylor I am currently spending some time to understand MLIR and CLangIR for a competitive proposal. I was curious, you mentioned that this project is open-ended in terms of size. Is there a guideline/ threshold from what you expect out of a successful medium vs large size project?
Hey @andykaylor I’ve been collaborating with my lab students on building non-heuristic function inlining algorithms in LLVM for quite a while now, and have worked at MathWorks as SWE and infra using adv C++. With all of my experiences, I have deep knowledge of MLIR, LLVM IR (and internals) and compiler design (curtesy of many courses), advanced C++
Very excited to contribute to this project. I had a few clarifying questions and would be grareful if you can address:
For C++ exceptions, would the implementation require integrating with existing LLVM EH mechanisms like landingpad /resume or designing new MLIR abstractions?
Are there toolchains for auto-generating migration checklists (e.g., dependencies, tests, coding standards) to improve the process?
Is there a specific email ID or channel where I can send you/ team the draft for proposal before submitting it to GSoC?
Hi @andykaylor, I am a developer with experience in C++ and LLVM IR. I am interested in contributing to the Clang IR migration effort and would like to understand the best way to get involved. Any guidance would be appreciated. Thank you.
@spausum I’m reluctant to specify success criteria in terms of lines of code or number of commits, because that will vary significantly depending on the complexity of what you end up working on. I general, I think 1-2 commits per 40 hours of work is probably a reasonable goal (assuming ~300-500 lines of code per commit), so you can possibly scale that according to the commitment you’d be able to make. The rate of contribution will also likely be lower at the beginning of the project if you need time to ramp up on the code base.
@nipun0307 ClangIR uses a cir.try operation to model C++ exceptions, and that is lowered to the LLVM dialect operations corresponding to the LLVM IR EH mechanisms. You can see how that works in the incubator project here: https://godbolt.org/z/61oK486dT
I’m not aware of any tools being used for auto-generating migration checklists. Right now, most of the migration process is being done manually. Proposals for improving the process would be welcome, but I suggest that rather than proposing an improvement upfront it would be better to get involved in the upstreaming effort as a first stage of the project, then propose and implement whatever automation seems good based on the experiences from the first stage.
You can send me draft proposals at akaylor@nvidia.com. I’m already getting quite a lot of inquiries related to this project, so my response may be slow, but I’m trying to respond within a couple of days.
@arrawten A good way to get started is to build the ClangIR incubator project, use it to build the llvm-test-suite’s single source tests, and then see if there’s a test that isn’t compiling that you might be able to fix. There is information about how to do that here: Getting started · ClangIR. You could also look at the “good first issue” list in the incubator repository.
Fixing a couple of small issues in the incubator is a good way to get started because understanding the incubator code is fundamental to the upstreaming process, and a little hands-on experience will prove very helpful.