[RFC] Upstreaming ClangIR

bcardosolopes · January 29, 2024, 6:39am

Hey folks,

This RFC proposes upstreaming ClangIR: incorporating the llvm/clangir repo from LLVM’s incubator into the mainstream llvm/llvm-project.

Background

A little over a year ago, an RFC introducing ClangIR was published: a new higher-level IR for C/C++. It’s an MLIR based C/C++ dialect for Clang generated out of the Clang AST and can be lowered to other IRs – check it out for more background and motivation bits and see the FAQ below to understand what has changed since. The ClangIR page also contains general information, documentation and usage instructions.

A year of progress

Last October Evolution of ClangIR talk was presented at the LLVM Dev Meeting (the video should be available on LLVM’s youtube soon). It explores some aspects of the design and some of the past year’s achievements. Given the progress and community built around it I believe that CIR is no longer ‘experimental’ in concept and the working group (MLIR C/C++ frontend folks) now believes that the dialect and architecture are in the right direction.

The project also grew from two contributors from January 2023, to a total of nine by the end of 2023, with four currently active ones. I also expect that number to increase due to upstreaming (see next section). Some of the achievements include:

A coroutines-aware C++ lifetime checker based on ClangIR. Deployment for this just started at Meta, where a C++ lifetime bug (which caused major churn in production) was retroactively caught in our codebase.
Progressive lowering of ClangIR.
1. CIRGen: AST to CIR. Direct translation, avoiding early optimizations.
2. Passes: Lifetime checker, cleanup pass, lowering prepare, idiom recognizer and library call optimizer.
3. LoweringPrepare: Unwrap some abstractions and expand CIR into more basic CIR operations, prior to lowering to LLVM IR. To be more concrete, this is where things like static initializers get their cxa_aquire/cxa_release, trivial constructors get expanded to CIR’s memcpy/memmove, etc.
Lowering to LLVM IR: more than half of the SingleSource tests (1000+) pass correctness checks. This is the default ClangIR pipeline.
Lowering to MLIR in-tree dialects: still in toy shape. Some lowering to memref, arith, func, scf and cf.
Community: monthly meetings with other C/C++ MLIR Frontend efforts. This has been extremely valuable in getting feedback, direction, building momentum and figuring out how all of our projects fit together.

Why upstream now?

Why is this a good moment in time to include ClangIR into llvm-project?

The project is currently getting contributions from some new interested parties (e.g. see recent OpenACC RFC), and it’s more convenient to everyone involved for ClangIR collaboration to happen directly upstream, including examples like: OpenACC bits shared between Clang and Flang, and SYCL/MLIR effort already using a fork from intel/llvm. It also happens that it’s more appealing for some of the entities involved to directly contribute to an upstream llvm-project instead of a project under the incubator - incorporation into their process become an easier step.

ClangIR is young enough to be actively redesigned. Its evolution so far has been driven by lifetime checker and LLVM lowering, but there’s more to cover on C/C++ language extensions (GPUs, HPC, …), static analysis, debug info, sanitizers, etc. The project is also mature enough not to cause major breakages/churn to the rest of LLVM, and is of sufficient quality that one expects from the LLVM infrastructure.

Stakeholders

The conversation of upstreaming ClangIR has already started among interested parties, here’s a list of community members that would rather see ClangIR upstreamed sooner than later:

OpenACC / OpenMP at NVIDIA. OpenACC’s upstreaming RFC states the interest in a lowering story via ClangIR. They are already sending PRs to ClangIR and have expressed the intent to see this upstream – Erich Keane, David Olsen.
SYCL. Intel and Codeplay are looking into downstream strategies in order to adopt ClangIR for their SYCL-MLIR project presented at EuroLLVM 2023 (talk, poster). We provided them a merged branch with another fork to start an experiment, but that isn’t ideal long term – Intel and Codeplay: Lukas Sommer, Julian Oppermann, Victor Lomüller, Victor Perez, Ettore Tiotto, Whitney Tsang.
HLSL. Interested in exploring the potential benefits of lowering HLSL to Clang IR to preserve structured control flow for the SPIR-V backend in the future, which may also allow us to share more common graphics legalization passes with DXIL. HLSL has complex legalization requirements that are onerous to implement solely on ASTs. Today DXC relies on legalization at the LLVM IR layer, which can result in poor quality diagnostics issued late. ClangIR has the potential to significantly improve the accuracy and quality of this class of diagnostic. Having ClangIR included in llvm-project would make testing this alternate path easier without needing to manage multiple merge branches for frequently updating projects – Google and Microsoft: Diego Novillo, Steven Perron, Natalie Chouinard, Nathan Gauër, Cassandra Beckley, David Neto, Chris Bieneman, Justin Bogner.
Polygeist. There’s common agreement with the project owners over community meetings that Polygeist would benefit from lowering directly out of ClangIR instead of AST. Having ClangIR in tree will let them start the process – Google: Alex Zinenko.
VAST. Trail of Bits expresses their interest in using ClangIR to lower VAST IRs down to LLVM IR. As VAST targets high-level program analysis for C/C++, it would benefit everyone not to split the community into multiple representations and allow interchangeable formats. ClangIR, being the representation that brings this to the table, can serve as a unifying standard, similar to how LLVM IR did for various tools in the past – Henrich Lauko, Lukáš Korenčik and Peter Goodman.
At NextSilicon, we recognize the significant value of integrating clang-mlir into our workflow. As an accelerated compute company, we primarily use MLIR as the driving force behind our chip optimization efforts. The adoption of ClangIR would introduce an additional optimization layer for our hardware, leveraging high-level abstractions like for loops and multi-dimensional arrays to enhance performance – Or Birenzwige, Johannes de Fine Licht, Christian Ulmann, and Tobias Gysi.

If you are reading this, and I missed your project (or your support), please chime-in!

Implementation Strategy

ClangIR’s development follows some guiding principles:

Follow the proven CodeGen skeleton: re-use the direct AST-to-LLVM codegen skeleton as much as possible as it has been proven to be a correct and safe baseline and is a convenient entry point for newcomers.
Produce hard errors on unimplemented language features: this prevents silently failing to properly generate some IR for unimplemented features which might lead to tricky late implementation. In the near future the plan is to make this more graceful using some form of diagnostics.
Generate the same LLVM IR at baseline: lowered LLVM IR out of ClangIR should be as close as possible from what Clang currently generates. This helps eliminate canonicalization issues and phase ordering when investigating codegen quality. If it makes sense, this guiding principle could change when ClangIR is mature.
Avoid early optimization and premature lowering within the AST-to-ClangIR transformation: traditional Clang codegen does a lot of this eagerly (e.g. replacing ctor calls with memcpy). ClangIR’s raison d’être is to prevent going “too low, too early”.
ClangIR’s source code is mostly isolated and non-intrusive: no dependency on custom AST changes or ported patches.

Source code

Most of the new code is in clang/lib/CIR, clang/include/clang/{CIR,CIRFrontendAction} and clang/test/CIR. Additional changes in the codebase include:

A ClangIR-based clang-tidy infrastructure in clang-tools-extra/clang-tidy/cir (used to invoke the lifetime checker from clang-tidy)
Driver changes in tablegen files to add new flags to activate ClangIR-specific behavior.

Compiler Flags

From clangir.org:

By passing -fclangir-enable to the clang driver, the compilation pipeline is modified and CIR gets emitted from Clang AST and then lowered to LLVM IR, backend, etc … To get CIR printed out of a compiler invocation the flag -emit-cir can be used to tell the compiler to stop right after CIR is produced.

ClangIR codegen (CIRGen) and passes are hidden behind flags:

-fclangir-enable forces CIR to be enabled in the pipeline and used transparently (e.g. if one asks the compiler to output assembly then that’s the end result).
Miscellaneous -fclangir-* flags change CIRGen and pipeline behavior (adding passes, disabling verifiers, etc).
The -emit-cir flag which isthe moral equivalent of -emit-llvm for CIR.

Prefixing clangir in flag names has been our way to mark behavior as experimental, though alternatively these flags could be changed and prefixed with experimental - as done by similarly experimental past projects, e.g. the new pass manager.

Builds

Building ClangIR is optional and can be accomplished by setting the proper CMake flag: CLANG_ENABLE_CIR. It works very similar to existing flags like CLANG_ENABLE_ARCMT or CLANG_ENABLE_STATIC_ANALYZER.

Note that CIR test execution is also tied to overall CMake enablement, e.g. ninja check-clang-cir only works if the proper CMake setup is done.

Git strategy & Timeline

This is probably a more engaging discussion and I’d prefer to first focus on getting approval on the proposal before tackling this (maybe even on its own RFC). So unless this becomes somehow critical to the decision, perhaps best to wait for a follow up?

FAQ

Is there an easy way to play around with ClangIR?

Yes, compiler explorer to the rescue! See an example here: Compiler Explorer. Note that it’s still missing proper setup with a more updated C++ standard library version in order to play with coroutines and other more modern features.

To what extent has the current design of ClangIR changed since the initial RFC?

The initial design has changed on top of community feedback since then. The top three changes in ClangIR are:

Operations are able to hold references back to the Clang AST (inspired by Swift’s SIL).
We have opted for a more cautious approach, staying closer to LLVM unless there are compelling reasons to raise operations early. This choice will help us reach the finish line faster, as opposed to pursuing a clean room design with higher semantics and representation.
Focused on direct LLVMIR dialect lowering rather than standard MLIR (in tree) dialects. The in-tree dialects are still unstable and for a project like clang a non-fixed target IR wasn’t the target we chose to go with. Work on the standard MLIR path is encouraged but hasn’t been the focus of the more active contributors.

How about the Kleckner criteria (build time footprint)?

Reid Kleckner (@rnk) raised some good questions regarding ClangIR’s compile time footprint. For the “C/C++ → CIR → LLVM” path, we have only been able to gather compile time numbers for the part of the SingleSource tests we’re able to build from the LLVM testsuite - results are noisy though. Unfortunately, it’s not a reliable performance comparison as many of these tests are too small.

For the “C/C++ → CIR → C++ lifetime analysis” path there’s currently no good proxy to compare against, especially given CIR codegen is only done for source files being analyzed (no CIRGen for definitions from headers, only declarations are emitted).

The honest answer is that we don’t have reliable numbers to show just yet. Though it’s also worth mentioning that there are possible compile time benefits unique to MLIR around function pass level parallelism.

How much longer does Clang’s build and testing get?

Time to build: The ClangIR specific code added were in the noise compared to a build that also built both Clang and MLIR. However, the cost of building MLIR is pretty significant. The average build time measured to add MLIR to the LLVM_ENABLE_PROJECTS list was ~45% overhead compared to just building Clang. (conf: 2x AMD, 166 cores, 224GB)

Time to run tests (assuming nothing else to build): ninja check-clang-cir reports in ~2s for release builds and ~6s for debug builds (~225 tests. conf: Apple M1 Max laptop, 64GB).

What’s the progress on static analysis?

The lifetime checker is the only current piece in that direction, and it does very simple analysis - it’s capable of catching low hanging fruits from modern C++ mainly because the higher level operations and the AST back references are really useful in the compiler understanding C++. Over the past year we (subset of MLIR C/C++ frontend folks) had many discussions and guidance from some of the experts in the community (such as Gabor Horvath, Dmytro Hrybenko and Artem Dergachev), and some open project ideas we’d like to see in the future include: teach dataflow analysis framework to use ClangIR and implement some of Clang’s CFG-based analysis (e.g., AnalysisBasedWarnings) with CIR passes (this would also be great for compile time evaluation).

Assuming there’s a large amount of code duplication between ClangIR (CIRGen) generation and LLVM tradition IR generation in CodeGen (IRGen), what are the expectations for maintainers (for example, if someone fixes a bug in IRGen, should they also fix it in CIRGen?)

No. CIRGen follows the general skeleton of IRGen… However, there are no plans to merge both code generators. One area of improvement is about the sharing of AST queries done by both - there are duplicated helpers that gather information from types and other AST properties, and those should be shared. We currently track a bunch of these and plan to send a specific RFC in the future to discuss proper mechanisms to address them.

On the expectations for maintainers: none. If the developers of IRGen want to be helpful they can communicate the new gap, but nothing is required. We’ve been operating as a few people playing catchup for years now, we’re fine with that until the community decides it’s worth their time to keep up.

Acknowledgements

Thanks to everyone who contributed PRs, created issues and participated in the C/C++ MLIR frontend meetings. Special thanks to folks who contributed to the project in the past year: Nathan Lanza (@lanza), Vinicius Couto Espindola (@sitio-couto), Hongtao Hu (@htyu), David Olsen, Yury Gribov, Oleg Kamenkov, Henrich Lauko (@xheno), Jeremy Kun (@j2kun), Keyi Zhang, Sirui Mu (@Lancern), Roman Rusyaev (@rusyaev-roman), Zhou (@redbopo), Ivan Murashko (@ivanmurashko), Nikolas Klauser (@philnik) and Fabian Mara Cordero (@fabianmc).

RFC accepted in this message.

ChuanqiXu · January 29, 2024, 7:26am

Hi Bruno, the progress is really impressive!

A major concern is about the divergence between CIRGen and the existing CodeGen, (although you mentioned it twice, I am not sure if I am just lost in reading). IIUC, after the proposal, there will be two paths from original C++ sources to LLVM IR:

AST-> CodeGen ->LLVM IR
AST-> ClangIR → LLVM IR

(Do I understand correct?)

Although you said you improved/mitigated it by reusing the general skeleton of IRGen, it is still unclear to me. Could you try to describe it in more details or just throw the link to sources for us to get a feeling?

Another question is about the serialization, how about the serialization in CIR? Since I am maintaining modules, I am curious to see how can they work together. Also I am curious if we can implement a new modules format on the top of clang IR.

erichkeane · January 29, 2024, 2:47pm

I’ve reviewed this offline, but as a part of doing so spent quite a while investigating the implementation as it sits today.

IMO, it has ‘proven’ everything it can while living in a separate repo. I consider the Concept proven, there is plenty of engagement (which is something that will only improve once this is upstreamed), and plenty of interest.

Thanks for all you/your team’s work Bruno, I look forward to this!

That said, @ChuanqiXu brings up some excellent points regarding Modules, it would be nice to know how that looks in a CIR world.

sommerlukas · January 29, 2024, 4:25pm

We also reviewed this offline and believe that this is the right time for upstreaming ClangIR.

Having ClangIR upstream will facilitate the integration and experimentation with other ongoing efforts, both upstream and downstream, further fostering collaboration. This would also help with the general visibility of the project and help it gain momentum

We also strongly believe that connecting Clang to MLIR opens up interesting avenues, not only for SYCL and other C+±embedded programming models, but also for C++ itself (e.g., C++26 BLAS).

Thank you @bcardosolopes and @lanza for driving this and everyone contributing to the project, we are excited to see this happening!

Victor Lomüller, Victor Perez, Julian Oppermann and Lukas Sommer

tstellar · January 29, 2024, 4:52pm

@bcardosolopes I’m sure this is explained in detail in some of the documents you listed, but would you be able to give a few sentence summary about how clang IR will make clang better? And not just how it will make clang better today, but also what kind of future enhancements will be possible with Clang IR.

etiotto · January 29, 2024, 4:54pm

We also support upstreaming ClangIR to the LLVM repository. Providing a way to generate MLIR directly in clang will speedup development of ClangIR and allow people interested in creating MLIR based optimizations for C-like languages (SYCL, OpenMP, C/C++, etc…) to use a robust approach supported by the LLVM/clang community.

Ettore Tiotto, Whitney Tsang and James Brodman.

efriedma-quic · January 29, 2024, 5:41pm

Are you proposing that CLANG_ENABLE_CIR is on or off by default? The proposal doesn’t quite make that clear. (See also LLVM Community Support Policy — LLVM 19.0.0git documentation .)

lanza · January 29, 2024, 6:27pm

Are you proposing that CLANG_ENABLE_CIR is on or off by default? The proposal doesn’t quite make that clear. (See also LLVM Community Support Policy — LLVM 19.0.0git documentation .)

Sorry about the lack of clarity, we propose that it’s off by default. We aim to have zero impact to the LLVM community if they choose not to participate in the project.

bcardosolopes · January 29, 2024, 6:31pm

Thanks for the questions @ChuanqiXu, happy to clarify!

A major concern is about the divergence between CIRGen and the existing CodeGen,

From our perspective, it’s not really a major concern, it’s just a statement of the approach.

(although you mentioned it twice, I am not sure if I am just lost in reading). IIUC, after the proposal, there will be two paths from original C++ sources to LLVM IR:

AST-> CodeGen ->LLVM IR
AST-> ClangIR → LLVM IR
(Do I understand correct?)

Yes, there will be two different paths from the AST to LLVMIR for quite some time. As mentioned in the RFC, there are no expectations for any maintenance or support from the community until ClangIR is proven to be worth it. That’s not a component of this proposal.

Although you said you improved/mitigated it by reusing the general skeleton of IRGen, it is still unclear to me.

CIRGen is not “competing” with traditional CodeGen, keeping the skeleton around serves more as a guideline for comparing the approaches while also making it easier to have a starting point when introducing a new feature in CIRGen. Our goal is not to improve or mitigate anything related to CodeGen. In the FAQ section we’re mostly pointing out what the expectations are for CodeGen developers.

Could you try to describe it in more details or just throw the link to sources for us to get a feeling?

Sure.

For instance, take CIRGenFunction::buildCXXConstructExpr in clangir/clang/lib/CIR/CodeGen/CIRGenExprCXX.cpp at 4e069c6269dd51606a58773b0fb90089c90cc645 · llvm/clangir · GitHub

The equivalent one in CodeGen is CodeGenFunction::EmitCXXConstructExpr: clangir/clang/lib/CodeGen/CGExprCXX.cpp at 4e069c6269dd51606a58773b0fb90089c90cc645 · llvm/clangir · GitHub

Note how even though the implementation differs, the skeleton is similar enough that, even if comments aren’t enough, it allows us to go back to IRGen and reason on the differences.

Another question is about the serialization, how about the serialization in CIR?

Great question. At some point we’d like to be able to serialize the AST to improve round trip testing with things that require the AST. So far we haven’t done any work in this direction though. My rough plan here would be to serialize the whole TU in a PCH-like approach, reusing the existing clang infra to do this job. I currently don’t think we’d need any extra work (besides plumbing the pieces and perhaps fixing bugs) to make it happen.

Since I am maintaining modules, I am curious to see how can they work together.

CIR is lower level than the AST and although it keeps the references around, it’s probably not a good fit for retaining the level of information needed for Modules.

Also I am curious if we can implement a new modules format on the top of clang IR.

It’s possible that CIR would be useful for doing some of the reachability/visibility analysis, but since a lot of the Modules logic is needed at Sema time (e.g. merging definitions), we’d need to have CIR being created during Sema, and I’m not really sure how that would play out. People also have asked in the past if CIR could be used for template instantiation (given MLIR handy tools for playing with types), and the answer is similar to this one: it’s possible, but so far we’re lower level and we know there are challenges if we’d start at Sema instead.

clattner · January 29, 2024, 6:37pm

I’m not actively involved in clang development these days, but I’m very excited to see this progress! Congrats to everyone driving it forward!

jankorous · January 29, 2024, 6:51pm

Hi Bruno,

This is really exciting and congratulations on the awesome progress you guys have made!
I am excited about the possibilities ClangIR will unlock for static analysis in clang!

It is probably unsurprising that at Apple many of our workflows are sensitive to clang build time and compile time.

Is there any impact on clang build time with CLANG_ENABLE_CIR off?
We’d love to see more data on the latter.

blangmuir · January 29, 2024, 6:55pm

Just to clarify: MLIR is only required to build clang when CLANG_ENABLE_CIR is enabled, and for now it will be off by default, correct?

Is there anything that can be done to mitigate the build time regression in the future, e.g. building only a subset of MLIR? 45% seems fairly substantial. I’d also be curious how this would impact clang CI builders if they’re not building MLIR already.

lanza · January 29, 2024, 6:59pm

@bcardosolopes I’m sure this is explained in detail in some of the documents you listed, but would you be able to give a few sentence summary about how clang IR will make clang better? And not just how it will make clang better today, but also what kind of future enhancements will be possible with Clang IR.

Sure! A few quick examples of things that are enabled by ClangIR and it’s usage of MLIR:

It’s an IR designed to work with C++ and C – a lot of features have been added to LLVM to embed the semantics of C++ into LLVMIR where it no longer exists – e.g. TBAA & devirtualization. ClangIR’s purpose is to reliably replicate the semantics in the IR as first class operations.
Given that ClangIR is higher level, we also will support things like idiom recognition at a higher level, this can be useful for both optimization and static analysis. Examples:
- LLVM will notice if you are iterating through a loop and storing 0 to an array and replace it with memcpy. With ClangIR, we are working on similar features (e.g. replacing std::find with memchr, or transforming some copies in moves, etc). LLVM might do libc-level aware replacements, ClangIR could do libc+±level aware replacements.
- Lifetime checking on modern C++ features like coroutines: with structured control flow, first class scope and await logic, it becomes easier to reason on higher level constructs.
MLIR is multi-threaded at the function pass level. This introduces new opportunities that would be unreasonable with single threaded pipelines. E.g. a FullLTO-like scenario in ClangIR would be more tractible from a compile time perspective given that you can parallelize.
Better support for programming models such as OpenMP/OpenACC/SYCL. We’ve been working with parties from numerous companies on this front and they all intend to support these programming models in MLIR and need Clang to be a part of that pipeline.
The Clang CFG supports a level of static analysis higher than you can implement in the AST or LLVMIR. Unfortunately, it’s a bit neglected as it is not on the critical path for CodeGen and many features never were implemented. ClangIR necessitates full feature support while maintaining the same set of semantics.

bcardosolopes · January 29, 2024, 7:21pm

Thanks for the questions @jankorous and @blangmuir. Since you asked similar ones, let me address them together.

It is probably unsurprising that at Apple many of our workflows are sensitive to clang build time and compile time. Is there any impact on clang build time with CLANG_ENABLE_CIR off?

Nope. It only (a) requires MLIR and (b) builds CIR related sources if CLANG_ENABLE_CIR is ON. The minor exception here are a few flags in tablegen and support in the driver, but they have negligible effects on build time.

Is there anything that can be done to mitigate the build time regression in the future, e.g. building only a subset of MLIR? 45% seems fairly substantial. I’d also be curious how this would impact clang CI builders if they’re not building MLIR already.

Good point, 45% is indeed substantial. Theoretically, we could trim some features from MLIR that we don’t use, but that’s unexplored as of now. I’m also curious but we haven’t investigated further - it’s worth exploring soon and definitely a prerequisite if/when we get to build it by default.

ributzka · January 29, 2024, 7:21pm

Hi @bcardosolopes, do you also have numbers about the binary size of clang? What is the cost of adding MLIR?

lanza · January 29, 2024, 7:25pm

Just a quick test of my local llvm-project and clangir release builds yields 139MiB for clangir and 95MiB for upstream with -DCMAKE_BUILD_TYPE=Release and llvm-strip --strip-all

ributzka · January 29, 2024, 7:35pm

That is a lot more than I would have expected. There is definitely some optimization potential, which might tie in with the build time optimization.

lanza · January 29, 2024, 7:52pm

That is a lot more than I would have expected. There is definitely some optimization potential, which might tie in with the build time optimization.

Yup, definitely. This is a starting unoptimized scenario. We know we presently link in MLIR code that isn’t actually used. So a good chunk of that is dead. But it will also certainly still grow in other ways as we flesh out more of the pipeline.

ChuanqiXu · January 30, 2024, 2:37am

Hi Bruno,

Yes, there will be two different paths from the AST to LLVMIR for quite some time. As mentioned in the RFC, there are no expectations for any maintenance or support from the community until ClangIR is proven to be worth it. That’s not a component of this proposal.

Then, the question may be, what if the plan when the ClangIR is proven? Just out of curiosity, not blocking questions.

For instance, take CIRGenFunction::buildCXXConstructExpr in clangir/clang/lib/CIR/CodeGen/CIRGenExprCXX.cpp at 4e069c6269dd51606a58773b0fb90089c90cc645 · llvm/clangir · GitHub

The equivalent one in CodeGen is CodeGenFunction::EmitCXXConstructExpr: clangir/clang/lib/CodeGen/CGExprCXX.cpp at 4e069c6269dd51606a58773b0fb90089c90cc645 · llvm/clangir · GitHub

Yeah, they are really similar. But the similar codes are the enemy of SE in my mind.

Great question. At some point we’d like to be able to serialize the AST to improve round trip testing with things that require the AST. So far we haven’t done any work in this direction though. My rough plan here would be to serialize the whole TU in a PCH-like approach, reusing the existing clang infra to do this job. I currently don’t think we’d need any extra work (besides plumbing the pieces and perhaps fixing bugs) to make it happen.

I feel the current framework for serializing AST may not be good to be reused for ClangIR. (Or I don’t know how can that be.) In my mind, the serialization for ClangIR should be more like the serialization of LLVM IR. Or if there is a serialization framework for MLIR, can we reuse that?

CIR is lower level than the AST and although it keeps the references around, it’s probably not a good fit for retaining the level of information needed for Modules.

It’s possible that CIR would be useful for doing some of the reachability/visibility analysis, but since a lot of the Modules logic is needed at Sema time (e.g. merging definitions), we’d need to have CIR being created during Sema, and I’m not really sure how that would play out. People also have asked in the past if CIR could be used for template instantiation (given MLIR handy tools for playing with types), and the answer is similar to this one: it’s possible, but so far we’re lower level and we know there are challenges if we’d start at Sema instead.

Good insight. And maybe CIR needs some additional work to handle modules. Otherwise the analysis may be not efficient. I mean we probably don’t want the CIR to analyze the codes imported from other TUs. But this should be minor points.

ChuanqiXu · January 30, 2024, 3:30am

BTW, I like the idea of ClangIR due to some experience in coroutines in LLVM.

Some semantics of C++20 coroutines (like symmetric transfer, coroutine elision, exception handlings) are implemented in LLVM. This is not good. It breaks the design idea of LLVM to be a low level compiler component in some level. But with the introduction of ClangIR, maybe it is possible to move some of C++20 Coroutines specific implementations to ClangIR, and only leave the general coroutines semantics in LLVM Coroutines intrinsics.

Topic		Replies	Views
[RFC] An MLIR based Clang IR (CIR) Clang Frontend	76	24522	July 29, 2023
[ClangIR][GSoC 2025] ClangIR upstreaming GSoC gsoc	17	1111	March 25, 2025
[ClangIR][GSoC2025] Validate existing Clang CodeGen test coverage with ClangIR GSoC clang , clangir , gsoc2025	6	647	April 7, 2025
[clangir] What are the long-term challenges of evolving ClangIR in parallel with Clang’s existing CodeGen infrastructure? Beginners clangir	0	79	May 31, 2025
[ClangIR] Compile GPU kernels using ClangIR GSoC gsoc2024	26	2099	May 29, 2024