Hi Chuanqi,
Then, the question may be, what if the plan when the ClangIR is proven? Just out of curiosity, not blocking questions.
At that point one idea is to put a transition plan in place. This could imply changing policy to require changes to go to both pipelines until we could flip a switch. These things take a while to converge though (the new pass manager and GlobalISel are probably good references), so we won’t introduce additional burden in the meantime.
Yeah, they are really similar. But the similar codes are the enemy of SE in my mind.
I guess it depends on what you are trying to achieve. It has been helpful to track unimplemented features, and ease up the barrier to entry (from what I’ve seen by reviewing PRs). And above all, following CodeGen’s skeleton has guided us to handling corner cases extremely well crafted over the past decade. Since there’s still a lot to be done in the project I feel like we gain more by sticking to this. Note that this doesn’t preclude some parts to be factored out, use more modern C++ features, etc – we do that to the extent that makes sense.
I feel the current framework for serializing AST may not be good to be reused for ClangIR.
Let me clarify this one. In CIR operations can hold references to AST nodes. Right now, when we dump CIR to disk (serialization or printing) we lack a way to restore AST information when we load CIR from disk (deserialization or parsing). Instead of reinventing the wheel what I’m suggesting is that the AST is serialized in a separate file, just like a .pcm
or .pch
. The file containing CIR will contain descriptions on how to obtain these AST nodes back. The AST nodes are finally retrieved by using those descriptions and looking inside the “pch-like” file (mechanisms already available in clang).
(Or I don’t know how can that be.) In my mind, the serialization for ClangIR should be more like the serialization of LLVM IR. Or if there is a serialization framework for MLIR, can we reuse that?
Sure, MLIR does have a bytecode format that you can read about here: MLIR Bytecode Format - MLIR.
Good insight. And maybe CIR needs some additional work to handle modules. Otherwise the analysis may be not efficient. I mean we probably don’t want the CIR to analyze the codes imported from other TUs. But this should be minor points.
Yes, some exploratory work would be necessary.
BTW, I like the idea of ClangIR due to some experience in coroutines in LLVM.
Some semantics of C++20 coroutines (like symmetric transfer, coroutine elision, exception handlings) are implemented in LLVM. This is not good. It breaks the design idea of LLVM to be a low level compiler component in some level. But with the introduction of ClangIR, maybe it is possible to move some of C++20 Coroutines specific implementations to ClangIR, and only leave the general coroutines semantics in LLVM Coroutines intrinsics.
Right on point, the way coroutines work is one of the biggest motivations. Having a CIR representation could also allow us to experiment with different LLVM lowerings (e.g. returned continuation) or apply optimizations at the CIR level (e.g. eliminate some coawait logic when init ready never suspends)