[RFC] An MLIR based Clang IR (CIR)

bcardosolopes · June 20, 2022, 10:04pm

Hello Clang and MLIR folks, this RFC proposes CIR, a new IR for Clang.

TL;DR — We have been working on an MLIR based IR for Clang, currently called CIR (ClangIR, C/C++ IR, name-it). It’s open source by inception and we’d love to upstream it sooner rather than later. Our current (and initial) goal is to provide a framework for improved diagnostics for modern C++, meaning better support for coroutines and checks for idiomatic uses of known C++ libraries. Design has grown out of implementing a lifetime analysis/checker pass for CIR, based on the C++ lifetime safety paper. C++ high level optimizations and lowering to LLVM IR are highly desirable but are a secondary goal for now - unless, of course, we get early traction and interested members in the community to help :).

Motivation

In general, Clang’s AST is not an appropriate representation for dataflow analysis and reasoning about control flow. On the other hand, LLVM IR is too low level — it exists at a point in which we have already lost vital language information (e.g. scope information, loop forms and type hierarchies are invisible at the LLVM level), forcing a pass writer to attempt reconstruction of the original semantics. This leads to inaccurate results and inefficient analysis - not to mention the Sisyphean maintenance work given how fast LLVM changes. Clang’s CFG is supposed to bridge this gap but isn’t ideal either: a parallel lowering path for dataflow diagnostics that (a) is discarded after analysis, (b) has lots of known problems (checkout Kristóf Uman’s great survey regarding “dataflowness”) and (c) has testing coverage for CFG pieces not quite up to LLVM’s standards.

We also have the prominent recent success stories of Swift’s SIL and Rust’s HIR and MIR. These two projects have leveraged high level IRs to improve their performance and safety. We believe CIR could provide the same improvements for C++.

Case study: C++ Coroutines

Coroutines are a complex C++ feature and exemplifies quite well the consequences of lacking a higher level IR between Clang’s AST and LLVM IR. The interesting code generation parts are done at the LLVM IR pass level where necessary correctness work occurs — this is non-ideal, but there’s no better abstraction layer for this work yet! For a quick example, proper handling of symmetric transfer requires the presence of lifetime intrinsics which might not be available in a given LLVM IR bitcode file. It’s also common to hit subtle bugs given that these coroutine passes in LLVM are intermixed with other transformations which do not necessarily understand coroutines intrinsics nor properly consider frame allocation.

On the coroutines code analysis and diagnostics axis, we’ve attempted to combine both clang-tidy’s AST matchers and Clang’s CFG to reason about control-flow and lifetime. However, we have ran into CFG liveness accuracy issues, interprocedural capabilities limitations and aliasing problems. This raised the question of whether improving the current tools is enough to cover the problems we care within our codebase. An example of C++ coroutines usage (using folly::coro) we’d like to diagnose:

folly::coro::Task<int> byRef(const std::string& s) {
  // do something with 's' ...
  co_return 0;
}

folly::coro::Task<void> sillycoro() {
  std::optional<folly::coro::Task<int>> task;

  {
    std::vector<std::string> v = {"foo", "bar", "baz"};
    task = byRef(v[0]);
  } // vector 'v' is destroyed here; references to it are dangling.

  // do something with 's' effectively runs here.
  folly::coro::blockingWait(std::move(task.value()));

  co_return;
}

Additionally, something like CIR has been mentioned multiple times in hall chats, round tables, etc. The community interest for such an IR is a big motivator for us, and perhaps we can build something altogether. Two recent examples of such discussions are the discourse threads on HLSL support and Polygeist incubation.

Goals

This situation described above prompted us to look into other solutions and revisit existing limitations on our clang based tooling. This led to two main goals:

Enable better diagnostics for correctness, security and performance.
- Security / Bugs: The Google Chrome team notes that around 70% of their high-severity security bugs are memory unsafety problems. Half of which are use-after-free bugs. Using std::optional to illustrate, CIR could introduce instructions for optional derefs (cir.std.optional.deref) and diagnose them as harmful if they are not dominated by the check on whether the object contains a value (cir.std.optional.has_value).
- Performance-driven diagnostics: expensive and potentially unintended C++ copies could be diagnosed using CIR by a compiler pass that consumes profile information and emits remarks over interesting copy ctor usage.
- Privacy: CIR could be used to check const-ness out of selected code paths, and to provide rich dataflow information on data access.
Pave the way to CIR high-level transformations for optimizations.
- Recognizing idiomatic C++ usage could allow tools to suggest more elaborate source code modifications, e.g. CIR based code modification tools could suggest rewriting a ranged-based for into a loop form to better fit existing vectorizers. This is already a step towards using CIR transformations for optimization purposes.

How do we get there:

Provide a way to express a contract between libraries and the compiler. Compiler passes operating on CIR can rewrite parts of code with more domain specific CIR operations and types, allowing it naturally recognize idioms and apply C++ aware code analysis.
Cross translation unit (CTU) analysis. We’ve had simple bugs in production that could have been avoided by CTU analysis capabilities. Even though there are AST based approaches, we believe CIR is also a more natural place to move forward with this type of technology. It’s feasible to imagine something like ThinLTO summaries for CIR, enabling the propagation of lifetime, initialization, etc for both improved diagnostics and performance.

We are currently putting most of our effort into C/C++ (mostly C++) given our codebase demands. We plan to work on ObjC in the future, and would be happy to collaborate with contributors on this.

Related work

Several of the ideas presented in goals section are not new and some have even already been implemented, including an MLIR based IR for clang and similar. Not only have these projects moved the needle with tooling and overall compiler quality for C++, but they were also important in showing what tooling, features and bug mitigation the C++ community cares about. Let’s go over some of them and explain why we still think we need CIR:

CIL: “a common MLIR dialect for Fortran/C/C++”, presented in LLVM Developer’s meeting in 2020 and open sourced in early 2021. This project seems promising, but it’s focused on optimizations (it’s not clear how much it does for diagnostics). To the best of our knowledge, CIL is designed around unstructured control flow, lacks a git history and upstreaming efforts, which are non-starters for us. Unfortunately when we tried to reach out and clarify more info, we didn’t hear back.
Polygeist: “is a new C and C++ frontend and compilation flow that connects the MLIR compiler infrastructure to cutting edge polyhedral optimization tool”. This project emits lower level dialects as well as their own custom dialect polygeist. ClangIR slots above this dialect in the lowering hierarchy and we believe these two projects could exist complementary of each other. Polygeist is in the the process to get incubated in the LLVM umbrella; perhaps CIR can be a bridge in between clang AST and polygeist goodness.
Clang Dataflow framework: “a new static analysis framework for implementing Clang bug-finding and refactoring tools (including ClangTidy checks) based on a dataflow algorithm”. We are interested in similar goals and checks (e.g. stuff along the lines of std::optional example), but not a complete fit since part of our goal is to use the same representation to apply transformations and codegen LLVM at some point.
Clang’s Cross Translation Unit (CTU) Analysis is AST based (can be used with PCHs) and is a perfect fit for the current usage of analysis tools in Clang. As mentioned before, the AST representation limits the analysis potential.

C++ is hard. We are not going to solve all problems in the first year. But we do strongly believe that this is the way of the future.

Design decisions

Why MLIR?

MLIR provides a solid and tested framework to build custom IR and run passes. Among several other capabilities, it’s also in already in tree and used by Flang to build FIR.

High level language semantics

High level C/C++ semantics are better represented with custom and specific operations. A lifetime checker for C++ — based on the lifetime safety paper (P1179) by Herb Sutter — is a great example of an analysis that can greatly benefit from these richer operations. The design choices for the current form of CIR are mostly designed around elements that make a lifetime checker easy to express in the compiler.

Two examples to illustrate how such operations help:

Scopes: the cir.scope defines a new MLIR region in CIR, which closely represents opening a new scope in C/C++. This means that:
- New local (scope) variables (allocated by cir.alloca) and are always found in the closest cir.scope region’s entry block. Lifetime of resources finishes at the end of the embracing cir.scope region.
- A points-to analysis on structured control-flow can rely on these properties to reason on a resource lifetime in a more natural way than a CFG with lifetime intrinsics. Further dialect lowering could unwrap cir.scopes if desirable (e.g. before emitting LLVMIR dialect).
- In the example below, note how x is declared under the equivalent cir.scope. Check our implemented lifetime checker pass around CIR for more information.

int *may_explode() {
  int *p = nullptr;
  {
    int x = 0;
    p = &x;
    *p = 42;
  }
  *p = 42; // oops...
  ...
}

func @may_explode() -> !cir.ptr<i32> {
  %p_addr = cir.alloca !cir.ptr<i32>, cir.ptr <!cir.ptr<i32>>, ["p", cinit]
  ...
  cir.scope {
    // int x = 0;
    %x_addr = cir.alloca i32, cir.ptr <i32>, ["x", cinit]
    ...
    // p = &x;
    cir.store %x_addr, %p_addr : !cir.ptr<i32>, cir.ptr <!cir.ptr<i32>>
    ...
    // *p = 42
    cir.store %forty_two, %x_addr : i32, cir.ptr <i32>
    %p = cir.load deref %p_addr : cir.ptr <!cir.ptr<i32>>, !cir.ptr<i32>
    ...
  } // 'x' lifetime ends, 'p' is bad.

  // *p = 42
  %forty_two = cir.cst(42 : i32)
  %dead_x_addr = cir.load deref %p_addr : cir.ptr <!cir.ptr<i32>>, !cir.ptr<i32>

  // attempt to store 42 to the dead address
  cir.store %forty_two, %dead_x_addr : i32, cir.ptr <i32> 
}

Loops: cir.loop represents loops from C/C++. The loop form (for/while/do-while, and soon range-based for) must be explicitly provided to the operation. It also encompasses three different regions (condition, step and body) and must be enclosed by a cir.scope, where all possible init-statement declarations hold their cir.alloca’s. This representation has interesting effects on diagnostics quality:
- Accuracy: the form dictates the order the regions are executed when implementing MLIR’s RegionBranchOpInterface, allowing different MLIR based passes to retrieve the appropriate order for regions/blocks that are relevant for a particular loops. For example, the order to process regions are different between a do-while and a while (body then condition versus condition then body).
- Extra analysis capabilities: the same mechanism that allows better accuracy can be combined with SCCP information to find more constants and reduce the amount of regions to be analyzed.

Structured control flow

Current Clang’s codegen for CIR assumes a mostly structured control flow — gotos are only supported intra-scope right now. C/C++ return statements are represented with cir.return, while break and continue are special forms of cir.yield, an operation that represents returning control to the parent of a region. This has some advantages for code analysis (like lifetime) since it makes control-flow simple.

No inherent property of CIR prevents unstructured control flow from being used. It can be done by implementing a pass that flattens the CFG by merging scopes and moving allocas back to the function entry block. This effectively makes all gotos intra-scope now that the function-level scope is the only one present. Any transformations or analysis that then prefer to work on such representations (like lowering to the LLVM IR dialect) can add this pass as required. This is probably the route we are taking when we get there.

Dialect tranformations

One great aspect of using MLIR is the ability to easily write transformations. This allows CIR to be morphed into an even higher level CIR (let’s say, when recognizing idioms for C++ containers) or lower CIR (when merging scopes for LLVM IR dialect generation). We mentioned in the Goals section the plan to improve diagnostics on top of C++ library usage, and CIR dialect transformations is the way to get there.

Clang’s codegen from AST to CIR is straightforward, using a subset of CIR, and it’s up to compiler options or extra tools to setup the necessary pass pipelines to achieve the desired CIR form for analysis. For instance, consider C++ lambdas: the naive codegen uses a method call to the appropriate internal struct callable. For lifetime analysis, a required transformation could inject a cir.lambda operation at the original definition location, making it easier for the lifetime check pass to reason on captures and scope.

Verifiers

MLIR mechanisms for operation verification have been useful to codify the semantics of CIR. The implemented verifiers for CIR operations cover things like operand/result type matching, placement of certain operations (e.g. breaks need to be dominated by cir.loop or cir.switch). We have tests to exercise invalid constructs and the verifiers.

Status and Plans

CIR started towards the end of 2021; information about github repo, building instructions and others can be found in clangir.org.

The project is in early stages and we’re currently working on (a) the heavy lifting required to codegen CIR out of C++ sources from our codebase and (b) complete the lifetime checker, focusing on coroutines linting. Once we are ready to build full C++ projects, we also intend to measure compile time, memory usage, and check how it compares to other existing tools.

Open source & Upstream

This RFC proposes to incorporate CIR into the llvm-project as soon as possible, and we are ready to make any needed changes to make that happen. The early stages of development are especially attractive since the project could benefit from multiple eyes and instigate potential interested parties to join the effort early. This is highly dependent on the community buy-in and we also understand if some level of maturity is desired.

Project Layout

The layout so far is divided into a few main pieces:

Clang codegen bits in clang/lib/CIR: this mostly mimics the file/class layout of LLVM codegen. To leverage on years of codegen improvements and fixes, we carefully try to track currently out-of-scope features with assertions and feature guarding whenever we don’t have needed data to assert. This has proven helpful when incrementally adding codegen pieces.
The CIR dialect in mlir/lib/Dialect/CIR. This is different from what FIR does (dialect is part of Flang). We’d prefer it to live along side its other dialects buddies but we can see pros/cons either way - we are open to discussion.
Testing: All tests are currently under clang/test/CIR, and divided into CodeGen, IR, IRGen and Transforms. There are several flavors right now: C++ to warnings, C++ to CIR, CIR to CIR and CIR load/store to memref.

Technical Debt

If the community thinks it’s a good idea to incorporate this sooner rather than later, here are some current known technical debt we need to tackle:

CMake: the project is currently hardcoded to link mlir into clang. This should be an extra build flag to compile CIR optionally.
Dependencies: there’s a cycle between clang and CIR, not really needed and easy to break.
Testing: some tests should probably live inside mlir/test

On a list of other possible initial improvements:

AST helpers: many LLVM codegen helper functions rely on AST queries that do not depend on LLVM IR, these can be factored out into common helpers.
MLIR: the canonicalizer is a bit aggressive for code analysis usage since it may remove operations we might want to keep around for diagnostic purposes. We currently work around by adding our own rewrites, but perhaps we can add a new GreedyRewriteConfig mode for that.
Support a CIR version of Clang’s analysis based warnings, such as -Wunreachable, -Wunused, etc.

Thank you for reading,

Bruno Cardoso Lopes <bruno.cardoso@gmail.com> (@bcardosolopes)
Nathan Lanza <nathanlanza@gmail.com> (@lanza)

Special thanks to Nadav Rotem, Eric Garcia and Shoaib Meenai for the support. And thank you for early feedback from Jez Ng, Nicholas Ormrod, Wenlei He, Puyan Lotfi, Ivan Murashko, Han Zhu, Matthias Braun and Yedidya Feldblum.

efriedma-quic · June 20, 2022, 11:37pm

The general goal of a more rigorous replacement for the clang CFG that we can also use for code generation makes sense. And using MLIR to build it seems like the right approach.

I’m a little concerned that the “structured” control flow graph is going to become an obstacle to practical usage. Anything concerned about actual control flow or data flow needs to understand the control flow implied by return/break/continue/goto/destructors/exceptions/etc., and that’s basically impossible with the way you’ve defined it. The way you’ve described it, your lifetime checker really wants the actual control flow; otherwise, you’ll end up with a ton of incorrect and/or missing warnings.

Maybe it would make sense to come up with some sort of hybrid: unstructured control flow, but with each statement associated with some representation of the original scopes as written in the source code.

bcardosolopes · June 21, 2022, 12:23am

The general goal of a more rigorous replacement for the clang CFG that we can also use for code generation makes sense. And using MLIR to build it seems like the right approach.

Thanks for the feedback!

I’m a little concerned that the “structured” control flow graph is going to become an obstacle to practical usage. Anything concerned about actual control flow or data flow needs to understand the control flow implied by return/break/continue/goto/destructors/exceptions/etc., and that’s basically impossible with the way you’ve defined it. The way you’ve described it, your lifetime checker really wants the actual control flow; otherwise, you’ll end up with a ton of incorrect and/or missing warnings. Maybe it would make sense to come up with some sort of hybrid: unstructured control flow, but with each statement associated with some representation of the original scopes as written in the source code.

This is a very good point. Totally aggree on the hybrid part, and we already do that to some extend: part of the codegen emits branches between blocks that are cleaned up to a structured form when possible. If a more refined lifetime analysis pass (let’s say phase two) requires unstructured CIR, it should run later in the pipeline, requiring some lowering pass as a dependency (which is part of our roadmap). In the particular lifetime case, there are many low hanging fruits that don’t require dataflow, and that’s where we started. Should probably have phrased it better that we want structured control-flow in the head of the pipeline, but not as a unique representation.

Xazax-hun · June 21, 2022, 2:46am

Hi!

This sounds like a great effort and I’d really love to see how it turns out. I definitely feel like the CFG in Clang has its problems, so I am glad to see these efforts to explore alternatives.

While the clang CFG definitely has its problems, I don’t think this point does justice to the state of the art. First of all, the Clang CFG already has everything to calculate dominators, and in fact it is just reusing the algorithms from LLVM. See this test. Second, detecting unsafe optional dereferences is more complicated than checking for the dominance. Consider:

if (b) {
  // more code
  if (opt)
    hasVal = true;
}
// more code
if (hasVal)
  *opt = 42;

In the snippet above, no emptiness check dominates the dereference of the optional, yet the code is safe. It is also possible to create a code snippet with the converse, where the check does dominate the dereference, but the code is unsafe.

Could you elaborate on this? Do you envision CIR to be self-contained or would it have back-references to the AST? I feel a self-contained representation could lead to redundancy (do we represent all declarations twice?). With the latter option, we are back to having to solve the CTU problem with ASTs. Do I miss something?

This surprised me a bit. I participated in a lifetime analysis implementation for Herb’s paper on top of the current CFG’s and had no problem dealing with lifetime end markers in the CFG. Could you elaborate on how cir.scope is more natural than what the CFG has to offer?

I also share some of Eli’s concern about unstructured control flow. But overall, I am excited for this project and eager to learn more

ChuanqiXu · June 21, 2022, 3:07am

Hi Bruno, the proposal looks spectacular. It must not be easy to introduce a new IR for C++ now.

Now it looks like a replacement for CFG analysis for better diagnostic messages. So I am wondering if it would affect Clang CodeGeneration. I mean, the process would be something like:

C++ Codes ---(Semantic Analysis)---> Clang AST ---> CIR --- More semantical analysis ---> CIR ---> LLVM IR

or

C++ Codes ---(Semantic Analysis helped by CIR)---> Clang AST ---> LLVM IR

Or in another word, would you like to replace clang/lib/CodeGen to clang/lib/CIRCodeGen in plan? If yes, I think the ABI compatibility might be a big problem.

For a quick example, proper handling of symmetric transfer requires the presence of lifetime intrinsics which might not be available in a given LLVM IR bitcode file.

Just out of interesting, is there any bug report about this in Clang/LLVM? I remember I don’t meet one. On the one hand, the lifetime intrinsics for coroutines would be generated unconditionally (⚙ D99227 [Coroutine][Clang] Force emit lifetime intrinsics for Coroutines). On the other hand, symmetric transfer nowadays is not a C++ standard feature. It is a compiler optimization instead. But it looks CIR is intended for C++ semantics. So the symmetric transfer is not a very good example to me. (This is not intended to block your proposal)

An example of C++ coroutines usage (using folly::coro ) we’d like to diagnose:

From the example, it looks like you’d love to do IPA in CIR, right? Otherwise you might not be able to do diagnostic. Or do you just want to look at the signature of ‘byRef’? If you want to do IPA, I feel like compile time might be a big problem. And if the latter case, I feel like there might be some false-negative.

Provide a way to express a contract between libraries and the compiler. Compiler passes operating on CIR can rewrite parts of code with more domain specific CIR operations and types, allowing it naturally recognize idioms and apply C++ aware code analysis.

I am not sure if I understand this. Do you mean the libraries in the ideal future would provide in CIR form instead of the current .a/.so form? The ecosystem might be a problem. But it might be OK since we’re talking about the future.

From my understanding, it looks like the long future direction of C++ would want to define a BMI (Binary Module Interface), which is compatible from different compilers (GCC, Clang and MSVC) in different versions (the current implementation of modules could be compiled by clang even in another commit id!). In that time, the distributed C++ libraries would be in BMI form.

Cross translation unit (CTU) analysis. We’ve had simple bugs in production that could have been avoided by CTU analysis capabilities. Even though there are AST based approaches, we believe CIR is also a more natural place to move forward with this type of technology. It’s feasible to imagine something like ThinLTO summaries for CIR, enabling the propagation of lifetime, initialization, etc for both improved diagnostics and performance.

From my point of view, the ability to do CTUA must require the support from build tools. From the past philosophy of tool chains, it looks like we want the compiler and the build systems to be independent with each other. But again, it looks like the trend of C++20 Modules want to combine the language with the build systems.

So it looks like what you want to do here might be similar to the picture of C++20 Modules. But a significant difference might be CIR focus on the clang compiler and the C++20 Modules focus on the C++ language. I am not sure if I state my points clearly.

One great aspect of using MLIR is the ability to easily write transformations. This allows CIR to be morphed into an even higher level CIR (let’s say, when recognizing idioms for C++ containers) or lower CIR (when merging scopes for LLVM IR dialect generation). We mentioned in the Goals section the plan to improve diagnostics on top of C++ library usage, and CIR dialect transformations is the way to get there.

One of the most requirements I heard for C++ Coroutines is “Is it possible to do dead lock detecting just like the many researches about goroutine?” My answer is always like “No. Since C++ Coroutines is not in the same level as goroutine. C++ Coroutines are low level component. The Task or the generator in C++ is in the same level with goroutine. But the compiler shouldn’t do analysis for specific libraries things.”

But if it is really easy enough to write a dialect analysis (I’m not sure how easy it is), it is possible to add some user-defined analysis in libraries. I guess this might be a possible usage of CIR if it is really easy enough.

Thanks,
Chuanqi

alexbatashev · June 21, 2022, 7:26am

I took a quick look at the source code and overall this looks fantastic. But I have one major concern with the design of both CIR and CIL. Today clang is so much more than C/C++. Many variations of C-like languages may introduce higher level abstractions. Think CUDA textures, OpenCL images or SYCL accessors. These new data types (and sometimes control flow constructions) may have vital information for diagnostics or performant code generation. The classic LLVM codegen always seemed to me hard to scale and modify. It’s a giant visitor pattern spread across multiple files, and it’s hard to add something there without modifying clang source code. And if you go that way, you either have to submit your patch to community (when it may be only useful in your particular case) or maintain a fork of clang (which makes it hard to keep up to date with the community). Instead, let’s pretend AST is a yet another MLIR dialect, and we need to lower it to CIR. So, we’d leverage dialect conversion framework, create a bunch of patterns for different AST nodes, and for every language clang supports we’d choose the right patterns. And if somebody is creating a C+±like DSL, they’d simply write a few new patterns and load them as a plugin (or link clang into their own compiler). Now, AST is not a MLIR dialect, but it still seems to be possible to create a similar engine for lowering AST to a combination of MLIR dialects. To me this looks a bit more scalable. What do you think?

clattner · June 21, 2022, 7:54am

Hi Bruno and Nathan,

This is super exciting work, thank you so much for pushing this forward! A few high level thoughts and comments on this:

I agree with you that Clang and the C/C++/derived community would benefit tremendously from improvements in this area!
There are no existing (known-to-me at least) work in progress that is a credible way of tackling these issues, I agree with your general assessments of the other related projects in flight.
I agree that Clang CFG is limited in completeness, perhaps not up to the general LLVM quality standards, and doesn’t see to have a path to rectify that.
I also agree with Alex’s comment above that the broader derived-from-c++ languages community could benefit a tremendous amount from this.

So anyway, I’m a superfan of the work . It seems like this post is trying to do a few different things, and I’d recommend you separate them into different efforts:

On the one hand you’re launching a new project and raising awareness of it. A+
On the other hand, you’re looking for design feedback. I think that Eli’s comments up-stream are important: it would be really unfortunate to replace ClangCFG with another tech that is limited and can’t be on a path to subsume the codegen path (which of course, will likely take years to happen). I think a dedicated design review will be important to collect feedback and iterate on the design. My experience building these sorts of things is that MLIR makes it very easy to get things started, but it doesn’t provide a lot of help on the nuances of the IR design - this takes iteration, experimentation, and discussion. You’re made a ton of decisions that may be write, but should be reviewed, e.g. a couple random examples is "why is the op named “get_global” and “why have one loop operator instead of maintaining more syntactic forms” etc. You’re likely to end up with a progressive lowering path, and I don’t understand the goal here. It seems that one or more design docs are needed
You’re proposing upstreaming the code to LLVM and it becoming “the thing”. As a community we need to be convinced that this can fully subsume (at least) clang’s CFG representation, and I think we would also want to believe that this is on a path to intersect with codegen. This is a big hurdle that will naturally take time - both for socialization and for the technical implementation progress. Just to set expectation - your goal of “as soon as progress” is probably 6-9 months at the very least, maybe more like 18 months depending on how the design work goes.

All that said, I again am a superfan of this work and it makes perfect sense that you want to pull together likeminded people and make this serious effort intersect clang.

Have you considered making this be a formal llvm-incubator project? That would allow you to make it be “the effort” in LLVM, pull together likeminded people, do the design iteration, and make more progress so can build towards replacing the CFG implementation completely. I think this could be a low-burden (on your side) and low risk (on the Clang side) way to stage this in.

WDYT?

-Chris

ftynse · June 21, 2022, 9:38am

Having participated in some of the hallway discussions about MLIR-based clang, I am really excited to see this happen and presented to the community at an early stage!

In-tree MLIR can be tough on introducing new dialects. At the very least, there’s an argument that if CIR is to live under lib/mlir, so should FIR, and potentially any other language-specific IR. MLIR makes it relatively easy to live out-of-tree, except for some, increasingly less frequent, API changes. Same is probably true for Clang, although my experience with it isn’t as extensive. So a separate project under the LLVM umbrella, whether in incubator or not, may be a good option to start.

I second @clattner’s comment on design review, it sounds like this can be a great community-based exercise, even if it takes longer initially. Contrary to many of the comments above, my interests are in optimization and code transformation rather than in analyses for diagnostics, and preserving structured control flow and some type information is paramount in my case. We are likely to end up with some hybrid form and progressive lowering anyway, just because it’s natural in MLIR-based flows but we’d better avoid premature lowering.

@alexbatashev also has a great point about language extensions. It would be ideal if the end state of the compiler made it easy to bring even more domain knowledge to the compiler than just C++. This can range from OpenMP-like pragmas to modeling some library types (e.g. folly::Task) and calls as first-class concepts in MLIR that some passes understand and can transform (e.g. BLAS calls that get codegened as TOSA or something in a way that allows for fusion). We put this bit in the Polygeist charter, but haven’t yet worked on the design beyond the initial brainstorming. I am happy to join design discussions and reviews as well as share our experience from both Polygeist and earlier MLIR design.

tschuett · June 21, 2022, 12:34pm

I am mostly looking forward to domain-specific optimisations: OpenMP, STL, …

Will Cross-TU analysis and optimisation at CIL, FIR, or TOSA bring a significant benefit over ThinLTO?

erichkeane · June 21, 2022, 1:21pm

This sounds great, thanks for doing this! I WILL say my biggest concern though is that if this is intended to replace Clang IR-Gen/CodeGen is the amount of work that happens ABI-wise in there, as well as things like ifuncs/multiversioning/etc. A proper replacement would need to ensure we don’t end up with any significant ABI breaks.

tschuett · June 21, 2022, 1:27pm

ABI is a really interesting topic. Currently, the frontend seems to be responsible. Rust had similar issues:
https://github.com/rust-lang/rust/issues/97463

rnk · June 21, 2022, 3:58pm

Thanks for working on this, I think there has been and will continue to be interest in an MLIR lowering path out of Clang for all the good reasons you mentioned.

I have some very “back to basics” concerns about this project. You touched on this, but I’ll say more:

Once we are ready to build full C++ projects, we also intend to measure compile time, memory usage, and check how it compares to other existing tools.

If and when MLIR is an on-by-default part of an LLVM C++ toolchain, we should consider the following:

Binary size. If we link MLIR into clang, what is the new binary size? Is it acceptable? MLIR includes lots of generated code. Generated code is often large. Can we make it smaller? I don’t want us to get to the end of the project and say, “oh, we doubled the toolchain size, oh well, disk space is cheap, the cost is worth the benefit, ship it!” You mention that you haven’t written the CMake logic to make the new component optional, and I think that’s probably a requirement to merge this into llvm-project.
Compile time. You mention that Rust and Swift use high-level IRs effectively, but I am told that both language implementations have issues with compile time. Many of us have put a lot of effort into reducing compile time, and introducing an entire representational shift could have big costs. We should probably set and agree on an upfront target of what is an acceptable increase, like +15-25%, before making this part of the default pipeline.
Memory usage: Similar to the above concerns.
Performance regressions. If you run any optimizations at all at the CIR level, you will need to have an inliner. C++ benchmarks are super sensitive to inlining decisions, so there will be regressions. A great deal of effort has been spent, and will be spent before this lands, tuning the pass manager and inliner. How can we ensure that that tuning effort is not wasted? I don’t have great answers here. At the very least, you’ll need to have transitional flags to temporarily get the old behavior.

wsmoses · June 21, 2022, 4:40pm

Disclaimer: Work on Polygeist, as well as other fun LLVM projects.

This is really exciting stuff (and I am very excited by having a proper MLIR backend to clang). As others have mentioned, I don’t think this is likely to get upstreamed to Clang in its current from for both the need for design review as well as some technical reasons (e.g. making Clang depend on MLIR).

I wonder if a good intermediate step is to combine efforts with Polygeist [1], which is already an incubator project in the LLVM organization (https://github.com/llvm/Polygeist)?

As you mention, Polygeist directly lowers to MLIR rather than having an intermediate C-level dialect (with a couple of special ops for ensuring ABI compatibility like type size, etc).

I could imagine that an integration would involve slowly replacing the Polygeist AST lowering to use your CIR lowering and moving that existing Polygeist lowering of Clang AST to “former standard dialects” into a transformation pass within MLIR.

I also think it’s definitely wise for us to have a longer conversation regarding design. For example, to support break/continue while maintaining nice MLIR semantics we added flags that specify whether to continue a computation. I think that it would be nice to discuss the different approaches together, and perhaps we can give a design overview update at an MLIR ODM?

Also I’m hopeful that combining our efforts will let us get to that fully functional MLIR backend sooner

[1] Note that Polygeist, in spite of the name, works on generic C/C++ code (and can correctly & ABI compatible compile the entire PyTorch binary arithmetic library – notably this generates ~1million lines of MLIR!). It also supports compiling CUDA code to the GPU dialect, OpenMP to SCF/OpenMP dialects/etc. We’re right now considering renaming to cgeist to clarify this (as well as being shorter to run on the command line).

emosy · June 21, 2022, 5:58pm

What is lacking in the existing MLIR dialects (especially the “formerly standard dialects”) for common control flow and loop constructs that requires CIR to have its own versions of if, loop, switch, br, and brcond? For example, I see that cir.br and cir.brcond are very similar to cf.br and cf.cond_br. I’d imagine that you could specify that certain scf and cf ops are dynamically legal for use with the cir dialect based on your scope dominance rules so that you could reuse scf and cf ops while keeping your rules about scoping.

I can see the purpose for having some special loop ops in cir for something like a range-based for loop or to maintain some information about the original form of the loop. However, I don’t fully grasp why CIR should duplicate effort from the “standard” MLIR dialects.

What’s holding the standard-ish MLIR dialects back from being useful for CIR? It sounds like Polygeist uses pre-existing MLIR with just a few extra custom ops (about 10 by my count), so it seems like it could be possible.

Maybe you could write passes that rewrite from the cir ops to their equivalent scf or cf ops with the needed changes to accomodate semantic differences in the ops.

I think combining efforts with Polygeist would be a very good idea. Anything to get CIR into the LLVM incubator space would be helpful.

rjmccall · June 21, 2022, 6:08pm

I’m excited to see this. I’ve said for awhile that I think Clang really ought to be doing a two-stage lowering, similar to how Swift does it, although surely there are lessons to be learned there as well. A couple quick thoughts:

I do think CIR should be designed from the beginning with the idea of inserting into the compilation pipeline. One of the things that has always held Clang’s CFG back is the fact that it simply doesn’t get the same attention as the main compilation path. People don’t add support to the CFG for various little language extensions, and then static analyses are just disabled (or worse, broken) in those functions. We really need this IR to not end up being yet another thing in that box, especially since not only would it suffer the same fate, but it would distract effort from the CFG and make things even worse on that side of the world. The only way to ensure that this IR gets at least that basic level of effort put into it from everybody working on new Clang features is to put it on the compilation path.
The scope representation is interesting. I suspect you’ll probably want multiple stages, one where destruction is implicit this way and another where destruction has been made concrete in the IR. And by “destruction” I mean not just destructors / ObjC ARC releases and so on, but also probably deallocation of local variables — it’s a pain to maintain in a lot of ways, but SIL ultimately really benefits from having a properly-constrained allocation stack. A nice early goal here would be to replace JumpDiagnostics.
I’m a little worried about over-promising about coroutines. Unless you’re seriously thinking about doing lowering of coroutines on your new IR, you’re not really going to be eliminating LLVM’s need to understand them. And there are at least some good reasons to do lowering at the LLVM level rather than at the high level; among other things, the lowering can take advantage of LLVM-level optimization having happened, which means it gets the benefits of cross-language optimization. (I don’t know whether it will be possible to link CIR objects across language settings, but cross-frontend optimization matters, too.) And unlike Swift’s coroutines, the use pattern of C++'s coroutines is basically entirely up to the library, so things like coroutine inlining will not just fall out naturally.
If you preserve the C type system in CIR, which I think needs to be the goal rather than lowering to some sort of hybrid C+LLVM type system, then ABI lowering should largely just fall out. You’ll turn a MemberExpr into something that projects a class member from a class, referring to decls the whole way, and then the IRGen pass which converts CIR to LLVM IR will figure out how best to turn that into GEPs. Similarly, functions will just take and return C types, and IRGen will be the point which understands the translation from that into LLVM IR.

tschuett · June 21, 2022, 8:27pm

Could the ABI infrastructure be shared between several frontends, i.e., Clang and Flang?
The joke is that Swift and Rust will migrate to MLIR and would also benefit from it.

lanza · June 21, 2022, 9:04pm

Thanks everyone for the feedback and comments, this is great. We are going over each of those as the time allows, thanks for your patience.

bcardosolopes · June 21, 2022, 9:58pm

This sounds like a great effort and I’d really love to see how it turns out. I definitely feel like the CFG in Clang has its problems, so I am glad to see these efforts to explore alternatives.

Thanks for taking the time to read through!

While the clang CFG definitely has its problems, I don’t think this point does justice to the state of the art.

This is a fair point, but the previously mentioned concerns (e.g. the CFG not being used for codegen) are still valid.

First of all, the Clang CFG already has everything to calculate dominators, and in fact it is just reusing the algorithms from LLVM. See this test.

Nice test!

Second, detecting unsafe optional dereferences is more complicated than checking for the dominance. Consider:

The dominance is just an example of how CIR could do it on top of higher level operations on a cir. namespace. At this point in the RFC, it’s not really talking about what Clang CFG lacks, also not saying a simple dominance check is all that’s needed, it’s an illustrative point.

In the snippet above, no emptiness check dominates the dereference of the optional, yet the code is safe. It is also possible to create a code snippet with the converse, where the check does dominate the dereference, but the code is unsafe.

The goal of the example is simpler than what you’re pointing out. We’re not trying to statically prove that the optional has a value, the dominance check is intentional. We’re motivated by, for example, Swift and Rust’s handling of

let a: Int
if true { a = 5; }
print(a)

This is a fatal error in both Rust and Swift even though a clearly always has a value. We consider this a reasonable clang-tidy-like warning from CIR.

Could you elaborate on this? Do you envision CIR to be self-contained or would it have back-references to the AST? I feel a self-contained representation could lead to redundancy (do we represent all declarations twice?). With the latter option, we are back to having to solve the CTU problem with ASTs. Do I miss something?

We envision it to be self-contained, without back-references - but happy to have a deeper discussion when this becomes a more concrete goal. The project is still new enough that we haven’t excluded ourselves from taking either option. It looks like Swift’s SIL does maintain back-references to the AST, but we haven’t dig enough to understand all the trade-offs, maybe experts from that community could shed some light here!

This surprised me a bit. I participated in a lifetime analysis implementation for Herb’s paper on top of the current CFG’s and had no problem dealing with lifetime end markers in the CFG.

We failed to get reliable results when trying to use it in the context of analyzing coroutines - it’s possible that some coroutines bits were missing or had bugs in the CFG representation. Note that CFG isn’t the only motivation for CIR, as stated elsewhere in the RFC.

There’s also a mistake in the text, we should be saying “CFG or lifetime intrinsics” instead of “CFG with lifetime intrinsics”, since the point here is to mention both CFG or analysis at LLVM IR level.

Could you elaborate on how cir.scope is more natural than what the CFG has to offer?

It feels more natural to me given that the lifetimes of allocas in a scope ends with the cir.scope region (no need for extra markers) and that we can leverage that to explicitly encode more high level semantics in CIR: one illustrative example is that cir.scope could “yield” a value to model lifetime of temporaries on full-expressions. This would have potential usage outside the CFG realm of analysis, as it could, for instance, provide a direction for later passes to identify where to place lifetime-oriented sanitizer calls.

bcardosolopes · June 22, 2022, 5:33am

Hi Chuanqi, thanks for the feedback!

Or in another word, would you like to replace clang/lib/CodeGen to clang/lib/CIRCodeGen in plan?

ClangIR lives entirely separate to CodeGen and replaces it in the LLVM generation pipeline. e.g. AST → CIR → LLVM. The replacement of CodeGen is a long term goal (e.g. 2+ years).

If yes, I think the ABI compatibility might be a big problem.

Agreed, correct ABI lowering certainly won’t be trivial. We are trying to make much of the logic from CodeGen “reusable.” e.g. there is code that looks at an integer and decides how to represent it to agree with the ABI. Since the result is an LLVM IR type the logic there should be amenable to us ultimately also wanting to lower to LLVM IR types.

Just out of interesting, is there any bug report about this in Clang/LLVM? I remember I don’t meet one. On the one hand, the lifetime intrinsics for coroutines would be generated unconditionally (D99227).

We are actually talking about this very problem/fix: it’s an example of how the coroutines correctness fits weirdly in the LLVM IR opt pipeline. Should have phrased that better though.

On the other hand, symmetric transfer nowadays is not a C++ standard feature. It is a compiler optimization instead. But it looks CIR is intended for C++ semantics. So the symmetric transfer is not a very good example to me. (This is not intended to block your proposal)

Standard or not, our users care about symmetric transfer, therefore it’s important for us. We are focusing on CIR for diagnostics in the short-term, but as mentioned above, at some point we want optimizations too.

From the example, it looks like you’d love to do IPA in CIR, right? Otherwise you might not be able to do diagnostic. Or do you just want to look at the signature of ‘byRef’? If you want to do IPA, I feel like compile time might be a big problem. And if the latter case, I feel like there might be some false-negative.

Yes, compile time could be a big problem, accuracy is another one. There are interesting challenges here. By using standalone tools to apply such checks we plan to isolate compile time out of the main compilation pipeline, where we can incrementally check and track it. Another possibility is to experiment using profile information to decide which part of the code deserves more expensive checks.

In general, there’s more discussion to be had about this topic. @rnk just raised a great point out that we’ll need to make sure we’re not regressing compile time by too much. We might end up wanting to bundle different levels of optimization effort to try to satisfy these goals.

I am not sure if I understand this. Do you mean the libraries in the ideal future would provide in CIR form instead of the current .a/.so form? The ecosystem might be a problem. But it might be OK since we’re talking about the future.

Nope, the model we have in mind is for CIR passes to recognize functions, classes, etc from libraries and treat them specially for the sake of analysis / transformations.

From my point of view, the ability to do CTUA must require the support from build tools. From the past philosophy of tool chains, it looks like we want the compiler and the build systems to be independent with each other. But again, it looks like the trend of C++20 Modules want to combine the language with the build systems.

Cross TU analysis/optimization wouldn’t be a hard requirement for CIR to function. We plan on supporting CIRGen and CIRToLLVM lowering as part of a standard clang++ invocation transparently on single TUs. We envision cross TU to be a feature closer to using -flto and not modules.

One of the most requirements I heard for C++ Coroutines is “Is it possible to do dead lock detecting just like the many researches about goroutine?” My answer is always like “No. Since C++ Coroutines is not in the same level as goroutine. C++ Coroutines are low level component. The Task or the generator in C++ is in the same level with goroutine. But the compiler shouldn’t do analysis for specific libraries things.”

But if it is really easy enough to write a dialect analysis (I’m not sure how easy it is), it is possible to add some user-defined analysis in libraries. I guess this might be a possible usage of CIR if it is really easy enough.

One of our primary motivations for this project is providing better analysis of code that extensively uses coroutines (and in particular folly::coro::Task and unifex::task). So yes we explicitly are targeting the ability to analyze library features.

bcardosolopes · June 22, 2022, 7:39am

We’re very excited to share with the community, thanks for the feedback and for breaking it down into different efforts!

On the other hand, you’re looking for design feedback. I think that Eli’s comments up-stream are important: it would be really unfortunate to replace ClangCFG with another tech that is limited and can’t be on a path to subsume the codegen path (which of course, will likely take years to happen). I think a dedicated design review will be important to collect feedback and iterate on the design. My experience building these sorts of things is that MLIR makes it very easy to get things started, but it doesn’t provide a lot of help on the nuances of the IR design - this takes iteration, experimentation, and discussion. You’re made a ton of decisions that may be write, but should be reviewed, e.g. a couple random examples is "why is the op named “get_global” and “why have one loop operator instead of maintaining more syntactic forms” etc. You’re likely to end up with a progressive lowering path, and I don’t understand the goal here. It seems that one or more design docs are needed

We largely agree with all of this. We’d be glad to have open design discussions and to defer to other’s expertise where it’s valuable. One concern, however, is that we don’t want to digress into perfect-is-the-enemy-of-good territory. We’re still just two engineers for the time being and need to make sure that discussion doesn’t dominate implementation.

You’re proposing upstreaming the code to LLVM and it becoming “the thing”. As a community we need to be convinced that this can fully subsume (at least) clang’s CFG representation, and I think we would also want to believe that this is on a path to intersect with codegen. This is a big hurdle that will naturally take time - both for socialization and for the technical implementation progress. Just to set expectation - your goal of “as soon as progress” is probably 6-9 months at the very least, maybe more like 18 months depending on how the design work goes.

No contestation here. Your second sentence accurately summarizes our strategy and your timeline is in strong agreement with ours as well.