[RFC] An MLIR based Clang IR (CIR)

Yes, stay tuned for details. But with abstraction raising here it is mostly to convert certain emulator style operations such as “get target pc”, “get target gpr”, “target branch”, etc into dedicated instructions. The rest of the dialect is actually mostly lower level than LLVM and more based on Generic Machine IR.

Given all the folks wanting to explore a C/C++ MLIR frontend and to start design review & avoid unnecessary duplication of efforts, I’m going to propose we immediately schedule a call to get everyone up to speed and talking together.

Here’s a when2meet (C/C++ MLIR Frontend Design - When2meet) for 9am ET - 9pm ET. Fill out when you’re available and add your email to MLIR C/C++ Frontend Working Group - Google Docs where we’ll also keep minutes.

Edit: and now the permissions on the doc are fixed

4 Likes

Hi Bruno and Nathan,

Thanks for working on this!

I followed the exchange so far and I have several questions mostly from the interactive C++ side where we use clang to produce IR and run with the LLVM JIT.

IIUC, in couple of years from now, Clang will have a new lowering pipeline from AST → CIR → IR → MC. Do you have an estimate how much overhead that would introduce in terms of compile time and memory use? Do you see any ways to shorten that pipeline by having something like AST → IR → MC or that completely goes away?

Many thanks,
Vassil

Right. Swift was originally written with a unified IRGen like Clang has today, and the process of introducing SIL was largely one of teasing apart those two aspects of IRGen, rather than being a rewrite from scratch. The parts of IRGen that involved recursive walks on the AST, scope/cleanup management, and other things made explicit in SIL generally became part of SILGen, and IRGen was left mostly with focused code about implementing the ABI for the high-level operations exposed by SIL.

The main mistake we made there in Swift is that we probably ought to have made a third library for expressing some of those ABI details (like struct layout) abstractly, independent of the act of generating LLVM IR. Clang has the advantage on us there because those kinds of details are often exposed at the language level in C, and so e.g. RecordLayout is necessarily an AST-level facility. But we’re moving towards having such a library anyway.

1 Like

Note that MLIR has an LLVM dialect. It will be kind of automatically translated to LLVM.

Clang has to generate CIR. Eventually you will reach the LLVM dialect.

1 Like

My point is that it is very important to get the design right. If it is too rigid from the start (e.g., fixed two-phase AST->CIR->IR mindset rather than a more fluid MLIR-esque mix-of-dialects), it will be extremely hard to add other abstractions in the future.

I’ll allow myself to interpret @alexbatashev’s comment here because it sounds like it echoes my concerns. There is a conceptual difference between how Clang codegen is implemented and how MLIR “codegen” (lowering conversions) is implemented. The former is more or less a giant switch / visitor. Things cannot be left unhandled. If someone needs to emit custom IR constructs, they likely need to thread them throw the entire flow. The latter is a set of rewrite patterns (think InstCombine on a lot of steroids) that can be applied in multiple passes if necessary. It’s perfectly fine to leave pieces of input unhandled as long as somebody downstream the pass pipeline converts them. Third parties can just take the bag of already available patterns and throw in their own to handle language extensions, or even override the defaults, to emit custom constructs; all without modifying the existing code.

The idea is to steer the design towards more MLIR-style extensibility. I appreciate this is challenging. FWIW, the dialect conversion infrastructure is still missing some useful features and it was started in 2018. So it may well be out of scope to have AST as a dialect and use dialect conversions to get to CIR, but the design should have the same level of extensibility as the final goal. Otherwise, it will essentially defy the purpose of using MLIR as a more-easily-customizable IR.

We have put a lot of effort into making MLIR usable and extensible out-of-tree. There are several quite large and reasonably successful projects doing that (IREE, CIRCT). There shouldn’t be any technical reason to put IR definitions in MLIR tree, which is actually one of the main pushback reasons against any new dialect. OTOH, it seems to have become common wisdom that one needs to fork LLVM in order to extend it, which is sometimes erroneously transposed to MLIR.

The layering problem is that MLIR “core” doesn’t want the dependency on Clang. If Clang can take an (optional) dependency on MLIR, that’s good for me with my MLIR hat on, save for the extra work when refactoring. If it cannot, the remaining option for this is to be a separate top-level project.

2 Likes

Attributes simply attached to existing operations can and will be discarded by many “core” passes, so this is a non-starter.

There is another important aspect somewhere between design and maintenance: possibility of independent evolution. scf.for shouldn’t necessarily be constrained by the needs of the C++ (or Fortran, or Rust) frontend and vice versa. This is the same reason why MLIR’s Arithmetic dialect mostly duplicates the LLVM dialect arithmetic operations, which themselves closely mirror LLVM IR instructions. We can make different decisions than LLVM IR in the former, but not in the latter. This may lead to bloat, but it seems to have been manageable so far.

(I am an MLIR expert) You don’t want to stay that far behind. Doing a bump every week or two, with additional on-demand bumps if you need a recent upstream feature, is probably the sanest.

Out of curiosity, how much this frustration drives the desire to put the code in the monorepo and thus shift the maintenance burden on MLIR developers? :slight_smile:

I love the enthusiasm, thanks for putting this together. I was wondering if we should wait a bit more and harvest any remaining feedback (maybe until mid of next week?), summarize the discussion and build an agenda, which we were already planning to do anyways. This perhaps could lead to a more productive first meeting? Wdyt?

1 Like

Thanks for working on this! The functionality would be very useful to C/C++ dialects in Clang which constantly lack native representation in LLVM. For OpenCL particularly it would be great if the compilation time through the new flow matches what we currently currently observe as many toolchains use clang to compile online (during application execution) which sometimes happen on constraint devices that can be sensitive to the long compilation latency.

Over all, I think it’s pretty clearly been for the best. It hasn’t been completely uncontroversial, though, and I think it sometimes makes testing a little trickier. For example, we have a textual representation of SIL, but some kinds of AST reference aren’t clear how to represent in it, like when you’re referring to a local type. But cloning AST types into a totally separate type system would create a lot of its own problems without necessarily removing any of the earlier ones.

Swift does a lot of dynamically-sized allocations with scoped lifetime, so having the ability to rigorously reason about their correctness is really important to us. That’s very difficult to do with LLVM IR’s representation, in which local lifetimes can theoretically overlap in arbitrary ways with each other; you end up having to fall back on worst-case assumptions all the time. It also means we have a general, reliable stack discipline system into which we can integrate new features that want to use dynamic local allocation; for example, we recently introduced a scoped temporary VLA feature, and that was easy to fit in despite how much we already use dynamic stack allocation because we simply had to make the builtins for it participate in the standard stack discipline.

Now, in SIL we separately terminate everything, so that you have e.g.

  %0 = alloc_stack ...
  %1 = alloc_stack ...
  ...
  dealloc_stack %1
  dealloc_stack %0

I don’t think there’s anything wrong per se about having a common scope terminator, e.g.

  %s = begin_scope
  %0 = alloc_stack ..., %s   // or even just make %s implicit
  %1 = alloc_stack ..., %s
  ...
  end_scope %s

In fact, that might actually be really nice to have coming out of the initial generation pass. But if you want to be able to optimize these things later, I think it’s nice to be able to separate them so that e.g. you can merge specific allocations as a high-level optimization without worrying about messing everything else up. Maybe that would be less important as an optimization for C than it is for Swift, though.

Well, that’s the thing. People often cite complexity as a reason to not use Clang’s type system, but a lot of that complexity affects code generation, at the very least for debug info but quite possibly also for layout, calling conventions, and whatever else. So my central concern about not using Clang’s type system is that it’s really easy to make an attractive toy that works in common cases where people aren’t doing anything complex and then doesn’t work if you throw any of the complexity at it, because it’s almost intentionally not preserving that. Like, if I’m generating debug info for a variable whose type is a local class, I need to know that! I can’t just hoist this thing out and pretend it’s declared at global scope, because that’s not how it really works in the language, and the debugger will present misleading information to the user and everyone will be confused. And so I think we do want the high-level IR to stay expressed in the high-level terms of the source language, and the process of eliminating that is a real lowering step.

What we do in SIL tests is we allow the SIL file to start with arbitrary source declarations, and then we have SIL declarations that can refer to those declarations. So it’s quite easy to write a compact test case directly in SIL for most things. There are basically two places that isn’t true:

  • It doesn’t handle multi-file problems very well. Swift modules aren’t semantically split into independent translation units, and sometimes we have multi-file processing bugs where we’re not handling uses of declarations in other files quite right. That’s specific to the Swift processing model and shouldn’t be an issue in Clang.
  • You can’t write tests involving function-local declarations because you can’t refer to them. Sometimes you do have tests that are specific to function-local types or whatever. Fixing this this is just a matter of creating some way to refer to these declarations from the global scope, presumably just in this test mode.

SIL is not a stable representation, and I can’t imagine CIR would be, either. You really don’t want it to be a stable representation, I think; you very frequently realize with these high-level representations that there’s some invariant that needs to be a structural restriction on the IR. It’s usually not really possible to upgrade existing IR to make that restriction hold.

3 Likes

Hi Vassil,

I followed the exchange so far and I have several questions mostly from the interactive C++ side where we use clang to produce IR and run with the LLVM JIT. IIUC, in couple of years from now, Clang will have a new lowering pipeline from AST → CIR → IR → MC. Do you have an estimate how much overhead that would introduce in terms of compile time and memory use?

This is a very good question, similar to the concerns brought by @rnk. We currently have no estimates, but we know that when time comes to make such pipeline changes, having those numbers are paramount to make an informed decision. We plan to start tracking these metrics once we can build bigger code bases. Note that a lot of the goodness we want from CIR isn’t necessarily required for LLVM IR codegen and we can come up with reduced pipelines, as @lanza mentioned:

w.r.t your next question:

Do you see any ways to shorten that pipeline by having something like AST → IR → MC or that completely goes away?

Keeping AST → IR → MC around in a future where CIR is part of the pipeline might lead to similar issues we currently face with CFG. John has a good remark on this:

3 Likes

Hi Anastasia,

Thanks for working on this! The functionality would be very useful to C/C++ dialects in Clang which constantly lack native representation in LLVM.

Sure, we’re looking forward to have CIR being a good solution for C/C++ dialects in Clang as well.

For OpenCL particularly it would be great if the compilation time through the new flow matches what we currently currently observe as many toolchains use clang to compile online (during application execution) which sometimes happen on constraint devices that can be sensitive to the long compilation latency.

Thanks for sharing. Compile time is also a big deal for us, please check the replies to @vvassilev and @rnk for some of our current thoughts there. For instance, a future OpenCL lowering pipeline could opt-in to only use passes that fits a given compile time budget.

2 Likes

Thanks! I see. Flexibility of setting up language specific pass pipeline would be awesome. However we typically use most of the LLVM standard optimisations. Btw in relation to custom pass pipeline, I imagine we would get the flexibility to skip lowering to LLVM IR which is desirable for multiple dialects that compile into portable binary formats. For example something like:
Clang AST → CIR Dialect → … other Dialects … → SPIR-V Dialect → SPIR-V binary

would make a lot of sense. One practical issue with this flow we see at the moment is that in this flow we won’t be able to leverage LLVM optimization passes at all but they are valuable. Not sure this is the right forum to discuss this topic but I wonder whether it would make sense to move or duplicate some optimisations in MLIR from LLVM IR and whether in the future the guidelines for new optimizations would become such that common passes among SPIR-V, LLVM and other similar IRs would be implemented in some common dialect?

CIR as I imagine it (and it sounds like Bruno is in accord with this) would be pretty minimally more portable than LLVM IR, mostly because C/C++ expose an enormous amount of detail at the language level. CIR wouldn’t partially encode CC lowering the way that LLVM IR does, and some of the details of things like bit-field layout and virtual calls would remain abstract, but pretty much everything else is locked in.

Hi Alex,

My point is that it is very important to get the design right. If it is too rigid from the start (e.g., fixed two-phase AST→CIR→IR mindset rather than a more fluid MLIR-esque mix-of-dialects), it will be extremely hard to add other abstractions in the future.

We don’t plan for anything rigid, the idea of progressive lowering is key to success here. Assuming an analysis pipeline for improved diagnostics, our naive (non-MLIR expert) plan is something like:

Clang AST → CIR_1 → Analysis_1 → CIR_2 → Analysis_2 → … → CIR_N → OneOrManyMLIRDialects

Breaking the pieces from the pipeline above:

  • CIR_1 is a subset of operations in CIR, coming directly from the Clang AST generation. This needs to be fast and shouldn’t be trying to codegen anything expensive to recognize.
  • CIR_X is any CIR variation after some idiom recognition, usually required to perform an Analysis_X.
  • Until CIR_N we only want CIR operations. After CIR_N, the pipeline is free to use a fluid MLIR-esque mix-of-dialects (which I’m calling pangea from here on).
    • If we allow the pangea before CIR_N, we are afraid that CIR passes could become susceptible to issues due to changes in the other dialects: e.g. a change in the semantics of an operation can ruin an analysis. Also, why the MLIR community should bother to be maintaining things at this level? (Agreement on your comment later in the reply)

Assuming this toy pipeline, can you elaborate on what kind of abstractions would be hard to add?

I’ll allow myself to interpret @alexbatashev’s comment here because it sounds like it echoes my concerns. There is a conceptual difference between how Clang codegen is implemented and how MLIR “codegen” (lowering conversions) is implemented. The former is more or less a giant switch / visitor. Things cannot be left unhandled. If someone needs to emit custom IR constructs, they likely need to thread them throw the entire flow. The latter is a set of rewrite patterns (think InstCombine on a lot of steroids) that can be applied in multiple passes if necessary. It’s perfectly fine to leave pieces of input unhandled as long as somebody downstream the pass pipeline converts them. Third parties can just take the bag of already available patterns and throw in their own to handle language extensions, or even override the defaults, to emit custom constructs; all without modifying the existing code.

The idea is to steer the design towards more MLIR-style extensibility. I appreciate this is challenging. FWIW, the dialect conversion infrastructure is still missing some useful features and it was started in 2018. So it may well be out of scope to have AST as a dialect and use dialect conversions to get to CIR, but the design should have the same level of extensibility as the final goal. Otherwise, it will essentially defy the purpose of using MLIR as a more-easily-customizable IR.

We have put a lot of effort into making MLIR usable and extensible out-of-tree. There are several quite large and reasonably successful projects doing that (IREE, CIRCT). There shouldn’t be any technical reason to put IR definitions in MLIR tree, which is actually one of the main pushback reasons against any new dialect. OTOH, it seems to have become common wisdom that one needs to fork LLVM in order to extend it, which is sometimes erroneously transposed to MLIR.

The layering problem is that MLIR “core” doesn’t want the dependency on Clang. If Clang can take an (optional) dependency on MLIR, that’s good for me with my MLIR hat on, save for the extra work when refactoring. If it cannot, the remaining option for this is to be a separate top-level project.

Interesting, this clarifies a lot, thanks for the detailed explanation. Double checking: can we assume that if CIR lives out-of-tree (clang) versus in-tree (under mlir) the MLIR-style extensibility problem you are talking about is gone? To be more specific, from a MLIR design point of view, a pangea-free CIR_N should be fine if it’s out of tree? My current understanding (after your reply to John) is that this should be fine.

Attributes simply attached to existing operations can and will be discarded by many “core” passes, so this is a non-starter.

Sounds like we agree on that.

There is another important aspect somewhere between design and maintenance: possibility of independent evolution. scf.for shouldn’t necessarily be constrained by the needs of the C++ (or Fortran, or Rust) frontend and vice versa. This is the same reason why MLIR’s Arithmetic dialect mostly duplicates the LLVM dialect arithmetic operations, which themselves closely mirror LLVM IR instructions. We can make different decisions than LLVM IR in the former, but not in the latter. This may lead to bloat, but it seems to have been manageable so far.

If I’m writing a source code modification tool that rewrites a ranged-based for loop into a 0 to N-1 size based for, are you saying that is best to transform a scf.for with set of attributes A to a set of attributes B and then use a rewritter that understands specific attributes in scf.for to then emit C++ code? My impression is that those things are way less brittle if different C++ loops can be represented in CIR; I can then convert between them and have only one CIR to C++ rewritter. Can you help me understand why scf.for would be better in this case?

(I am an MLIR expert) You don’t want to stay that far behind. Doing a bump every week or two, with additional on-demand bumps if you need a recent upstream feature, is probably the sanest.

Agreed. That’s actually one of our first priorities — to get a singular monolithic rebase and then enable some internal infra to do daily rebases.

Out of curiosity, how much this frustration drives the desire to put the code in the monorepo and thus shift the maintenance burden on MLIR developers?

Well, we don’t want to create burden to anyone, we expect to be more involved with MLIR as part of this work and hopefully share some of that burden ourselves.

Responding to several different things in one post:

FWIW, I agree with this point from John. This work should be part of the Clang module in the LLVM mono repo, not part of the MLIR module (or as an incubator that depends on both MLIR and Clang in the intermediate timeframe).

I think this is important to design in (and yes, MLIR makes this straight-forward because you can have dialect defined locations that refer to the AST). The usecase for this are dataflow diagnostics. You inevitably want to do “some amount” of lowering when you convert from the AST to the IR (and generally the deeper in the pipeline, the more lowering you get) but when you want to emit a diagnostic, you want to do so in a way that has high QoI. The best way to do this in Swift is to map back to the AST, grub around a bit with heuristics, and then generate an error.

The way swift works is that this is optional though, it always has a correct fallback to give a generic error if the location doesn’t carry an AST location. This is important because the textual SIL representation doesn’t carry the entire AST. The consequence of this is that you get better QoI from using an integrated compiler than breaking it in parts.

This is already par for the course in Clang though, which already has super primitive backreferences from /LLVM IR/ to Clang’s AST (specifically, its source locations), check out the integer field on inline asm nodes in LLVM IR. This experience from Clang is what informed the design in SIL, and it has worked out well for Swift IMO. I expect it to work out well for CIL/MLIR in Clang too, and to be extremely important for diagnostic and tooling clients.

I would expect that CIR as described would use literally the clang type system, so it wouldn’t need to expose any thing about the ABI. Rationale: you don’t want to expose the ABI for diagnostic and tooling clients, and MLIR doesn’t force you to do this (MLIR types can directly reference clang AST types) so why would you?


I see many comments re: polygeist and other projects that are related to this. I can’t comment about out of tree projects that aren’t public, but MHO is that polyguist is “research quality” and not the right baseline to build into something that is right for clang (note that I’m not saying that this proposal is either!). The reason is that the motivation was to get code up and running through the compiler, so it covered a lot of the important cases, but it doesn’t handle a lot of the important corner cases. I suspect polygeist has evolved since the last time I looked at it (6 month or so ago) but when I last did, it was lacking core abstractions that can handle the full generality of the C language.

I would also be concerned that starting from polygeist would have an unfortunate anchoring effect: we want design freedom to change the design to be right for clang long term, even if it means breaking the existing clients that polygeist has now.


I also don’t think it is correct to think of this as “2-stage lowering” like Swift has with SIL. This is a fairly broken understanding even of Swift, which has multiple phases of SIL, both explicitly with “raw” and “canonical” SIL, but also implicitly as the pass pipeline enforces various invariants etc.

The Clang ecosystem is far more rich and varied than Swift’s, given the many different clients and use-cases that Clang serves. I think that a progressive lowering approach (which MLIR is far better equipped to handle than SIL is) is the right way to go.

That said, we currently have a monolithic lowering from ASTs->LLVM. I think that starting with a two phase approach of AST->CIL->{Diagnostics+LLVM} is a great way to anchor the project. Just beware that the name “CIL” is completely wrong :slight_smile: because we’ll eventually have many levels of abstraction and many dialects involved over time - e.g. openmp is orthogonal to language, as the flang people have already built. ABI lowering is another one that MLIR would allow to be cleanly broken into reusably components that would be useful far beyond the C ecosystem, but I’d think that that redesign/refactoring should come after this “CIL” phase is complete and all codegen is switched over.


FWIW, I think one of the biggest risks to this project is that lots of people will jump on it and want to push and pull it in many different directions. I think that /anchoring/ it will be key to its success. It is ok to say “not yet” and specifically aim for landing one high value design point (an intermediary level of abstraction that subsumes the clang CFG and supports IR generation). I think that until those things are landed and the old paths are removed, we should resist the urge to fix ABI lowering and many other things.

Rationale: doing the first 50% of 10 different projects for 10 different clients will make for good demos, but won’t be landable in tree.

-Chris

4 Likes

I just mean in all the usual ways that C programs aren’t portable — concrete choices for the portable typedefs, concrete values for sizeof and offsetof, and so on. I agree that nothing about the ABI which isn’t already exposed in one of those ways needs to be exposed just by virtue of the production of CIL.

1 Like

The Clang AST is an abstract syntax tree. For correctness, you have to walk it exactly once and generate CIL. Maybe the new CILGen can support plugins with callbacks. Language extensions and whatever can then insert Dialect ops. If in 5 years you have interest in the analysis and optimisation of std::vector_next, you can add a new plugin and generate dialect specific ops.

1 Like

There are two aspects here.

  1. How does this intend to connect to, e.g., Open{MP,Acc,CL} and other C or C++ language extensions? Do these get included into CIR? Live as separate dialects that are understood by CIR-level passes? There needs to be a way to add such things. The ideal end state in ±5-year time horizon is to make it easy for me to create an AlexDSL embedded into C++ and have the ClangMLIR flow emit my custom dialect for it.
  2. Avoiding premature lowering. If CIR_N only has branch-based control flow and GEP as opposed to structured control flow and rich types, it becomes too low-level to apply any sort of interesting optimization, and it is a known hard problem to recover these higher-level abstractions.

I understand that most of these sounds like stretch goals and that we need to start somewhere. Closed dialect+pass ecosystem is easy to bootstrap and is the reasonable thing to start with. But keep in mind the usual MLIR-duck-typing trick: most passes don’t care that an operation is specifically a C++-semantics unsigned integer addition, they can work on any arithmetic operation that is commutative, applies to two’s complement integers of power-of-two-bitwidth with wrapping overflow, etc. Similarly, I expect that even seemingly C++-specific passes can be generalized, even if slightly, to handle an operation that “has all the relevant properties of a C++ ‘for’ loop” without necessarily being the exact C++ ‘for’ loop operation but, say, some new parallel loop construct.

Yes, this should be fine.

This is not really related to the extensibility problem though. The extensibility I am referring to is that of CIR internals. CIR does not have to mix with the pangea right now, or ever, but it would be better if it eventually embraced some of the ecosystem openness MLIR provides and encourages. Like allowing CIR to interoperate with foreign dialects that obey by whatever requirements CIR has.

I never implied that scf.for is better, my argument is the opposite: scf.for shouldn’t care about any C++-related semantics so C++ should not use it.

2 Likes

While I agree that Poylgeist is not necessarily the best base for something right for Clang, I am wary of fragmenting the development effort in the short term, and of the users coming to rely on specific quirks of both projects in the longer term making the projects irreconcilable. On the other hand, the benefit of “research quality” is being able to change course dramatically if needed. It’s not like there are many people relying on this in production (please let me know if somebody does!).

Not arguing that this is a wrong thing to do, but why redesigning after the codegen switch, that presumably creates a lot more constraints and drag force, would not be a problem for clients while redesigning the current CIR/Polygeist/DARPA-project would be?

+1 on this in general. A slightly different take on this that I’ve seen several times in MLIR is “premature overdesign” where, because of the IR generality, we spent a lot of time thinking about and supporting hypothetic use cases that have not yet materialized. It would be nice to avoid this. And also avoid people just running with a completely parallel effort.

2 Likes