RFC: Polygeist LLVM Incubator Proposal

The MLIR ecosystem is growing rapidly while the core infrastructure matures. However, it is missing a working C and C++ frontend. Certainly, Clang does an amazing job targeting LLVM IR, but there are many more opportunities to consider by targeting MLIR from C++ before going down to LLVM IR. For example, MLIR code generation and optimization flows can be exercised on well-known benchmarks. Higher-level abstractions such as tensor algebra available in MLIR can be more easily connected to C inputs. At the same time, C and C++ compilers could benefit from the representational capability offered by the MLIR framework for supplementary transformations at the language and library levels.

We have built Polygeist [1,2,3], a tool that aims to provide this through two core mechanisms:

  • A frontend that directly lowers Clang AST to existing MLIR dialects (e.g. scf, arith, control flow, gpu, affine, etc). This enables preserving high level information such as structured control flow, OpenMP/GPU parallelism for direct use in MLIR as well as lowering C/C++ constructs to user-defined custom operations (partially functioning presently).

  • Transformation passes to detect and raise to the high-level structure (e.g. affine loops) within existing code. This enables C/C++ (and other) codes to use high-level transformations within MLIR.

As Polygeist fulfills some core needs in both a C/C++ frontend and raising framework, we’ve already seen a fair amount of use/desire within the community including as a CIRCT frontend [4], a SYCL frontend [5], use of its memory optimizations [6], and use of its raising within a Julia MLIR frontend [7].

We’d really like to upstream Polygeist, but at this point the code [2] isn’t in a form that would be acceptable to mainline. Moreover, there are some larger questions about whether Polygeist should be its own project in LLVM, a subproject of Clang, a subproject of MLIR, or something distinct?

We presently propose making Polygeist an LLVM incubator project to allow the community more visibility into the code to answer these questions and prepare for hopeful upstreaming.

A version of this message and a longer charter is available here: Polygeist Incubator Email / Proposal - Google Docs

Sincerely,

William (Billy) Moses wmoses@mit.edu @wsmoses
Alex Zinenko zinenko@google.com @ftynse
Ruizhe Zhao rz3515@ic.ac.uk @kumasento
Lorenzo Chelini l.chelini@icloud.com @lorenzo_chelini
Valentin Churavy vchuravy@mit.edu @vchuravy

[1] https://polygeist.mit.edu/
[2] GitHub - llvm/Polygeist: C/C++ frontend for MLIR. Also features polyhedral optimizations, parallel optimizations, and more!
[3] Polygeist: Raising C to Polyhedral MLIR | IEEE Conference Publication | IEEE Xplore
[4] GitHub - circt-hls/circt-hls: CIRCT-based HLS compilation flows, debugging, and cosimulation tools.
[5] https://github.com/InteonCo/Polygeist
[6] [RFC?] Store to load forwarding - #6 by ftynse
[7] https://github.com/JuliaLabs/brutus

23 Likes

I am definitely +1 on having this capability upstream at some point, and I’ve been lurking in the code you’ve been writing for a while hoping it goes somewhere. Having clang work in this way would help with many things in the ML space that have previously had elaborate solutions implemented and could be simplified a lot with a good C++ to MLIR path.

I have looked a bit at the code and do agree that some better alignment with where this is landing and getting it out of “poc quality” is necessary. If an incubator repo would enable collaboration to this end, strong +1 from me.

I’m excited to see this! Having a path where Polygeist could eventually be upstreamed will really help these dependent projects. I see becoming an official incubator project is a natural stepping stone towards this. Perhaps you could comment further on what you think needs to happen for Polygeist to be upstreamed, beyond “Where should it live?”

At this point, the biggest remaining issue is a code-cleanup. Over the past ~year or so we’ve been trying to actively upstream various extractable components (supporting memref of memref, data layout/alignment, canonicalizations, etc), though there are also a few more we can do as well.

I think the second biggest question is how much support we need from the frontend before it’s ready for mainline. Right now it works decently well on most C/C++ code (templates, constructors, constexpr, classes, etc), though has some specific feature gaps (no support for virtual methods, or calling destructors, as an example). The other question that I think is posing to the community is one of ABI.

Presently Polygeist supports two ABI’s for pointer types which can be specified in a flag:

  • memref (this is nice for permitting composability with much of the rest of MLIR)
  • LLVM.Pointer (this mimics the ABI using clang which is good for compatibility outside the MLIR ecosystem). Note the LLVM.Pointer is also used as a fallback if it is illegal to use a memref.

There wouldn’t need to be dual ABI’s if it was possible to specify a memref lowering to a bare pointer (not just in the function signature) if there were constraints on operations not using the other fields.

The memref ABI question doesn’t need to be resolved to upstream (we’ll just keep both, like now), but if that is possible it would be a nontrivial simplification.

I’m supportive of Polygeist becoming an LLVM incubator, but I think there are major challenges to upstreaming Polygeist into Clang itself.

While I think that Polygeist is very cool and useful, I think that building proper clang/mlir support may force different design tradeoffs and polygeist may or may not be the right starting point for that (I don’t know).

-Chris

3 Likes

+1 here, an incubator is “cheap” to add, and a good step to find a possible path forward to land this in the LLVM project.

I also share Chris’ observations above though.

That likely requires some more explanations: my understanding was that Polygeist isn’t introducing MLIR support to model all of C++ (or even C) but limited to “polyhedral friendly” sections of the code. In particular Polygeist does not attempt to provide a new dialect (other than handful of helper ops) to model C/C++.
So this claim here is somehow confusing to me, and could mislead people who were interested a Clang MLIR Frontend in the past (there was even a talk at the LLVM dev meeting ~1y ago I think).

I share the same views as Chris and Mehdi above.

Also, is there a better category that this can be moved to? It would be nice to get a wider audience than just MLIR (given the connection to clang, among other things), and I assume/expect that not everyone who is interested here subscribes to the MLIR section.

– River

@mehdi_amini I’ve revised the statement to say we directly lower Clang AST to arith/scf/control flow/etc as opposed to directly lowering Clang AST, if that clarifies things?

Regarding your polyhedral comment, Polygeist supports far more than just the polyhedral parts of the program (aiming to lower all of the C/C++ code [which notably is incomplete in various places, such as virtual methods / destructors / ABI mismatch at the moment]). However, if something is capable of being represented by affine (i.e. polyhedral code) Polygeist will raise it to that, if not it will remain scf, as an example. To date the largest program we have tested is the PyTorch binary operator file, which as it also includes aten/libstdc++/others, results in ~850K lines of MLIR after optimization (here pytorchbin.mlir.txt - Google Drive if you’re curious, certainly would make a good compile-time test).

Some snippets of general non-polyhedral code and the corresponding Polygeist output:

int fib(int n) {
  if (n <= 2) return 1;
  return fib(n-1) + fib(n-2);
}
  func @fib(%arg0: i32) -> i32 attributes {llvm.linkage = #llvm.linkage<external>} {
    %c2_i32 = arith.constant 2 : i32
    %c1_i32 = arith.constant 1 : i32
    %c-1_i32 = arith.constant -1 : i32
    %c-2_i32 = arith.constant -2 : i32
    %0 = arith.cmpi sle, %arg0, %c2_i32 : i32
    %1 = scf.if %0 -> (i32) {
      scf.yield %c1_i32 : i32
    } else {
      %2 = arith.addi %arg0, %c-1_i32 : i32
      %3 = call @fib(%2) : (i32) -> i32
      %4 = arith.addi %arg0, %c-2_i32 : i32
      %5 = call @fib(%4) : (i32) -> i32
      %6 = arith.addi %3, %5 : i32
      scf.yield %6 : i32
    }
    return %1 : i32
  }
int gcd(int m, int n) {
  while (n > 0) {
    int r = m % n;
    m = n;
    n = r;
  }
  return m;
}
  func @gcd(%arg0: i32, %arg1: i32) -> i32
    %c0_i32 = arith.constant 0 : i32
    %0:2 = scf.while (%arg2 = %arg1, %arg3 = %arg0) : (i32, i32) -> (i32, i32) {
      %1 = arith.cmpi sgt, %arg2, %c0_i32 : i32
      scf.condition(%1) %arg3, %arg2 : i32, i32
    } do {
    ^bb0(%arg2: i32, %arg3: i32):  // no predecessors
      %1 = arith.remsi %arg2, %arg3 : i32
      scf.yield %1, %arg3 : i32, i32
    }
    return %0#0 : i32
 }

Structs (https://github.com/wsmoses/Polygeist/blob/main/tools/mlir-clang/Test/Verification/classrefmem.cpp)
Inheritance (https://github.com/wsmoses/Polygeist/blob/main/tools/mlir-clang/Test/Verification/virt.cpp)
Advanced OpenMP (Polygeist/omp2.c at 7c26e71fab863552e4a3890dd4cc5eda91fa5b3c · llvm/Polygeist · GitHub)
Templates (https://github.com/wsmoses/Polygeist/blob/main/tools/mlir-clang/Test/Verification/twotemplatevardecls.cpp)
Templated Classes (Polygeist/caff.cpp at 7c26e71fab863552e4a3890dd4cc5eda91fa5b3c · llvm/Polygeist · GitHub)
Lambda functions (https://github.com/wsmoses/Polygeist/blob/main/tools/mlir-clang/Test/Verification/capture.cpp)

@clattner There are certainly design choices we made to do this, which we hope being an incubator project will better allow us to discuss as a community. It is unclear if the decisions made were the best ones to make, but hopefully having an existing end-to-end flow as an incubator enables anyone who wants to explore a different design a relatively easy way to try their design out.

As one example, in C/C++ it is possible to return from any point in the program. However, structured control flow does not support an “immediate exist from all of the ancestor scope” operations (and for good reason), so we chose to create a variable that denotes whether a return has occurred and guard all statements with an scf.if that check thats flag. We also have similar flags to enable break/continue in loops. Relatedly this is why I have been sending numerous scf.if and similar canonicalization patches for you to review @medhi_amini @rriddle :stuck_out_tongue:

Here’s the “immediate” unoptimized and optimized IR output of an example for the return case:

void compute(int x) {
   A();
    if (x == 0) {
       B();
       return;
    }
   C();
}
  func @compute(%arg0: i32) attributes {llvm.linkage = #llvm.linkage<external>} {
    %0 = memref.alloca() : memref<1xi32>
    %1 = memref.cast %0 : memref<1xi32> to memref<?xi32>
    %c0 = arith.constant 0 : index
    %2 = llvm.mlir.undef : i32
    memref.store %2, %1[%c0] : memref<?xi32>
    %c0_0 = arith.constant 0 : index
    memref.store %arg0, %1[%c0_0] : memref<?xi32>
    %true = arith.constant true
    %3 = memref.alloca() : memref<i1>
    %4 = memref.alloca() : memref<i1>
    memref.store %true, %4[] : memref<i1>
    memref.store %true, %3[] : memref<i1>
    %5 = memref.load %3[] : memref<i1>
    scf.if %5 {
      scf.execute_region {
        call @A() : () -> ()
        scf.yield
      }
    }
    %6 = memref.load %3[] : memref<i1>
    scf.if %6 {
      scf.execute_region {
        %8 = memref.load %3[] : memref<i1>
        scf.if %8 {
          scf.execute_region {
            %c0_1 = arith.constant 0 : index
            %9 = memref.load %1[%c0_1] : memref<?xi32>
            %c0_i32 = arith.constant 0 : i32
            %10 = arith.cmpi eq, %9, %c0_i32 : i32
            %11 = arith.extsi %10 : i1 to i32
            %c0_i32_2 = arith.constant 0 : i32
            %12 = arith.cmpi ne, %11, %c0_i32_2 : i32
            scf.if %12 {
              %13 = memref.load %3[] : memref<i1>
              scf.if %13 {
                scf.execute_region {
                  call @B() : () -> ()
                  scf.yield
                }
              }
              %14 = memref.load %3[] : memref<i1>
              scf.if %14 {
                scf.execute_region {
                  %15 = memref.load %3[] : memref<i1>
                  scf.if %15 {
                    scf.execute_region {
                      %false = arith.constant false
                      memref.store %false, %3[] : memref<i1>
                      memref.store %false, %4[] : memref<i1>
                      scf.yield
                    }
                  }
                  scf.yield
                }
              }
            }
            scf.yield
          }
        }
        scf.yield
      }
    }
    %7 = memref.load %3[] : memref<i1>
    scf.if %7 {
      scf.execute_region {
        call @C() : () -> ()
        scf.yield
      }
    }
    return
  }
}

After optimization:

  func @compute(%arg0: i32) attributes {llvm.linkage = #llvm.linkage<external>} {
    %c0_i32 = arith.constant 0 : i32
    call @A() : () -> ()
    %0 = arith.cmpi eq, %arg0, %c0_i32 : i32
    scf.if %0 {
      call @B() : () -> ()
    } else {
      call @C() : () -> ()
    }
    return
  }

@River707 I agree in trying to get this in a category with broader visibility, but I’m not exactly sure how/where to do so on discourse. I was seemingly only allowed to select one primary category (MLIR) and I saw there was a secondary tag I could add to clang (which I added), but that’s not the same as the clang category it seems. I also considered making this in the incubator category, but that also felt like it wouldn’t get appropriate visibility. Happy to move/add things and please forgive my relative lack of discourse skills.

1 Like

Can you expand how different Polygeist approach is fundamentally compared to for example raising from LLVM IR into LLVM? It seems that you’re emitting MLIR at a comparable level here.

The main difference is that we try to preserve as much semantics as we can. So, for example, if we compare Polygesit with Polly, Polly needs to delinearize the array references. In contrast, we don’t necessarily need delinearization as we can preserve dimensionality by targeting memref. We can further expand on that.

The IR is currently emitted at a slightly higher level than LLVM IR (structured control flow is emitted as such, multi-index memrefs may be used), but not significantly so. There is also some further raising to, e.g., find “while” loops that can be expressed as “for” instead or to detect some affine properties.

The overall idea is, however, to go up the abstraction stack, define and start emitting higher-level constructs when necessary. So far, Polygeist users only needed structured (+affine) control flow and various parallelism-related constructs, so that is what was implemented. Arguably, one could just raise loops from LLVM IR. This would be significantly harder for OpenMP and CUDA constructs that Polygeist translates to the OpenMP and GPU dialects respectively. Same can happen for other higher-level constructs when the need arises. This is likely most useful for language extensions (OpenMP, OpenACC) or restrictions (SystemC).

Sticking to LLVM IR emission and trying to raise everything from it may be feasible, but I don’t think we all have a clear idea for such a comprehensive raising (consider, for example, recovering template instantiations in CUDA code from LLVM IR). On the other hand, switching from emitting low-level MLIR to higher-level MLIR will require one to write a lowering or partial code generation, which we arguably understand much better. There will also be an example of how lowered MLIR should look like, and the usual benefits of MLIR’s progressive lowering will make the process easier. So targeting even low-level MLIR puts us in a better position for longer-term evolution IMO.

One other thing I’ll add:

In addition to directly lowering to higher-level MLIR (e.g. scf, affine, gpu, openmp) as @ftynse says, Polygeist also directly lowers to MLIR types. This means if you had a custom type that directly mapped to an MLIR type, you could directly preserve this and any other higher types (structs, class, templates, etc) built out of it, would contain the higher-level MLIR type.

For example, like @lorenzo_chelini says we lower pointers and multidimensional arrays to memrefs, thereby preserving the dimension information. SYCL folks use this to lower SYCL C++ types to an MLIR SYCL dialect types, etc.

Currently these C++ type and AST to MLIR matchers are implemented within Polygeist itself, but we have some experimental support for recognizing an attribute/etc on a function/struct specifying a custom MLIR lowering that we hope to expand on in the future.

Incidentally, the “abstraction raising” part of Polygeist also is worth mentioning when comparing to a LLVM IR to LLVM Dialect pipeline. Obviously, per the above directly lowering to higher level types and ops in the frontend provides a lot more structure & information, but Polygeist’s “identify CFG loops and make SCF loops”, “while to for / loop simplification”, “raise SCF to affine” and similar passes would also be extremely helpful in such an approach.

To be clear: I am not suggesting that we do this in any way (I’m very happy to see Polygeist getting more mature and I’m welcoming this addition to the ecosystem), I’m just trying to clarify the current state with a comparative discussion to characterize better the current positioning, in particular because the proposal does not include a “clang dialect” or “CIL” (like Swift has “SIL”). I hope it makes sense :slight_smile:

2 Likes

Yeah we do not have a clang dialect as part of this. That said, if that’s a desirable design choice the existing “Clang AST to MLIR core dialects” frontend could keep the same behavior and just be a “Clang AST Dialect to MLIR core dialects” conversion pass. We could then ideally Tablegen create the AST dialect from AST.

That said I’m not sure going to an AST dialect directly is the best choice, but happy to think it through (and perhaps someone wants to try it, they can make a Tablegen PR on an hopefully soon-to-be accepted incubator repo :stuck_out_tongue: ).

Note that CIL/SIL isn’t an “AST dialect” as far as I can tell, maybe you’re using “AST dialect” differently?

Yeah I just really meant to say more generally that if one wanted to design/include any higher level ops as part of lowering, the existing infrastructure within Polygeist would make a good stepping stone as an existing end-to-end pipeline and existing lowering (either from AST, etc) could end up being part of a lowering pass for said higher-level ops.

To be clear, I’m unsure on whether/what such higher level ops would be useful, but if having them is something the community would like, we would have room to ponder and explore with relative ease.

This is also why making an incubator project is timely, so the entire community can participate in a discussion about what would be good as part of the design.

In terms of phasing MLIR into Clang, there are two ways to go: front to back (e.g. move the diagnostics / CFG stuff before doing the full end to end) or “back to front” (move existing “CodeGen” stuff to generate LLVM dialect in MLIR then push to higher and higher level abstractions

I see this approach closer to the later, and I think that separating OpenMP lowering is one very useful step along the way of detangling “CodeGen”.

1 Like

Sure, I tried to write a justification in a form that could be directly included into some rationale doc.

I would say that a hypothetical CIL would be in scope of the project :slight_smile:

Exactly.

This shouldn’t preclude front-to-back exploration and prototyping though.

1 Like

I too support this and it will be great to see this happen. I think the project has several reusable pieces useful for frontends to lower to MLIR, or more importantly, to get to the right abstractions in MLIR.

1 Like

I would also like to support this proposal.

If the long term plan is to convert the entire C/C++ language would you consider renaming Polygeist. As of now the name (to me) suggests that it is doing something specific to polyhedral transformations. Alternatively, if the plan is to focus on such transformations can this be moved into the llvm tree inside the polly sub-directory?