EmitC - Generating C/C++ from MLIR

Hey all, with this I want to give a short summary of the discussion we had during last weeks ODM. Thanks again to all! The question that remains after last week’s ODM is if EmitC should be integrated into core. I hope this can serve as a basis that allows us to continue discussion.

It was discussed if a dialect or a printer API should be used to emit code. It was noted that a printer (normally) provides a 1:1 representation, but that there is no canonical binding to the code that gets emitted with EmitC. Using a dialect, emitting code is realized via the same infrastructure that is used for every other conversion. This has been helpful and fits well with the rest of the used tools. Composability is one of the key things. When for example emitting code from the std dialect to specific APIs like CUDA or Metal totally different printers would be necessary. In this context, it was asked how one could add a custom printer, like it is possible to add your own dialect, or how one could emit code that is calling custom libraries. Going with a printer, one needs to deal with losing the integration with conversion. It’s unclear how to avoid this. Here, using a dialect allows to reuse of the infrastructure in terms of patterns and patterns divers. As an example, one can think of using PDL for dynamically injecting conversions to EmitC. With this PDL could be used for emitting library calls.

The main concern regarding EmitC is that it might compete with Clang IR. There is a general agreement that Clang IR is desirable and that nobody wants a competition between both dialects.
So far, it seems that no clear vision of how Clang IR would look like exists. It is still unanswered if it would look like clang AST or if it would model the language constructs found in the C++ reference manual. It was mentioned that there might be meta-constructs along the way to Clang IR, like those provided by EmitC. The call and function side handles in EmitC as well as handling the std ops seem not to block an evolving Clang IR. In addition, one can think of lowering EmitC to Clang IR. If Clang IR is at a similar level to LLVM IR, it might be more convenient to write transforms to EmitC which are afterwards lowered to Clang IR.
Nonetheless, the dialect approach needs a clear vision and direction of how it fits into the Clang world. A completely organic and undirected growth needs to be avoided. What should EmitC model at what level and when should an op be added to the dialect?

Looking forward to continue the discussion!


As a consequence of last weeks discussion, and as we agree that we need to sharpen the EmitC’s scope, we dropped the emitc.getaddressof op and replaced it in favor of an emitc.apply op.

1 Like

In addition I also removed emitc.for, emitc.if and emitc.yield. Instead the corresponding SCF ops are now supported in the Cpp target. With this EmitC only introduces the operations

  • emitc.apply
  • emitc.call
  • emitc.const

and defines an opaque type. Limited to this, EmitC will probably to not compete with a potential C/C++ dialect. However, to me it is still unclear how such a C/C++ dialect could look like. This makes it difficult to draw a clear line. @jpienaar already raised the question if a dialect would more likely model the Clang AST level or rather the C++ language semantics. Any thoughts on this?

I don’t think we need to focus on a C/C++ dialect at the moment, emitc is likely scoped enough as is, and the efforts towards “minimalism” you’re pushing for are also taming concerns about scope-creeping here I think.

+1 to what Mehdi says. I think the scoping you’ve done has mostly eschewed my concerns about feature creep. With a constrained initial scope, some guidelines about evolution and scope-creep, and the prospect of using patterns to mix/match generation from different dialects(the most intriguing part of having a dialect IMO), I am mildly +1 on the idea in general.

I don’t think we should focus on a C/C++ dialect at the moment, with the scope you already have defined (and some guidelines) I think it becomes less of an issue.

– River

This SGTM too, and constrained scope to start with. Thanks

+1 from me as well

(More text for discourse)

Thanks for the positive feedback!

I am currently cleaning up EmitC and would like to send a patch for review via Phabricator afterwards. The scope we have with EmitC is fine for now. I only consider to add an IncludeOp in near future. Beside that it is not planed (and should also not be necessary from our perspective) to extend EmitC further.
View view on upstreaming, I can either send one big patch or I could split into two separate ones. Going with the latter option, I think of splitting into a EmitC core patch and a TOSA to EmitC patch. The TOSA to EmitC patch would also include our reference header only implementation. MHLO to EmitC could still live in our out-of-tree repository or we could even upstream to tensorflow/mlir-hlo.
Anyway, I am open for suggestions and so I would appreciate if you share your opinion.

Exact number of patches is open, but definitely would prefer keeping it contained and tested without a specific use (e.g., I see TOSA lowering as a use, but the infra should be tested and usable without it).

That would be useful (PR would be on main TF repo as that one is only a view).

Let me know if I can help here :slight_smile:

Thank you for the offer :slight_smile: I nearly have the patch series finished, but wasn’t able to finalize it before my holidays (will be back at work on Wednesday).
The last blocker is that rely on a global initializer for registering command line arguments at TranslationFlags.cpp#L17-L27. Maybe @River707 has a hint how we can get rid of those. I think we can’t use something like registerAsmPrinterCLOptions(), as this is explicitly called in Translation.cpp.

FYI, I just sent ⚙ D103969 [mlir] Add EmitC dialect for review.