[RFC] The Natural Intersection of MLIR and C

Diana · February 23, 2023, 9:29pm

We propose that a suitable subset of MLIR dialects can be directly represented in C99, and that a core Translator will produce this representation for desired usages.
Our aim is to retain high-level semantics native to MLIR which have a natural correspondence to C. This requires elevating C to
represent memref’s with C struct’s, analogous to those generated when converting memrefs down to LLVM-IR dialect, and to those provided for OpenCL C by TTL.
Such a native translator should arguably be self-contained and emit all the C code needed to implement the core constructs w/o depending on externally provided function definitions.
Moreover, the translator would be extensible to support variants of C and external function definitions if desired, for example OpenCL C whose async_work_group_copy() builtin function can represent MLIR’s memref.dma_start and memref.dma_wait operations with memref structs but cannot represent func.call_indirect; C augmented with OpenMP pragmas can represent omp dialect operations.

Code example demonstrating the proposal:

Input MLIR code:


  func.func @add_or_mul(%arg0: memref<?x4x8xf32>, %arg1: memref<?x4x8xf32>, %arg1: memref<?x4x8xf32>, %arg2: memref<?x4x8xf32>, %arg3: memref<?x4x8xi32>) {
    %c0 = arith.constant 0 : index
    %c1 = arith.constant 1 : index
    %c4 = arith.constant 4 : index
    %c8 = arith.constant 8 : index
    %0 = memref.dim %arg0, %c0 : memref<?x4x8xf32>
    scf.for %arg5 = %c0 to %0 step %c1 {
      scf.for %arg6 = %c0 to %c4 step %c1 {
        scf.for %arg7 = %c0 to %c8 step %c1 {
          %1 = memref.load %arg3[%arg5, %arg6, %arg7] : memref<?x4x8xi32>
          %2 = arith.index_cast %c0 : index to i32
          %3 = arith.cmpi slt, %1, %2 : i32
          %4 = memref.load %arg1[%arg5, %arg6, %arg7] : memref<?x4x8xf32>
          %5 = memref.load %arg2[%arg5, %arg6, %arg7] : memref<?x4x8xf32>
          scf.if %3 {
              %6 = arith.addf %4, %5 : f32
              memref.store %6, %arg0[%arg5, %arg6, %arg7] : memref<?x4x8xf32>
          } else {
              %7 = arith.mulf %4, %5 : f32
              memref.store %7, %arg0[%arg5, %arg6, %arg7] : memref<?x4x8xf32>
            }
          }
        }
      }
      return
    }

Output C code:

#include "memrefs.h"
void add_or_mul(Memref_float_3D v0, Memref_float_3D v1, Memref_float_3D v2, Memref_int32_t_3D v3){
  unsigned int v5 = memref_dim_float_3D(v0/*memref*/, 0/*dim*/);
  for(uint32_t v6 = 0; v6 < v5; v6 += 1) {
    for(uint32_t v7 = 0; v7 < 4; v7 += 1) {
      for(uint32_t v8 = 0; v8 < 8; v8 += 1) {
        int32_t v9 = memref_load_int32_t_3D(v3/*memref*/, {v6, v7, v8}/*indexes*/);
        int32_t v10 = (int32_t)0;
        int8_t v11 = v9 < v10;
        float v12 = memref_load_float_3D(v1/*memref*/, {v6, v7, v8}/*indexes*/);
        float v13 = memref_load_float_3D(v2/*memref*/, {v6, v7, v8}/*indexes*/);
        if(v11) {
          float v14 = v12 + v13;
          memref_store_float_3D(v0/*memref*/, {v6, v7, v8}/*indexes*/, v14/*value*/);
        } else {
          float v15 = v12 * v13;
          memref_store_float_3D(v0/*memref*/, {v6, v7, v8}/*indexes*/, v15/*value*/);
        }
      }
    }
  return;
  }

Supported MLIR dialects subset:

The main idea is to identify the subset of MLIR dialects along with their operations, types, and attributes, which have natural corresponding elements in C, augmented to support memrefs. These include the following MLIR dialects which we refer to as Core-C MLIR in what follows:

builtin
arith
math
func
scf
memref

More details on supported dialects can be found below.

EmitC:

The related EmitC project already facilitates generating C from MLIR. It does so by first lowering to an EmitC dialect, thereby supporting general constructs including opaque types and calls to arbitrary functions. In contrast, our aim is to retain high-level semantics native to MLIR including memrefs, which have a natural correspondence to C - by elevating C to capture memref semantics rather than lowering MLIR. Other distinctions include support for C++ and dependence on external function definitions.
It should however be possible to extend the proposed core C translator to support EmitC - see optional integration with emitc translator (at appendix below).

Translating MLIR to C versus LLVM-IR

Core-C MLIR dialects represent semantics that are higher than those of LLVM-IR. Translating Core-C out to LLVM-IR thus first lowers to LLVM-IR and CF dialects. There are several reasons why it would be beneficial to translate out from Core-C MLIR dialects directly to C rather than lowering it to LLVM-IR:

Any C compiler can then be used to compile down to the desired target, not necessarily Clang. See, e.g., DaCe and the poster presented at C4ML’20.
C is more stable in terms of versioning than LLVM-IR.
Lowering to LLVM-IR dialect generally obfuscates semantic information and duplicates a process taken care of by C front-ends. OTOH, translating from Core-C MLIR dialects out to C potentially facilitates round-tripping.
Translating core MLIR semantics directly to C would provide a more human readable artifact than LLVM-IR (to some of us ;-), which could facilitate diagnostics, debugging and manual interception.
Translating to C99 could naturally extend to target related extensions including OpenCL, OpenMP, vector types supported by GCC, address spaces supported by Clang.

On the other hand, it may be preferable to translate to LLVM-IR rather than C in order to save compile-time or integrate more tightly with an LLVM-based middle-end.

CFamilyTranslator

CFamilyTranslator is a modular extendable framework to support language extensions to C that have a natural mapping to MLIR dialects, and is built on top of the core translator.

CFamilyTranslator design approach

CFamilyTranslator is composed of core, which can be extended by target plugins.

C family translator core:
1. Supports translating listed below MLIR dialects to C99.
2. Is a generic framework extendable by user-provided plugins.
Plugins: CFamilyTranslator provides a way to add plugins to target C variants.
- Add support for dialects types, attributes and operations not supported by core.
- Override types, attributes and operations, which are supported by core.

Plugins examples:

OpenCL translator plugin - translation to OpenCL.
Requires special treatment, for example: memref::dma.start, memref::dma.wait, address spaces, while preventing use of indirect function calls.
OpenMP translator plugin - translation to OpenMP can be added to support omp dialect, for example generating: #pragma omp_parallel for{…

CFamilyTranslator architecture description

CFamilyTranslator Core:

There are abstract classes with corresponding pure virtual method ‘process’:
- AbstractOperationTranslator
- AbstractTypeTranslator
- AbstractAttributeTranslator

For each supported MLIR op, type, attribute there is dedicated class, which translates it.

Each translator class:
- Derives from appropriate abstract class.
- Registers at core on construction.
- Implements ‘process’ method, where it actually performs the translation.

Custom plugin:

In order to add custom target plugin implement the following:

Register appropriate entry in translator: ‘generate-opencl-code’, ‘generate-openmp-code’.
Let’s call it translation mode.

As registration callback provide cft::translateToTarget function and pass it translation mode.

Implement class per custom op/type/attribute following rules at bullet 3 above.
In case several translation modes are registered and they have translations for same op/type/attribute:
core provides support for registration and selection of correct translator class.

Supported dialects details:

BuiltIn Dialect

Types - natively supported in C99, including:

Float32Type as float

Float64Type as double

signed-integer-type of width 8/16/32/64 as intN_t

unsigned-integer-type of width 8/16/32/64 as uintN_t
Above 3 and 4 are according to ISO/IEC 9899:TC3
For signless IntegerType will be generated intN_t.

MemRefType as struct with predefined dimension and type as in example of float 3 dimensions below.
The struct scheme is similar to how Memref is lowered to LLVM-IR from MLIR.
 struct Memref_float_3D {
  float* allocated;
  float* aligned;
  int offset;
  int sizes[3];
  int strides[3];
 }
Data types of aligned and allocated pointers:

float

double

intN_t according to ISO/IEC 9899:TC3

Attributes - natively supported in C99, including:

DenseArrayAttr as raw C array, single dimension

DenseIntOrFPElementsAttr as raw C array, multiple dimensions

DenseStringElementsAttr as const char *

FloatAttr as float/double

IntegerAttr as intN_t, uintN_t

StringAttr as const char *

Operations

ModuleOp

Is treated as single compilation unit

Have specific restrictions, for example nested modules aren’t allowed.

Can be used to add global custom information in attributes.

Arith dialect
Types:

signed/unsigned integer types

Float32/64Type

NOT supported: vectors and tensors

Operations:
Supported operations, which can be represented in C using operators and casting, including:

AddI/FOp as operator ‘+’

MulI/FOp as operator ‘*’

DivFOp as (float)operand1/operand2

SubFOp/SubIOp as operator ‘-’

SIToFPOp as (float)operand1

SelectOp as condition ? true_value : false_value

ConstantOp : is performed constants propagation due to following reasons:

Code readability

Aligned with LLVM approach

Not supported in core CFamilyTranslator operations, which can NOT be represented in C using just operators and casting, including:

CeilDivSIOp

MulUIExtendedOp

AddUIExtendedOp

Math dialect
Supported operations, which have identical function in math.h. They are translated as direct calls to functions from math.h, like:

FloorOp as floor(operand)

AbsFOp as fabs(operand)

TanOp as tan(operand)

Func dialect

Supported all operations, when there is non or single return value.
Memref will pass a struct by value.
Private mlir func will be generated as static.

SCF dialect
Supported operations, which can be natively translated to C99, including:

ForOp as for loop

IfOp as if

IndexSwitchOp as switch case

WhileOp as while loop and do while loop

Not supported all parallel related operations, including:

ForallOp

ParallelOp

ReduceOp

Memref dialect support:
Operations that can be natively translated to C99 are supported.

For each supported op is generated appropriate func call. Func name keeps semantic information: mlir op name, type and dimension of memref.
For example: memref_get_global_float, memref_expand_shape_float_4D_to_5D, memref_load_float_5D etc.
The semantic in function names is for clarity what was the origin mlir op to enable round trip.
C functions declaration can be generated inside memrefs.h and implementation can be generated in memrefs.c.

Few examples for memref ops translations:

GetGlobalOp
mlir

memref.global "private" constant @__constant_1x3x2xf32 :
        memref<1x3x2xf32> = dense<[[[1.200000e+01, 1.600000e+01],
                                    [1.900000e+01, 3.600000e+01],
                                    [4.000000e+01, 2.800000e+01]]]>
%0 = memref.get_global @__constant_1x3x2xf32 : memref<1x3x2xf32>

C

static float __constant_1x3x2xf32[1][3][2] =
    {{{1.200000e+01, 1.600000e+01}, {1.900000e+01, 3.600000e+01}, {4.000000e+01, 2.800000e+01}}};
  Memref_float_3D v8 = memref_get_global_float_3D(0/*offset*/, {1, 3, 2}/*sizes*/, {6, 2, 1}/*strides*/, __constant_1x3x2xf32/*array*/);

ExpandShapeOp
mlir

%2 = memref.expand_shape %arg0 [[0], [1,2]] :
      memref<1x128xf32> into memref<1x8x16xf32>

C

Memref_float_3D v9 = memref_expand_shape_float_2D_to_3D(0/*offset*/, {1, 8, 16}/*sizes*/, {128, 16, 1}/*strides*/,v0/*src_memerf*/);

SubViewOp
mlir

%3 = memref.subview %2[0, 0, 0] [1, 8, 8] [1, 1, 1] :
      memref<1x8x16xf32> to memref<1x8x8xf32>

C

Memref_float_3D v10 = memref_sub_view_float_3D_to_3D(0/*offset*/, {1, 8, 8}/*sizes*/, {64, 8, 1}/*strides*/,v9/*src_memerf*/);

ViewOp
mlir

%4 = memref.view %arg5[%c0][] : memref<256xi8> to memref<1x8x16xf32>

C

Memref_float_3D v11 = memref_view_to_float_3D((0/*offset*/, {1, 8, 16}/*sizes*/, {128, 16, 1}/*strides*/,v1/*src_memerf*/);
C

mlir::memref address space:

C99 doesn’t have support for address space. As a result core CFamilyTranslator doesn’t have support for address space. CFamilyTranslator framework has support for plugins(OpenCl, ClangC, accelerator specific…) to extend the support for address space in memrefs.

Appendix

Memrefs representation

By MLIR definition memref is a pointer and an affine map, which can be any function defining index mapping.
We suggest to start by supporting strided memrefs.
This is what lowering to LLVM-IR supports today, so it should suffice for a first version.
It could be extended in the future to support any affine map following similar support once added to LLVM-IR translator.

TTL

Augmenting C and OpenCL C with structs reminiscent of “memref”s was recently introduced in TTL public url

C to MLIR projects

There are several projects dealing with going from C/C++ to MLIR, including CIR project, Polygeist and SYCLops. The possible interaction with such projects is TBD.

Optional integration with EmitC Translator

Note:
Difference in approaches of CFamilyTranslator vs EmitC were discussed above.
At this point let’s review technical alternatives.

There are few operations which are supported in both TranslateToCpp(translator used for EmitC dialect) and in CFamilyTranslator, like scf::IfOp, scf::ForOp, func::CallOp.

Following options are available to reuse common functionality:

Option 1: Convert TranslateToCpp to CFamilyTranslator C++ plugin

EmitC dialect ops supported in TranslateToCpp become part of CFamilyTranslator C++ plugin.
Common code is integrated with implementation inside CFamilyTranslator Core.
C++ specific code generation will override CFamilyTranslator Core implementation.

Pros:

Single holistic, modular and scalable solution for all C family.
Full reuse, no duplication.

Cons:

More complicated and risky approach from support point of view.

Option 2:

Extract common code into utility and reuse from TranslateToCpp and from CFamilyTranslator

Pros:

Simple and easy separation and reuse.

Cons:

Users might be confused by the duplication to understand the differences between two implementations.

This RFC is proposed by: Diana Dubov, Gil Rapaport, Ayal Zaks.

mehdi_amini · February 23, 2023, 9:46pm

Nice proposal!

Something I’m not sure I perceive exactly why a new framework is necessary instead of reusing EmitC? That is: EmitC is just an emission infrastructure, the actual mapping of the high-level to what is emitted is a matter of “dialect conversion”: seems like all of your example could be implemented with some dialect conversion to EmitC to end-up with exactly what you intend here. What am I missing?

Diana · February 23, 2023, 11:06pm

Thank you Mehdi,

There are few aspects to this RFC: first, we’d like to rise a discussion about natural intersection of MLIR and C. We are suggesting that the dialects mentioned in the RFC are good candidates for this intersection. If we successfully define such dialects subset we could standardize. Once there is a standard there is ability to provide strong, scalable solutions.

You can use EmitC to end-up with function calls. You will have to provide an external library with implementation for those functions.
We are proposing that once there is a standard there is a place for translator, which actually generates the implementation for chosen subset and the generated code is self contained.
Please note that the translator generates operator for each supported arith op. You can see in example: subi to ‘-’, muli to ‘*’ and so on. For memrefs the translator as well generates implementation for each supported op with types and dimensions required in current invocation.

kuhar · February 24, 2023, 3:57pm

Hi @Diana,

Thanks for a thorough proposal! I have a few high level questions:

What is the intended usage or consumer of the generated C? For example, why is it important not to emit function calls?
Is the proposal to have this translator upstream? What’s the benefit over having the translator as an external project that consumes .mlir/.mlirbc inputs?
If the translator gets upstreamed, does it mean that we essentially tie semantics of those ops/types you listed to their C representation, including undefined behavior? For example, we would want for ~most arith. ops to produce poison on invalid inputs while the alternative in C would be considered immediate UB. This means that we would either have to make the MLIR ops less defined or introduce some guardrails during translation, breaking the proposed 1:1 mapping, e.g., clamping shift values, casting to unsigned before doing * or +.

Diana · February 26, 2023, 12:59pm

Hey @kuhar,

Thank you for taking time to go through this RFC and raising important questions.

There is nothing wrong with emitting functions calls.
EmitC gives a freedom to call any function name and pass it opaque arguments types.
OTOH freedom compromises on safety, loses semantics and pushes everyone to reinvent the wheel.
This is why we are suggesting extendable framework with plugins.
Emitc is one of the suggested plugins, so whoever wants to use it - can, enjoying from all other dialects CFamilyTranslator core supports.

Inside core CFamilyTranslator we’d like to have legal, well defined, safe and structured members and behavior.
The benefits of having it is sharing, reuse and interoperability.
For example, once we have memref structure defined in C, developers can start implementing projects in C, using CMemrefStructure, with well-defined API and share those projects.
Others can translate their input mlir to C and know for sure that generated C code can be integrated with above projects.

Another point we find useful is that having translator generating code directly gives development workflow alternative:
Let’s review following use case:
There is a need to add support for some custom op, which doesn’t have much logic and is straight forward.
a. Function call approach:
i. Write a pass which lowers to emitc function call.
ii. Provide hand written implementation for all data types for that function.
In case there are required changes in api developer might have to update both: the conversion pass and the hand written implementation.

b. CFamilyTranslator approach:
i. Add plugin, which generates required code directly.
In case of change there is single place to update.
Since there is a standard and a framework, this plugin can be up streamed and reused by others.

In case we agree on a standard we need to make sure it grows together with MLIR.

This is a good point for keeping MLIR semantics when translating arith dialect to C. We can generate code following the to be defined precision of arith semantics. What do you think?

xuexingtu · June 28, 2023, 7:51am

do you have a link to the code，can you email to me the code? thank you 1249430176@qq.com

Diana · July 2, 2023, 6:21am

@xuexingtu thank you for showing an interest in this RFC. We are currently working on uploading our code for code review by the community. Currently there is our proprietary implementation which we can’t share at this point.

pag · July 9, 2023, 7:18pm

You might find these two projects of interest:

Rellic: decompiles LLVM modules into goto-free Clang ASTs, which can be pretty-printed to C source code.

VAST: Converts C and C++ code into high-level MLIR dialects, then progressively lowers them to LLVM IR. VAST’s high-level dialect has a lot in common with the Clang AST, though it also brings in control-flow and data flow, which are lacking in the Clang AST. I have briefly experimented with converting VAST’s high level MLIR dialect back into Clang ASTs and it seemed to be promising.

Topic		Replies	Views
Need help on code generation semantics for memref MLIR	3	338	September 30, 2020
[RFC] Rebooting C APIs for core IR MLIR	33	2393	July 31, 2020
Translating array into memref in MLIR MLIR	3	371	February 19, 2021
Representing "main" with a `char**` argument at a level above the LLVM dialect MLIR	1	398	November 4, 2020
MLIR News, 53th edition (16th August 2023) Newsletter llvm-weekly	0	807	August 16, 2023