We propose that a suitable subset of MLIR dialects can be directly represented in C99, and that a core Translator will produce this representation for desired usages.
Our aim is to retain high-level semantics native to MLIR which have a natural correspondence to C. This requires elevating C to
represent memref’s with C struct’s, analogous to those generated when converting memrefs down to LLVM-IR dialect, and to those provided for OpenCL C by TTL.
Such a native translator should arguably be self-contained and emit all the C code needed to implement the core constructs w/o depending on externally provided function definitions.
Moreover, the translator would be extensible to support variants of C and external function definitions if desired, for example OpenCL C whose async_work_group_copy() builtin function can represent MLIR’s memref.dma_start and memref.dma_wait operations with memref structs but cannot represent func.call_indirect; C augmented with OpenMP pragmas can represent omp dialect operations.
Code example demonstrating the proposal:
Input MLIR code:
func.func @add_or_mul(%arg0: memref<?x4x8xf32>, %arg1: memref<?x4x8xf32>, %arg1: memref<?x4x8xf32>, %arg2: memref<?x4x8xf32>, %arg3: memref<?x4x8xi32>) {
%c0 = arith.constant 0 : index
%c1 = arith.constant 1 : index
%c4 = arith.constant 4 : index
%c8 = arith.constant 8 : index
%0 = memref.dim %arg0, %c0 : memref<?x4x8xf32>
scf.for %arg5 = %c0 to %0 step %c1 {
scf.for %arg6 = %c0 to %c4 step %c1 {
scf.for %arg7 = %c0 to %c8 step %c1 {
%1 = memref.load %arg3[%arg5, %arg6, %arg7] : memref<?x4x8xi32>
%2 = arith.index_cast %c0 : index to i32
%3 = arith.cmpi slt, %1, %2 : i32
%4 = memref.load %arg1[%arg5, %arg6, %arg7] : memref<?x4x8xf32>
%5 = memref.load %arg2[%arg5, %arg6, %arg7] : memref<?x4x8xf32>
scf.if %3 {
%6 = arith.addf %4, %5 : f32
memref.store %6, %arg0[%arg5, %arg6, %arg7] : memref<?x4x8xf32>
} else {
%7 = arith.mulf %4, %5 : f32
memref.store %7, %arg0[%arg5, %arg6, %arg7] : memref<?x4x8xf32>
}
}
}
}
return
}
Output C code:
#include "memrefs.h"
void add_or_mul(Memref_float_3D v0, Memref_float_3D v1, Memref_float_3D v2, Memref_int32_t_3D v3){
unsigned int v5 = memref_dim_float_3D(v0/*memref*/, 0/*dim*/);
for(uint32_t v6 = 0; v6 < v5; v6 += 1) {
for(uint32_t v7 = 0; v7 < 4; v7 += 1) {
for(uint32_t v8 = 0; v8 < 8; v8 += 1) {
int32_t v9 = memref_load_int32_t_3D(v3/*memref*/, {v6, v7, v8}/*indexes*/);
int32_t v10 = (int32_t)0;
int8_t v11 = v9 < v10;
float v12 = memref_load_float_3D(v1/*memref*/, {v6, v7, v8}/*indexes*/);
float v13 = memref_load_float_3D(v2/*memref*/, {v6, v7, v8}/*indexes*/);
if(v11) {
float v14 = v12 + v13;
memref_store_float_3D(v0/*memref*/, {v6, v7, v8}/*indexes*/, v14/*value*/);
} else {
float v15 = v12 * v13;
memref_store_float_3D(v0/*memref*/, {v6, v7, v8}/*indexes*/, v15/*value*/);
}
}
}
return;
}
Supported MLIR dialects subset:
The main idea is to identify the subset of MLIR dialects along with their operations, types, and attributes, which have natural corresponding elements in C, augmented to support memrefs. These include the following MLIR dialects which we refer to as Core-C MLIR
in what follows:
- builtin
- arith
- math
- func
- scf
- memref
More details on supported dialects can be found below.
EmitC:
The related EmitC project already facilitates generating C from MLIR. It does so by first lowering to an EmitC dialect, thereby supporting general constructs including opaque types and calls to arbitrary functions. In contrast, our aim is to retain high-level semantics native to MLIR including memrefs, which have a natural correspondence to C - by elevating C to capture memref semantics rather than lowering MLIR. Other distinctions include support for C++ and dependence on external function definitions.
It should however be possible to extend the proposed core C translator to support EmitC - see optional integration with emitc translator (at appendix below).
Translating MLIR to C versus LLVM-IR
Core-C MLIR dialects represent semantics that are higher than those of LLVM-IR. Translating Core-C out to LLVM-IR thus first lowers to LLVM-IR and CF dialects. There are several reasons why it would be beneficial to translate out from Core-C MLIR dialects directly to C rather than lowering it to LLVM-IR:
- Any C compiler can then be used to compile down to the desired target, not necessarily Clang. See, e.g., DaCe and the poster presented at C4ML’20.
- C is more stable in terms of versioning than LLVM-IR.
- Lowering to LLVM-IR dialect generally obfuscates semantic information and duplicates a process taken care of by C front-ends. OTOH, translating from Core-C MLIR dialects out to C potentially facilitates round-tripping.
- Translating core MLIR semantics directly to C would provide a more human readable artifact than LLVM-IR (to some of us ;-), which could facilitate diagnostics, debugging and manual interception.
- Translating to C99 could naturally extend to target related extensions including OpenCL, OpenMP, vector types supported by GCC, address spaces supported by Clang.
On the other hand, it may be preferable to translate to LLVM-IR rather than C in order to save compile-time or integrate more tightly with an LLVM-based middle-end.
CFamilyTranslator
CFamilyTranslator is a modular extendable framework to support language extensions to C that have a natural mapping to MLIR dialects, and is built on top of the core translator.
CFamilyTranslator design approach
CFamilyTranslator is composed of core, which can be extended by target plugins.
- C family translator core:
- Supports translating listed below MLIR dialects to C99.
- Is a generic framework extendable by user-provided plugins.
- Plugins: CFamilyTranslator provides a way to add plugins to target C variants.
- Add support for dialects types, attributes and operations not supported by core.
- Override types, attributes and operations, which are supported by core.
Plugins examples:
- OpenCL translator plugin - translation to OpenCL.
Requires special treatment, for example: memref::dma.start, memref::dma.wait, address spaces, while preventing use of indirect function calls. - OpenMP translator plugin - translation to OpenMP can be added to support omp dialect, for example generating: #pragma omp_parallel for{…
CFamilyTranslator architecture description
CFamilyTranslator Core:
- There are abstract classes with corresponding pure virtual method ‘process’:
- AbstractOperationTranslator
- AbstractTypeTranslator
- AbstractAttributeTranslator- For each supported MLIR op, type, attribute there is dedicated class, which translates it.
- Each translator class:
- Derives from appropriate abstract class.
- Registers at core on construction.
- Implements ‘process’ method, where it actually performs the translation.
Custom plugin:
In order to add custom target plugin implement the following:
- Register appropriate entry in translator: ‘generate-opencl-code’, ‘generate-openmp-code’.
Let’s call it translation mode.- As registration callback provide cft::translateToTarget function and pass it translation mode.
- Implement class per custom op/type/attribute following rules at bullet 3 above.
In case several translation modes are registered and they have translations for same op/type/attribute:
core provides support for registration and selection of correct translator class.
Supported dialects details:
BuiltIn Dialect
Types - natively supported in C99, including:
- Float32Type as float
- Float64Type as double
- signed-integer-type of width 8/16/32/64 as intN_t
- unsigned-integer-type of width 8/16/32/64 as uintN_t
Above 3 and 4 are according to ISO/IEC 9899:TC3
For signless IntegerType will be generated intN_t.- MemRefType as struct with predefined dimension and type as in example of float 3 dimensions below.
The struct scheme is similar to how Memref is lowered to LLVM-IR from MLIR.
struct Memref_float_3D { float* allocated; float* aligned; int offset; int sizes[3]; int strides[3]; }
Data types of aligned and allocated pointers:
- float
- double
- intN_t according to ISO/IEC 9899:TC3
Attributes - natively supported in C99, including:
- DenseArrayAttr as raw C array, single dimension
- DenseIntOrFPElementsAttr as raw C array, multiple dimensions
- DenseStringElementsAttr as const char *
- FloatAttr as float/double
- IntegerAttr as intN_t, uintN_t
- StringAttr as const char *
Operations
- ModuleOp
- Is treated as single compilation unit
- Have specific restrictions, for example nested modules aren’t allowed.
- Can be used to add global custom information in attributes.
Arith dialect
Types:
- signed/unsigned integer types
- Float32/64Type
- NOT supported: vectors and tensors
Operations:
Supported operations, which can be represented in C using operators and casting, including:
- AddI/FOp as operator ‘+’
- MulI/FOp as operator ‘*’
- DivFOp as (float)operand1/operand2
- SubFOp/SubIOp as operator ‘-’
- SIToFPOp as (float)operand1
- SelectOp as condition ? true_value : false_value
- ConstantOp : is performed constants propagation due to following reasons:
- Code readability
- Aligned with LLVM approach
Not supported in core CFamilyTranslator operations, which can NOT be represented in C using just operators and casting, including:
- CeilDivSIOp
- MulUIExtendedOp
- AddUIExtendedOp
Math dialect
Supported operations, which have identical function in math.h. They are translated as direct calls to functions from math.h, like:
- FloorOp as floor(operand)
- AbsFOp as fabs(operand)
- TanOp as tan(operand)
Func dialect
- Supported all operations, when there is non or single return value.
Memref will pass a struct by value.
Private mlir func will be generated as static.
SCF dialect
Supported operations, which can be natively translated to C99, including:
- ForOp as for loop
- IfOp as if
- IndexSwitchOp as switch case
- WhileOp as while loop and do while loop
Not supported all parallel related operations, including:
- ForallOp
- ParallelOp
- ReduceOp
Memref dialect support:
Operations that can be natively translated to C99 are supported.
For each supported op is generated appropriate func call. Func name keeps semantic information: mlir op name, type and dimension of memref.
For example: memref_get_global_float, memref_expand_shape_float_4D_to_5D, memref_load_float_5D etc.
The semantic in function names is for clarity what was the origin mlir op to enable round trip.
C functions declaration can be generated inside memrefs.h and implementation can be generated in memrefs.c.
Few examples for memref ops translations:
- GetGlobalOp
mlir
Cmemref.global "private" constant @__constant_1x3x2xf32 : memref<1x3x2xf32> = dense<[[[1.200000e+01, 1.600000e+01], [1.900000e+01, 3.600000e+01], [4.000000e+01, 2.800000e+01]]]> %0 = memref.get_global @__constant_1x3x2xf32 : memref<1x3x2xf32>
static float __constant_1x3x2xf32[1][3][2] = {{{1.200000e+01, 1.600000e+01}, {1.900000e+01, 3.600000e+01}, {4.000000e+01, 2.800000e+01}}}; Memref_float_3D v8 = memref_get_global_float_3D(0/*offset*/, {1, 3, 2}/*sizes*/, {6, 2, 1}/*strides*/, __constant_1x3x2xf32/*array*/);
- ExpandShapeOp
mlir
C%2 = memref.expand_shape %arg0 [[0], [1,2]] : memref<1x128xf32> into memref<1x8x16xf32>
Memref_float_3D v9 = memref_expand_shape_float_2D_to_3D(0/*offset*/, {1, 8, 16}/*sizes*/, {128, 16, 1}/*strides*/,v0/*src_memerf*/);
- SubViewOp
mlir
C%3 = memref.subview %2[0, 0, 0] [1, 8, 8] [1, 1, 1] : memref<1x8x16xf32> to memref<1x8x8xf32>
Memref_float_3D v10 = memref_sub_view_float_3D_to_3D(0/*offset*/, {1, 8, 8}/*sizes*/, {64, 8, 1}/*strides*/,v9/*src_memerf*/);
- ViewOp
mlir
C%4 = memref.view %arg5[%c0][] : memref<256xi8> to memref<1x8x16xf32>
Memref_float_3D v11 = memref_view_to_float_3D((0/*offset*/, {1, 8, 16}/*sizes*/, {128, 16, 1}/*strides*/,v1/*src_memerf*/); C
mlir::memref address space:
C99 doesn’t have support for address space. As a result core CFamilyTranslator doesn’t have support for address space. CFamilyTranslator framework has support for plugins(OpenCl, ClangC, accelerator specific…) to extend the support for address space in memrefs.
Appendix
Memrefs representation
By MLIR definition memref is a pointer and an affine map, which can be any function defining index mapping.
We suggest to start by supporting strided memrefs.
This is what lowering to LLVM-IR supports today, so it should suffice for a first version.
It could be extended in the future to support any affine map following similar support once added to LLVM-IR translator.
TTL
Augmenting C and OpenCL C with structs reminiscent of “memref”s was recently introduced in TTL public url
C to MLIR projects
There are several projects dealing with going from C/C++ to MLIR, including CIR project, Polygeist and SYCLops. The possible interaction with such projects is TBD.
Optional integration with EmitC Translator
Note:
Difference in approaches of CFamilyTranslator vs EmitC were discussed above.
At this point let’s review technical alternatives.
There are few operations which are supported in both TranslateToCpp(translator used for EmitC dialect) and in CFamilyTranslator, like scf::IfOp, scf::ForOp, func::CallOp.
Following options are available to reuse common functionality:
Option 1: Convert TranslateToCpp to CFamilyTranslator C++ plugin
- EmitC dialect ops supported in TranslateToCpp become part of CFamilyTranslator C++ plugin.
- Common code is integrated with implementation inside CFamilyTranslator Core.
- C++ specific code generation will override CFamilyTranslator Core implementation.
Pros:
- Single holistic, modular and scalable solution for all C family.
- Full reuse, no duplication.
Cons:
- More complicated and risky approach from support point of view.
Option 2:
- Extract common code into utility and reuse from TranslateToCpp and from CFamilyTranslator
Pros:
- Simple and easy separation and reuse.
Cons:
- Users might be confused by the duplication to understand the differences between two implementations.
This RFC is proposed by: Diana Dubov, Gil Rapaport, Ayal Zaks.