Background: Mixed value/memory semantics in EmitC
The EmitC dialect currently identifies MLIR SSA values with C variables: The
translator is assumed to generate a C variable for each SSA value, and the
dialect supports taking that C variable’s address via its emitc.apply op.
Consider the following example:
func.func @take_address(%v1: i32, %v2: i32) -> i32 {
%val = emitc.mul %v1, %v2 : (i32, i32) -> i32
%addr = emitc.apply "&"(%val) : (i32) -> !emitc.ptr<i32>
emitc.call_opaque "zero" (%addr) : (!emitc.ptr<i32>) -> ()
return %val : i32
}
where opaque function zero is:
void zero(int32_t *x) { *x = 0; }
What does take_address return? By MLIR SSA value semantics, it should
return v1 * v2; By EmitC semantics, it actually returns zero, as can be
seen when @take_address is translated to C:
int32_t take_address(int32_t v1, int32_t v2) {
int32_t v3 = v1 * v2;
int32_t* v4 = &v3;
zero(v4);
return v3;
}
In addition to redefining MLIR’s SSA value semantics, which is confusing at
best, identifying values with mutable C variables implies a memory model which
isn’t expressed in MLIR’s memory and side-effects interfaces and traits,
making various standard analyses and transforms unusable for the dialect.
EmitC partly addresses this by providing the emitc.variable op which should be
used for defining mutable values, like so:
func.func @variables(%v1: i32, %v2: i32) -> i32 {
%val = emitc.mul %v1, %v2 : (i32, i32) -> i32
%variable = "emitc.variable"() {value = #emitc.opaque<"">} : () -> i32
emitc.assign %val : i32 to %variable : i32
%addr = emitc.apply "&"(%variable) : (i32) -> !emitc.ptr<i32>
emitc.call_opaque "zero" (%addr) : (!emitc.ptr<i32>) -> ()
return %variable : i32
}
which is translated into:
int32_t variables(int32_t v1, int32_t v2) {
int32_t v3 = v1 * v2;
int32_t v4;
v4 = v3;
int32_t* v5 = &v4;
zero(v5);
return v4;
}
Note however that this convention doesn’t prevent emitc.apply from being
misused as done in take_address above. In addition, there are currently two
exceptions to the values-as-C-variables semantics:
-
The address of the SSA value defined by
emitc.constantcannot be taken via
emitc.apply. -
The SSA value defined by the
emitc.literalop has no counterpart C variable
since the translator always inlines its value.
These semantics will be further challenged by the pending suggestion for
modeling C expressions, which
add support for emitting complex C expressions where only the final result may
be associated with a C variable.
Proposal: Model C variables as memory allocations
C variables are statically scoped named locations. Local C variables have
automatic scopes. Modeling this aspect of variable definition can be done using
MLIR’s automatic allocation traits, similar to automatic allocation in the
memref dialect by adding an emitc.automatic operation which allocates
automatically-scoped memory similar to memref.alloca.
Since C variables are defined within syntactic blocks, to fully model C’s
allocation scopes the dialect would need to provide automatic allocation scopes
similar to memref.alloca_scope within the syntactic constructs it currently
supports: Functions, for-loops and if-then-else.
-
Functions are currently still supported using
func.func, which is an
automatic allocation scope. -
For-loops (
emitc.for): EmitC currently supports limited init-cond-iter
clauses, so a single allocation scope for the loop’s body would currently
suffice. However, future extension of this op may benefit from an additional
scope enclosing the init-cond-iter clauses. -
If-then-else (
emitc.if): This construct models two syntactic blocks, so
defining it as a (single) allocation scope would not suffice.
As a unified solution, the dialect can instead be augmented with an
emitc.block operation that would model the syntactic block construct {...}.
This op would become the only valid operation within the body of a
function/for/then/else region. For example (in generic form):
"emitc.if"(%arg0) ({
"emitc.block"() ({
%3 = "emitc.call_opaque"(%arg1) <{callee = "f"}> : (f32) -> i32
"emitc.yield"() : () -> ()
}) : () -> ()
"emitc.yield"() : () -> ()
}, {
"emitc.block"() ({
%3 = "emitc.call_opaque"(%arg1) <{callee = "f"}> : (f32) -> i32
"emitc.yield"() : () -> ()
}) : () -> ()
"emitc.yield"() : () -> ()
}) : (i1) -> ()
"emitc.for"(%0, %1, %2) ({
^bb0(%arg0: index):
"emitc.block"() ({
%9 = "emitc.call_opaque"(%7, %arg0) <{callee = "f"}> : (i32, index) -> i32
"emitc.yield"() : () -> ()
}) : () -> ()
"emitc.yield"() : () -> ()
}) : (index, index, index) -> ()
Such an emitc.block op would not affect the code currently emitted for these
ops, as the translator already emits curly braces for their bodies. When used
independently, an emitc.block would be emitted as a syntactic { ... } block.
Modeling global variables could again follow the memref dialect by defining an
emitc.global operation, analogous to memref.global. This operation would
define a symbol residing on the module’s symbol table.
Possible semantics for the emitc.automatic operation could then be:
Alternative 1: Using pointers
Follow through the example of memref.alloca, i.e. let emitc.automatic return
a value of type !emitc.ptr<T> for a given type T. The operation would then
be translated into:
T v1;
T* v2 = &v1;
where v2 would be the value returned by the operation. In this alternative
there is no need to take the address of the variable as it’s already available
as a value. The emitc.apply "*" op can then be used to dereference the
variable into an rvalue as done today, whereas a new emitc.store operation would
replace the existing emitc.assign operation, allowing any !emitc.ptr<T> to be
used as an lvalue. The variables example could then be expressed as:
func.func @variables(%v1: i32, %v2: i32) -> i32 {
%val = emitc.mul %v1, %v2 : (i32, i32) -> i32
%variable = emitc.automatic : !emitc.ptr<i32>
emitc.store %val : i32 to %variable : !emitc.ptr<i32>
emitc.call_opaque "zero" (%variable) : (!emitc.ptr<i32>) -> ()
%updated_val = emitc.apply "*"(%variable) : i32
return %updated_val : i32
}
Note that in this alternative, if global variables are indeed modeled as symbols
they would require an operation analogous to memref.get_global for getting a
pointer for their allocated memory.
Alternative 2: As symbols
Let emitc.automatic define a symbol rather than returning any value, similar to
emitc.global, thus using a unified model of C variables as symbols. Provide
operations for reading, writing and taking the address of variables. The
variables example could then be expressed as:
func.func @symbol_variables(%v1: i32, %v2: i32) -> i32 {
%val = emitc.mul %v1, %v2 : (i32, i32) -> i32
emitc.automatic @variable : i32
emitc.write %val : i32 into @variable
%addr = emitc.address_of @variable : !emitc.ptr<i32>
emitc.call_opaque "zero" (%addr) : (!emitc.ptr<i32>) -> ()
%updated_val = emitc.read %variable : i32
return %updated_val : i32
}
Since EmitC defines nested operations, and MLIR requires symbols to be defined
at symbol table level, in this alternative the emitc.block operation would
need provide these ops’ symbol tables in addition to providing their automatic
allocation scopes.
Note that there are several differences between C’s blocks and
MLIR’s symbol tables:
-
MLIR symbols are by default
public, which makes them visible from outside
the symbol table they are defined in. Symbols modeling C variables would
therefore have to be definedprivate. -
C variables are visible in the block they are declared in and in blocks nested
within it. MLIR symbols are resolved with respect to the closest parent
operation that defines a symbol table. To properly model C variable scopes,
each nestedemitc.blockwould need to declare all variables declared or
defined in any of its containing blocks.