Background: Mixed value/memory semantics in EmitC
The EmitC dialect currently identifies MLIR SSA values with C variables: The
translator is assumed to generate a C variable for each SSA value, and the
dialect supports taking that C variable’s address via its emitc.apply
op.
Consider the following example:
func.func @take_address(%v1: i32, %v2: i32) -> i32 {
%val = emitc.mul %v1, %v2 : (i32, i32) -> i32
%addr = emitc.apply "&"(%val) : (i32) -> !emitc.ptr<i32>
emitc.call_opaque "zero" (%addr) : (!emitc.ptr<i32>) -> ()
return %val : i32
}
where opaque function zero
is:
void zero(int32_t *x) { *x = 0; }
What does take_address
return? By MLIR SSA value semantics, it should
return v1 * v2
; By EmitC semantics, it actually returns zero, as can be
seen when @take_address
is translated to C:
int32_t take_address(int32_t v1, int32_t v2) {
int32_t v3 = v1 * v2;
int32_t* v4 = &v3;
zero(v4);
return v3;
}
In addition to redefining MLIR’s SSA value semantics, which is confusing at
best, identifying values with mutable C variables implies a memory model which
isn’t expressed in MLIR’s memory and side-effects interfaces and traits,
making various standard analyses and transforms unusable for the dialect.
EmitC partly addresses this by providing the emitc.variable
op which should be
used for defining mutable values, like so:
func.func @variables(%v1: i32, %v2: i32) -> i32 {
%val = emitc.mul %v1, %v2 : (i32, i32) -> i32
%variable = "emitc.variable"() {value = #emitc.opaque<"">} : () -> i32
emitc.assign %val : i32 to %variable : i32
%addr = emitc.apply "&"(%variable) : (i32) -> !emitc.ptr<i32>
emitc.call_opaque "zero" (%addr) : (!emitc.ptr<i32>) -> ()
return %variable : i32
}
which is translated into:
int32_t variables(int32_t v1, int32_t v2) {
int32_t v3 = v1 * v2;
int32_t v4;
v4 = v3;
int32_t* v5 = &v4;
zero(v5);
return v4;
}
Note however that this convention doesn’t prevent emitc.apply
from being
misused as done in take_address
above. In addition, there are currently two
exceptions to the values-as-C-variables semantics:
-
The address of the SSA value defined by
emitc.constant
cannot be taken via
emitc.apply
. -
The SSA value defined by the
emitc.literal
op has no counterpart C variable
since the translator always inlines its value.
These semantics will be further challenged by the pending suggestion for
modeling C expressions, which
add support for emitting complex C expressions where only the final result may
be associated with a C variable.
Proposal: Model C variables as memory allocations
C variables are statically scoped named locations. Local C variables have
automatic scopes. Modeling this aspect of variable definition can be done using
MLIR’s automatic allocation traits, similar to automatic allocation in the
memref
dialect by adding an emitc.automatic
operation which allocates
automatically-scoped memory similar to memref.alloca
.
Since C variables are defined within syntactic blocks, to fully model C’s
allocation scopes the dialect would need to provide automatic allocation scopes
similar to memref.alloca_scope
within the syntactic constructs it currently
supports: Functions, for-loops and if-then-else.
-
Functions are currently still supported using
func.func
, which is an
automatic allocation scope. -
For-loops (
emitc.for
): EmitC currently supports limited init-cond-iter
clauses, so a single allocation scope for the loop’s body would currently
suffice. However, future extension of this op may benefit from an additional
scope enclosing the init-cond-iter clauses. -
If-then-else (
emitc.if
): This construct models two syntactic blocks, so
defining it as a (single) allocation scope would not suffice.
As a unified solution, the dialect can instead be augmented with an
emitc.block
operation that would model the syntactic block construct {...}
.
This op would become the only valid operation within the body of a
function/for/then/else region. For example (in generic form):
"emitc.if"(%arg0) ({
"emitc.block"() ({
%3 = "emitc.call_opaque"(%arg1) <{callee = "f"}> : (f32) -> i32
"emitc.yield"() : () -> ()
}) : () -> ()
"emitc.yield"() : () -> ()
}, {
"emitc.block"() ({
%3 = "emitc.call_opaque"(%arg1) <{callee = "f"}> : (f32) -> i32
"emitc.yield"() : () -> ()
}) : () -> ()
"emitc.yield"() : () -> ()
}) : (i1) -> ()
"emitc.for"(%0, %1, %2) ({
^bb0(%arg0: index):
"emitc.block"() ({
%9 = "emitc.call_opaque"(%7, %arg0) <{callee = "f"}> : (i32, index) -> i32
"emitc.yield"() : () -> ()
}) : () -> ()
"emitc.yield"() : () -> ()
}) : (index, index, index) -> ()
Such an emitc.block
op would not affect the code currently emitted for these
ops, as the translator already emits curly braces for their bodies. When used
independently, an emitc.block
would be emitted as a syntactic { ... }
block.
Modeling global variables could again follow the memref
dialect by defining an
emitc.global
operation, analogous to memref.global
. This operation would
define a symbol residing on the module
’s symbol table.
Possible semantics for the emitc.automatic
operation could then be:
Alternative 1: Using pointers
Follow through the example of memref.alloca
, i.e. let emitc.automatic
return
a value of type !emitc.ptr<T>
for a given type T
. The operation would then
be translated into:
T v1;
T* v2 = &v1;
where v2
would be the value returned by the operation. In this alternative
there is no need to take the address of the variable as it’s already available
as a value. The emitc.apply "*"
op can then be used to dereference the
variable into an rvalue as done today, whereas a new emitc.store
operation would
replace the existing emitc.assign
operation, allowing any !emitc.ptr<T>
to be
used as an lvalue. The variables
example could then be expressed as:
func.func @variables(%v1: i32, %v2: i32) -> i32 {
%val = emitc.mul %v1, %v2 : (i32, i32) -> i32
%variable = emitc.automatic : !emitc.ptr<i32>
emitc.store %val : i32 to %variable : !emitc.ptr<i32>
emitc.call_opaque "zero" (%variable) : (!emitc.ptr<i32>) -> ()
%updated_val = emitc.apply "*"(%variable) : i32
return %updated_val : i32
}
Note that in this alternative, if global variables are indeed modeled as symbols
they would require an operation analogous to memref.get_global
for getting a
pointer for their allocated memory.
Alternative 2: As symbols
Let emitc.automatic
define a symbol rather than returning any value, similar to
emitc.global
, thus using a unified model of C variables as symbols. Provide
operations for reading, writing and taking the address of variables. The
variables
example could then be expressed as:
func.func @symbol_variables(%v1: i32, %v2: i32) -> i32 {
%val = emitc.mul %v1, %v2 : (i32, i32) -> i32
emitc.automatic @variable : i32
emitc.write %val : i32 into @variable
%addr = emitc.address_of @variable : !emitc.ptr<i32>
emitc.call_opaque "zero" (%addr) : (!emitc.ptr<i32>) -> ()
%updated_val = emitc.read %variable : i32
return %updated_val : i32
}
Since EmitC defines nested operations, and MLIR requires symbols to be defined
at symbol table level, in this alternative the emitc.block
operation would
need provide these ops’ symbol tables in addition to providing their automatic
allocation scopes.
Note that there are several differences between C’s blocks and
MLIR’s symbol tables:
-
MLIR symbols are by default
public
, which makes them visible from outside
the symbol table they are defined in. Symbols modeling C variables would
therefore have to be definedprivate
. -
C variables are visible in the block they are declared in and in blocks nested
within it. MLIR symbols are resolved with respect to the closest parent
operation that defines a symbol table. To properly model C variable scopes,
each nestedemitc.block
would need to declare all variables declared or
defined in any of its containing blocks.