[RFC] Separate variables from SSA values in EmitC

Background: Mixed value/memory semantics in EmitC

The EmitC dialect currently identifies MLIR SSA values with C variables: The
translator is assumed to generate a C variable for each SSA value, and the
dialect supports taking that C variable’s address via its emitc.apply op.
Consider the following example:

func.func @take_address(%v1: i32, %v2: i32) -> i32 {
  %val = emitc.mul %v1, %v2 : (i32, i32) -> i32
  %addr = emitc.apply "&"(%val) : (i32) -> !emitc.ptr<i32>
  emitc.call_opaque "zero" (%addr) : (!emitc.ptr<i32>) -> ()
  return %val : i32
}

where opaque function zero is:

void zero(int32_t *x) { *x = 0; }

What does take_address return? By MLIR SSA value semantics, it should
return v1 * v2; By EmitC semantics, it actually returns zero, as can be
seen when @take_address is translated to C:

int32_t take_address(int32_t v1, int32_t v2) {
  int32_t v3 = v1 * v2;
  int32_t* v4 = &v3;
  zero(v4);
  return v3;
}

In addition to redefining MLIR’s SSA value semantics, which is confusing at
best, identifying values with mutable C variables implies a memory model which
isn’t expressed in MLIR’s memory and side-effects interfaces and traits,
making various standard analyses and transforms unusable for the dialect.

EmitC partly addresses this by providing the emitc.variable op which should be
used for defining mutable values, like so:

func.func @variables(%v1: i32, %v2: i32) -> i32 {
  %val = emitc.mul %v1, %v2 : (i32, i32) -> i32
  %variable = "emitc.variable"() {value = #emitc.opaque<"">} : () -> i32
  emitc.assign %val : i32 to %variable : i32
  %addr = emitc.apply "&"(%variable) : (i32) -> !emitc.ptr<i32>
  emitc.call_opaque "zero" (%addr) : (!emitc.ptr<i32>) -> ()
  return %variable : i32
}

which is translated into:

int32_t variables(int32_t v1, int32_t v2) {
  int32_t v3 = v1 * v2;
  int32_t v4;
  v4 = v3;
  int32_t* v5 = &v4;
  zero(v5);
  return v4;
}

Note however that this convention doesn’t prevent emitc.apply from being
misused as done in take_address above. In addition, there are currently two
exceptions to the values-as-C-variables semantics:

  • The address of the SSA value defined by emitc.constant cannot be taken via
    emitc.apply.

  • The SSA value defined by the emitc.literal op has no counterpart C variable
    since the translator always inlines its value.

These semantics will be further challenged by the pending suggestion for
modeling C expressions, which
add support for emitting complex C expressions where only the final result may
be associated with a C variable.

Proposal: Model C variables as memory allocations

C variables are statically scoped named locations. Local C variables have
automatic scopes. Modeling this aspect of variable definition can be done using
MLIR’s automatic allocation traits, similar to automatic allocation in the
memref dialect by adding an emitc.automatic operation which allocates
automatically-scoped memory similar to memref.alloca.

Since C variables are defined within syntactic blocks, to fully model C’s
allocation scopes the dialect would need to provide automatic allocation scopes
similar to memref.alloca_scope within the syntactic constructs it currently
supports: Functions, for-loops and if-then-else.

  • Functions are currently still supported using func.func, which is an
    automatic allocation scope.

  • For-loops (emitc.for): EmitC currently supports limited init-cond-iter
    clauses, so a single allocation scope for the loop’s body would currently
    suffice. However, future extension of this op may benefit from an additional
    scope enclosing the init-cond-iter clauses.

  • If-then-else (emitc.if): This construct models two syntactic blocks, so
    defining it as a (single) allocation scope would not suffice.

As a unified solution, the dialect can instead be augmented with an
emitc.block operation that would model the syntactic block construct {...}.
This op would become the only valid operation within the body of a
function/for/then/else region. For example (in generic form):

    "emitc.if"(%arg0) ({
      "emitc.block"() ({
        %3 = "emitc.call_opaque"(%arg1) <{callee = "f"}> : (f32) -> i32
        "emitc.yield"() : () -> ()
      }) : () -> ()
      "emitc.yield"() : () -> ()
    }, {
      "emitc.block"() ({
        %3 = "emitc.call_opaque"(%arg1) <{callee = "f"}> : (f32) -> i32
        "emitc.yield"() : () -> ()
      }) : () -> ()
      "emitc.yield"() : () -> ()
    }) : (i1) -> ()

    "emitc.for"(%0, %1, %2) ({
    ^bb0(%arg0: index):
      "emitc.block"() ({
        %9 = "emitc.call_opaque"(%7, %arg0) <{callee = "f"}> : (i32, index) -> i32
        "emitc.yield"() : () -> ()
      }) : () -> ()
      "emitc.yield"() : () -> ()
    }) : (index, index, index) -> ()

Such an emitc.block op would not affect the code currently emitted for these
ops, as the translator already emits curly braces for their bodies. When used
independently, an emitc.block would be emitted as a syntactic { ... } block.

Modeling global variables could again follow the memref dialect by defining an
emitc.global operation, analogous to memref.global. This operation would
define a symbol residing on the module’s symbol table.

Possible semantics for the emitc.automatic operation could then be:

Alternative 1: Using pointers

Follow through the example of memref.alloca, i.e. let emitc.automatic return
a value of type !emitc.ptr<T> for a given type T. The operation would then
be translated into:

T v1;
T* v2 = &v1;

where v2 would be the value returned by the operation. In this alternative
there is no need to take the address of the variable as it’s already available
as a value. The emitc.apply "*" op can then be used to dereference the
variable into an rvalue as done today, whereas a new emitc.store operation would
replace the existing emitc.assign operation, allowing any !emitc.ptr<T> to be
used as an lvalue. The variables example could then be expressed as:

func.func @variables(%v1: i32, %v2: i32) -> i32 {
  %val = emitc.mul %v1, %v2 : (i32, i32) -> i32
  %variable = emitc.automatic : !emitc.ptr<i32>
  emitc.store %val : i32 to %variable : !emitc.ptr<i32>
  emitc.call_opaque "zero" (%variable) : (!emitc.ptr<i32>) -> ()
  %updated_val = emitc.apply "*"(%variable) : i32
  return %updated_val : i32
}

Note that in this alternative, if global variables are indeed modeled as symbols
they would require an operation analogous to memref.get_global for getting a
pointer for their allocated memory.

Alternative 2: As symbols

Let emitc.automatic define a symbol rather than returning any value, similar to
emitc.global, thus using a unified model of C variables as symbols. Provide
operations for reading, writing and taking the address of variables. The
variables example could then be expressed as:

func.func @symbol_variables(%v1: i32, %v2: i32) -> i32 {
  %val = emitc.mul %v1, %v2 : (i32, i32) -> i32
  emitc.automatic @variable : i32
  emitc.write %val : i32 into @variable
  %addr = emitc.address_of @variable : !emitc.ptr<i32>
  emitc.call_opaque "zero" (%addr) : (!emitc.ptr<i32>) -> ()
  %updated_val = emitc.read %variable : i32
  return %updated_val : i32
}

Since EmitC defines nested operations, and MLIR requires symbols to be defined
at symbol table level, in this alternative the emitc.block operation would
need provide these ops’ symbol tables in addition to providing their automatic
allocation scopes.

Note that there are several differences between C’s blocks and
MLIR’s symbol tables:

  • MLIR symbols are by default public, which makes them visible from outside
    the symbol table they are defined in. Symbols modeling C variables would
    therefore have to be defined private.

  • C variables are visible in the block they are declared in and in blocks nested
    within it. MLIR symbols are resolved with respect to the closest parent
    operation that defines a symbol table. To properly model C variable scopes,
    each nested emitc.block would need to declare all variables declared or
    defined in any of its containing blocks.

Hi Gil, Thank you for the detailed RFC.

I thought about a variation of your second suggestion, that may work without having to work with symbols all the time. We could make lvalues part of the type system by introducing an !emitc.lvalue type. The emitc.automatic operation or a repurposed emitc.variable would produce a value which is wrapped in this type. The emitc.address_of op would be restricted to this type as well as the destination operand of the emitc.assign op. To make the usage of the value explicit in the IR I could imagine an emitc.lvalue_to_rvalue operation (similar to your emitc.read on symbols I guess) that represents lvalue conversion with a read memory effect.

Either way I am not quite sure how this would be handled in the emitter. Should the emitter skip the op and print the lvalue/symbol instead for every use? This would mean that the memory read is posponed to the uses in the generated code leading to a semantic mismatch between the IR and the generated code. Making all EmitC ops lvalue agnostic may be an option, we’d need to have a least custom func, call and return ops for this.

Example:

func.func @lvalue_variables(%v1: i32, %v2: i32) -> i32 {
  %val = emitc.mul %v1, %v2 : (i32, i32) -> i32
  %variable = emitc.variable : !emitc.lvalue<i32> // alloc effect
  emitc.assign %val : i32 to %variable : !emitc.lvalue<i32> // write effect
  %addr = emitc.address_of %variable : !emitc.lvalue<i32> -> !emitc.ptr<i32>
  emitc.call_opaque "zero" (%addr) : (i32) -> ()
  %updated_val = emitc.lvalue_to_rvalue %variable : i32 // read effect, (noop in emitter?)
  %neg_one = arith.constant -1 : i32
  emitc.assign %neg_one : i32 to %variable : !emitc.lvalue<i32> // invalidates %updated_val
  return %updated_val : i32
  // dealloc effect through automatic allocation scope
}

Simon

Hi Simon, Sorry for the late response.

Sounds good to me! Explicitly modeling the l-value-to-r-value memory access would allow most ops to remain side-effect free. I guess the derefence op (either apply or a new one) would also be generating l-values, right? We may need to work a bit harder when verifying ops, but that sounds reasonable.

Modelling function, call and return ops is something I’ve been wanting to do anyway in order to handle multiple return values at the dialect level rather than in the emitter, so that can perhaps be done separately in advance.

I vote for keeping the emitter 1-1 with the dialect. Moving loads/stores requires memory analysis which doesn’t seem like something a translator should be doing. Also, the form-expressions pass already folds SSA C variables into emitc.expression ops which the translator then emits compactly. The pass currently avoids handling side effects, but can arguably be extended to handle them as well where possible.

I’ve added those on Feb. 4th, see [mlir][EmitC] Add func, call and return operations and conversions by marbre · Pull Request #79612 · llvm/llvm-project · GitHub. In contrast to the ops from the Func dialect, these only support a single return value.

I’ve started to prototype this in a downstream project. I’m currently hitting a few errors where some lvalue-to-rvalue conversion ops are missing.

Good to hear! Would be happy to collaborate on it upstream.

Here is the current state of the prototype to discuss on one of the alternatives proposed above: EmitC lvalue. (I haven’t put to much thought into the changes in the emitter or validation of the ops. I just added code where ever needed to make most of the tests pass.)

To summarize the changes. Ops return results or take operands with lvalue type where appropriate. Explicit lvalue-to-rvalue ops are used in places where lvalue types are not expected in the IR. During emission these ops don’t produce any output. Instead the mapping from SSA values to variable names explicitly checks if the defining op is an LValueToRValueOp and then recursively looks up the variable name of it’s operand.

A few things to consider:

  • If a conversion replaces a value with an op producing a lvalue type. LValueToRValueOps need to be inserted for every use of the result value. I think a source materialization can be used for this, but I haven’t tested this yet. An example for this upstream would be in the SCFToEmitCPass where scf.if results are replaced by emitc.variable ops which get assigned to in both regions of the corresponding emitc.if op.
  • Many ops in the dialect should maybe be restricted to not take and produce lvalues (arithmetic operators for example).

For visibility: @marbre @aniragil @mgehre-amd

Thanks for pushing through with this Simon!

During emission these ops don’t produce any output. Instead the mapping from SSA values to variable names explicitly checks if the defining op is an LValueToRValueOp and then recursively looks up the variable name of it’s operand

Not sure I follow: Won’t this effectively move the load operation to the use rather than at the location designated by the lvalue_to_rvalue op?
We could have an EmitC optimization pass that tries to move/rematerialize lvalue_to_rvalue ops right before their uses, leaving
it to the translator to eliminate trivially-redundant “SSA” C variables.

Hi all, I started working on the LValue type again.
I got all upstream tests passing except for one that uses nested for ops. I think this has to do with how I update users of the iter_args values. I will continue with this next week and will keep you updated.

1 Like