Hi all,
I recently ran into an issue with memref only dialects and easy discovery of constant memrefs. After some offline discussion with a few folks, I have the following RFC that allows defining constant or non-constant global variables including memrefs. Can folks please review and provide feedback?
Thanks
Rahul
Representing global variables in MLIR
Motivation
MLIR does not have a standard way or representing global (module scoped) variables. Dialects need to define dialect specific operations to represent global variables, e.g.,tf_saved_model.global_tensor.The RFC below proposes adding operations in the MLIR standard dialect to help define and reference global variables.
In addition, for memrefs, we propose additional variants that can help define a module scoped memref that is backed by buffers allocated in the static data section by the compiler. The motivation for these comes from the desire to represent constant valued memrefs in MLIR. MLIR dialects that use memrefs as opposed to tensors may need to determine if some of their inputs are compile time constants. This information may be needed for codegen optimizations or for other reasons. As an example, the
LMHLO dialect represents operation inputs and outputs as memref operands. Code generation or conversion to other dialects from LMHLO could benefit from knowing if certain inputs are constants. With memrefs involved, doing this analysis may be expensive as we need to track the last write to a given memref, while also taking into account any aliasing that might be happening due to memref casting and views. With the operations being proposed by this RFC, this analysis can be simplified if the memref in question is a global constant memref.
Proposal
Since MLIR does not define a standard pointer type, we propose 3 related operations for defining and using global variables:
- An operation to declare/define a named global variable. This operation should always be module scoped (parent should be a
ModuleOp
, similar toFuncOp
). - A pair of getter and setter operations that can be used in get or set the value of a global variable by name.
- A specialized variant for memref that semantically allocates the backing buffer for the memeref as well.
The sections below discuss these operations in more detail.
Defining or declaring a global variable
The operation std.global
will declare or define a named global variable. The operation will support the SymbolOpInterface
that supports declaring the visibility of the global variable (public, nested, private).
If the operation is defining a global variable, it will need an initializer that defines how the global variable is initialized. We have a choice of representing this initializer as either an attribute attached to the operation or as a region that computes the initial value. Since attributes in general may not be able to represent all possible initial values (for example, initial values that depend on other global variables), we choose to represent the initializer as a single-block region that computes the initial value of the variable being defined. If the initializer region is empty, then the operation is assumed to declare the global variable but not define it.
An additional unit attribute can also mark the declared or defined variable as immutable (TODO: Would another parallel operation, std.constant
be preferred? And then an std.get_constant
getter would be needed as well).
Below is an example of this operation using generic MLIR syntax.
std.global { sym_name = "foo", type = i32, sym_visibility = "private" } ( {
%0 = std.constant { value = 10 : i32 } : i32;
std.yield %0;
})
This defines a global variable foo
of type i32
with an initial value of 10. Custom assembly format could look like:
global @foo : i32 { sym_visibility = "private", is_constant ) {
%0 = std.constant { value = 10 : i32 } : i32;
std.yield %0;
}
Accessing global variables
We will define getter and setter operations to read and write values of global variables defined using std.global
. These operations refer to the variables by name.
func @main(...) {
%0 = std.get_global { name = @foo } : i32
%1 = ...
std.set_global(%1) { name = @foo } : i32
}
Other names could be std.read_global
and std.write_global
. If the variable is marked constant, std.set_global
will lead to a verification failure.
Specialized variants for memrefs
We would like to use similar mechanism to define module scoped memrefs. A memref can be thought of as 2 distinct things: the pointer, and the backing buffer. The std.global
operation can be used to define just the pointer portion of the memref. And it’s initializer can potentially initialize it using alloc:
global @buffer : memref<4x4xf32> { sym_visibility = "private" } {
%0 = alloc() : memref<4x4xf32>
std.yield %0 : memref<4x4xf32>
}
This entails that we potentially also need a de-initializer or destructor attached to that global variable. Something like:
global @buffer : memref<4x4xf32> { sym_visibility = private} {
%0 = alloc() : memref<4x4xf32>
%1 = std.constant ... : tensor<4x4xf32>
tensor_store %1, %0 : memref<4x4xf32>
std.yield %0 : memref<4x4xf32>
}, {
%0 = std.get_global { name = @buffer } : memref<4x4xf32>
dealloc %0 : memref<4x4xf32>
}
This will likely benefit from a custom assembly format that can look like:
global @buffer : memref<4x4xf32> { sym_visibility = private}
init {
%0 = alloc() : memref<4x4xf32>
%1 = std.constant ... : tensor<4x4xf32>
tensor_store %1, %0 : memref<4x4xf32>
std.yield %0 : memref<4x4xf32>
} de-init {
%0 = std.get_global { name = @buffer } : memref<4x4xf32>
dealloc %0 : memref<4x4xf32>
}
However, we also need another variant that will define a memref and allocate the backing memory using some sort of statically allocated memory (i.e., without using alloc). Such an operation (called static_memref
) can look like:
static_memref @buffer : memref<4x4xf32> { sym_visibility = private } {
^bb0(%this: memref<4x4xf32)
// initializer, which assumes the memref itself is allocated.
%1 = std.constant ... : tensor<4x4xf32>
tensor_store %1, %0 : memref<4x4xf32>
}
The semantics of this operation are that the operation itself defines the memref and also allocates the backing buffer. The initializer then is just a single argument region that initializes the contents of the memref. The memref being defined here is implicitly constant (you cannot set_global
the memref value defined by static_memref
).
In addition to this, both std.global
and std.static_memref
can also declare that the contents of backing buffer is immutable using an additional attribute. This will allow transformations to look at the defining operation for that memref and infer that its contents are compile time constant values. Note that such memref’s themselves will also need to be constant. Say the unit attribute used to represent this is is_backing_buffer_immutable
. Then the operation that defines a heap allocated immutable memref will look like:
global @heap_buffer : memref<4x4xf32>
{sym_visibility = private, is_constant, is_backing_buffer_immutable}
init {
%0 = alloc() : memref<4x4xf32>
%1 = std.constant ... : tensor<4x4xf32>
tensor_store %1, %0 : memref<4x4xf32>
std.yield %0 : memref<4x4xf32>
} de-init {
%0 = std.get_global { name = @buffer } : memref<4x4xf32>
dealloc %0 : memref<4x4xf32>
}
and one using statically allocated memory:
static_memref @buffer : memref<4x4xf32> { sym_visibility = private, is_backing_buffer_immutable } {
^bb0(%this: memref<4x4xf32)
// initializer, which assumes the memref itself is allocated.
%1 = std.constant ... : tensor<4x4xf32>
tensor_store %1, %this : memref<4x4xf32>
}
With these operations, given a memref, transformations can trace back the definition of the memref and if its backing buffer is immutable, infer the value of the memref contents and use that for improved coegen or other purposes. If the program writes to such a memref, the results are undefined.
Interactions with MemoryEffects
The std.global
operation by itself does not do any memory allocation etc, so it will not need any memory effects trait (it will be marked as NoSideEffect
).
The static_memref
operation does allocate the backing buffers, so would need a MemAlloc<DefaultResource>
trait to describe that behavior. However, unlike other such operations like std.alloc
, the allocated memref is not returned as a result, so there is no result that also gets tagged with that memory effect. (TODO: Is using DefaultResource
the right thing? Or do we need to add a new resource type to represent static allocations?)
The std.get_global
operation will not get any memory effects as its not allocating a new memref, just getting a handle to an existing one. It will be marked with the NoSideEffect
trait.
Interactions with buffer placement (for global memref variables)
We expect that these operations, being module scoped, will not participate in buffer placement. This will happen naturally because buffer placement is a function pass that works only on operations within functions and when they define a result.
Buffer alias analysis has been extracted into a separate analysis. It will likely need to be updated to handle module scoped buffers if needed. Current use for buffer placement will not need this though.
General transformations
We can envision several passes that transform global variables in specific ways.
- Transform a global variable into a SSA value when possible. This will be possible if a global variable is live only within a group of functions and not live outside these functions. Need to make sure semantics are preserved, for example, if the initializer has side effects, we cannot do this, or atleast cannot remove the global variable.
- Convert non-constant global variables to constant.
- Constant fold global variables that are marked as constant (folder).
- store → load forwarding (get followed by set).
For memrefs in particular, there are some additional transformations possible:
- Transform module scoped global memrefs to heap allocated memrefs when possible. This is similar to transforming a non-memref global variable to an SSA value. This transformation can help in cases where say a global memref is used only within a single function or groups of functions and not live outside that group. Such transformation can happen prior to buffer placement so that the transformed memrefs can then participate in buffer placement.
- Convert non immutable global memrefs to immutable if there are no writes.
- Convert alloc/alloca’d memrefs with constant initializers to global memrefs when possible so that the same global constant memref can be shared across multiple functions.
- Eliminate global memrefs:
- For immutable global memrefs, we can materialize them once in each function they are referenced in (allocate a new memref, initialize with the initial value, and RAUW the allocated memrefs within that function).
- For non-immutable ones, if we know a set of entry functions, we can allocate and initialize them in those functions and pass additional memref arguments, one for each non-constant memref.
- If we don’t know a specific set of entry points, we may not be able to eliminate them completely and later codegen/conversion will need to handle them. This is also true even if we know the set of entry functions but some of these global memrefs are used in functions that are public.
Interaction with SymbolDCE
Global variables that are not used anymore should be eliminated by SymbolDCE when possible. The std.global
and std.static_memref
ops will implementSymbolOpInterface
to allow this.
Code generation choices
Code generation for global memrefs can fall into 2 categories:
- No code generation needed because the variable is constant and all of its users have folded the initial value into the code generation of the user. This represents the case where none of the users need that variable.
- Otherwise, code generation can generate LLVM
GlobalVariable
for each of these global variables. The get and set will map to load and store for that GlobalVariable. - Code generation for a global of type memref will be a GV that itself is a pointer type.
- For
static_memref
, essentially code generation can just generate a GV that corresponds to the underlying buffer allocation and short circuit the GV that a pointer to this allocation. - The initializers for these global variables can map to
llvm.global_ctors
, and de-initializers can map tollvm.global_dtors
.