RFC: Global Variables in MLIR

Hi all,

I recently ran into an issue with memref only dialects and easy discovery of constant memrefs. After some offline discussion with a few folks, I have the following RFC that allows defining constant or non-constant global variables including memrefs. Can folks please review and provide feedback?

Thanks
Rahul

Representing global variables in MLIR

Motivation

MLIR does not have a standard way or representing global (module scoped) variables. Dialects need to define dialect specific operations to represent global variables, e.g.,tf_saved_model.global_tensor.The RFC below proposes adding operations in the MLIR standard dialect to help define and reference global variables.

In addition, for memrefs, we propose additional variants that can help define a module scoped memref that is backed by buffers allocated in the static data section by the compiler. The motivation for these comes from the desire to represent constant valued memrefs in MLIR. MLIR dialects that use memrefs as opposed to tensors may need to determine if some of their inputs are compile time constants. This information may be needed for codegen optimizations or for other reasons. As an example, the
LMHLO dialect represents operation inputs and outputs as memref operands. Code generation or conversion to other dialects from LMHLO could benefit from knowing if certain inputs are constants. With memrefs involved, doing this analysis may be expensive as we need to track the last write to a given memref, while also taking into account any aliasing that might be happening due to memref casting and views. With the operations being proposed by this RFC, this analysis can be simplified if the memref in question is a global constant memref.

Proposal

Since MLIR does not define a standard pointer type, we propose 3 related operations for defining and using global variables:

  • An operation to declare/define a named global variable. This operation should always be module scoped (parent should be a ModuleOp, similar to FuncOp).
  • A pair of getter and setter operations that can be used in get or set the value of a global variable by name.
  • A specialized variant for memref that semantically allocates the backing buffer for the memeref as well.

The sections below discuss these operations in more detail.

Defining or declaring a global variable

The operation std.global will declare or define a named global variable. The operation will support the SymbolOpInterface that supports declaring the visibility of the global variable (public, nested, private).

If the operation is defining a global variable, it will need an initializer that defines how the global variable is initialized. We have a choice of representing this initializer as either an attribute attached to the operation or as a region that computes the initial value. Since attributes in general may not be able to represent all possible initial values (for example, initial values that depend on other global variables), we choose to represent the initializer as a single-block region that computes the initial value of the variable being defined. If the initializer region is empty, then the operation is assumed to declare the global variable but not define it.

An additional unit attribute can also mark the declared or defined variable as immutable (TODO: Would another parallel operation, std.constant be preferred? And then an std.get_constant getter would be needed as well).

Below is an example of this operation using generic MLIR syntax.

std.global { sym_name = "foo", type = i32, sym_visibility = "private" } ( {
  %0 = std.constant { value = 10 : i32 } : i32;
  std.yield %0;
})

This defines a global variable foo of type i32 with an initial value of 10. Custom assembly format could look like:

global @foo : i32 {  sym_visibility = "private", is_constant ) {
  %0 = std.constant { value = 10 : i32 } : i32;
  std.yield %0;
}

Accessing global variables

We will define getter and setter operations to read and write values of global variables defined using std.global. These operations refer to the variables by name.

func @main(...) {
  %0 = std.get_global { name = @foo } : i32
  %1 = ...
  std.set_global(%1) { name = @foo } : i32
}

Other names could be std.read_global and std.write_global. If the variable is marked constant, std.set_global will lead to a verification failure.

Specialized variants for memrefs

We would like to use similar mechanism to define module scoped memrefs. A memref can be thought of as 2 distinct things: the pointer, and the backing buffer. The std.global operation can be used to define just the pointer portion of the memref. And it’s initializer can potentially initialize it using alloc:

global @buffer : memref<4x4xf32> { sym_visibility = "private" } {
  %0 = alloc() : memref<4x4xf32>
  std.yield %0 : memref<4x4xf32>
}

This entails that we potentially also need a de-initializer or destructor attached to that global variable. Something like:

global @buffer : memref<4x4xf32> { sym_visibility = private} {
  %0 = alloc() : memref<4x4xf32>
  %1 = std.constant ... : tensor<4x4xf32>
  tensor_store %1, %0 : memref<4x4xf32>
  std.yield %0 : memref<4x4xf32>
}, {
  %0 = std.get_global { name = @buffer } : memref<4x4xf32>
  dealloc %0 : memref<4x4xf32>
}

This will likely benefit from a custom assembly format that can look like:

global @buffer : memref<4x4xf32> { sym_visibility = private}
init {
  %0 = alloc() : memref<4x4xf32>
  %1 = std.constant ... : tensor<4x4xf32>
  tensor_store %1, %0 : memref<4x4xf32>
  std.yield %0 : memref<4x4xf32>
} de-init {
  %0 = std.get_global { name = @buffer } : memref<4x4xf32>
  dealloc %0 : memref<4x4xf32>
}

However, we also need another variant that will define a memref and allocate the backing memory using some sort of statically allocated memory (i.e., without using alloc). Such an operation (called static_memref) can look like:

static_memref @buffer : memref<4x4xf32> { sym_visibility = private } {
  ^bb0(%this: memref<4x4xf32)
    // initializer, which assumes the memref itself is allocated.
    %1 = std.constant ... : tensor<4x4xf32>
    tensor_store %1, %0 : memref<4x4xf32>
}

The semantics of this operation are that the operation itself defines the memref and also allocates the backing buffer. The initializer then is just a single argument region that initializes the contents of the memref. The memref being defined here is implicitly constant (you cannot set_global the memref value defined by static_memref).

In addition to this, both std.global and std.static_memref can also declare that the contents of backing buffer is immutable using an additional attribute. This will allow transformations to look at the defining operation for that memref and infer that its contents are compile time constant values. Note that such memref’s themselves will also need to be constant. Say the unit attribute used to represent this is is_backing_buffer_immutable. Then the operation that defines a heap allocated immutable memref will look like:

global @heap_buffer : memref<4x4xf32>
  {sym_visibility = private, is_constant, is_backing_buffer_immutable}
init {
  %0 = alloc() : memref<4x4xf32>
  %1 = std.constant ... : tensor<4x4xf32>
  tensor_store %1, %0 : memref<4x4xf32>
  std.yield %0 : memref<4x4xf32>
} de-init {
  %0 = std.get_global { name = @buffer } : memref<4x4xf32>
  dealloc %0 : memref<4x4xf32>
}

and one using statically allocated memory:

static_memref @buffer : memref<4x4xf32> { sym_visibility = private, is_backing_buffer_immutable } {
  ^bb0(%this: memref<4x4xf32)
    // initializer, which assumes the memref itself is allocated.
    %1 = std.constant ... : tensor<4x4xf32>
    tensor_store %1, %this : memref<4x4xf32>
}

With these operations, given a memref, transformations can trace back the definition of the memref and if its backing buffer is immutable, infer the value of the memref contents and use that for improved coegen or other purposes. If the program writes to such a memref, the results are undefined.

Interactions with MemoryEffects

The std.global operation by itself does not do any memory allocation etc, so it will not need any memory effects trait (it will be marked as NoSideEffect).

The static_memref operation does allocate the backing buffers, so would need a MemAlloc<DefaultResource> trait to describe that behavior. However, unlike other such operations like std.alloc, the allocated memref is not returned as a result, so there is no result that also gets tagged with that memory effect. (TODO: Is using DefaultResource the right thing? Or do we need to add a new resource type to represent static allocations?)

The std.get_global operation will not get any memory effects as its not allocating a new memref, just getting a handle to an existing one. It will be marked with the NoSideEffect trait.

Interactions with buffer placement (for global memref variables)

We expect that these operations, being module scoped, will not participate in buffer placement. This will happen naturally because buffer placement is a function pass that works only on operations within functions and when they define a result.

Buffer alias analysis has been extracted into a separate analysis. It will likely need to be updated to handle module scoped buffers if needed. Current use for buffer placement will not need this though.

General transformations

We can envision several passes that transform global variables in specific ways.

  • Transform a global variable into a SSA value when possible. This will be possible if a global variable is live only within a group of functions and not live outside these functions. Need to make sure semantics are preserved, for example, if the initializer has side effects, we cannot do this, or atleast cannot remove the global variable.
  • Convert non-constant global variables to constant.
  • Constant fold global variables that are marked as constant (folder).
  • store → load forwarding (get followed by set).

For memrefs in particular, there are some additional transformations possible:

  • Transform module scoped global memrefs to heap allocated memrefs when possible. This is similar to transforming a non-memref global variable to an SSA value. This transformation can help in cases where say a global memref is used only within a single function or groups of functions and not live outside that group. Such transformation can happen prior to buffer placement so that the transformed memrefs can then participate in buffer placement.
  • Convert non immutable global memrefs to immutable if there are no writes.
  • Convert alloc/alloca’d memrefs with constant initializers to global memrefs when possible so that the same global constant memref can be shared across multiple functions.
  • Eliminate global memrefs:
    • For immutable global memrefs, we can materialize them once in each function they are referenced in (allocate a new memref, initialize with the initial value, and RAUW the allocated memrefs within that function).
    • For non-immutable ones, if we know a set of entry functions, we can allocate and initialize them in those functions and pass additional memref arguments, one for each non-constant memref.
    • If we don’t know a specific set of entry points, we may not be able to eliminate them completely and later codegen/conversion will need to handle them. This is also true even if we know the set of entry functions but some of these global memrefs are used in functions that are public.

Interaction with SymbolDCE

Global variables that are not used anymore should be eliminated by SymbolDCE when possible. The std.global and std.static_memref ops will implementSymbolOpInterface to allow this.

Code generation choices

Code generation for global memrefs can fall into 2 categories:

  • No code generation needed because the variable is constant and all of its users have folded the initial value into the code generation of the user. This represents the case where none of the users need that variable.
  • Otherwise, code generation can generate LLVM GlobalVariable for each of these global variables. The get and set will map to load and store for that GlobalVariable.
  • Code generation for a global of type memref will be a GV that itself is a pointer type.
  • For static_memref, essentially code generation can just generate a GV that corresponds to the underlying buffer allocation and short circuit the GV that a pointer to this allocation.
  • The initializers for these global variables can map to llvm.global_ctors, and de-initializers can map to llvm.global_dtors.

Cool, it is interesting to look at this. Overall, I’d start with the simplest possible thing first, then build the model out over time.

Some more detailed thoughts:

I would recommend separating the dynamic init case from the static init case. The static init case should be defined with an attribute, the dynamic case with a region. Representing the static case with a region will be fragile and difficult to work with.

It, isn’t clear clear to me that you need to build dynamic init/destroy logic into the std dialect like this. It would be just as reasonable to model it orthogonally in a different operation. For example, destructors can (and often are) modeled by having an “atexit” call in the initializer. Different languages have different initialization semantics – exposing it on the std.global doesn’t seem necessary.

It’s true that MLIR doesn’t have a general pointer type, but memref’s are pointerlike, I think that a “get memref” operation on the global should just take hte global symbol as an attribute and produce a memref.

Thanks for the proposal Rahul!

Haven’t had time to digest the entire proposal, but this part stood out to me. std.get_global as you have described is reading from some memory, which would mean that it is side effecting. Otherwise, you would incorrectly reorder gets/sets. Are you referring to just the static_memref case here?

River’s comment hints at one thing that is a bit irregular in this proposal when the global defines a memref: because you bundle the pointer and the buffer at the same time, it seems we can’t easily reason about the memory effects around this.

I am not sure I have a great suggestion around this. The first thing that would come to my mind would be to forbid the non-memref case: a single i32 scalar global is just a memref<i32>.
That would at least simplify the mental model here. We could also then consider that these only represent “constant addresses”: i.e. the data can change (unless marked as immutable) but the buffer won’t be reallocated.
That way: get_global would by like llvm.address_of and really side-effect free, you need separate load/store to access the data.

Thanks for this initial feedback. The idea of defining get_global to be similar to get_address_of seems interesting and might make the memref cases simple/redundant. Starting with supporting static initialization with attributes and keep dynamic initialization cases separate seems fine as well.

std.global can define a global variable (optionally immutable) of a given type, and
std.get_address_of will return the address as a memref of the appropriate type. And then reads and writes to that memref can happen. It looks though that MLIR does not allow defining memrefs with an arbitrary element type, just int, float, index, complex and vector types. So if we go with the memref model, it will only support global variables of that type. So for GV of one of these types, std.get_address_of will return a rank 0 memref.

Generalizing, since memref types hold more information than say the corresponding tensor type, we
can restrict std.global to define just memref types. The semantics will be that the memref type will describe the logical dimensions and layout in memory (the affine maps) backing the GV, and the initializer attribute will be a tensor which will define the initial value (optional) with the same semantics as if doing a tensor_store on the backing buffer (or equivalent for vector element type). This will be a restricted form, because only memref compatible types will be supported. Scalar values will need to be rank 0 memrefs.

To extend this to a general type, it seems we either need to extend memref types support arbitrary element types, or add a new pointer type (and GEP like, load, store and other operations) or leave that to other dialects.

Yes, restricting std.global to only define memref directly was what I had in mind, you would not do:

global @foo : i32
func @main(...) {
  %0 = std.get_global @foo : i32

But:

global @foo : memref<1xi32>
func @main(...) {
  %0 = std.get_global @foo : memref<1xi32>
  %cst0 = std.constant 0 : index
  %value = std.load %0[%cst0] : i32

Thanks. I think this is similar to an earlier proposal I had (did not post here). Let me revive that and send a new RFC.

Below is a new simpler proposal that does not attempt to bite too much:

Global Variables in MLIR

This document proposes adding new operation to the MLIR standard dialect to represent global variables with optional static initialization. Such global variables can optionally be marked immutable (constant) and that can enable representing constant memrefs.

[TOC]

Motivation

MLIR does not have a standard way or representing global (module scoped) variables. Dialects need to define dialect specific operations to represent global variables, e.g., tf_saved_model.global_tensor.The RFC below proposes adding operations in the MLIR standard dialect to help define and reference global variables. The model for such standard global variables proposed below is in some ways similar to LLVM GlobalVariables. We can define a global variable at module scope and then access it through a pointer in the code. Since MLIR does not have a generic pointer type, we will piggyback on using memref as our pointer type. The global variables defined can be optionally marked constant.

Proposal

We propose adding two new operations to MLIR standard dialect:

  • An operation to declare or define a named global variable. This operation should always be module scoped (parent should be a ModuleOp, similar to FuncOp)
  • An operation to get the “pointer” to the global variable (as a memref).

Define/Declare global variable

The operation std.global will declare or define a named global variable. In addition to the name, this operation will specify its type using a memref type, which will describe the layout of the underlying statically allocated buffer backing the global variable. An optional attribute will define the initial value (including undef/uninitialized) and another optional attribute will mark the global variable constant.

We propose the following attributes on this operation:

  • Attributes from SymbolOpInterface: sym_name and sym_visibility.
  • type: A memref type that defines the backing buffer that will be statically allocated for this global variable.
  • init_value: An optional ElementsAttr that specifies the initial value of the global variable. It should be compatible with the type of the global variable.
  • init_value_undef: An optional unit attribute that specifies that the initial value of the variable is undefined. If the operation has both init_value and init_value_undef attributes, verification will fail.
  • is_constant: An optional unit attribute that specifies that the value of the variable is constant in this module.

The std.global operation will be assumed to define the global variable if the initial value is specified (using either init_value or init_value_undef). Otherwise the std.global will be assumed to declare a global variable which is defined in another module. When the std.global defines a global variable, its type needs to be statically shaped memref (rank and all dimensions known). When declaring a global variable, the type could be more relaxed (TODO: is this the right choice here? We could be more strict now and relax when needed).

As an example, the following operation defines a 1D array of f32 elements:

std.global() : { sym_name = "my_array",
                 sym_visibility = "private",
                 init_value = dense<0.0, 1.0, 2.0, 3.0> : tensor<4xf32>,
                 type = memref<4xf32> }

Since it has an initial value, it will be considered a definition. Since it’s not a constant, code will be allow to modify its value. A more readable custom assembly syntax could look like:

global @my_array { sym_visibility = private, init_value = ... } : memref<4xf32>

Accessing global variables

The operation std.get_address_of will get the memref for a global variable by name. It will have a single attribute name which will identify the name of the global variable. The return value will be a
memref of the same type used to declare/define the named global variable.

func @foo(...) {
  %0 = "std.get_address_of"() { name = "foo" } : memref<4xf32>
  // read or write to %0.
}

If the named global variable is marked constant, writing to the memref obtained using std.get_address_of will be undefined. Custom assembly syntax could look like:

%9 = std.get_address_of @foo : memref<4xf32>

Limitations

The proposal below has 2 limitations:

  • MLIR memref types have a restricted set of elements that can be supported. Since this proposal relies on memref, it cannot represent global variables that cannot be represented as memrefs. As an example, custom types defined by dialects cannot be used as an element type in memrefs. Dialect can choose to represent such variables using operations defined in the dialect. Though they could still continue to use std.get_address_of to get the address of such custom global variables. Verification could be supported by having a method in SymbolOpInterface that returns the type of the pointer to the symbol.

  • It does not support dynamic initialization. Dynamic initialization could potentially be represented using a region attached to the std.global. However there are additional issues that need to be addressed there, like do we need constructor/destructor style of representation (2 regions) or something else. Also, even in the presence of dynamic initialization support, have a simpler form to support static initialization will still be useful. So this proposal does not handle dynamic initialization. In anticipation for future support, we could consider renaming std.global to std.static_global.

Interactions with other parts of MLIR

Symbol and effects interfaces

  • The std.global will implement SymbolOpInterface since it defines a symbol that needs to go in the ModuleOp symbol table.
  • The std.get_address_of will implement the SymbolUserOpInterface.
  • Since std.global defines a statically allocated global variable, we will likely need to it to support the MemoryEffects{MemAlloc<DefaultResource>} trait. There are two issues to resolve here:
    • Is DefaultResource the right resource to use here, or do we need to introduce a new StaticAllocationScopeResource?
    • How to distinguish between definition (which allocates memory) vs declaration (which does not)? Can MemoryEffects::getEffects() be dynamic?

Buffer placement

In general, we do not expected statically allocate global variables to participate in buffer placement. This will happen naturally as the std.global operations will be outside any function and buffer placement is a function pass, so it will not see these global symbol operations.

Possible transformations for global variables

Several transformations related to static global variables could be implemented.

  • Transform module scoped global definitions to heap allocates ones. This will require several conditions like:
    • The global variable defined should be private or nested.
    • It is either a constant, or its lifetime is such that it is not live across some set of functions. The transformation will allocate it in the heap and then pass around pointers to it.
    • If certain entry functions are known, then heap allocation can be placed in these entry functions and then additional memref’s, one per global, passed around (while checking other inter-procedural transformation constraints).
  • Dead global variables should be deleted by SymbolDCE. This should happen without any specific changes to the pass (due to SymbolOpInterface).
  • Transformation of non-constant global to constant globals if no writes seen.
  • Simple folders for reads from memrefs for constant globals.

Code generation for global variables

Code generation for statically initialized global variables should be straightforward as they would map directly to LLVM GlobalVariables.

  • For constant globals, there will be cases when code generation is not needed at all because the users can subsume the constant values in code generation.
  • For other cases where these global variables need to be allocated static memory, these can map to LLVM GlobalVariables. The std.global will generate a GV and std.get_address_of will codegen to a memref type SSA value with the GV pointer embedded within it.

This is looking really nice.

Since this is admittedly just for memref, let’s call it global_memref and get_global_memref_descriptor for clarity, and to make evolution to some future more general std.global nicer (it’s not obvious when that will happen or how different it will be from global_memref, so let’s avoid giving a false impression of generality to readers).

Also, I don’t think that marking std.global_memref as MemoryEffects{MemAlloc} makes sense. The std.global_memref doesn’t actually “allocate” anything really; it’s more like a declaration of existence of a memory block (such as in a .data or .rodata section) that “just exists” as far as the program is concerned. So I support adding a StaticAllocationScopeResource (or some equivalent notion) to model this accurately.

Also, is init_value_undef needed? Couldn’t absence of init_value be used to signify that?

Looks good overall! I have a few nits, and mostly we need to think the memory resource.

I don’t think we should restrict the parent of the global: it seems to me that this is a question of lowering pipeline. I can imagine having my accelerator.module as parent of a global for example.

Do we need an explicit attribute? Can’t we just take the absence of init_value for this?
Unless the absence of init_value is marking external global? (you didn’t describe how we differentiate external global I believe).

This would impact our lowering: if you accept external definition to have unknown shapes you need to always emit a complex descriptor for public globals, because a users in another module needs to be able to refer to this descriptor. With a static type you can lowering directly to a LLVM global containing only the data I think.

When do we need to differentiate?

Since this is admittedly just for memref, let’s call it global_memref and get_global_memref_descriptor for clarity, and to make evolution to some future more general std.global nicer (it’s not obvious when that will happen or how different it will be from global_memref, so let’s avoid giving a false impression of generality to readers).

That seems ok. I’d still like to call the getter std.get_global_memref since what it returns is a memref.

Also, I don’t think that marking std.global_memref as MemoryEffects{MemAlloc} makes sense. The std.global_memref doesn’t actually “allocate” anything really; it’s more like a declaration of existence of a memory block (such as in a .data or .rodata section) that “just exists” as far as the program is concerned. So I support adding a StaticAllocationScopeResource (or some equivalent notion) to model this accurately.

I agree that a static memory is not allocated at runtime. If we do not want to model these as allocating memory, we don’t need the StaticAllocationScopeResource as I believe that’s tied to MemoryEffects.

Also, is init_value_undef needed? Couldn’t absence of init_value be used to signify that?

This is to distinguish between declaration vs definition. Initial value (of either form) will signify definition. The 2 forms of initialization will support initialized vs uninitialized static data.

[/quote]

SGTM. The SSA value that is a memref is actually just the “memref descriptor” (not the data), so I thought that it could add some clarity that no data is being moved. Another option could be get_descriptor_for (rather than get_address_of). But I don’t feel strongly about this (fine with whatever you choose).

Ok. I’d assume we do not want these inside FuncOp or other function like operations. May be we can constrain it that way (no parent op can have FunctionLike trait?)

Yes, that’s the intent:

I suspect something like that, hence the question. We can start with restricting external globals with static shapes as well.

To distinguish between declarations and definitions. This was assuming we want to model definitions as having MemAlloc effect.

Good observation. One interesting thing that I had to deal with for tf_saved_model.global_tensor is that there is a distinction between the “type” of the global and the type of the “init value”, which is needed to model TensorFlow semantics.

  • The init_value is an attribute, so inherently statically shaped.
  • The “type” of the global could be something like tensor<?xf32> or tensor<128x?x?xf32>, meaning that any tensor of a compatible shape could be assigned to the global.

In that case, the semantics are more of a “pointer to tensor” because the underlying tensor/descriptor can be reassigned. That’s not the case for global_memref as we have agreed upon it thus far, but thought I might share that use case.

To support that use case later, we would need to introduce a mutable_global_descriptor or some such with a set_global_descriptor op that can be used to actually set the descriptor. In fact, I don’t think memrefs are even sufficient to be used for that use case – some sort of refcounting is needed to handle the lifetimes properly I think.

global_mutable_descriptor @global : memref<f32>
func @callee(%arg0: memref<f32>) {
  set_descriptor @global = %local
}
func @f() {
  %local = alloc()
  if %cond {
    call @callee(%local)
  }
  // should buffer-deallocation insert a dealloc() here for %local?
  // It's undecidable for arbitrarily complex @callee.
  // Or even impossible in case of externally defined @callee.
}

For that use case, MLIR will need to grow a runtime construct that supports dynamic lifetime management.

FYI @jurahul, the BufferizeConstantOp pattern that is being moved in this patch would be the one that would really benefit from using std.global_memref: https://reviews.llvm.org/D89916

Sounds good. Any more comments here before I revise the proposal to address some of the comments above?

As for the allocation modeling, do folks think we need to model this as a allocation of a new resource type or not model this as an allocation at all?

Same comment as Mehdi, we shouldn’t overly restrict the scope like this. FuncOp has no such restrictions. The only stipulation there(though I don’t even think it is codified) is that the function is within an operation defining a SymbolTable.

Could you change the representation to better support this instead? For example, it seems like if we don’t restrict the initializer to be ElementsAttr we could just use something like a UnitAttr initializer to represent the undef_initializer use case.

Yes, I’ll relax this part.

Do you mean that if the initializer is either ElementsAttr or UnitAttr, then the same init_value attribute can represent either a know or undef init value? I agree that’s cleaner than 2 attributes.

Yeah, that is what I meant. We could sugar it such that a user would never have to know the difference, but either way I’d say it’d be fine.