Motivation
A data layout description in MLIR is long overdue: there exist types that represent data in memory, but no specification on how exactly it is stored. At the LLVM dialect level, we reuse DataLayout from LLVM IR, which may lead to surprising behavior if transformations on the higher-level representations did not account for the DataLayout that will only be introduced later.
A set of decisions related to the target and layout are currently encoded as pass options, invisible in the IR, when converting to the LLVM dialect, for example, the bit width of the index
.
This will help clarify address computation model in memref
, enable support of custom element types in memref
, and open the door for generic modeling of custom types with memory-reference semantics. (Tangentially to data layout, being able to identify types that can represent memory references is also important for alias analysis. Containers that require elements to have data layout are likely memory references.)
Similarly, this can also help making built-in container types whose semantics is related to size or other layout information, e.g., vectors, to support dialect-specific element types.
Requirements
- The data layout mechanism should support MLIR’s open type system: we don’t know in advance which types will want to use it and how their layout can be parameterized (e.g., having size/abi-alignment/preferred-alignment tuple is likely not enough).
- The data layout should be controllable at different levels of granularity, for example nested modules may have different layouts.
- The data layout should be optional, i.e. types should have a well-specified default “natural” layout that can be used in absence of a layout descriptor.
Observations
MLIR does not allow one to express a type class, i.e. a set of all possible instantiations of the given parametric IR type such as IntegerType
or FloatType
, in the IR. It is not desirable to list all instantiations of a type class as their number may be huge (e.g., we support up to i16777215
). At the same time, the relation between different instances of a type class when it comes to layout is specific to the type (e.g., integers may want to round to the closest power-of-two bits, structures may want to pad elements, etc.).
Data layout can contain entries that are not specific to any type, such as endianness.
Proposal
Operations Defining the Data Layout
The proposed mechanism is based on existing MLIR concepts - attribute, operation and type interfaces. Operations willing to support a concrete data layout implement an DataLayoutOperationInterface
interface, which allows one to opaquely obtain the data layout to use for regions (transitively) nested in the operation. The layout parameters are described by an array attribute containing DataLayoutEntryAttr
instances. Each instance is essentially a pair of PointerUnion<Identifier, Type>
and Attribute
. The first element in the pair identifies entry, and the second element is an arbitrary attribute that describes alignment parameters in a type-specific way. Data layout entries specific to a type or type class use Type
as the first element of the pair, generic entries use an Identifier
.
For example, ModuleOp
will likely support the data layout attribute and may resemble the following in textual IR format:
module attributes { datalayout.spec = [
#datalayout.entry<"endianness", "big">,
#datalayout.entry<i8, {size = 8, abi = 8, preferred = 32}>
#datalayout.entry<i32, {size = 32, abi = 32, preferred = 32}>
#datalayout.entry<memref<f32>, {model = "bare"}>
]} {
// ops
}
Types Subject To Data Layout
Types willing to use a layout must implement the DataLayoutTypeInterface
by implementing the following functions:
static LogicalResult verifyLayoutEntries(ArrayRef<DataLayoutEntryAttr>);
size_t getSizeInBits(ArrayRef<DataLayoutEntryAttr>) const;
size_t getRequiredAlignment(ArrayRef<DataLayoutEntryAttr>) const;
and may additionally implement:
size_t getPreferredAlignment(ArrayRef<DataLayoutEntryAttr>) const;
The verification function is used to ensure the well-formedness of the list of relevant entries, e.g. the absence of duplicate entries or the use of the expected attribute kind to describe the type-specific layout properties. All the other functions are expected to return the corresponding value in bits. Their argument is an unordered list of DataLayoutEntryAttr
s with the first element either belonging to the same type class (e.g., IntegerType will receive entries for i8
, i32
, i64
, etc. when present) or being generic (i.e., all types receive all generic entries). Therefore, types cannot be affected by layout properties of other types otherwise than by querying those properties through the interface. The list may be empty if the layout is not specified, and the functions are still expected to return meaningful values, e.g. the natural alignment of the type, without failing the verification. Additional methods can be added later to this interface.
Each type class implements, and must document, an algorithm to compute layout-related properties. This algorithm is _fixed _and can use as parameters the parameters of the type instance (e.g., the integer bit width) and the data layout entries. The mechanism of interpreting the data layout entries is specified by the type class and is opaque to MLIR’s general mechanism.
Querying Data Layout Properties
The DataLayoutTypeInterface
defines final methods that can be used to query layout properties of a type:
size_t getSizeInBits(Region &scope) const;
size_t getRequiredAlignment(Region &scope) const;
size_t getPreferredAlignment(Region &scope) const;
// Potentially provide the default implementation.
size_t getSizeInBytes(Region &scope) const {
return ceil_div(getSizeInBits(scope), 8);
}
Note that these functions do not accept a list of data layout entries. Instead, the interfaces accept a region in which the request is scoped (different regions may belong to, e.g., different modules with different data layouts) and identifies the relevant data layout entries using the following procedure:
- find the first ancestor operation of
scope
that implements theDataLayoutOperationInterface
interface; - obtain the layout attribute from this op;
- continue looking for further ancestors and extract layout attributes from those ops;
- combine the attributes; if there are two entries with the same key, the innermost in the region nesting sequence is chosen and the rest are discarded.
We may also consider additional type- and dialect-specific mechanisms of how the nested data layouts specifications are combined, but this is excluded from this proposal for simplicity.
Corollary and Example: MemRef of MemRef
Enabling memref-of-memref is a frequent request that has been blocked by the lack of clear mechanism to allocate such objects and correctly index them due to unknown size of a single value of a memref type (depending on the lowering convention, memref
is treated as either a descriptor containing dynamic shape and stride information, or as a bare pointer to the first element; the size of pointer may also be unspecified at levels higher than LLVM dialect). It can be achieved by relaxing MemRefType
to accept as element type any type that implements DataLayoutTypeInterface
, making it itself implement this interface, and defining the size of index
type and the lowering convention in the data layout (assuming its equal to the pointer size).
The size computation algorithm, fixed for MemRefType
as required by the mechanism, is as follows:
- The data layout is expected to contain at most one entry, with a dictionary attribute containing a key “model” associated with a string attribute with value either “bare” or “descriptor”.
- In absence of the entry, “descriptor” model is assumed.
- If the model is “bare”, the size of the memref type is equal to that of the index type (query the mechanism recursively,
IndexType
is assumed to implementDataLayoutTypeInterface
). - If the model is “descriptor”, the size of the memref type is equal (3 + 2 * memref-rank) * size of the index type (recursive query + using parameters of the type).
- The required alignment is always equal to that of the index type.
Please note that this is an example illustrating the proposal. Objections specifically to modeling memrefs should not preclude the infrastructural proposal from being implemented (without the model for memrefs).
Corollary for Type Casting
The semantics of bitcast
/ reinterpret_cast
can now be clearly defined for types with data layout as interpreting the bit representation of the type as another type.
In addition, the data layout mechanism can be used to (dis)allow certain casts in dialect conversions.