[RFC] Data Layout Modeling

I’m not familiar with this at all, but it sounds like we have a modeling problem, or perhaps just an expedient hack afoot :wink:

If this is about calling convention lowering, then it really is a different kettle of fish that is similar-but-different to layout. If it is really about “there are different types that can be passed between functions that are memref-like” then I think it should be modeled explicitly, either was a different type or as a bit on MemRefType.

-Chris

Thanks everyone for joining the live discussion. As a quick summary, we raised questions of generalizing this mechanism to other target-related information, rebalancing the separation of concerns to give ops more control, and efficiency/scalability issues of having to go through dispatch on types for repeated queries.

I am considering the following direction to address the last two points. The data layout query functions can be placed in the operation interface rather than type interface. Specific ops can adapt their implementation to consider additional, domain-specific factors in computing the answer to the query before or after dispatching the query to individual types. The extent of flexibility (e.g., can the op completely override the result from type) remains to be defined. Furthermore, routing the queries through the scoping op allows us to create a cache indexed by type for the results and remove repeated dispatches through type interfaces. To avoid the dispatch on the operation interface, we may consider a wrapper mechanism similar to SymbolTable for cases where no op-specific behavior is necessary.

It stops being just the calling convention the moment we want to allocate a memref<2 x memref<f32>> because we need sizeof( memref<f32> ). This is why the layout discussion blocks memref-of-memref…

We can already use !llvm.ptr with built-in types, and adding extract-ptr/make-memref ops, even out-of-tree, looks straightforward. Calling functions that expect is simple by generating a wrapper that takes a memref in whatever format available, extracts the pointer and forwards it to the function. I have had code that does it on LLVM IR for over a year. So it is really only the matter of aliasing info before we are ready to reconsider this. But it’s a different discussion.

I’m just asking a question and please keep in mind I am quite unfamiliar with this topic still catching up. But is this layout modeling supposed to capture special swizzled layouts, interleaving and alignment like that of textures? Or are those kinds of memory allocations out of scope for these?

Layouts could be

RRRR
BBBB
GGGG
AAAA

or

RGBA
RGBA
RGBA

or spatial swizzling and twiddling like

0 1 4 5
2 3 6 7

This isn’t the kind of datalayout that is targeted here, it is more in the LLVM terminology: LLVM Language Reference Manual — LLVM 16.0.0git documentation

What you’re describing here isn’t a property of the target but a property of each individual value in isolation, which makes me think about a type annotation or an op attribute.

That might work better yes. Sounds good, thanks.

Revised RFC

Requirements

  • The data layout mechanism should support MLIR’s open type system: we don’t know in advance which types will want to use it and how their layout can be parameterized (e.g., having size/abi-alignment/preferred-alignment tuple is likely not enough).
  • The data layout should be controllable at different levels of granularity, for example nested modules may have different layouts.
  • The data layout should be optional, i.e. types should have a well-specified default “natural” layout that can be used in absence of a layout descriptor.
  • [New in this revision] The mechanism should be controllable at the scope (op/region) level without changing the types; in particular, in some domains, there is need to impose additional restrictions on built-in types without modifying them upstream.
  • [New in this revision] Efficiency of the implementation: layout queries should be cached when possible; it is preferable to avoid interface-based dispatch for built-in types.

Proposal

The proposal is based on the existing MLIR concepts: operation and type interfaces, and attributes. For additional flexibility and to support type-independent target properties, it is using a double dispatch: first, at the operation level, then, if necessary, at the type level. This is complemented by a caching mechanism to minimize the overhead of interface calls.

For the purposes of this proposal, the data layout API consists of three properties: type size, ABI alignment requirements, preferred alignment. Since all of these provide one value for a given type and for the sake of brevity, the text below discusses only the type size property assuming the remaining properties can be implemented similarly.

Operations Defining the Data Layout

Operations willing to support a data layout implement a DataLayoutOperationInterface interface, which allows one to opaquely obtain the data layout to use for regions (transitively) nested in the operation. The layout parameters are described by an array attribute containing DataLayoutEntryAttr instances. Each instance is essentially a pair of PointerUnion&lt;Identifier, Type> and Attribute. The first element in the pair identifies entry, and the second element is an arbitrary attribute that describes alignment parameters in an operation- and type-specific way. Data layout entries specific to a type or type class use Type as the first element of the pair, generic entries use an Identifier.

For example, ModuleOp will likely support the data layout attribute and may resemble the following in textual IR format:

module attributes { target.dl_spec = [
  #target.dl_entry<"target.endianness", "big">,
  #target.dl_entry<i8, {size = 8, abi = 8, preferred = 32}>
  #target.dl_entry<i32, {size = 32, abi = 32, preferred = 32}>
  #target.dl_entry<memref<f32>, {model = "bare"}>
]} {
  // ops
}

The attributes belong to the new dialect, target, provisioned to include other target-specific attributes when they become necessary.

The interface does not define functions for querying the data layout but only the hooks for handling those queries. For querties, users need to construct a DataLayout object (see below).

The interface provides static overridable functions that serve as hooks for implementing op-specific query behavior.

/*static*/ unsigned getTypeSize(Type t, const DataLayout &dl,
                                ArrayRef<DataLayoutEntryAttr> params);
/*static*/ unsigned getABIAlignment(Type t, const DataLayout &dl,
                                    ArrayRef<DataLayoutEntryAttr> params);
/*static*/ unsigned getPreferredAlignment(Type t, const DataLayout &dl,
                                          ArrayRef<DataLayoutEntryAttr> params);
/* ... */

These functions accept a type for which the query is performed, a reference to the DataLayout object that can be used for recursive queries, and a potentially empty list of DataLayoutEntryAttrs relevant to this type. They are required to always provide a reasonable default response to the query, even in absence of parameters, and should not rely on any information other than that provided as function arguments. (This is partially enforced by the functions being static and thus not having access to the raw operation or its attributes). These functions are used in handling the data layout querties by DataLayout and should not be called directly. Providing custom implementations of these functions in specific operations allows these operations to control the data layout without needing to change the type.

Default Handling of Data Layout Queries in Operations

The static interface methods listed above have the following default implementation. For built-in types, the implementation derives the size and alignment requirements directly from the type properties such as bitwidth. (For non-scalar types, the type itself can be providing the implementation). Other types are expected to implement the DataLayoutTypeInterface, described below, to which the query is dispatched.

Types Subject to Data Layout

Custom types willing to opt into the data layout mechanism must implement the DataLayoutTypeInterface with the following (instance) methods:

unsigned getTypeSize(const DataLayout &dl, ArrayRef<DataLayoutEntryAttr> params);
unsigned getABIAlignment(const DataLayout &dl,
                         ArrayRef<DataLayoutEntryAttr> params);
unsigned getPreferredAlignment(const DataLayout &dl,
                               ArrayRef<DataLayoutEntryAttr> params);
/* ... */
LogicalResult verifyEntries(ArrayRef<DataLayoutEntryAttr> params);

The verification function is used to ensure the well-formedness of the list of relevant entries, e.g. the use of the expected attribute kind to describe the type-specific layout properties. All the other functions are expected to return the corresponding value for this type. Their first argument is a DataLayout object that can be used for recursive queries. Their second argument is an unordered list of DataLayoutEntryAttrs with the first element either belonging to the same type class (e.g., IntegerType will receive entries for i8, i32, i64, etc. when present) or being generic (i.e., all types receive all generic entries). Therefore, types cannot be affected by layout properties of other types otherwise than by querying those properties through the interface. The list may be empty if the layout is not specified, and the functions are still expected to return meaningful values, e.g. the natural alignment of the type, without failing the verification. Additional methods can be added later to this interface.

Each type class implements, and must document, an algorithm to compute layout-related properties. This algorithm is _fixed _and can use as parameters the parameters of the type instance (e.g., the integer bit width) and the data layout entries. The mechanism of interpreting the data layout entries is specified by the type class and is opaque to MLIR’s general mechanism.

Queries on a DataLayout object and caching

The DataLayout object is the central place for data layout queries. It provides both isolation at type level, i.e. hooks handling layout queries for a specific type only see attributes related to that type, and caching. It points back to the operation for which it was constructed and the original attribute to check if the cache is still valid. The implementation can resemble the following.

class DataLayout {
  explicit DataLayout(DataLayoutOperationInterface op)
    : originalLayout(op ? op.getOperation()->getAttribute("target.dl_spec")
                        : nullptr),
      scope(op) {}

  unsigned getTypeSize(Type t) const {
    // Check if the cache is still valid.
    assert(!scope || (mixWithAncestors(originalLayout) ==             
        (scope.getOperation()->getAttribute("target.dl_spec"))));
    if (sizes.count(t))
      return sizes[t];
    if (scope)
      sizes[t] = scope.getTypeSize(t, *this, extractParams(t, scope));
    else
      sizes[t] = t.cast<DataLayoutTypeInterface>().getTypeSize(*this, {});
    return sizes[t];
  }

  // ...
private:
  const Attribute originalLayout;
  DataLayoutOperationInterface scope;
  // Caches for individual queries.
  mutable DenseMap<Type, unsigned> sizes;
  mutable DenseMap<Type, unsigned> alignmentABI;
  mutable DenseMap<Type, unsigned> alignmentPreferred;
};

A similar mechanism is used for each query. In debug mode, it asserts that the cached layout information is still correct with respect to the layout attribute. The query is first checked in the relevant cache and, if not present, is dispatched to the operation through its interface and cached. DataLayout::mixWithAncestors takes the data layout spec attribute on the current op and combines it spec attributes of all ancestor ops implementing DataLayoutOperationInterface by using the most nested entry with the similar key and concatenating the lists of entries otherwise. This mechanism can be extended in the future to be more type- and operation-specific as well as to use an attribute combination mechanism if/when MLIR provides one. DataLayout::extractParams takes this combined form and extracts the entries relevant for the given type.

Verification

The verification happens in multiple steps and can be customized by hooks in the operation interface and type interface. At the top level, the attribute verifier of the target.dl_spec ensures the absence of duplicate entries, the attribute being attached to an op that implements DataLayoutOperationInterface, and may additionally verify refinement correctness (see below) of nested layouts. The rest of the verification happens in the DataLayoutOperationInterface verification hook (triggered by the operation verifier), which dispatches to

LogicalResult verifyEntries(ArrayRef<DataLayoutEntryAttr> params);

after extracting the attribute values. This is an interface static method with the default implementation that dispatches groups of params that are associated with the subclass T of Type implementing DataLayoutTypeInterface to T::verifyEntries, and non-type entries prefixed with target. dialect to TargetDialect::verifyDLEntries. Entries with other prefixes and types not implementing DataLayoutTypeInterface fail verification by default, but can be accepted by custom operations that support them.

As a specific example, a custom op may support additional parameters in the layout specification attribute that are not supported by the relevant type. In this case, the operation must redefine its verifyEntries to verify the well-formedness of these additional parameters and remove them from the list before delegating the rest of the verification to the type. Similarly, it must reimplement the query methods in the operation interface to handle these additional parameters.

Refinement

For additional verification, we introduce the notion of layout refinements. A layout newLayout is a refinement of the oldLayout if the new layout respects all constraints of the old layout. For example, the new layout may introduce smaller ABI layout requirements for some types. Note that that two layouts may be mutual refinements of each other if they apply to different subsets of types. Types can specify if a set of data layout entries is a refinement of another set by redefining the static interface method

/*static*/ bool isRefinement(ArrayRef<DataLayoutEntryAttr> oldLayout,
                             ArrayRef<DataLayoutEntryAttr> newLayout);

which returns true by default.

Refinement is relevant in two cases. First, in case of nested data layout specifications, the op-level (or attribute-level) verifier ensures that nested specifications refine those of their ancestors so that the overall layout still makes sense. Second, in case of transformations that change the data layout, the transformation must ensure that the new layout is a refinement of the old one. There is no automated mechanism for the latter since MLIR provides various IR mutation mechanisms, including low-level attribute manipulation; it is entirely the responsibility of the user to perform the check by calling TargetDialect::isDLRefinement(ArrayRef&lt;DataLayoutEntry>, ArrayRef&lt;DataLayoutEntry>, function_ref&lt;bool(DataLayoutEntry)> nonTypeRefinementCheck) that dispatches to individual types.

Alternatives Considered

I considered providing query methods directly on the operation interface. This makes it hard to reuse cached results and to restrict the set of parameters forwarded down to specific calls because op interfaces have access to the raw operation and thus raw attribute list.

I considered providing DataLayout getDataLayout() { return DataLayout(*this); } in the op interface, but it looks like it would favor the iface.getDataLayout().getTypeSize(type) pattern that ignores caching. Having to create a separate object will force the caller to consider keeping it around for further queries, and there is little readability loss.

I considered implementing the cache in the interface class itself rather than in a separate object. This would have been correct because cache information is transient so there is no correctness problem with storing it in the *Op class even if it gets discarded when abstracting to Operation *. However, it is a performance issue of dropping the cache in this case, and so passing the *Op instances by-value in presence of large caches in them.