[RFC] Data Layout Modeling

It would be a built-in:

if I don’t modify the example. I think discussing where this specific property should live is slightly beyond the point of the current RFC. It does not propose this property, only the mechanism, which I agree needs more detail on prefixing in general.

The way I see it, this data layout problem can basically be described, in terms of mechanism, as “module-scoped parameterization of interfaces [that can be persisted in IR]”. The current proposal is applying this to a specific problem of determining bit size / alignment of types after lowering, but I don’t think that’s essential. We could use the same mechanism for parameterizing lowering to LLVM.

For example, instead of #datalayout.entry<memref<f32>, {model = "bare"}> it feels more like we should have

module attributes {
  datalayout.spec = #datalayout.spec_table<#datalayout.entry...]>
  llvm.spec = #llvm.spec_table<[
    #llvm.type_lowering_entry<memref<f32>, {model = "bare"}>
  ]>
}

Then, when you query for the bit size of memref, one of the strategies we have for resolving that is:

  • query LowerToLLVMTypeInterface, which requires passing in a #llvm.spec_table attribute.
  • In this case, memref would implement that interface and say “yes, I know how to lower myself to LLVM with {model = "bare"}
  • then use BitSizeAndAlignmentTypeInterface (analogous to LLVM’s datalayout) to resolve the bit size and alignment of the new type
    • All LLVM types would implement BitSizeAndAlignmentTypeInterface, and that interface would take a #datalayout.spec_table attribute as an argument to configure themselves.
  • using the information above, compute the final bit size of memref<f32>

I don’t expect memref itself to implement BitSizeAndAlignmentTypeInterface. And LowerToLLVMTypeInterface is independently reusable for lowering like Chris wants. (the question of how memref will implement LowerToLLVMTypeInterface without polluting builtin with a dep on the llvm dialect is a separate question, but an important one…)

Ultimately, what we want is pretty simple, just interfaces that are efficiently parameterized by some sort of persistent IR annotation on the module (or perhaps a scoped set of modules). In this case, the #foo.spec_table custom attributes create efficient C++ data structures, which are mandatory arguments of the methods on the corresponding interfaces.

We don’t need to couple BitSizeAndAlignmentTypeInterface with LowerToLLVMTypeInterface or make them part of some overarching target abstraction.

I think the important thing here is that each interface is likely to have a separate set of ways in which it is parameterized, so module attributes { target.abi = "linux-x86_64-somethingorother" } is insufficient, or at least needs to be layered on top of the individual attributes that configure each interface / dialect we are going to be using in the pipeline.

It seems to me that this showing a coupling between the memref type properties inside the system (the bit size) with the lowering strategy which is hard-coded here.
In such case I would rather have the datalayout encode the bit size directly, and have the frontend or whoever sets up the pipeline populate the data layout accordingly.
For example if you build your compiler pipeline with a bare lowering, you could populate the datalayout from the LowerToLLVMTypeInterface.
The point is that this is all resolved ahead of time, and not on the fly by the client analyses/transformations which really shouldn’t have to know about all the possible lowering interfaces.

I think we need on-the-fly. For example, in the non-bare convention, a memref can have many possible ranks and element types, you can’t just resolve all possible ones ahead of time. I can’t see how, without a callback into the memref type, one could compactly encapsulate knowledge like memref<?xf32> has the same size as memref<4xf32> and memref<?xi32>, but different from memref<?x?xf32>.

What I was saying about having BitSizeAndAlignmentTypeInterface query LowerToLLVMTypeInterface implies no coupling between them. We just don’t currently have a way to do it without coupling them. It’s conceptually simple: a pipeline author configures BitSizeAndAlignmentTypeInterface with a “fallback” or “extension” (for lack of a better word) that is just a class with some virtual methods that can be consulted as fallbacks if a type doesn’t implement the interface itself.

This is general functionality that would be useful to upgrade our interface system anyway. It seems like a similar mechanism would allow the LLVM dialect to inject a way of resolving LowerToLLVMTypeInterface for builtin types without requiring builtin to depend on llvm.

This is analogous to a problem that arises in programming language design with “trait”-like systems. Using Rust as an example that I’m familiar with, you can say “If a type implements this other trait, then the type implements my trait too”. You can also implement your traits on the builtin types without changing the builtin types. MLIR doesn’t have that kind of flexibility, and I think it’s going to be hard to scale our trait ecosystem without that.

I’m missing what this actually means and how this is relevant: we can’t know this at compile time.

The proposal is about MLIR types carrying relevant data layout information. What’s captured in the spec shouldn’t be concerned about downstream lowering to LLVM except of course being guided by how it was designed in LLVM for reference.

This looks great to me! A few comments.

  1. I think the RFC will benefit from a para (right before or after “Requirements”) on what information we’d like to see captured in the first place as part of the data layout — to start with and going forward in the near future. For eg., you have size, alignment (required and preferred), endianness, and something custom like model for memrefs for now.

  2. What happens if the data layout align attribute says 8 for i64, but an alloc on a memref<i64> op says alignment = 64 : i64 or alignment = 16 : i64?

  3. What about alignment info for vector types as part of data layout and how is that reconciled with the alignment info for the elemental types? Similarly, memref types?

  4. On a very minor note, consider renaming: abi → align_abi / alignment_abi, preferred → align_prefer / alignment_preferred.

Isn’t the key for a dictionary attribute always required to be an identifier? How does one encode the type here without being able to create a (dummy) constant of that type? Do specific hardcoded names map to specific types? For eg. memref<i32> vs memref<f32>.

This is a really nice topic, I’d propose we spend some time brainstorming more on this during the ODM tomorrow?

SGTM

[extra characters to satisfy discord]

I think this is an interesting extension, but I don’t really see what kind of infrastructural support it would require. The only common thing between such parameterized interfaces seems to be the scoped lookup. It looks very straightforward to implement, like a dozen of LoC, and different interfaces may want different “compose” rules. Unless we have many such interfaces, it looks easier to just write the lookup for each of them.

Whether we want to a LowerToLLVMTypeInterface is mostly an orthogonal discussion IMO. The memref “convention” is a quirk we have to live with because MLIR doesn’t have a way to annotate non-aliasing function arguments and translate that information to the LLVM IR. Once it does, the “convention” will hopefully disappear.

Why not?

And a side question, how to we do memref-of-memref or memref-of-custom-type?

We can’t know the size of the data the memref points to, but we can know the size of the memref object itself, e.g., sizeof(pointer).

This is a good point. I suppose any information that is relevant to answering the queries exposed by the interface (size, minimum alignment, preferred alignment).

Alignment requirements are implicitly “minimal”, so the final alignment is gcd of required (also probably assuming power-of-two alignments only). If something is required to be aligned at 8, and happens to be also aligned at 64, it’s not a big deal. Same here, the attribute required 8, but the op gave it 64, 8 is still respected.

Up to the type definition in both cases. For vectors, we may consider having flags/enums in the attribute that says how to treat them, e.g. same alignment as elements, different explicitly specified alignment, power-of-two-closest-to-num-elements times element alignment, etc. For memrefs, I’d go for same alignment as pointer/index (note that this does not specify how the data is aligned, only the memref itself).

The top-level attribute is not a dictionary, but a list of custom attributes, each of which is conceptually a key-value pair.

TypeAttr

The are types. Entries for both memref<i32> and memref<f32> (as well as any other MemRefType) will be sent to MemRefType::getSizeInBits(ArrayRef<DataLayoutEntry>). It’s up to the type how to interpret that. For memrefs specifically, I am thinking of only allowing one entry, regardless of the actual type, because they don’t need to change depending on element type.

Thank you for the great discussion today @ftynse. A conversation can be much higher bandwidth than forum posts sometimes, but I still miss whiteboards :slight_smile:

-Chris

I was thinking in terms of keeping the information normalized – we need something that knows how to lower the memref into more primitive types, and so we should be able to infer the BitSizeAndAlignmentTypeInterface from that. However, I think that Chris provided some experience in today’s talk that some amount of denormalization here is useful, which made sense to me. Thus, I have changed my mind about that statement (or at least am on the fence / not feeling very strongly about it).

Thanks for driving this, @ftynse! It looks very promising!

A few comments on the “bare” memrefs:

  1. Annotating non-aliasing function arguments (temporary workaround) is not the only use case. Another use case is to provide an alternative lowering for targets that are not able to deal with the “complexity” of the default memref descriptor. Another one is to model invocations to arbitrary functions from external libraries which take bare pointers as arguments instead of a memref descriptors.
  2. As of today, the bare pointer calling convention only impacts memrefs at the boundaries of a call/function, not all the memrefs. Therefore, the memref lowering to a bare pointer is not a generic type property right now. It depends on the operation in which the memref is being used. This was implemented like this to minimize the customization impact on the LLVM lowering. We could generalize this to apply to all the types, if needed, and have a much better/cleaner implementation if we could decouple the memref lowering from the LLVM lowering itself, as we discussed in the past. This would be great!
  3. To decouple the implementation, as you suggested in the past, we would need a way to represent pointers before the LLVM dialect and add operations to extract/insert pointers from/into a memref. However, this would need a separate discussion.

I hope this clarifies the situation.

I’m not familiar with this at all, but it sounds like we have a modeling problem, or perhaps just an expedient hack afoot :wink:

If this is about calling convention lowering, then it really is a different kettle of fish that is similar-but-different to layout. If it is really about “there are different types that can be passed between functions that are memref-like” then I think it should be modeled explicitly, either was a different type or as a bit on MemRefType.

-Chris

Thanks everyone for joining the live discussion. As a quick summary, we raised questions of generalizing this mechanism to other target-related information, rebalancing the separation of concerns to give ops more control, and efficiency/scalability issues of having to go through dispatch on types for repeated queries.

I am considering the following direction to address the last two points. The data layout query functions can be placed in the operation interface rather than type interface. Specific ops can adapt their implementation to consider additional, domain-specific factors in computing the answer to the query before or after dispatching the query to individual types. The extent of flexibility (e.g., can the op completely override the result from type) remains to be defined. Furthermore, routing the queries through the scoping op allows us to create a cache indexed by type for the results and remove repeated dispatches through type interfaces. To avoid the dispatch on the operation interface, we may consider a wrapper mechanism similar to SymbolTable for cases where no op-specific behavior is necessary.

It stops being just the calling convention the moment we want to allocate a memref<2 x memref<f32>> because we need sizeof( memref<f32> ). This is why the layout discussion blocks memref-of-memref…

We can already use !llvm.ptr with built-in types, and adding extract-ptr/make-memref ops, even out-of-tree, looks straightforward. Calling functions that expect is simple by generating a wrapper that takes a memref in whatever format available, extracts the pointer and forwards it to the function. I have had code that does it on LLVM IR for over a year. So it is really only the matter of aliasing info before we are ready to reconsider this. But it’s a different discussion.

I’m just asking a question and please keep in mind I am quite unfamiliar with this topic still catching up. But is this layout modeling supposed to capture special swizzled layouts, interleaving and alignment like that of textures? Or are those kinds of memory allocations out of scope for these?

Layouts could be

RRRR
BBBB
GGGG
AAAA

or

RGBA
RGBA
RGBA

or spatial swizzling and twiddling like

0 1 4 5
2 3 6 7

This isn’t the kind of datalayout that is targeted here, it is more in the LLVM terminology: LLVM Language Reference Manual — LLVM 16.0.0git documentation

What you’re describing here isn’t a property of the target but a property of each individual value in isolation, which makes me think about a type annotation or an op attribute.

That might work better yes. Sounds good, thanks.

Revised RFC

Requirements

  • The data layout mechanism should support MLIR’s open type system: we don’t know in advance which types will want to use it and how their layout can be parameterized (e.g., having size/abi-alignment/preferred-alignment tuple is likely not enough).
  • The data layout should be controllable at different levels of granularity, for example nested modules may have different layouts.
  • The data layout should be optional, i.e. types should have a well-specified default “natural” layout that can be used in absence of a layout descriptor.
  • [New in this revision] The mechanism should be controllable at the scope (op/region) level without changing the types; in particular, in some domains, there is need to impose additional restrictions on built-in types without modifying them upstream.
  • [New in this revision] Efficiency of the implementation: layout queries should be cached when possible; it is preferable to avoid interface-based dispatch for built-in types.

Proposal

The proposal is based on the existing MLIR concepts: operation and type interfaces, and attributes. For additional flexibility and to support type-independent target properties, it is using a double dispatch: first, at the operation level, then, if necessary, at the type level. This is complemented by a caching mechanism to minimize the overhead of interface calls.

For the purposes of this proposal, the data layout API consists of three properties: type size, ABI alignment requirements, preferred alignment. Since all of these provide one value for a given type and for the sake of brevity, the text below discusses only the type size property assuming the remaining properties can be implemented similarly.

Operations Defining the Data Layout

Operations willing to support a data layout implement a DataLayoutOperationInterface interface, which allows one to opaquely obtain the data layout to use for regions (transitively) nested in the operation. The layout parameters are described by an array attribute containing DataLayoutEntryAttr instances. Each instance is essentially a pair of PointerUnion&lt;Identifier, Type> and Attribute. The first element in the pair identifies entry, and the second element is an arbitrary attribute that describes alignment parameters in an operation- and type-specific way. Data layout entries specific to a type or type class use Type as the first element of the pair, generic entries use an Identifier.

For example, ModuleOp will likely support the data layout attribute and may resemble the following in textual IR format:

module attributes { target.dl_spec = [
  #target.dl_entry<"target.endianness", "big">,
  #target.dl_entry<i8, {size = 8, abi = 8, preferred = 32}>
  #target.dl_entry<i32, {size = 32, abi = 32, preferred = 32}>
  #target.dl_entry<memref<f32>, {model = "bare"}>
]} {
  // ops
}

The attributes belong to the new dialect, target, provisioned to include other target-specific attributes when they become necessary.

The interface does not define functions for querying the data layout but only the hooks for handling those queries. For querties, users need to construct a DataLayout object (see below).

The interface provides static overridable functions that serve as hooks for implementing op-specific query behavior.

/*static*/ unsigned getTypeSize(Type t, const DataLayout &dl,
                                ArrayRef<DataLayoutEntryAttr> params);
/*static*/ unsigned getABIAlignment(Type t, const DataLayout &dl,
                                    ArrayRef<DataLayoutEntryAttr> params);
/*static*/ unsigned getPreferredAlignment(Type t, const DataLayout &dl,
                                          ArrayRef<DataLayoutEntryAttr> params);
/* ... */

These functions accept a type for which the query is performed, a reference to the DataLayout object that can be used for recursive queries, and a potentially empty list of DataLayoutEntryAttrs relevant to this type. They are required to always provide a reasonable default response to the query, even in absence of parameters, and should not rely on any information other than that provided as function arguments. (This is partially enforced by the functions being static and thus not having access to the raw operation or its attributes). These functions are used in handling the data layout querties by DataLayout and should not be called directly. Providing custom implementations of these functions in specific operations allows these operations to control the data layout without needing to change the type.

Default Handling of Data Layout Queries in Operations

The static interface methods listed above have the following default implementation. For built-in types, the implementation derives the size and alignment requirements directly from the type properties such as bitwidth. (For non-scalar types, the type itself can be providing the implementation). Other types are expected to implement the DataLayoutTypeInterface, described below, to which the query is dispatched.

Types Subject to Data Layout

Custom types willing to opt into the data layout mechanism must implement the DataLayoutTypeInterface with the following (instance) methods:

unsigned getTypeSize(const DataLayout &dl, ArrayRef<DataLayoutEntryAttr> params);
unsigned getABIAlignment(const DataLayout &dl,
                         ArrayRef<DataLayoutEntryAttr> params);
unsigned getPreferredAlignment(const DataLayout &dl,
                               ArrayRef<DataLayoutEntryAttr> params);
/* ... */
LogicalResult verifyEntries(ArrayRef<DataLayoutEntryAttr> params);

The verification function is used to ensure the well-formedness of the list of relevant entries, e.g. the use of the expected attribute kind to describe the type-specific layout properties. All the other functions are expected to return the corresponding value for this type. Their first argument is a DataLayout object that can be used for recursive queries. Their second argument is an unordered list of DataLayoutEntryAttrs with the first element either belonging to the same type class (e.g., IntegerType will receive entries for i8, i32, i64, etc. when present) or being generic (i.e., all types receive all generic entries). Therefore, types cannot be affected by layout properties of other types otherwise than by querying those properties through the interface. The list may be empty if the layout is not specified, and the functions are still expected to return meaningful values, e.g. the natural alignment of the type, without failing the verification. Additional methods can be added later to this interface.

Each type class implements, and must document, an algorithm to compute layout-related properties. This algorithm is _fixed _and can use as parameters the parameters of the type instance (e.g., the integer bit width) and the data layout entries. The mechanism of interpreting the data layout entries is specified by the type class and is opaque to MLIR’s general mechanism.

Queries on a DataLayout object and caching

The DataLayout object is the central place for data layout queries. It provides both isolation at type level, i.e. hooks handling layout queries for a specific type only see attributes related to that type, and caching. It points back to the operation for which it was constructed and the original attribute to check if the cache is still valid. The implementation can resemble the following.

class DataLayout {
  explicit DataLayout(DataLayoutOperationInterface op)
    : originalLayout(op ? op.getOperation()->getAttribute("target.dl_spec")
                        : nullptr),
      scope(op) {}

  unsigned getTypeSize(Type t) const {
    // Check if the cache is still valid.
    assert(!scope || (mixWithAncestors(originalLayout) ==             
        (scope.getOperation()->getAttribute("target.dl_spec"))));
    if (sizes.count(t))
      return sizes[t];
    if (scope)
      sizes[t] = scope.getTypeSize(t, *this, extractParams(t, scope));
    else
      sizes[t] = t.cast<DataLayoutTypeInterface>().getTypeSize(*this, {});
    return sizes[t];
  }

  // ...
private:
  const Attribute originalLayout;
  DataLayoutOperationInterface scope;
  // Caches for individual queries.
  mutable DenseMap<Type, unsigned> sizes;
  mutable DenseMap<Type, unsigned> alignmentABI;
  mutable DenseMap<Type, unsigned> alignmentPreferred;
};

A similar mechanism is used for each query. In debug mode, it asserts that the cached layout information is still correct with respect to the layout attribute. The query is first checked in the relevant cache and, if not present, is dispatched to the operation through its interface and cached. DataLayout::mixWithAncestors takes the data layout spec attribute on the current op and combines it spec attributes of all ancestor ops implementing DataLayoutOperationInterface by using the most nested entry with the similar key and concatenating the lists of entries otherwise. This mechanism can be extended in the future to be more type- and operation-specific as well as to use an attribute combination mechanism if/when MLIR provides one. DataLayout::extractParams takes this combined form and extracts the entries relevant for the given type.

Verification

The verification happens in multiple steps and can be customized by hooks in the operation interface and type interface. At the top level, the attribute verifier of the target.dl_spec ensures the absence of duplicate entries, the attribute being attached to an op that implements DataLayoutOperationInterface, and may additionally verify refinement correctness (see below) of nested layouts. The rest of the verification happens in the DataLayoutOperationInterface verification hook (triggered by the operation verifier), which dispatches to

LogicalResult verifyEntries(ArrayRef<DataLayoutEntryAttr> params);

after extracting the attribute values. This is an interface static method with the default implementation that dispatches groups of params that are associated with the subclass T of Type implementing DataLayoutTypeInterface to T::verifyEntries, and non-type entries prefixed with target. dialect to TargetDialect::verifyDLEntries. Entries with other prefixes and types not implementing DataLayoutTypeInterface fail verification by default, but can be accepted by custom operations that support them.

As a specific example, a custom op may support additional parameters in the layout specification attribute that are not supported by the relevant type. In this case, the operation must redefine its verifyEntries to verify the well-formedness of these additional parameters and remove them from the list before delegating the rest of the verification to the type. Similarly, it must reimplement the query methods in the operation interface to handle these additional parameters.

Refinement

For additional verification, we introduce the notion of layout refinements. A layout newLayout is a refinement of the oldLayout if the new layout respects all constraints of the old layout. For example, the new layout may introduce smaller ABI layout requirements for some types. Note that that two layouts may be mutual refinements of each other if they apply to different subsets of types. Types can specify if a set of data layout entries is a refinement of another set by redefining the static interface method

/*static*/ bool isRefinement(ArrayRef<DataLayoutEntryAttr> oldLayout,
                             ArrayRef<DataLayoutEntryAttr> newLayout);

which returns true by default.

Refinement is relevant in two cases. First, in case of nested data layout specifications, the op-level (or attribute-level) verifier ensures that nested specifications refine those of their ancestors so that the overall layout still makes sense. Second, in case of transformations that change the data layout, the transformation must ensure that the new layout is a refinement of the old one. There is no automated mechanism for the latter since MLIR provides various IR mutation mechanisms, including low-level attribute manipulation; it is entirely the responsibility of the user to perform the check by calling TargetDialect::isDLRefinement(ArrayRef&lt;DataLayoutEntry>, ArrayRef&lt;DataLayoutEntry>, function_ref&lt;bool(DataLayoutEntry)> nonTypeRefinementCheck) that dispatches to individual types.

Alternatives Considered

I considered providing query methods directly on the operation interface. This makes it hard to reuse cached results and to restrict the set of parameters forwarded down to specific calls because op interfaces have access to the raw operation and thus raw attribute list.

I considered providing DataLayout getDataLayout() { return DataLayout(*this); } in the op interface, but it looks like it would favor the iface.getDataLayout().getTypeSize(type) pattern that ignores caching. Having to create a separate object will force the caller to consider keeping it around for further queries, and there is little readability loss.

I considered implementing the cache in the interface class itself rather than in a separate object. This would have been correct because cache information is transient so there is no correctness problem with storing it in the *Op class even if it gets discarded when abstracting to Operation *. However, it is a performance issue of dropping the cache in this case, and so passing the *Op instances by-value in presence of large caches in them.