Why can't I have memref<tuple<i32,i32>>?

Hello again,

I guess the whole question fits in the title. It appears I cannot write something like:

   func @get_first_element(memref<tuple<i32,f32>>) -> (i32)

Why is this?

More generally, how can I represent in MLIR a C function that takes as input a pointer to a struct?


The memory requirements or storage for a tuple is not fully specified AFAIK. There can be multiple different representations and in memory forms for it, so a memref of tuple is underspecified (various padding, packing and alignment considerations that would be specific to one or another convention). If you expect/desire it to be two adjacent i32s, then that is one such realization and so using a memref of 2 element array would fit your expectation.



Thanks @jpienaar for this information. Given this, two questions:

  1. Is there a way I can (tweak MLIR to) represent pointers to structs? It’s not just two i32 like in the previous example, it can be one i32, two f64, and four pointers to functions. Something like the execution state of a simulation engine that is passed along, and has to be mutable (hence memref). I do need it for modular code generation from a dataflow language, and I expect all people generating efficient code for dataflow specifications to need such a feature. I guess you would also need it to have buffers of more complex element type, or to share code between multiple instances of the same RNN which also have different parameters…

  2. Ok, so if I understand well, optimization algorithms require storage information. For instance, in order to perform tiling based on the cache size.
    I guess this means MLIR also has problems with types such as i1, whose storage is implementation-dependent (unless you make a choice which may be sub-optimal on some targets, either in space, or in time).
    Just an idea: wouldn’t it be a good thing to separate the specification of a type from its storage requirements, whose quantification may depend not only on the type itself but also on implementation choices? And that optimizations that depend on hardware or software mapping details use a specialized separate facility to determine storage size? For some types, this facility will always give the same size for any architecture. For others, it may vary.


Ping: @albertcohen @Ulysse @qaco

It seems to me like both of these questions can be answered by a target-specific lowering into whatever representation you’re thinking about. For instance, in the first case, you could have a type that represents the ‘simulation state’ and then lower that in whatever way you want. In the second case, you could promote i1 to whatever type you want as part of a transformation.


I think you can easily use the MLIR LLVM dialect to represent something like this. If you are particular about using memrefs, then I guess @jpienaar has said what I wanted to say here. The elemental type of a memref is restricted to be an int, float, or a vector of those to match with the common use case of multidimensional arrays of such types - also, the elemental type is always contiguously stored in memory (valid and accessible data). Extending to other things will require addition of a whole lot of conditional code / casing to check it’s not a tuple type for all the use cases the memref was designed keeping in mind - especially those surrounding ML dialect ops that work on the memref type system.

If such an abstraction is really needed in the standard type system, one may just be better off creating a new struct memref type.

1 Like

In line with the Steve and Uday’s answers, one option would be to use an unstructured vector of int32 of the appropriate size, and lower to a memref of these from your higher-level dialect.

Regarding the (lack of) modularity of this design, one should remember that memref is only what its name tells: an unstructured memory reference. It abstracts a very flat and non-composable type (either in product type or sum type fashion). Anyone working on modeling general-purpose front-end languages would need a richer data type abstraction. Maybe this should lead to the design of a recursive data type dialect, common to a bunch of high-level languages, rather than F18 and the like coming up with their own design?