[RFC] MLIR types with encoding

Right now, if we want to model how the data is laid out in registers with respect to the threads of a warp, the options are:

  1. RankedTensorType - it has the encoding attribute, it is supported by arith ops. The problem is that it is a wrong level of abstraction and the IR becomes confusing. Also it is a major footgun, because we have to distinguish between tensors with and without encoding.

  2. VectorType - it is supported by arith ops, but has no encoding. I sent PR99029 to add the encoding attribute. That seems like a useful and harmless change. It was already done for RankedTensorType.

  3. !magic_custom_type - if we create a custom shaped type, then we can add all necessary attributes to it, but that would also lead to reimplementing half of arith, because the arith ops are restricted to scalars, VectorType and TensorType, which seems unnecessary. I sent PR99028 to make arith ops support all shaped types. Judging by RFC: remove arith/math ops on tensors it is mostly a philosophical exercise.

I would like to collect some feedback and then, ideally, use options 2 and/or 3. I am pretty sure that there are folks who are having similar concerns right now.

@MaheshRavishankar @ThomasRaoux @nicolasvasilache @ftynse @jpienaar @kuhar @rengolin @stellaraccident

I agree with that statement, but it is vague. It’d be better to demonstrate what you’re trying to do, why the encoding doesn’t work and what would be better. @Groverkss had a presentation at EuroLLVM about this. You will want to sync with their group to have a common proposal.

You’re not considering the upstream cost. This is far from harmless. As @MaheshRavishankar said, there are discussions on that topic in the forum, please participate on them if you want to help design this space. I can’t find the specific thread, but others can chime in.

Again you’re assuming there’s no upstream cost. This is not philosophical, it’s not harmless and it’s definitely not NFC. This characterization does not help your case. Please participate in the existing discussions if you want to change anything.

On this particular PR, I’d like to raise an important upstream point: please do not approve PRs from your own team, shortly before they are created, but ideally, not at all. If that PR got merged without other reviews, it would go against the RFC that removed it in the first place.

If you want to participate in the upstream discussion, please do so in the right way:

1 Like

There was a similar RFC and a similar PR some time back:

I believe the consensus at the time was that adding encoding to existing VectorType was not an acceptable solution since it would require every transformation on ops taking vector to be aware of this to be correct.

I agree with the RFC analysis and in my opinion option 3 is the way to go. Arith operations are elementwise and therefore it feels natural to allow with any shaped type. I mentioned in the past discussion that it would also help with other use cases like existing wmma type from GPU dialect.

2 Likes

What would be the meaning for memref types (which are shaped types)? Does the op allocate a new buffer and return it? (It would need to have a MemAlloc side effect in that case.)

Does it actually matter if the type is “shaped” or not? I think the meaning of “shaped” is a bit fuzzy. E.g., tensor<*xf32> has a shape but it is not encoded in the type. tensor<f32> does not really have a shape at all. Maybe the only thing that’s important for element-wise arith ops is that the type has immutable semantics.

4 Likes

@matthias-springer That’s a good point. We need to restrict it to “values” not “buffers”. Could it be done via type interfaces?

@ThomasRaoux I guess there is an option 2.1: create a new type upstream like it was mentioned in [mlir] Add optional layout attribute to VectorType by harsh-nod ¡ Pull Request #71916 ¡ llvm/llvm-project ¡ GitHub .

I am not sure I understand: a common proposal to add an attribute?

I didn’t say that the arith change was NFC or harmless. But it is philosophical. We can spend a lot of time debating whether arith.addi should work for tensors or not. Or maybe it should be put to a tensor dialect? Or maybe it should be put to some other dialect? And if we move it to some other dialect, then why it is addi or addf op and not just an add. Or maybe we can use linalg, but it is destination-passing style? So, it is pretty philosophical.

The “answer” that would be accepted by all for some time has been to create a new arith.vector (or something) type, give it the desired semantics, and adapt the implementation.

But so far, people have been not wanting to do that and have either been semi-successfully or unsuccessfully trying to work around it.

I’m just reporting what I recall from the various sides.

Unless if CL has changed his opinion, such a PR to modify builtin vector will not be accepted. Upon discussing, I understand his viewpoint, and the recommendation is for a new type that does what is needed.

1 Like

These are all technical issues. IR construction is not an easy task when involving multiple parties from different parts of the design space.

It looks easy when you do it alone because you just need to “add a new attribute” and you’re done. But when you work with other people, the new attribute can become a pretty fundamental change to their usage.

That’s why we discuss these issues in the open and try to reach consensus. There’s no philosophy there, just complex requirements design.

If this feels like bike-shedding, then it may be because you haven’t spent the time to understand the other side’s requirements.

Nope, a common proposal to solve the vector semantics, as @stellaraccident mentions above. Make sure you understand the whole problem before assuming the other solutions are wrong.

I think it’s more a trait than interface right? (Well we could make a interface that has a bool member :slight_smile: )

1 Like

Maybe the answer depends on “How many layouts do you want to support?” If you don’t have many layouts, a local solution might be the most low-cost and won’t require debate.

The NVGPU dialect has nvgpu.warpgroup.accumulator, which represents registers owned by a warpgroup (4 warps). It works fine for the Hopper GPU. It wont’ work for the other architectures out-of-the-box, one needs to extend it.

I’m sharing the previous implicit consensus state that this should be a new type. Vector type is beyond the Hyrum’s law point (mandatory xkcd), and so is Tensor given the discussion on my attempt to remove arithmetic on tensors.

It feels like we are missing an abstraction for a collection of values that can be loaded/stored in a buffer (so not a tensor) and not necessarily related to some hardware concept (so not a vector). It is however very close to both structurally, so hopefully we can reuse most of the implementation. This is also something that may be relevant to the “tiling on vectors” discussion.

As for ops, we have an ElementwiseMappable trait for ops. Maybe we should have a matching type trait indicating that values of this type can be elemenwise mapped over. This is different from shaped type, which is in reality a fixed-rank hyperretangularly indexable type or dynamic-rank unindexable type, and may apply to other kinds of types in the future.

1 Like

I didn’t follow this discussion closely, but my 2c:

  1. I’m generally supporting adding encoding to vector type.
  2. there may be another approach for the option 3: instead of extending arith for any shaped or reimplementing arith ops for your custom type you can just add a single wrapper op to your custom dialect, which works on your types
%1 = my_dialect.elementwise %a, %b {
  arith.addi ...
}

(this also may be a solution for a previous arith-on-tensors discussions)

I like the idea of the type trait. I can implement it if everyone agrees that it could be landed.

Good point. Yes a trait or an interface would be better as suggested by others.

One limitation of making it part of arith is that it wouldn’t work with math dialect. In my experience this is also required for the same reasons as arith.

Makes sense to me.

ElementwiseMapableValueBasedContainerType - starting with the “simplest” name I can think of so we get it out of the way :slight_smile:

Yeah. Note I’m not advocating a specific design here. Just reporting what I think the state of thoughts and constraints are.

We are missing the AbstractFactoryProducer suffix for the full Java vibe.

More seriously, UniformElementsType or something along those lines could work.

MapableContainerType ?

This proposal is being motivated by warp distribution for tensor-core like operations, but to me this is being approached from the wrong direction (also telling from experience, cause I have done this too). To me this should be designed from the bottom up. Look for what the hardware needs and work backwards to connect it to linalg/vector. Jamming through encodings seems like something that is being looked for short term rather than having a full flushed out solution (at least in core). I dont think any of this needs to live in vector dialect. These are really low level hardware details that need to be modeled by operations/types in dialects meant for those hardware, and the conversion from linalg/vector done using something like Dialect conversion (yes, I went there, I know people hate Dialect conversion, but thats the tool for the job IMO). The best solution I have actually seen in my book is opaque types the way SPIR-V community is modeling it. They are facing the same constraint as having to deal with multiple vendors, and it is not solved fully by any means, but that seems like a more robust approach to me.
But in interest of experimentation my vote would be to have a type interface, and allow downstream users to work with implementation of the interface that work with encodings.

This seems way too complicated for me, and I dont see how this works. This is necessarily not elementwise mapped. IIUC the root of this proposal they come from operations like tensor cores where the encoding tells you how the data from each thread is used in a group operation. So I dont know what the type trait will achieve.