[RFC] MLIR types with encoding

pifon2a · July 17, 2024, 10:07am

Right now, if we want to model how the data is laid out in registers with respect to the threads of a warp, the options are:

RankedTensorType - it has the encoding attribute, it is supported by arith ops. The problem is that it is a wrong level of abstraction and the IR becomes confusing. Also it is a major footgun, because we have to distinguish between tensors with and without encoding.
VectorType - it is supported by arith ops, but has no encoding. I sent PR99029 to add the encoding attribute. That seems like a useful and harmless change. It was already done for RankedTensorType.
!magic_custom_type - if we create a custom shaped type, then we can add all necessary attributes to it, but that would also lead to reimplementing half of arith, because the arith ops are restricted to scalars, VectorType and TensorType, which seems unnecessary. I sent PR99028 to make arith ops support all shaped types. Judging by RFC: remove arith/math ops on tensors it is mostly a philosophical exercise.

I would like to collect some feedback and then, ideally, use options 2 and/or 3. I am pretty sure that there are folks who are having similar concerns right now.

@MaheshRavishankar @ThomasRaoux @nicolasvasilache @ftynse @jpienaar @kuhar @rengolin @stellaraccident

rengolin · July 17, 2024, 10:25am

I agree with that statement, but it is vague. It’d be better to demonstrate what you’re trying to do, why the encoding doesn’t work and what would be better. @Groverkss had a presentation at EuroLLVM about this. You will want to sync with their group to have a common proposal.

You’re not considering the upstream cost. This is far from harmless. As @MaheshRavishankar said, there are discussions on that topic in the forum, please participate on them if you want to help design this space. I can’t find the specific thread, but others can chime in.

Again you’re assuming there’s no upstream cost. This is not philosophical, it’s not harmless and it’s definitely not NFC. This characterization does not help your case. Please participate in the existing discussions if you want to change anything.

On this particular PR, I’d like to raise an important upstream point: please do not approve PRs from your own team, shortly before they are created, but ideally, not at all. If that PR got merged without other reviews, it would go against the RFC that removed it in the first place.

If you want to participate in the upstream discussion, please do so in the right way:

ThomasRaoux · July 17, 2024, 11:03am

There was a similar RFC and a similar PR some time back:

github.com/llvm/llvm-project

[mlir] Add optional layout attribute to VectorType

llvm:main ← harsh-nod:vector_layout

opened 10:06AM - 10 Nov 23 UTC

harsh-nod

+305 -10

This patch adds an attribute interface for representing the layout on vector typ…es. This layout could be used to represent the mapping from the vector indices to the indices of the vector fragments held by different threads of a GPU. The interface has a verify function that can be used to validate that the layout accurately represents the vector shape.

I believe the consensus at the time was that adding encoding to existing VectorType was not an acceptable solution since it would require every transformation on ops taking vector to be aware of this to be correct.

I agree with the RFC analysis and in my opinion option 3 is the way to go. Arith operations are elementwise and therefore it feels natural to allow with any shaped type. I mentioned in the past discussion that it would also help with other use cases like existing wmma type from GPU dialect.

matthias-springer · July 17, 2024, 11:51am

What would be the meaning for memref types (which are shaped types)? Does the op allocate a new buffer and return it? (It would need to have a MemAlloc side effect in that case.)

Does it actually matter if the type is “shaped” or not? I think the meaning of “shaped” is a bit fuzzy. E.g., tensor<*xf32> has a shape but it is not encoded in the type. tensor<f32> does not really have a shape at all. Maybe the only thing that’s important for element-wise arith ops is that the type has immutable semantics.

pifon2a · July 17, 2024, 12:00pm

@matthias-springer That’s a good point. We need to restrict it to “values” not “buffers”. Could it be done via type interfaces?

pifon2a · July 17, 2024, 12:05pm

@ThomasRaoux I guess there is an option 2.1: create a new type upstream like it was mentioned in [mlir] Add optional layout attribute to VectorType by harsh-nod · Pull Request #71916 · llvm/llvm-project · GitHub .

pifon2a · July 17, 2024, 12:10pm

I am not sure I understand: a common proposal to add an attribute?

I didn’t say that the arith change was NFC or harmless. But it is philosophical. We can spend a lot of time debating whether arith.addi should work for tensors or not. Or maybe it should be put to a tensor dialect? Or maybe it should be put to some other dialect? And if we move it to some other dialect, then why it is addi or addf op and not just an add. Or maybe we can use linalg, but it is destination-passing style? So, it is pretty philosophical.

stellaraccident · July 17, 2024, 12:13pm

The “answer” that would be accepted by all for some time has been to create a new arith.vector (or something) type, give it the desired semantics, and adapt the implementation.

But so far, people have been not wanting to do that and have either been semi-successfully or unsuccessfully trying to work around it.

I’m just reporting what I recall from the various sides.

Unless if CL has changed his opinion, such a PR to modify builtin vector will not be accepted. Upon discussing, I understand his viewpoint, and the recommendation is for a new type that does what is needed.

rengolin · July 17, 2024, 1:10pm

These are all technical issues. IR construction is not an easy task when involving multiple parties from different parts of the design space.

It looks easy when you do it alone because you just need to “add a new attribute” and you’re done. But when you work with other people, the new attribute can become a pretty fundamental change to their usage.

That’s why we discuss these issues in the open and try to reach consensus. There’s no philosophy there, just complex requirements design.

If this feels like bike-shedding, then it may be because you haven’t spent the time to understand the other side’s requirements.

Nope, a common proposal to solve the vector semantics, as @stellaraccident mentions above. Make sure you understand the whole problem before assuming the other solutions are wrong.

jpienaar · July 17, 2024, 1:11pm

I think it’s more a trait than interface right? (Well we could make a interface that has a bool member )

grypp · July 17, 2024, 1:15pm

Maybe the answer depends on “How many layouts do you want to support?” If you don’t have many layouts, a local solution might be the most low-cost and won’t require debate.

The NVGPU dialect has nvgpu.warpgroup.accumulator, which represents registers owned by a warpgroup (4 warps). It works fine for the Hopper GPU. It wont’ work for the other architectures out-of-the-box, one needs to extend it.

ftynse · July 17, 2024, 2:01pm

I’m sharing the previous implicit consensus state that this should be a new type. Vector type is beyond the Hyrum’s law point (mandatory xkcd), and so is Tensor given the discussion on my attempt to remove arithmetic on tensors.

It feels like we are missing an abstraction for a collection of values that can be loaded/stored in a buffer (so not a tensor) and not necessarily related to some hardware concept (so not a vector). It is however very close to both structurally, so hopefully we can reuse most of the implementation. This is also something that may be relevant to the “tiling on vectors” discussion.

As for ops, we have an ElementwiseMappable trait for ops. Maybe we should have a matching type trait indicating that values of this type can be elemenwise mapped over. This is different from shaped type, which is in reality a fixed-rank hyperretangularly indexable type or dynamic-rank unindexable type, and may apply to other kinds of types in the future.

Hardcode84 · July 17, 2024, 2:05pm

I didn’t follow this discussion closely, but my 2c:

I’m generally supporting adding encoding to vector type.
there may be another approach for the option 3: instead of extending arith for any shaped or reimplementing arith ops for your custom type you can just add a single wrapper op to your custom dialect, which works on your types

%1 = my_dialect.elementwise %a, %b {
  arith.addi ...
}

(this also may be a solution for a previous arith-on-tensors discussions)

pifon2a · July 17, 2024, 3:09pm

I like the idea of the type trait. I can implement it if everyone agrees that it could be landed.

ThomasRaoux · July 17, 2024, 4:58pm

Good point. Yes a trait or an interface would be better as suggested by others.

One limitation of making it part of arith is that it wouldn’t work with math dialect. In my experience this is also required for the same reasons as arith.

Makes sense to me.

jpienaar · July 17, 2024, 5:08pm

ElementwiseMapableValueBasedContainerType - starting with the “simplest” name I can think of so we get it out of the way

stellaraccident · July 17, 2024, 5:22pm

Yeah. Note I’m not advocating a specific design here. Just reporting what I think the state of thoughts and constraints are.

ftynse · July 17, 2024, 7:22pm

We are missing the AbstractFactoryProducer suffix for the full Java vibe.

More seriously, UniformElementsType or something along those lines could work.

pifon2a · July 17, 2024, 8:42pm

MapableContainerType ?

MaheshRavishankar · July 18, 2024, 4:36am

This proposal is being motivated by warp distribution for tensor-core like operations, but to me this is being approached from the wrong direction (also telling from experience, cause I have done this too). To me this should be designed from the bottom up. Look for what the hardware needs and work backwards to connect it to linalg/vector. Jamming through encodings seems like something that is being looked for short term rather than having a full flushed out solution (at least in core). I dont think any of this needs to live in vector dialect. These are really low level hardware details that need to be modeled by operations/types in dialects meant for those hardware, and the conversion from linalg/vector done using something like Dialect conversion (yes, I went there, I know people hate Dialect conversion, but thats the tool for the job IMO). The best solution I have actually seen in my book is opaque types the way SPIR-V community is modeling it. They are facing the same constraint as having to deal with multiple vendors, and it is not solved fully by any means, but that seems like a more robust approach to me.
But in interest of experimentation my vote would be to have a type interface, and allow downstream users to work with implementation of the interface that work with encodings.

This seems way too complicated for me, and I dont see how this works. This is necessarily not elementwise mapped. IIUC the root of this proposal they come from operations like tensor cores where the encoding tells you how the data from each thread is used in a group operation. So I dont know what the type trait will achieve.

Topic		Replies	Views
RFC: remove arith/math ops on tensors MLIR	70	2859	July 18, 2024
[RFC] Allow pointers as element type of `vector` MLIR	43	1342	April 3, 2025
RFC: Representing register data layout explicitly in the IR MLIR	4	735	November 14, 2023
Linalg and Shapes MLIR	20	2478	January 14, 2021
[RFC] `ptr` dialect & modularizing ptr ops in the LLVM dialect MLIR	61	3047	March 20, 2025

[RFC] MLIR types with encoding

Related topics