Right now, if we want to model how the data is laid out in registers with respect to the threads of a warp, the options are:
RankedTensorType - it has the encoding attribute, it is supported by arith ops. The problem is that it is a wrong level of abstraction and the IR becomes confusing. Also it is a major footgun, because we have to distinguish between tensors with and without encoding.
VectorType - it is supported by arith ops, but has no encoding. I sent PR99029 to add the encoding attribute. That seems like a useful and harmless change. It was already done for RankedTensorType.
!magic_custom_type - if we create a custom shaped type, then we can add all necessary attributes to it, but that would also lead to reimplementing half of arith, because the arith ops are restricted to scalars, VectorType and TensorType, which seems unnecessary. I sent PR99028 to make arith ops support all shaped types. Judging by RFC: remove arith/math ops on tensors it is mostly a philosophical exercise.
I would like to collect some feedback and then, ideally, use options 2 and/or 3. I am pretty sure that there are folks who are having similar concerns right now.
I agree with that statement, but it is vague. Itâd be better to demonstrate what youâre trying to do, why the encoding doesnât work and what would be better. @Groverkss had a presentation at EuroLLVM about this. You will want to sync with their group to have a common proposal.
Youâre not considering the upstream cost. This is far from harmless. As @MaheshRavishankar said, there are discussions on that topic in the forum, please participate on them if you want to help design this space. I canât find the specific thread, but others can chime in.
Again youâre assuming thereâs no upstream cost. This is not philosophical, itâs not harmless and itâs definitely not NFC. This characterization does not help your case. Please participate in the existing discussions if you want to change anything.
On this particular PR, Iâd like to raise an important upstream point: please do not approve PRs from your own team, shortly before they are created, but ideally, not at all. If that PR got merged without other reviews, it would go against the RFC that removed it in the first place.
If you want to participate in the upstream discussion, please do so in the right way:
There was a similar RFC and a similar PR some time back:
I believe the consensus at the time was that adding encoding to existing VectorType was not an acceptable solution since it would require every transformation on ops taking vector to be aware of this to be correct.
I agree with the RFC analysis and in my opinion option 3 is the way to go. Arith operations are elementwise and therefore it feels natural to allow with any shaped type. I mentioned in the past discussion that it would also help with other use cases like existing wmma type from GPU dialect.
What would be the meaning for memref types (which are shaped types)? Does the op allocate a new buffer and return it? (It would need to have a MemAlloc side effect in that case.)
Does it actually matter if the type is âshapedâ or not? I think the meaning of âshapedâ is a bit fuzzy. E.g., tensor<*xf32> has a shape but it is not encoded in the type. tensor<f32> does not really have a shape at all. Maybe the only thing thatâs important for element-wise arith ops is that the type has immutable semantics.
I am not sure I understand: a common proposal to add an attribute?
I didnât say that the arith change was NFC or harmless. But it is philosophical. We can spend a lot of time debating whether arith.addi should work for tensors or not. Or maybe it should be put to a tensor dialect? Or maybe it should be put to some other dialect? And if we move it to some other dialect, then why it is addi or addf op and not just an add. Or maybe we can use linalg, but it is destination-passing style? So, it is pretty philosophical.
The âanswerâ that would be accepted by all for some time has been to create a new arith.vector (or something) type, give it the desired semantics, and adapt the implementation.
But so far, people have been not wanting to do that and have either been semi-successfully or unsuccessfully trying to work around it.
Iâm just reporting what I recall from the various sides.
Unless if CL has changed his opinion, such a PR to modify builtin vector will not be accepted. Upon discussing, I understand his viewpoint, and the recommendation is for a new type that does what is needed.
These are all technical issues. IR construction is not an easy task when involving multiple parties from different parts of the design space.
It looks easy when you do it alone because you just need to âadd a new attributeâ and youâre done. But when you work with other people, the new attribute can become a pretty fundamental change to their usage.
Thatâs why we discuss these issues in the open and try to reach consensus. Thereâs no philosophy there, just complex requirements design.
If this feels like bike-shedding, then it may be because you havenât spent the time to understand the other sideâs requirements.
Nope, a common proposal to solve the vector semantics, as @stellaraccident mentions above. Make sure you understand the whole problem before assuming the other solutions are wrong.
Maybe the answer depends on âHow many layouts do you want to support?â If you donât have many layouts, a local solution might be the most low-cost and wonât require debate.
The NVGPU dialect has nvgpu.warpgroup.accumulator, which represents registers owned by a warpgroup (4 warps). It works fine for the Hopper GPU. It wontâ work for the other architectures out-of-the-box, one needs to extend it.
Iâm sharing the previous implicit consensus state that this should be a new type. Vector type is beyond the Hyrumâs law point (mandatory xkcd), and so is Tensor given the discussion on my attempt to remove arithmetic on tensors.
It feels like we are missing an abstraction for a collection of values that can be loaded/stored in a buffer (so not a tensor) and not necessarily related to some hardware concept (so not a vector). It is however very close to both structurally, so hopefully we can reuse most of the implementation. This is also something that may be relevant to the âtiling on vectorsâ discussion.
As for ops, we have an ElementwiseMappable trait for ops. Maybe we should have a matching type trait indicating that values of this type can be elemenwise mapped over. This is different from shaped type, which is in reality a fixed-rank hyperretangularly indexable type or dynamic-rank unindexable type, and may apply to other kinds of types in the future.
I didnât follow this discussion closely, but my 2c:
Iâm generally supporting adding encoding to vector type.
there may be another approach for the option 3: instead of extending arith for any shaped or reimplementing arith ops for your custom type you can just add a single wrapper op to your custom dialect, which works on your types
Good point. Yes a trait or an interface would be better as suggested by others.
One limitation of making it part of arith is that it wouldnât work with math dialect. In my experience this is also required for the same reasons as arith.
This proposal is being motivated by warp distribution for tensor-core like operations, but to me this is being approached from the wrong direction (also telling from experience, cause I have done this too). To me this should be designed from the bottom up. Look for what the hardware needs and work backwards to connect it to linalg/vector. Jamming through encodings seems like something that is being looked for short term rather than having a full flushed out solution (at least in core). I dont think any of this needs to live in vector dialect. These are really low level hardware details that need to be modeled by operations/types in dialects meant for those hardware, and the conversion from linalg/vector done using something like Dialect conversion (yes, I went there, I know people hate Dialect conversion, but thats the tool for the job IMO). The best solution I have actually seen in my book is opaque types the way SPIR-V community is modeling it. They are facing the same constraint as having to deal with multiple vendors, and it is not solved fully by any means, but that seems like a more robust approach to me.
But in interest of experimentation my vote would be to have a type interface, and allow downstream users to work with implementation of the interface that work with encodings.
This seems way too complicated for me, and I dont see how this works. This is necessarily not elementwise mapped. IIUC the root of this proposal they come from operations like tensor cores where the encoding tells you how the data from each thread is used in a group operation. So I dont know what the type trait will achieve.