[RFC] Improving gather codegen for Vector Dialect

  • We know that the vector.gather operation is incomplete, especially when it comes to multi-dim gathers. We have to make sure that the solution doesn’t lead to a parallel vector gather representation that leaves the existing “disfunctional” one hanging in there. That wouldn’t be a great outcome. Instead, I think the approach should focus on filling the gaps in multi-dimensional semantics and adding whatever missing functionality vector.gather is missing today.

I think that’s a good idea. We shouldn’t leave vector.gather hanging. I’ll start by improving vector.gather and vector.scatter support and then we can move on to discussing if we need vector.transfer_gather / vector.transfer_scatter.

  • When bringing an operation to the multi-dimensional domain it’s expected that certain traits need to be encoded per dimension. We do that for in_bounds, for example, among many other cases. For the gather case, it shouldn’t be different! We should aim to encode whether any dimension is contiguous or not, not only the innermost one. Knowing that the outer dimensions of a memory access are contiguous, even when the innermost is not, can enable some optimizations. This information could be encoded per dimension by adding new attributes (e.g., [contiguous, random, strided]) to vector.gather.

Good idea, I’ll start by adding something similare to vector.gather / vector.scatter.

I hope this makes sense! I’m open to discussing this further in a higher bandwidth venue (call?, Euro LLVM?, …) or whatever works for you!

Let’s have an open call / meet at EuroLLVM after I’ve improved vector.gather / vector.scatter support for multi dimensional indices. I doubt there is any opposition to that and it’s an overall improvement, over which we can build on.

I would leave any kind of indexing/permutation maps outside of the picture to simplify an already highly complex operation. I wouldn’t scope this within the transfer op family either. Transfer ops have been great at abstracting away all the details of memory loads and stores for cases where we really don’t care much about the memory access pattern or how the data is loaded. However, they have been a struggle when we need to reason about that information because they concentrate too much information. For the gather case, we are looking at encoding specifics of the memory access itself, which doesn’t align the aforementioned main goals of transfer ops.

I have a different opinion on this, but let’s discuss this later when we have the call about vector.transfer_gather.

Thanks for the reply Diego, these are really useful techincal points and they make sense and I can make progress on them.