Lowering of scatter operations

There seem to be two representations of scatter operations in the core MLIR repo: tosa.scatter and tensor.scatter.

As far as I can tell, there are no lowerings out of either of these representations.

For tosa.scatter, this was noted on in TOSA to Linalg lowering (tosa.scatter) and still seems to be true. The difficulty there being the inability for any existing Linalg operation to represent a scatter-like operation.

The tensor.scatter operation appears to have been introduced some time after that discussion. tensor.scatter seems like a possible lowering target for tosa.scatter except that it also lacks any lowerings.

My question is, are there any plans to have lowerings for these operations at some point in the future to the core MLIR repo? Does everyone just implement their own lowerings for these ops or is there just little demand for such a lowering?

We also have vector.scatter.

Normally, if someone added an op like this to an upstream dialect, the expectation is that there is a lowering path to the LLVM dialect or an imminent plan to support that. Can you check via git blame and tag the author who added it (or look up the commit summary)? That will answer your questions. I personally don’t find it ideal if ops are added to such core dialects without a path or an imminent plan to lower them to the LLVM dialect.

We should add bufferization support for tensor.gather and tensor.scatter (in Tensor/Transforms/BufferizableOpInterfaceImpl.cpp). The implementation does not have to be very efficient, it just has to lower to MemRef in some way, so that it is executable. In fact, tensor.gather will always bufferize to a new memory allocation; that may not be desireable.

An alternative vectorization pass could lower those two ops more efficiently. (Maybe we can already vectorize tensor.gather/tensor.scatter, given that we also have vector.gather/vector.scatter?)

This is a pattern that we have for many other ops: E.g., linalg.generic can be bufferized and lowered to loops (which is somewhat inefficient in most cases). Or it can be vectorized.