[RFC] Adding Gather, Scatter Ops

Thanks for your perspective @_sean_silva.

In this RFC, I left out the parallel part of the equation and I think there are 2 avenues along which rewrites could bottom out into the strategy you mention.

The first avenue is related to fusion through gather and scatter. As we create loops and take slices, the ops become lower-dimensional and “more expressive” (see the section on expressiveness vs transformation power). I can certainly see slices of (histogram + scatter) to require rewriting to RMW operations in order to allow fusion/tiling into parallel loops. It is possible that “a scatter with a histogram region” is the one true better abstraction to target such a rewrite but I don’t think we are there yet. A first step towards that IMO is getting the design right by getting tiling + inplace bufferization to work as well in the gather/scatter case as in the dense case.

The second avenue is that I see similarities between gather/scatter/parallel_scatter and extract_slice/insert_slice/parallel_insert slice that could generalize in a SubsetOpInterface. This similarity relates to the discussion on abstractions for parallelism and tensors and how to represent the reduction part. The reduction aspect is not solved yet (our friends on the XLA side have something similar with a reduction region that we may want to adopt) and I can see a parallel_scatter with a region that contains a histogram_compute like op as a way to represent some of this without losing information until after bufferization.

In any case, I think both these avenues are complementary to our ability to represent n-D gather / scatter that compose well with the lower-level abstractions that we already have. My take is we can reevaluate adding a region to the sequential n-D scatter op once the semantics, tiling/fusion and lowerings work well.