[RFC] Interface for destination-style ops

[RFC] DestinationStyleOpInterface

Destination-style operation

A destination-style operation with n input arguments and m tensor results is an op

with the following structure


%result:m = dst_op 
  ins(%in_1:TensorOrScalarType_1, ..., %in_n:TensorOrScalarType_n)
  outs(%out_1:TensorType_1, ..., %out_m:TensorType_m)
  optional-attrs optional-body

where type(%result_i) == type(%out_i). Output tensors out_i provide “initial” values for the corresponding results.

After bufferization it is transformed into


dst_op ins(%in_1:MemRefOrScalarType_1, ..., %in_n:MemRefOrScalarType_n)

outs(%out_1:MemRefType_1, ..., %out_m:MemRefType_m)

optional-attrs optional-body

Background

LinalgStructuredInterface in LinalgInterfaces.td contains methods for linalg.generic and LinalgNamedOps. These methods can be categorized into two types.

  • Methods that handle indexing maps, iterator types, library calls and body regions of Linalg ops

  • Methods relevant for destination-style ops

The reason why both types of methods can be found in LinalgStructuredInterface is purely historic. In 2020 Linalg added support for ops and transformations on tensors and it became possible to bufferize Linalg operation to themselves, i.e. linalg.generic with tensor arguments and results gets converted to linalg.generic with memref arguments an no results.

The class of destination-style operations is wider than linalg.generic-like ops and it includes LinalgExtOps and GmlStExtensionOps.

In order to improve code sharing, I suggest to move some of the methods in LinalgStructuredInterface into a separate DestinationStyleOpInterface within LinalgInterfaces.td.

New Interface

Here is the list of all current methods of LinalgStructuredInterface. If it is labeled with [MOVED] then it will be moved to DestinationStyleOpInterface, if it is labeled with [STAYS], then it won’t be affected.

Loop types handling operations are specific to linalg.generic.

// Loop types handling.

// [STAYS] Return the number of parallel loops.
unsigned getNumParallelLoops();

// [STAYS] Return the dims that are parallel loops.
void getParallelDims(SmallVectorImpl<unsigned>&);

// [STAYS] Return the number of reduction loops.
unsigned getNumReductionLoops();

// [STAYS] Return the dims that are reduction loops.
void getReductionDims(SmallVectorImpl<unsigned> &);

// [STAYS] Return the number of window loops.
unsigned getNumWindowLoops()

// [STAYS] Return the dims that are window loops.
void getWindowDims(SmallVectorImpl<unsigned> &);

// [STAYS] Return the total number of loops within the current operation.
unsigned getNumLoops();

// [STAYS] Returns true if the current operation has only one loop and
// it's a reduction loop.
bool hasSingleReductionLoop();

Input and output operands handling is defined by the structure of dst-style ops.

// Num input/output arguments handling.

// [MOVES] Return the input shape operands.
ValueRange inputs();

// [MOVES] Return the number of inputs.
int64_t getNumInputs();

// [MOVES] Return the output shape operands.
ValueRange outputs();

// [MOVES] Return the number of outputs.
int64_t getNumOutputs();

// [MOVES] Return the number of inputs and outputs.
int64_t getNumInputsAndOutputs();

// Input operands handling.

// [MOVES] Return the input operands.
OpOperandVector getInputOperands();

// [MOVES] Return the `i`-th input operand.
OpOperand getInputOperand(int64_t);

// [MOVES] Return the subset of input operands that are of buffer type.
OpOperandVector getInputBufferOperands();

// [MOVES] Return the subset of input operands that are of tensor type.
OpOperandVector getInputTensorOperands();

// Output operands handling.

// [MOVES] Return the output operands.
OpOperandVector getOutputOperands();

// [MOVES] Return the `i`-th output operand.
OpOperand* getOutputOperand(int64_t);

// [MOVES] Set the `i`-th output operand.
void setOutputOperand(int64_t":$i, "Value":$value);

// [MOVES] Return the subset of output operands that are of buffer type.
OpOperandVector getOutputBufferOperands();

// [MOVES] Return the subset of output operands that are of tensor type.
OpOperandVector getOutputTensorOperands();

// [MOVES] Return the types of the subset of output operands that are
// of buffer type.
SmallVector<MemRefType> getOutputBufferTypes();

// [MOVES] Return the types of the subset of output operands 
// that are of tensor type.
SmallVector<RankedTensorType> getOutputTensorTypes();
// Input and Output arguments handling.

// [MOVES] Return the range over input and output operands.
OpOperandVector getInputAndOutputOperands();

// [STAYS] Return true if the payload uses the value loaded from 
// `opOperand`. This is useful to avoid loading from "write-only"
// memory that may be
// uninitialized, as well as properly cloning "read-write" operands.
bool payloadUsesValueFromOperand(OpOperand *),

// [MOVES] Return true if `opOperand` is an input tensor.
bool isInputTensor(OpOperand *);

// [MOVES] Return true if `opOperand` is an output tensor.
bool isOutputTensor(OpOperand *);

// [STAYS] Return true if `opOperand` is an init tensor.
// This is true when it is an output tensor
// operand whose value is used in the payload region.
bool isInitTensor(OpOperand *);

// [DOES-NOT-HAVE-TO-BE-IN-ANY-INTERFACE] Return 
// the `opOperand` rank or zero for scalars.
int64_t getRank(OpOperand*);

// [STAYS] Return the output block arguments of the region.
Block::BlockArgListType getRegionOutputArgs();

// [DOES-NOT-HAVE-TO-BE-IN-ANY-INTERFACE] Return
// the `opOperand` shape or an empty vector for scalars.
ArrayRef<int64_t> getShape(OpOperand*":$opOperand);

// [DOES-NOT-HAVE-TO-BE-IN-ANY-INTERFACE] Return 
// true if the `opOperand` is a scalar value.
bool isScalar(OpOperand*),

// [STAYS] Return the block argument for an `opOperand`.
BlockArgument getTiedBlockArgument(OpOperand *);

// [STAYS] Return the operand for a `blockArgument`.
OpOperand* getTiedOpOperand(BlockArgument);

// [STAYS] Return the input or output indexing map for `opOperand`.
AffineMap getTiedIndexingMap(OpOperand*);

// [STAYS] Return the indexing map for a `result`.
AffineMap getTiedIndexingMapForResult(OpResult);

// [MOVES] Return the result tied to `opOperand`.
OpResult getTiedOpResult(OpOperand*);

// [STAYS] Return the value yielded by the region corresponding
// to an output `opOperand`.
OpOperand * getTiedYieldValue(OpOperand*);

The most important methods here are hasBufferSemantics and hasTensorSemantics that also follow from how the dst-style ops are bufferized.

// Other interface methods.

// [STAYS] Return the single block constituting the body of the 
// operation by calling the getBody method on the concrete
// operation.
Block* getBlock();

// [STAYS] Return the iterator types attribute.
ArrayAttr iterator_types();

// [STAYS] Return true if the indexing map is depending on
//  the current op instance. This means that the indexing map 
// is dynamically synthesized by using the op instance's concrete
// attributes, instead of being static for all
// instances of the same op kind.
bool hasDynamicIndexingMaps();

// [STAYS] Verify all attributes used by indexing maps are valid.
LogicalResult verifyIndexingMapRequiredAttributes();

// [STAYS] Return the indexing maps attribute.
ArrayAttr getIndexingMaps();

// [STAYS] Return the indexing maps within the current operation.
SmallVector<AffineMap> getIndexingMapsArray();

// [STAYS] Return true if any of the operands has a dynamic shape.
bool hasDynamicShape();

// [MOVES] Return whether the op has only MemRef input and outputs.
bool hasBufferSemantics();

// [MOVES] Return whether the op has only RankedTensor input and outputs.
bool hasTensorSemantics();

// [STAYS] Return the name registered for this op when lowering to an
// external library call.
std::string getLibraryCallName();

// [STAYS] Return whether the op accesses the iteration indices.
bool hasIndexSemantics();
// [STAYS] Linalg generalization hooks.
AffineMap getLoopsToShapesMap();
AffineMap getShapesToLoopsMap();
bool canOpOperandsBeDropped(ArrayRef<OpOperand *>);
std::pair<int64_t, int64_t> getResultsPositionInLoopsToShapeMap();
SmallVector<int64_t> getStaticShape();
SmallVector<int64_t, 4> getStaticLoopRanges();
// Other interface methods.

// [MOVES] Clone the current operation with the given location
// and operands. This is used to abstract away the optional
// underlying region creation. This
// does not change the balance between input, output_buffer and
// init_tensors operands.
Operation* clone(OpBuilder &, Location, TypeRange, ValueRange),

// [NOT USED ANYWHERE:CAN IT BE REMOVED?] Clone the current
// operation with the given location, operands
// and BlockAndValueMapping. This is used to abstract away the
// optional underlying region creation. This does not change the
// balance between input, output_buffer and init_tensors operands.
Operation * cloneWithMapper(OpBuilder &, "Location, TypeRange,
  ValueRange, BlockAndValueMapping &),

// [MOVES] Clone the current operation with the given location,
// operands and BlockAndValueMapping but leave the regions 
// empty. This is used to abstract away the optional underlying
// region creation. This does not change the balance between 
// input, output_buffer and init_tensors operands.
Operation* cloneWithoutRegions(OpBuilder &, Location,
                        TypeRange, ValueRange);

// [STAYS] Returns the region builder for constructing the body for
// linalg.generic.
// Returns a null function if this named op does not define a region
// builder.
std::function<void(ImplicitLocOpBuilder &, Block &,
  ArrayRef<NamedAttribute>)> getRegionBuilder();
// [STAYS] Return true if all the indexing maps are projected permutations.
// Otherwise return false.
bool hasOnlyProjectedPermutations();

// [STAYS]
let extraClassDeclaration = [{
SmallVector<Value, 4> createFlatListOfOperandDims(OpBuilder &, Location);
SmallVector<int64_t, 4> createFlatListOfOperandStaticDims();
SmallVector<Range, 4> createLoopRanges(OpBuilder &b, Location loc);
SmallVector<int64_t, 4> computeStaticLoopSizes();
LogicalResult reifyResultShapes(OpBuilder &b,
ReifiedRankedShapedTypeDims &reifiedReturnShapes);
ArrayAttr getIteratorTypes() { return iterator_types(); }
void setNumInputs(unsigned num) { setOperandSegmentAt(0, num); }
void setNumOutputBuffers(unsigned num) { setOperandSegmentAt(1, num); }
}]

1 Like

@MaheshRavishankar, @stellaraccident, @nicolasvasilache, @matthias-springer, @frgossen, @herhut

I think it is going in the right direction, but I’d like to see a better documentation for this interface in itself, right now it is hard to judge the merits of the proposal by looking at what moves and stays: do you already have a patch maybe?

It may also be the opportunity to revisit the naming: these are more init tensors rather than output tensors here.

1 Like

Yes! Let’s finally rename outs to inits.

I would try to avoid preparing a patch before we reach some agreement. The majority of the “moved” methods are just utilities to get inputs, outputs and “tied” results.

The documentation for the interface itself will be about what dst-style op is and what structure we impose. So, it will very similar to the first section of the RFC.

Since destination-passing style is relevant for bufferization, should the interface rather live in the bufferization dialect?

I thought that BufferizationDialect was mostly for the ops related to the bufferization passes, like bufferization.to_memref, bufferization.to_tensor, bufferization.alloc_tensor and not for the ops/interfaces that we want to bufferize. Since LinalgExt ops are becoming Linalg ops that don’t implement LinalgStructuredInterface, there won’t be any ops outside of Linalg dialect that are using the DestinationStyleOpInterface at least in MLIR Core.

There are some operations like the following that were intended for a use case of mixed buffer and tensor types inside of a single operation, which as far as I know, has not turned out to be used significantly. If my impression is correct, I would propose not moving methods like the following until someone actually needs them.

1 Like

Yes, the mixed buffer-tensor semantics won’t be allowed. It will be a part of verifier to check that it is either buffer- or tensor-based, not mixed. These methods then would become just getOutputOperands and getOutputTypes.

Sounds interesting, just a few random thoughts:

This reminds of TiedOpInterface in IREE. Is the concept of input/output operands important? Or do we just need a way to express the fact that some operands are tied to some results?

I associate “Destination-style”/“Destination-passing style” with “memory destination” and bufferization. I am wondering how bufferization-specific this really is. E.g., this is how IREE defines tieing:

An operation that "ties" one or more results to its operands indicating
that the result is directly related to the operand in an operation-defined
way. Results are still SSA values distinct from the operands and the tie is
strictly a relationship relevant to transformations and not something that
modifies IR definitions.

What about ops that are in destination-passing style but do not have explicit in/out operands (at least not in the assembly format and/or C++ API), e.g., tensor.insert_slice.

2 Likes

+1 to continuing to deprivilege and disaggregate linalg in general.
+1 to renaming outs → inits (in this or a future step)

I intuitively identify with the questions that @ftynse and @matthias-springer raise: in prior work along this line when trying to boil a distinct concept out of linalg, it has been important to isolate what is being done with a facet of the existing interface and align to that (vs just extracting). What remains may just be an ergonomic helper for that core facet.

I also think this is going in the right direction so don’t want to get in the way of that: but let’s at least stop and ask/answer those questions before making the move.

1 Like

Is the “tied operand” concept useful for anything other than bufferization? Side effects / aliasing analysis perhaps? If so, we may want to put in in “lib/Interfaces” without associating with a specific dialect.

Having a destination passing style op interface makes sense. A lot more operations apart from Linalg can move into this and make the implementation of bufferization interface much simpler. For example, scf.for could implement this interface too (unless there are some cases where it cant).

One suggestion though. Linalg has explicit methods that differentiate between operations that have tensor semantics and operations that have buffer semantics. The new interface should be able to handle both of these without having the differentiate between the two. So something like “getTiedOperands” from IREE.

That is true for operation with tensor semantics. It is not the case for operations with buffer semantics.

“inits” wouldn’t be incorrect for ops operating on buffers though? I agree that they are also written to though.

Yeah, agreed. If init makes more sense broadly thats fine for me. I know number of people get tripped up by it.

I think even on buffers “inits” make a lot of sense.

Good point, @matthias-springer about tensor.insert_slice . Shall we put the interface to mlir/Interfaces/DestinationStyleOpInterface then?

Before we move forward on this though, I’d suggest waiting for @nicolasvasilache to chime in. He is on vacation and should be back in a couple of weeks.

I guess you are talking about all these getInputTensorOperands() and such? Yes, we can just make it all getInputOperands and they will return tensor or buffers depending on what semantics the operation has.

I talked to Nicolas on Wednesday and Thursday and he was supportive of the DestinationStyleOpInterface idea.