[RFC] ValueBoundsOpInterface: Compute bounds of index values/dyn. dim. sizes

Summary

Provide an op interface (ValueBoundsOpInterface) that adds constraints about an OpResult / BlockArgument into a constraint set (FlatAffineValueConstraints). Constraints can be added for index-typed values and dimension sizes of shaped values. The constraint set can be queried to compute lower/upper/equality bounds of index-typed values or dynamic dimension sizes in terms of a specified set of values.

Prototype: ⚙ D145681 [mlir][linalg] Add ValueBoundsOpInterface and tensor dialect op impl. Stack of revisions contains interface implementations for various ops.

Example use cases: Padding dynamic tensor/memref dimensions to static sizes (to make IR vectorizable, etc.), hoisting of buffers with dynamic sizes from loops, vector masking. Some concrete example test case are here.

There have been attempts to implement similar functionality in MLIR and downstream projects (ONNX, TensorFlow, IREE). ValueBoundsOpInterface could also generalize and replace the current ReifyRankedShapedTypeOpInterface.

Interface Methods

  • void populateBoundsForIndexValue(Value, ValueBoundsConstraintSet &)
  • void populateBoundsForShapedValueDim(Value, int64_t dim, ValueBoundsConstraintSet &)

ValueBoundsConstraintSet is a thin wrapper around FlatAffineValueConstraints. It maintains a mapping of columns in constraint set to constrained values (index-typed SSA values or dims of shaped values). The mapping of columns to SSA values in FlatAffineValueConstraint cannot be used because it does not support “dim of shaped value”.

ValueBoundsConstraintSet also provides helper functions for retrieving constrained values/dims (getExpr) from the constraint set as AffineExprs and adding new bounds to the constraint set (addBound). In particular, AffineExprs make it easy to implement this interface with few lines of code.

ValueBoundsConstraintSet maintains an internal worklist for SSA values that were used in bounds of other values but who’s value bounds are not yet known. The class provides a static helper function reifyBound that iteratively processes values in the backward slice (reverse use-def chain) of a given value until a stop condition is met. It then projects out all columns for which the stop condition is not met and computes a bound as an IntegerAttr or Value. In the latter case, this can be an already existing SSA value (no new ops created) or the result of an affine.apply operation.

/// Reify a bound for the given index-typed value or shape dimension size in
/// terms of SSA values for which `stopCondition` is met.
static FailureOr<OpFoldResult>
ValueBoundsConstraintSet::reifyBound(
    OpBuilder &b, Location loc, presburger::IntegerPolyhedron::BoundType type,
    Value value, int64_t dim, function_ref<bool(Value)> stopCondition);

Example: tensor.pad

Compare this with the ReifyRankedShapedTypeOpInterface implementation of tensor.pad, which is doing roughly the same thing.

struct PadOpInterface
    : public ValueBoundsOpInterface::ExternalModel<PadOpInterface, PadOp> {
  void populateBoundsForShapedValueDim(Operation *op, Value value, int64_t dim,
                                       ValueBoundsConstraintSet &cstr) const {
    auto padOp = cast<PadOp>(op);
    assert(value == padOp.getResult() && "invalid value");

    AffineExpr expr = cstr.getExpr(padOp.getSource(), dim) +
                      cstr.getExpr(padOp.getMixedLowPad()[dim]) +
                      cstr.getExpr(padOp.getMixedHighPad()[dim]);
    cstr.addBound(IntegerPolyhedron::BoundType::EQ, value, dim, expr);
  }
};

Example: scf.for

void populateBoundsForIndexValue(Operation *op, Value value,
                                 ValueBoundsConstraintSet &cstr) const {
  auto forOp = cast<ForOp>(op);
  // Only IV is supported at the moment.
  if (value != forOp.getInductionVar())
    return;

  // TODO: Take into account step size.
  cstr.addBound(IntegerPolyhedron::BoundType::LB, value,
                cstr.getExpr(forOp.getLowerBound()));
  cstr.addBound(IntegerPolyhedron::BoundType::UB, value,
                cstr.getExpr(forOp.getUpperBound()));
}

Example: API usage

Reify an upper bound for the size of dimension 0 of tensorValue in terms of vals or return failure if no such bound could be computed.

SmallVector<Value> vals = ...;
auto stopCondition = [&](Value v) { return llvm::contains(vals, v); };
FailureOr<OpFoldResult> bound = ValueBoundsConstraintSet::reifyBound(
    b, loc, BoundType::UB, tensorValue, /*dim=*/0, stopCondition);

Generalization of ReifyRankedShapedTypeOpInterface

When reifying the dynamic shape of opResult, ValueBoundsOpInterface::reifyBound can be used as a replacement for the current reifyResultShapes when using the following stop condition.

auto stopCondition = [&](Value v) {
  // Reify in terms of SSA values that are different from `value`.
  return v != opResult;
};

Pros:

  • Reifies only the specified dimension of the specified value, not every dimension of every result.
  • The ValueBoundsOpInterface implementation is typically shorter (LoC) than the ReifyRankedShapedTypeOpInterface implement. E.g., compare the above tensor.pad implementation with the corresponding implementation of the existing interface.
  • No code/functionality duplication between ValueBoundsOpInterface and ReifyRankedShapedTypeOpInterface.
  • If ValueBoundsOpInterface is located in mlir/Transforms, ops can implement the interface directly, without having to depend on AffineDialect etc. (We would not longer need a separate build target in Dialect/Tensor etc.)

Cons:

  • ValueBoundsOpInterface implementations do not build IR. This is done as part of ValueBoundsOpInterface::reifyBound. This helper function must depend on at least one dialect (affine or arith) to materialize the IR that computes the shape. Therefore, it cannot be part of mlir/Interfaces. (If we use IntegerPolyhedron instead of FlatAffineValueConstraints, the interface itself can, just the reifyBound cannot. reifyBound could be placed in AffineTransforms and/or ArithTransforms.)
  • Creating a constraint set adds some overhead. If this overhead is too large, we could add a “fast path” specifically for reifyResultShapes, where we do not create a constraint set. (The constraint set is not really needed because no columns are projected out.)
1 Like

Nice proposal!

I have to say that I’m not sure about what “columns” in your description?

This is really something we should solve. Looking at the implementation it’s not clear to me why it can’t be addressed with an extra level of indirection?
It may be because memref and tensor types aren’t in the respective dialects but are builtin?

There is also tensorflow/same_shape_propagation.cc at master · tensorflow/tensorflow · GitHub and xla/shape_component_analysis.cc at main · openxla/xla · GitHub (and I may have missed others).

Could you please contrast these and say what each does and what this interface will provide and those downstream projects could use?

I’d actually prefer we go the other way, instead of generalizing ReifyRankedShapedTypeOpInterface, return it to reifying in the IR as it did before and then the above interface is for folks who want intermediate results rather than something that can be codegen’d and runtime queried, then one can implement reify in terms of inferBounds. But it is the final step only and no intermediate results get materialized, both interfaces are simple and callers don’t need to isa to determine what got returned and what happens. This is then a pure analysis that doesn’t even need to mutate IR (and potentially even context).

Could this go further and avoid IntegerAttr completely here? Using Attributes would create in context the intermediate value that may never be in the IR and occupies space throughout. Given the goal here is for intermediate results/Analysis, capture those in its own context of sorts rather than reuse Attribute (seems like Presburger ones support this).

1 Like

This refers to columns/positions in the FlatAffineValueConstraints API. A column can be dimensional variable, symbolic variable or local variable. This is an implementation detail. Users of ValueBoundsConstraintSet do not have to care about it.

I was thinking about the affine.apply op, that is created by ValueBoundsConstaintSet::reifyBound. This requires a dependency on the Affine dialect. This could be solved by putting reifyBound into the AffineDialect or AffineTransforms build target. The more I think about it, the more I like this approach, because we can have another reifyBound that in ArithTransforms. ValueBoundsOpInterface itself (and its helpers) would really just populate/manage the constraint set.

Creating the respective DimOps is another issue. If reifyBounds is in AffineTransforms, it is not a problem anymore because this build target can depend on (already depends on?) TensorDialect and MemRefDialect. Alternatively, have a type interface for creating DimOps to avoid a switch-case.

Are you referring to ⚙ D145250 [mlir][Interfaces] ReifyRankedShapedTypeOpInterface returns OpFoldResults (i.e., creating OpFoldResult instead of Value)?

This was based on an observation that we often reify the result shape and then immediately call getConstantIntValue/m_Constant/getAsOpFoldResult on the result. There are few examples of this in D145250. I found that this composes well with many memref/tensor ops (and ViewLikeOpInterface) that provide helper functions that return OpFoldResults (getMixedSizes etc.) and accept OpFoldResults as part of their builder API.

We may want to update the remaining operations that do not yet take advantage of Attributes/OpFoldResult. E.g., the index operand of tensor.dim could have an additional static_index attribute and a getMixedIndex() helper function that returns an OpFoldResult. This could replace helpers such as DimOp::getConstantIndex. Same for scf.for etc.

This sounds good. I’m modifying the revisions to do that. ValueBoundsConstaintSet::reifyBound will become a standalone function in AffineTransforms (as described above). The interface and ValueBoundsConstaintSet are “analysis only”. (It will still need an MLIRContext, but only to build AffineExprs.)

It is already doing that. Kind of… Inside the constraint set, constants are stored as plain int64_t. Only at the very end, reifyBound may create an IntegerAttr. With the above mentioned change, this will no longer be part of ValueBoundsConstaintSet.

Can this be made to interact with IntegerRangeAnalysis?

I think it can. The inferResultRanges interface method looks very similar to populateBoundsForIndexValue of the proposed interface. The API is a bit different (AffineExpr and ValueBoundsConstraintSet instead of ConstantIntRanges) but the implementations are essentially the same. InferIntRangeInterface and ValueBoundsOpInterface should probably be the same interface.