[RFC] ValueBoundsOpInterface: Compute bounds of index values/dyn. dim. sizes

Summary

Provide an op interface (ValueBoundsOpInterface) that adds constraints about an OpResult / BlockArgument into a constraint set (FlatAffineValueConstraints). Constraints can be added for index-typed values and dimension sizes of shaped values. The constraint set can be queried to compute lower/upper/equality bounds of index-typed values or dynamic dimension sizes in terms of a specified set of values.

Prototype: ⚙ D145681 [mlir][linalg] Add ValueBoundsOpInterface and tensor dialect op impl. Stack of revisions contains interface implementations for various ops.

Example use cases: Padding dynamic tensor/memref dimensions to static sizes (to make IR vectorizable, etc.), hoisting of buffers with dynamic sizes from loops, vector masking. Some concrete example test case are here.

There have been attempts to implement similar functionality in MLIR and downstream projects (ONNX, TensorFlow, IREE). ValueBoundsOpInterface could also generalize and replace the current ReifyRankedShapedTypeOpInterface.

Interface Methods

  • void populateBoundsForIndexValue(Value, ValueBoundsConstraintSet &)
  • void populateBoundsForShapedValueDim(Value, int64_t dim, ValueBoundsConstraintSet &)

ValueBoundsConstraintSet is a thin wrapper around FlatAffineValueConstraints. It maintains a mapping of columns in constraint set to constrained values (index-typed SSA values or dims of shaped values). The mapping of columns to SSA values in FlatAffineValueConstraint cannot be used because it does not support “dim of shaped value”.

ValueBoundsConstraintSet also provides helper functions for retrieving constrained values/dims (getExpr) from the constraint set as AffineExprs and adding new bounds to the constraint set (addBound). In particular, AffineExprs make it easy to implement this interface with few lines of code.

ValueBoundsConstraintSet maintains an internal worklist for SSA values that were used in bounds of other values but who’s value bounds are not yet known. The class provides a static helper function reifyBound that iteratively processes values in the backward slice (reverse use-def chain) of a given value until a stop condition is met. It then projects out all columns for which the stop condition is not met and computes a bound as an IntegerAttr or Value. In the latter case, this can be an already existing SSA value (no new ops created) or the result of an affine.apply operation.

/// Reify a bound for the given index-typed value or shape dimension size in
/// terms of SSA values for which `stopCondition` is met.
static FailureOr<OpFoldResult>
ValueBoundsConstraintSet::reifyBound(
    OpBuilder &b, Location loc, presburger::IntegerPolyhedron::BoundType type,
    Value value, int64_t dim, function_ref<bool(Value)> stopCondition);

Example: tensor.pad

Compare this with the ReifyRankedShapedTypeOpInterface implementation of tensor.pad, which is doing roughly the same thing.

struct PadOpInterface
    : public ValueBoundsOpInterface::ExternalModel<PadOpInterface, PadOp> {
  void populateBoundsForShapedValueDim(Operation *op, Value value, int64_t dim,
                                       ValueBoundsConstraintSet &cstr) const {
    auto padOp = cast<PadOp>(op);
    assert(value == padOp.getResult() && "invalid value");

    AffineExpr expr = cstr.getExpr(padOp.getSource(), dim) +
                      cstr.getExpr(padOp.getMixedLowPad()[dim]) +
                      cstr.getExpr(padOp.getMixedHighPad()[dim]);
    cstr.addBound(IntegerPolyhedron::BoundType::EQ, value, dim, expr);
  }
};

Example: scf.for

void populateBoundsForIndexValue(Operation *op, Value value,
                                 ValueBoundsConstraintSet &cstr) const {
  auto forOp = cast<ForOp>(op);
  // Only IV is supported at the moment.
  if (value != forOp.getInductionVar())
    return;

  // TODO: Take into account step size.
  cstr.addBound(IntegerPolyhedron::BoundType::LB, value,
                cstr.getExpr(forOp.getLowerBound()));
  cstr.addBound(IntegerPolyhedron::BoundType::UB, value,
                cstr.getExpr(forOp.getUpperBound()));
}

Example: API usage

Reify an upper bound for the size of dimension 0 of tensorValue in terms of vals or return failure if no such bound could be computed.

SmallVector<Value> vals = ...;
auto stopCondition = [&](Value v) { return llvm::contains(vals, v); };
FailureOr<OpFoldResult> bound = ValueBoundsConstraintSet::reifyBound(
    b, loc, BoundType::UB, tensorValue, /*dim=*/0, stopCondition);

Generalization of ReifyRankedShapedTypeOpInterface

When reifying the dynamic shape of opResult, ValueBoundsOpInterface::reifyBound can be used as a replacement for the current reifyResultShapes when using the following stop condition.

auto stopCondition = [&](Value v) {
  // Reify in terms of SSA values that are different from `value`.
  return v != opResult;
};

Pros:

  • Reifies only the specified dimension of the specified value, not every dimension of every result.
  • The ValueBoundsOpInterface implementation is typically shorter (LoC) than the ReifyRankedShapedTypeOpInterface implement. E.g., compare the above tensor.pad implementation with the corresponding implementation of the existing interface.
  • No code/functionality duplication between ValueBoundsOpInterface and ReifyRankedShapedTypeOpInterface.
  • If ValueBoundsOpInterface is located in mlir/Transforms, ops can implement the interface directly, without having to depend on AffineDialect etc. (We would not longer need a separate build target in Dialect/Tensor etc.)

Cons:

  • ValueBoundsOpInterface implementations do not build IR. This is done as part of ValueBoundsOpInterface::reifyBound. This helper function must depend on at least one dialect (affine or arith) to materialize the IR that computes the shape. Therefore, it cannot be part of mlir/Interfaces. (If we use IntegerPolyhedron instead of FlatAffineValueConstraints, the interface itself can, just the reifyBound cannot. reifyBound could be placed in AffineTransforms and/or ArithTransforms.)
  • Creating a constraint set adds some overhead. If this overhead is too large, we could add a “fast path” specifically for reifyResultShapes, where we do not create a constraint set. (The constraint set is not really needed because no columns are projected out.)
1 Like

Nice proposal!

I have to say that I’m not sure about what “columns” in your description?

This is really something we should solve. Looking at the implementation it’s not clear to me why it can’t be addressed with an extra level of indirection?
It may be because memref and tensor types aren’t in the respective dialects but are builtin?

There is also https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/mlir/tools/kernel_gen/transforms/same_shape_propagation.cc and https://github.com/openxla/xla/blob/main/xla/mlir_hlo/mhlo/analysis/shape_component_analysis.cc (and I may have missed others).

Could you please contrast these and say what each does and what this interface will provide and those downstream projects could use?

I’d actually prefer we go the other way, instead of generalizing ReifyRankedShapedTypeOpInterface, return it to reifying in the IR as it did before and then the above interface is for folks who want intermediate results rather than something that can be codegen’d and runtime queried, then one can implement reify in terms of inferBounds. But it is the final step only and no intermediate results get materialized, both interfaces are simple and callers don’t need to isa to determine what got returned and what happens. This is then a pure analysis that doesn’t even need to mutate IR (and potentially even context).

Could this go further and avoid IntegerAttr completely here? Using Attributes would create in context the intermediate value that may never be in the IR and occupies space throughout. Given the goal here is for intermediate results/Analysis, capture those in its own context of sorts rather than reuse Attribute (seems like Presburger ones support this).

1 Like

This refers to columns/positions in the FlatAffineValueConstraints API. A column can be dimensional variable, symbolic variable or local variable. This is an implementation detail. Users of ValueBoundsConstraintSet do not have to care about it.

I was thinking about the affine.apply op, that is created by ValueBoundsConstaintSet::reifyBound. This requires a dependency on the Affine dialect. This could be solved by putting reifyBound into the AffineDialect or AffineTransforms build target. The more I think about it, the more I like this approach, because we can have another reifyBound that in ArithTransforms. ValueBoundsOpInterface itself (and its helpers) would really just populate/manage the constraint set.

Creating the respective DimOps is another issue. If reifyBounds is in AffineTransforms, it is not a problem anymore because this build target can depend on (already depends on?) TensorDialect and MemRefDialect. Alternatively, have a type interface for creating DimOps to avoid a switch-case.

Are you referring to ⚙ D145250 [mlir][Interfaces] ReifyRankedShapedTypeOpInterface returns OpFoldResults (i.e., creating OpFoldResult instead of Value)?

This was based on an observation that we often reify the result shape and then immediately call getConstantIntValue/m_Constant/getAsOpFoldResult on the result. There are few examples of this in D145250. I found that this composes well with many memref/tensor ops (and ViewLikeOpInterface) that provide helper functions that return OpFoldResults (getMixedSizes etc.) and accept OpFoldResults as part of their builder API.

We may want to update the remaining operations that do not yet take advantage of Attributes/OpFoldResult. E.g., the index operand of tensor.dim could have an additional static_index attribute and a getMixedIndex() helper function that returns an OpFoldResult. This could replace helpers such as DimOp::getConstantIndex. Same for scf.for etc.

This sounds good. I’m modifying the revisions to do that. ValueBoundsConstaintSet::reifyBound will become a standalone function in AffineTransforms (as described above). The interface and ValueBoundsConstaintSet are “analysis only”. (It will still need an MLIRContext, but only to build AffineExprs.)

It is already doing that. Kind of… Inside the constraint set, constants are stored as plain int64_t. Only at the very end, reifyBound may create an IntegerAttr. With the above mentioned change, this will no longer be part of ValueBoundsConstaintSet.

Can this be made to interact with IntegerRangeAnalysis?

I think it can. The inferResultRanges interface method looks very similar to populateBoundsForIndexValue of the proposed interface. The API is a bit different (AffineExpr and ValueBoundsConstraintSet instead of ConstantIntRanges) but the implementations are essentially the same. InferIntRangeInterface and ValueBoundsOpInterface should probably be the same interface.

Status update: I uploaded a revision that addresses some of the comments mentioned in here. ValueBoundsOpInterface no longer builds IR. It is now analysis only. There are two helpers (in the respective dialects) that manifest a computed bound in IR; either as an affine.apply op or as a combination of arith dialect ops.

tensorflow/tensorflow/compiler/mlir/tools/kernel_gen/transforms/same_shape_propagation.cc

This component builds equivalence classes of SSA values that have the same shape.

ValueBoundsOpInterface could be a complete upstream replacement for this component. Some issues with same_shape_propagation.cc that could be addressed with ValueBoundsOpInterface:

  • Hard-coded list of supported ops (not interface-based).
  • Can only handle cases where the shape of a result is the same as the shape of another value. No arithmetics possible. E.g., tensor.pad could not be handled by same_shape_propagation.cc.

There’s also this comment. ValueBoundsOpInterface would address that comment.

// A basic shape equality inference. This should be superceeded by a proper
// inference once available. Until then, we just build this out to the needs of
// the kernel generator project.

xla/xla/mlir_hlo/mhlo/analysis/shape_component_analysis.cc

This is a symbolic interpreter for MHLO ops. It can compute dimension sizes (GetShapeInfo) and values of SSA values (GetValueInfo).

The handling of SSA values is slightly different from ValueBoundsOpInterface: it can be used to compute the elements stored in a shape tensor, whereas ValueBoundsOpInterface only works on index/integer-typed SSA values at the moment.

There are many similarities in the design: both approaches use a worklist-driven approach to traverse the backward slice. Both utilize AffineExpr to encode bounds. The SymbolicExpr data structure is very similar to the combination of AffineMap and ValueDimList in ValueBoundsOpInterface.

shape_component_analysis.cc is currently not interface-based and has a hard-coded list of supported ops. If we add support for shape tensor values to ValueBoundsOpInterface, it could be an upstream replacement for shape_component_analysis.cc.

MLIR InferIntRangeInterface

This interface can be used to constant integer LB and UB for integer-typed SSA values. ValueBoundsOpInterface is a more generalized variant, which also compute non-constant bounds wrt. SSA values.

The data structures are different. InferIntRangeInterface uses a lattice. I’m not super familiar with this data structure. ValueBoundsOpInterface uses a constraint set (FlatLinearConstraints).

ValueBoundsOpInterface can be seen as a generalization of InferIntRangeInterface. I think we should try to merge the two. I think ValueBoundsConstraintSet::computeConstantBound is doing roughly the same as querying a single bound from the lattice. Maybe the authors of InferIntRangeInterface and related infrastructure could also weigh in a bit. @krzysz00 @Mogball

InferIntRangeInterface is aware of integer type ranges:

Bounds are tracked separately for the signed and unsigned
interpretations of a value, which ensures that the impact of
arithmetic overflows is correctly tracked during the analysis.

ValueBoundsOpInterface does currently not take this into account. The underlying constraint set is based on MPInt (similar to APInt). Once a bound is added to the constraint set, the type information is “erased”; there are no restrictions about how small/large a value can get.

In general, when analyzing a non-constant value, it may be impossible to know if a value overflows at runtime or not. For the special case where a constant bound is computed, it should be possible to take into account such overflows. But this will require some extra work.

I have a preliminary revision that adds support for branches to ValueBoundsOpInterface (arith.select, scf.if). These are tricky because their value bounds cannot be expressed as linear constraints. E.g., for %0 = arith.select %cond, %a, %b : index, the lower bound of constraint would be %0 >= min(lb(%a), lb(%b)). For such ops, ValueBoundsOpInterface can infer constant bounds (if there are any), so the constraint set-based approach can still do as well as the lattice-based approach (but no better).

MLIR ReifyRankedShapedTypeOpInterface

This is a special case of ValueBoundsOpInterface, where a bound is computed and reified as IR in terms of the op’s operands. The ReifyRankedShapedTypeOpInterface would no longer be needed. mlir::reifyResultShapes can directly query the ValueBoundsOpInterface. This would also allow dialects to implement the interface without depending on the arith/affine dialect.

There is a related discussion about inefficiencies when using ReifyRankedShapedTypeOpInterface to reify the shape of a value over a chain of multiple ops, which would be improved with ValueBoundsOpInterface: Is there a way to avoid creating instructions that are going to be CSE-ed anyway

I’m not sure that InferIntRanges can be replaced by the current value bounds infrastructure. Specifically, in our downstream, we use that interface primarily to eliminate statically-unneened “cruft” that comes out of expandAffineMap() and similar code. That is, while, in full generality, operations like mod or floordiv involve select operations and signed arithmetic to correctly handle, for example -2 mod 3 or -2 floordiv -4, we can, by running bounds inference, determine that all that code can be simplified to the equivalent unsigned code in our case, as we have bounds (on things like gpu.thread_id) that tell us that all the inputs are well within [0, iN::signed_max].

The code for those rewrites is in mlir/test/lib/Transforms/TestIntRangeInference.cpp (which we have in a non-test context in rocMLIR) and in mlir/lib/Dialect/Arith/Transforms/UnsignedWhenEquivalent.cpp, which run in that order (with a CSE between them).

I would prefer that this functionality keep working, as, for things like the GPU block/grid sizes, we have bounds information in MLIR that doesn’t make its way to LLVM.