[RFC] Update valueBoundsOpInterface to compute bounds of dynamic stride/offset sizes

Background

The valueBoundsOpInterface is a great design in mlir, widely used for analysis and transformation passes. However, its inability to compute dynamic stride or offset sizes presents a limitation.

Given the prominence of stride and offset in the mlir::MemRef dialect, addressing this deficiency is essential. We propose extending the valueBoundsOpInterface to incorporate dynamic stride and offset calculations.

Extend Interface Methods

void populateBoundsForOffsetValue(Value value, ValueBoundsConstraintSet &cstr)

void populateBoundsForStridedValueDim(Value value, int64_t dim, ValueBoundsConstraintSet &cstr)

The ValueDim and ValueDimList within ValueBoundsConstraintSet are insufficient for capturing the necessary information about IndexValue, ShapedValueDim, StridedValueDim, and OffsetValue. To address this, we propose introducing a PopulateMode enum and a ColumnNode class. PopulateMode specifies the target computation (e.g., IndexValue, ShapedValue, StridedValue, or OffsetValue). The ColumnNode containing value, dim, and PopulateMode, can provide comprehensive information required for dynamic size calculations.

The demo code is as follows:

enum PopulateMode {
  INDEX = 0,
  SHAPE,
  STRIDE,
  OFFSET,
};

struct ColumnNode {
  Value value;
  int64_t dim;
  PopulateMode pMode;
};

using ColumnNodeList = SmallVector<ColumnNode>;

class Variable {
// ...
// other code
// ...
private:
  friend class ValueBoundsConstraintSet;
  AffineMap map;
  ColumnNodeList mapOperands;
};

Some other functions within ValueBoundsConstraintSet, including getExpr, getPos, and processWorklist, require corresponding adjustments due to the integration of ColumnNode and PopulateMode. However, these modifications are relatively straightforward and can be implemented efficiently.

Example: memref::CastOp

struct CastOpInterface
    : public ValueBoundsOpInterface::ExternalModel<CastOpInterface, CastOp> {
  void populateBoundsForShapedValueDim(Operation *op, Value value, int64_t dim,
                                       ValueBoundsConstraintSet &cstr) const {
    auto castOp = cast<CastOp>(op);
    assert(value == castOp.getResult() && "invalid value");

    if (llvm::isa<MemRefType>(castOp.getResult().getType()) &&
        llvm::isa<MemRefType>(castOp.getSource().getType())) {
      cstr.bound(value)[dim] == cstr.getExpr(castOp.getSource(), dim);
      cstr.bound({value, dim, PopulateMode::SHAPE}) ==
          cstr.getExpr(castOp.getSource(), dim, PopulateMode::SHAPE);
    }
  }

  void populateBoundsForOffsetValue(Operation *op, Value value,
                                    ValueBoundsConstraintSet &cstr) const {
    auto castOp = cast<CastOp>(op);
    assert(value == castOp.getResult() && "invalid value");

    cstr.bound(value, std::nullopt, PopulateMode::OFFSET) ==
        cstr.getExpr(castOp.getSource(), std::nullopt, PopulateMode::OFFSET);
  }

  void populateBoundsForStridedValueDim(Operation *op, Value value, int64_t dim,
                                        ValueBoundsConstraintSet &cstr) const {
    auto castOp = cast<CastOp>(op);
    assert(value == castOp.getResult() && "invalid value");

    if (llvm::isa<MemRefType>(castOp.getResult().getType()) &&
        llvm::isa<MemRefType>(castOp.getSource().getType())) {
      cstr.bound(value, dim, PopulateMode::STRIDE) ==
          cstr.getExpr(castOp.getSource(), dim, PopulateMode::STRIDE);
    }
  }
};

Example: memref::ReinterpretCastOpInterface

struct ReinterpretCastOpInterface
    : public ValueBoundsOpInterface::ExternalModel<ReinterpretCastOpInterface,
                                                   memref::ReinterpretCastOp> {
  void populateBoundsForOffsetValue(Operation *op, Value value,
                                    ValueBoundsConstraintSet &cstr) const {
    auto reinterpretCastOp = cast<memref::ReinterpretCastOp>(op);
    assert(value == reinterpretCastOp.getResult() && "invalid value");

    OpFoldResult offset = reinterpretCastOp.getConstifiedMixedOffset();
    cstr.bound(value, std::nullopt, PopulateMode::OFFSET) == offset;
  }

  void populateBoundsForShapedValueDim(Operation *op, Value value, int64_t dim,
                                       ValueBoundsConstraintSet &cstr) const {
    auto reinterpretCastOp = cast<memref::ReinterpretCastOp>(op);
    assert(value == reinterpretCastOp.getResult() && "invalid value");

    OpFoldResult shape = reinterpretCastOp.getConstifiedMixedSizes()[dim];
    cstr.bound(value, dim, PopulateMode::SHAPE) == shape;
  }

  void populateBoundsForStridedValueDim(Operation *op, Value value, int64_t dim,
                                        ValueBoundsConstraintSet &cstr) const {
    auto reinterpretCastOp = cast<memref::ReinterpretCastOp>(op);
    assert(value == reinterpretCastOp.getResult() && "invalid value");

    OpFoldResult stride = reinterpretCastOp.getConstifiedMixedStrides()[dim];
    cstr.bound(value, dim, PopulateMode::STRIDE) == stride;
  }
};

summary:

To effectively address the increasing need for dynamic stride and offset calculations in various Dialect, the valueBoundsOpInterface should be extended to support these computations. Introducing PopulateMode and ColumnNode provides a direct and efficient solution.

Thank you for proposing this RFC. I believe it’s a significant step towards improving the coverage of value bound support. However, there are a few issues that warrant further discussion:

Should we consider a more general approach to extending this data structure? For example, if I want to add a quantization parameter, would that require another extension? (I haven’t figured out a satisfactory solution for this yet.)

We can simply pass in ColumnNode here; there’s no need to decompose the function into three parts. We can avoid adding new interfaces in the next extension.

@matthias-springer I hope you’ll be able to review this RFC. Your deep knowledge of value bounds would be extremely helpful.

1 Like

Overall, this makes sense. But this is very memref-specific and I’m wondering what your use case is. When is it beneficial to populate constraints about offsets and strides?

If I remember correctly, @Groverkss mentioned that he’s working on attaching SSA values (or other kind of information) to a constraint set. Maybe we could use that mechanism to simplify the tracking data structures a bit.

Firstly, thank you very much for taking the time to reply. @matthias-springer

Well, the most important application of valueBoundsOpInterface is in memref dimension folding pass in my wrok.

Example about memref dimension folding pass

func.func @demo(%arg0: memref<2x4xf32, 101>, %arg1: memref<2x4xf32, 101>) {
  toy.copy(%arg1, %arg0) : (memref<2x4xf32, 101>, memref<2x4xf32, 101>) -> ()
  return
}

--------> after dimension folding pass

func.func @demo(%arg0: memref<2x4xf32, 101>, %arg1: memref<2x4xf32, 101>) {
  %reinterpret_cast = memref.reinterpret_cast %arg0 to offset: [0], sizes: [8], strides: [1] : memref<2x4xf32, 101> to memref<8xf32, 101>
  %reinterpret_cast_0 = memref.reinterpret_cast %arg1 to offset: [0], sizes: [8], strides: [1] : memref<2x4xf32, 101> to memref<8xf32, 101>
  toy.copy(%reinterpret_cast_0, %reinterpret_cast) : (memref<8xf32, 101>, memref<8xf32, 101>) -> ()
  return
}

After dimension folding pass, the two-dimensional memref is folded into one dimension, making it more convenient for subsequent pass optimization.

In dimension folding pass, the key task is to determine whether the memref is continuous or identity.

For example:

%arg0: memref<2x4xf32, 101>

%dim0 = 2

%dim1 = 4

%stride0 = 4

%stride1 = 1

if %stride0 == %dim1 * %stride1
  isContinuous = true
else 
  isContinuous = false

If the data in dim0 and dim1 is continuous, the dim0 and dim1 can be folded into one dimension.

While the provided example demonstrates a simple scenario with static shape and stride sizes, real-world applications often involve dynamic values. To effectively handle these cases, it is essential to reify dynamic stride and shape sizes using the valueBoundsOpInterface. This capability is crucial for accurate analysis and optimization.

Thank you for your advice and help. The advice about ColumnNode is very great.

Your use case makes sense to me.

You’re going to have to generalize this part of the codebase: llvm-project/mlir/lib/Dialect/Affine/Transforms/ReifyValueBounds.cpp at main · llvm/llvm-project · GitHub. In case of a stride or offset, a memref.extract_strided_metadata must be inserted.

It would be nice to generalize the infrastructure without tieing it to memref types. Maybe, something like this would work:

// Instead of ValueDim:
// Maybe find a better name for `Quantity`.
struct Quantity {
  Value owner;
  std::function<Value(OpBuilder&, Value)> buildQuantityFn;
}

The idea is that we don’t store (Value,int64_t) pairs anymore. But instead a Value and a helper function that build IR to get the quantity.

In case of an index-typed value, buildQuantityFn is nullptr. I.e., use the index-typed value directly.

In case of a memref dimension:

buildQuantityFn = [dim](OpBuilder &b, Location loc, Value val) {
  return b.create<memref::DimOp>(loc, val, dim);
};

The dimension is stored in the lambda (captured by value). Or we can make it explicit in the Quantity struct.

In case of a memref offset:

buildQuantityFn = [dim](OpBuilder &b, Location loc, Value val) {
  return b.create<memref::ExtractStridedMetadataOp>(loc, val).getResult(1);
};

Basically instead having PopulateMode and ColumnNode, maybe we can just have a lambda. I haven’t fully thought this through yet…

1 Like

Thanks for your reply.

However, avoiding the creation of new operations is a strong design choice for the valueBoundsOpInterface. This allows us to use it as an effective analysis tool without altering the underlying IR structure. We’ve observed that modifications to the IR can introduce unexpected issues, particularly in asynchronous scenarios.

You wouldn’t actually build any IR. Unless materializeComputedBound is called. Note that the b.create is inside a lambda that would only be executed in materializeComputedBound.

Thanks for your reply very much.

If you have new thought about stride/offset for ValueBoundsOpInterface, please tell me. Thank you very much ! :grin:

I apologize for the interruption, but I have another query. Would you happen to have any thoughts on how to resolve semi-affine problem, such as mul operator which rhs and lhs are all dynmaic ?

Multiplications etc. are not supported by the underlying constraint set. You could treat the result of a multiplication as a dimension, without adding an equation that describes how the result is defined. And storing this kind of information in a separate data structure next to the constraint set. That way, you can still reify a bound for it. But it won’t be possible to, e.g., project one operand of the multiplication without also projecting out the other one.

Ok, I got it !

Thank you very much.