[RFC] More OpFoldResult and "mixed indices" in ops that deal with Shaped Values

Background

tensor.extract_slice, tensor.insert_slice and memref.subview support mixed SSA values/Attributes for offsets, sizes and strides. The idea of this RFC is to extend this “mixed” representation to other ops that index into ranked shaped types.

%0 = tensor.extract_slice %t [%offset, 5] [10, 10] [1, 1] : ...
                               ^       ^
                           dynamic   static

Proposal

  • Provide an op interface that has convenience methods for dealing with “mixed values”. Things such as getMixedOffsets etc. Basically OffsetSizeAndStrideOpInterface, but without the sizes and strides. Prototype: https://reviews.llvm.org/D156899
  • Gradually extend operations that operate on shaped types to support mixed offsets: tensor.extract, tensor.insert, memref.load, memref.store, vector.extract, vector.load, vector.store, vector.maskedload, vector.maskedstore, vector.transfer_read, vector.transfer_write.
  • Change the signature of OpBuilder::createOrFold to return OpFoldResult instead of Value. (At the moment, it tries to fold the result to an Attribute, and then materializes it as an SSA value with a constant operation.)
  • Maybe: Use a similar approach to support static/dynamic values for tensor.dim, memref.dim, scf.for, etc.

Benefits

  • More opportunities for op verification: Out-of-bounds access of static indices can be verified. Dynamic indices could be verified in some cases, but such verifiers are discouraged according to the MLIR developer guide (“only verify local aspects of an operation”). Example for tensor.extract_slice/tensor.insert_slice: ⚙ D156061 [mlir][tensor] Improve verifiers: detect out-of-bounds accesses
  • Better integration with various MLIR helpers that produce or accept OpFoldResults. E.g.: affine::makeComposedFoldedAffineApply, various op builders, …
  • Fewer operations and IR that is easier to read. In particular, unrolling vector transfer ops generates many arith.constant ops (from 0 to size-of-dim).

I have heard mixed opinions about “mixed values”, so I wanted to discuss this before doubling down on this design principle.

1 Like

Thanks, Matthias!

The refactoring part looks good to me. However, I think I’m missing the context of when this was introduced. Playing a bit of a devil’s advocate here, I’m trying to understand the value proposition of having mixed indices vs the added overhead of having to deal with mixed Value and Attribute indices and converting them to Value-only indices when needed. What is the benefit of representing static indices as an Attribute vs just using a constant op? Could you please elaborate on that?

  • Gradually extend operations that operate on shaped types to support mixed offsets: tensor.extract, tensor.insert, memref.load, memref.store, vector.extract, vector.load, vector.store, vector.maskedload, vector.maskedstore, vector.transfer_read, vector.transfer_write.

This may lead to massive changes, as many of these ops, and their respective patterns and transformations, currently deal with constant ops instead of attributes for static indices.

The “benefits” aren’t clear enough to me right now, beyond some “nice to have” properties.
Before we encourage or generalize this kind of things, I feel this should be fleshed out a bit more. There is a tradeoff in terms of cost/complexity (both implementation complexity, API complexity and non-uniformity, and cognitive, with respect to "mixed values”).

That’s interesting, but that also means that we can’t just fold / canonicalize directly the dynamic case to the static one by matching a constant operand.

We could just look at the operand, check that is a constant and get the value, right? Even if this verification doesn’t fit into the Op::verify method, we can have a pass that does this. We should be allowed to represent OOB indexing and define it as UB.

I don’t know what you’re answering here @dcaballe : you’re quoting me but my remark was about canonicalization and folding being more difficult because they may break the IR verifier turning a constant into an attribute.

If you define something as UB, then it’s out-of-scope for the verifier, and I’m not what a separate pass would do about it either.

Generally speaking, there are some ops that only accept static values. For those, we should use Attributes instead of constant ops. (Mixed values does not make sense there, either.) The reason would be op verification and op documentation; if an op accepts SSA values, I would expect that non-static values are supported. An example is vector.extract, although this may change with one of your recent WIP revisions.

Coming back to “mixed values”, looks like this is not something that others were missing. The additional verification may not be worth it due to the added complexity (esp. API and special cases wrt. folding/canonicalizations).

-generate-runtime-verification is a pass that inserts cf.assert runtime assertions. It’s similar to what you suggest, but verifies at runtime and not at compile time. It also works with non-constant SSA values and there is no problem with “invalid ops that are guarded by an if-check and thus not a problem” etc.

This pass is not used much yet, but the additional op verification that I was looking for could be added there. So instead of using more mixed values, we could just improve runtime op verification and encourage users to run that pass in debug builds or for debugging purposes.

1 Like

Yes, that sounds good to me! I think there is still a compile-time verification part that we could leverage by analyzing constant ops, but runtime verification for the rest sounds like a great idea.

I would go even further… we could incrementally remove existing mixed types in favor of using plain SSA values. That would lead to massive changes, especially on the testing side, but we could take small steps towards that goal (after landing what we currently have in flight!)

Not as “IR verifier”, see the MLIR developer guide link that Matthias posted.
Specific lowering can always refuse to process some IR and fail a pipeline though (however that may not be a good idea, for example it may be unreachable code).

I thought about that. The three ops that are using them in MLIR are memref.subview, tensor.extract_slice, tensor.insert_slice.

These support rank reductions/expansions. In that case the corresponding “size” must be a static 1. Without static values we loose quite a lot (and valuable) of verification. I’m not even sure if the op would still be well-defined. (I would have preferred rank reductions/expansions to be separate ops, but that is another question…)

Also, without static sizes, the result types of these three ops would always be fully dynamic.

Speaking from similar experiences in the LLVM Dialect, having mixed static or dynamic values is IMO rather painful to use. llvm.getelementptr also has either dynamic or static indices as indexing into an LLVM struct type requires static indices.
This initially lead to an odd interface, both in terms of getters and build methods that has later also resulted in bugs in fold implementations. This was later made easier with an adaptor class

Unless required for op semantics, I’d therefore be against making ops mixed static or dynamic. Making all operands dynamic and requiring an input to come from a constant op does not work either, as this is easily broken by optimization passes (e.g. by introducing a block argument). It also violates the local IR verifier section in the developer guide previously mentioned.

1 Like

If the constant value lead to undefined behaviour, then the constant attribute would also. This is independent if we use values or attributed.

Attributes would be “correct by construction”, while values can change (inlining, constant propagation, range analysis) and lead to UB during the compilation process. An invalid transform should be fixed, so we want the error to crop up as soon as possible.

However, any valid transform that leads to UB means the code was broken to begin with (either user code or “library” code or the interaction between them). This goes back to poison discussions like other similar problems.

If an SSA value is proven to be constant, representing it as an attribute will not change its danger of generating UB, and having it crash on an early verifier (or just insert poison) is preferable to propagating through multiple passes and leading to UB at runtime.

If this makes canonicalization and folding harder (as in - needs to check it’s actually doing the right thing), then it’s for a good reason, IMHO.

while values can change (inlining, constant propagation, range analysis) and lead to UB during the compilation process.

Could you elaborate? How could a constant operand lead to UB during compilation in a way that isn’t considered a bug in the compiler? Compiler passes need to correclty handle ops that are known to have UB.

If an SSA value is proven to be constant, representing it as an attribute will not change its danger of generating UB, and having it crash on an early verifier (or just insert poison) is preferable to propagating through multiple passes and leading to UB at runtime.

But can you prove that it will always lead to UB at runtime? If the op with a constant operand is causing UB at runtime is in a path that is dead code you’d have a verifier causing compilation to be aborted despite the program possibly never executing UB.

A simple example from C would be an inliner inlining:

void foo(int*p, bool cond) {
   if (cond)
      *p = 3;
}

into a callsite given by foo(nullptr, false) would create an operation that is UB if it were to be executed, but since its dead code it is fine and does not change the semantics of the program.

This was also discussed in the revision Mathias mentioned above where it was concluded verifiers should not check such properties and needlessly abort compilation.

Bad “user code”.

  %t = tensor.empty() <5xf32>
  %0 = arith.constant 10
  %1 = tensor.extract_slice %t [%0] [1, 1] [1, 1]: ...

This doesn’t trigger verifier errors, but it is UB. Converting this to attribute would lead to verifier error and not UB.

Bad user code can come directly bad, or bad usage of a generic function (ex. dynamic type helper being used in the wrong way, but only after shape and constant propagation, the UB is exposed).

Exactly! Hence my poison remarks above. Some languages allow UB, others do not. The compiler should be able to handle both gracefully.

If the access is constant, yes. The 10th element of a 5 element array will always be UB, regardless if it “works” or not. If the offset is dynamic, you can keep it as an SSA value and UB will be a run time thing.

Exactly! Dynamic things remain dynamic, static things can be “promoted” to attributes, to encourage verifier checks before run time UB (by either crashing or creating poison).

If (broken) passes just create code without thinking about UB, then broken code will be generated. Sure, we need to fix those passes, but without checks (or poison, or alive), it’s hard to spot them.

Your example does not have any static behaviour as is, and would not be using attributes or failing verifiers or even getting poison values.

But if it were inlined in another code, after propagation, it could be provably UB by the compiler, and therefore you want it to handle it (poison or verifier error).

Example:

// Your original function, potential UB, not provably so
void foo(int*p, bool cond) {
   if (cond)
      *p = 3;
}

// Examples of users
void user(int *other_p) {

  // After inlining/propagation, the compiler knows some things, others not
  foo(other_p, true); // Potentially UB, depends on other_p
  foo(other_p, false); // Never UB, no changes to pointer, DCE'd

  int *p; // uninitialized
  foo(p, true); // clearly UB, p points to random memory

  p = malloc(2);
  foo(p, true); // Still UB, 2 < 3
  foo(p, false); // Never UB, no changes to pointer, DCE'd

  p = realloc(10);
  foo(p, true); // Not UB anymore, 10 > 3

  free(p);
}

Thank you for the elaboration, I think we are mostly on the same page.
As you said, the transformations need to be aware of UB and therefore can’t just create an attribute out of a statically known constant operand without ensuring it is not UB.
Whether you want an op with attributes as indices to create verifier errors or fold to a poison is a matter of design I’d say.

1 Like

I don’t follow what you mean, but this line of reasoning hits the usual confusion between “invalid IR” and “UB”, which to me are disjoint.
We can’t “crash” on finding a UB condition, because you don’t know if this is a runtime executed path.

I would need to see an example to understand really. Seems like you’re arguing against the current guidelines here.

Actually, the example in that policy page is good enough. For the record, I am very much in favour of that policy. I don’t want to break that for the reasons stated in the document.

My point is: if there is a transform that moves the SSA constant into an attribute, then the verifier can check it. If we want the verifier to check ranges, it will crash. If no attributes are ever produced, the verifier cannot check it. If we detect conflicts as a pass, we can return poison values. Note there are a lot of IFs there.

I think there’s space for different design here…

  • An op only allows dynamic (SSA) operands because it doesn’t want to check static properties
  • An op only has static (attribute) operands because it doesn’t want to deal with dynamic ranges
  • An op can interchange values and attributes as operands and only checks wen they’re attributes
  • An op can choose to crash (verifier) if the operands are conflicting, return poison, or do nothing

It’s up to the dialect/op to know what to do, but allowing intermixing values and attributes as operands sounds like a useful mechanism to express semantics.

For instance, user code calls a function in some reasonable way, the compiler works fine. Then the user calls the same function in a different way, inliner/propagation happen, the IR crashes with some message, the source location provides the compiler with a hint and the user knows it did a bad thing, corrects it, try again, the compiler passes fine.

That kind of iteration can only work if the compiler fails. If we always assume UB is fine, then the only way we can detect it is to have sanitizers in MLIR or valgrind, etc. C/C++ compilers love their UB super powers, but more high-level code (ML, HPC) is generally not so forgiving.

I’m not saying: let’s do that to all things. Just allowing the possibility for that to happen would let front-end developers to tune the behaviour to their needs, without assuming “expected” behaviour.

Some languages/frameworks don’t allow for UB to exist. In those cases, reading uninitialized memory or past the allocation is not undefined behaviour, but illegal behaviour and therefore needs to crash and not produce a binary.

We should allow ops to crash on whatever their dialects express to be illegal. Of course, this is already possible (dialects write their own verifiers), but having an interface that allows mixing them is helpful to avoid creating multiple versions of the same op or complex verifiers.

As to doing that on vector and tensor, I’m not sure.

Isn’t that what a constant op (wrap around an attribute) is doing to some extent? :slight_smile:

There’s also the fact that regardless of whether a constant op or an attribute is used, they both should be representing semantically the same, and that is not the case based on the discussion above. It sounds like we are making constant op a bit “less constant”…

More concerning to me is that the existing verification constraints are limiting what we can represent with the IR and making the use of a valid IR (constant op) difficult. Naive question: would it make sense to have two stages in the verification mechanism, one with the existing constraints (to be applied when the IR might be in an inconsistent state), and another one that could look beyond the operation itself (to be applied when the IR is expected to be stable)? Or something along those lines?

To me there are two things with respect to verification:

  • the IR verifier, which is about invariants of the compiler, and is here to detect that the compiler implementation is not broken.
  • programming level “legality” aspect: in general it is UB but in cases like Renato mention there are interest in catching issues with a user provided program at compile time. To me this is in the realm of “static analyzer” and thus belong to custom passes that are use-cases specific.

Back to the generic topic of “mixed indices”, something I was told at some point is that having some APIs taking ArrayRef<int> instead of ValueRange for building these ops allows for type inference in a way that you can’t get relying on ConstantOp.

You can infer the type from the ConstantOp but you can’t verify if the inferred type is correct because for that you have to look at the ConstantOp and that would break the verification constraints. That’s why I mentioned the second verification stage above.