[RFC] Runtime Op Verification

We currently have static op verifiers that check for violated op invariants / assumptions based on static program information.

Op verifiers are unable to detect certain cases where an op may turn out to be invalid at runtime. Such cases are hard to debug because they can crash the program execution at a seemingly unrelated position or (even worse) compute an invalid result without crashing.

Example: memref::ExpandShapeOp

Invalid at compile time: memref.expand_shape %m [[0, 1]] : memref<11xf32> into memref<2x5xf32>

Valid at compile time (because we do not know any better): memref.expand_shape %m [[0, 1]] : memref<?xf32> into memref<?x5xf32>. This op may or may not be valid at runtime depending on the runtime shape of %m.

Example: memref::CastOp

Invalid at compile time: memref.cast %m : memref<2xf32> into memref<5xf32>

Valid at compile time (because we do not know any better): memref.cast %m : memref<?xf32> into memref<5xf32>.

Example: memref::CollapseShapeOp

Non-contiguous dimensions cannot be collapsed. This is easy to detect when all dimensions and strides are static, but not can only be detected at runtime in the dynamic case.

Op documentation sometimes contains lingo such as “it acts as an assertion that fails at runtime” (memref.cast), but in practice such assertions are often not generated.

Proposal: ⚙ D138576 [mlir] Add RuntimeVerifiableOpInterface and transform

Add a new op interface RuntimeVerifiableOpInterface that can be implemented by ops that provide runtime verification. The interface has one method:

/// Generate IR to verify this op at runtime. Abort runtime execution if
/// verification fails. Return `success` if the runtime verification IR was
/// successfully built.
LogicalResult runtimeVerify(OpBuilder &b, Location loc);

A typical implementation of runtimeVerify may compute some checks and pass the result into cf.assert.

Runtime verifiers are not expected to verify assumptions that can be verified statically.

Runtime verification can be computationally expensive (e.g., for memref::CollapseShape), so the IR is only generated on an opt-in basis by running -runtime-op-verify.

Alternative considered

Generate assertions during the lowering to LLVM. Example for memref::CollapseShapeOp: ⚙ D138444 [mlir][LLVM] Add runtime debug assertions for memref::CollapseShapeOp.

Disadvantages:

  • Works only for LLVM backend. (Must be implemented separately for every backend.)
  • Tedious to write LLVMIR, esp. when there are loops involved.
  • Only works for ops that lower directly to LLVM.

@River707 @mehdi_amini @nicolasvasilache @ftynse @herhut @aartbik @_sean_silva @jpienaar @qcolombet

4 Likes

I love this concept: we are basically offering a multi-level UBSAN interface where each dialect can add runtime assertions for the current invariant at a given level of abstractions :slight_smile:
Something important for a good usability would be to have the runtime support to take good custom error messages with the failures.

3 Likes

As someone mentally preparing myself for a lot of dynamic shape codegen work next year, I’m a big fan of this. However, I wonder about leaving out the assertion part of this interface. My reasoning for this is that assertions provide more strict barriers to optimizations, and assertions can only be used as assertions.

If instead, this verification builder created a boolean value or a set of booleans with related error messages, a user could then use this information in a few different ways:

  1. This interface could be used by a frontend that wants to return an error code in cases of some invalid inputs rather than assertions. Assertions could be lowered to similar, but that would limit the pipeline to only returning errors or only having assertions without the flexibility to have both.
  2. This interface could be used by code that wants a less strict barrier to optimizations such as the Shape dialects witness types and constraint/assuming ops 'shape' Dialect - MLIR
  3. This interface could be used as proposed for assertions.
1 Like

+1 to the proposal. This is useful for guarding the correctness of dynamic behavior. Several comments though.

When reading the proposal, my main concern was the layering, in particular due to cf.assert being the hardcoded checking mechanism. So I second @tpopp’s proposal to relax this and give the caller of the interface control over how the results are verified. Then different passes can leverage the interface in different ways.

We also need some mechanism for the ops implementing this interface to specify the dialects in which the verification IR ops live (e.g., memref verifications are likely generating arith and/or index dialects). The runtime verifier generation pass will then have to declare these dialects as dependent somehow, unless the verified dialect itself does so. This part of the story is not very clear.

Finally, everyone’s favorite naming discussion. “-runtime-op-verify” sounds like it would run the verification and fail the pass pipeline, but it wouldn’t. It would instead generate the additional IR (that one may also need to lower), which may cause the further execution to fail. Similarly, runtimeVerify sounds like it will actually perform verification, especially given it returning LogicalResult and I can guarantee you that one will write if (succeeded(runtimeVerify(...))) really soon and will assume that verification succeeded. Let’s find names that actually reflect what the pass and the interface are doing – building or generating the runtime verification IR. Something like -geneate-runtime-verifications and buildRuntimeVerification are much less prone to confusion.

Separately, can you please give the examples of how the generation of runtime verifiers can fail, i.e. when runtimeVerify is supposed to return failure? The entire op builder infrastructure never returns any sort of failure. How is this failure interpreted? Are the implementations allowed to build some IR and then return failure?

1 Like

This makes sense. So the signature could be like this:

/// Generate IR to verify this op at runtime. Return a list of i1-typed values
/// that are assumed (can be asserted to be) satisfied (`true`) at runtime,
/// along with corresponding error messages in case they are not satisfied.
SmallVector<std::pair<Value, StringAttr>>
generateRuntimeVerification(OpBuilder &b, Location loc);

I think we can drop the LogicalResult status. There is no particular reason why I had it in the proposal and I cannot think of a good reason to have it.

The “default” -generate-runtime-verification pass can then generate the cf.assert ops.

+1 on adding this!

Is there any way we can have the interface allow the predicates to be non-i1? In Torch-MLIR we would want it to be !torch.bool. Within function bodies we only allow torch dialect types.

Also limiting this to a fixed string probably is not enough for a “UBSAN-like” diagnostic experience. We would want the flexibility to interpolate values and stuff in my error messages using our own ops. Like "matmul contracting dimension sizes must match, but LHS has shape {lhsShape} and RHS has shape {rhsShape}". Of course, this would require a dialect specific assertion op or somehow require a way to “return a string Value”.

1 Like

Based on some offline discussions with Sean, I just want to mention a trade off which is that the original proposal is more generic and can result in the second proposal by a later pass going through and converting assertions to something other than an assertion. I personally would still want to use the second version as being more ergonomic and immediately useful without more work, but I think Sean raised good points that the first version, without expectations of cf.assert, would be the most generic (by not restricting the interface to a given type (i1), which is similar to @ftynse’s concerns over cf.assert), and I think it would happen without making anything impossible that the second version would allow.

Note, I’m assuming a the pass to convert away assertions, but it would not be trivial. For example, an assertion inside a for loop would require modifying the containing for loop also.

I currently put the interface impls in external models that live in a separate file in Dialect/.../Transforms. Dependent dialects are loaded explicitly when adding the dialect extension to the dialect registry. (Same approach as with the bufferization external models.)

Can you wrap/rewrite the i1 in custom Torch-MLIR types in a subsequent pass. Also, how are you using any standard MLIR control flow ops such as scf.if, scf.while, scf.for, etc. (which only work on i1), or are you basically not using core MLIR ops?

Alternatively, maybe it is sufficient to allow arbitrary values being returned from generateRuntimeVerification. If they are I1, -generate-runtime-verification creates cf::AssertOps. Otherwise, -generate-runtime-verification fails and you have to write your own pass.

Also, I noticed that we have BoolLike. But it seems like it is not useful here.

Our dialect is self-contained. It has its own control flow and all other ops. For legacy reasons we actually rely still on builtin.module and func.func but if I had infinite time I would remove them too. We actually started with the builtin types but removed them over time because they did not adequately model our domain.

When lowering to linalg we of course convert !torch.bool to i1 and such, but that is later in the pipeline.

I think one question here is how general do we want to make this interface. I think for this to be a true peer to the static “verify” method and to provide a true multilevel UBSAN-like experience we need to handle arbitrary dialect predicate types and also have support for interpolated error messages.

The original approach presented in this PR where the callback was responsible for materializing the “abort op” itself lends itself naturally to this, since I can easily use my own abort op that already supports rich error message construction. I don’t see how to make this work with the “return the bool and error message” API, since that would require a “string type” for the rich error message and handling the custom predicate type somehow.

We don’t necessarily need to make something so general though, but we should name the interface appropriately, to avoid giving the false impression that MLIR is just the family of low-level dialects using builtin types/arith/scf/etc.

Why though? At some point a bool is a bool and any dialect could convert their boolean result to an i1…
We need to be able to bottom out to a “call to the ubsan runtime” and I don’t quite get why this would have to be a “multi-level thing”?
On another take: if you’re in your own dialect and don’t intend to mix, there is not much value in generic passes like this one or even like canonicalization: you can write your own passes!

I agree that without some common notion of what it means to build a string at runtime we’ll run in a dead-end here. I’m not totally sure how to solve this though.

That said we can get pretty far without this, for example I suspect we could have a “printf” like primitive for the “UBSAN runtime” though, where we’d support only the python style formatting (a very small subset of it).
For example let’s say we only support {}, {<n>} and we take a variadic number of operands. With a static format string (a string attribute) we can statically match the format matches with the arguments.
We can start with support for builtin types only, and then it is just a matter of dialect interface for lowering. That is:
ub.assert(%predicate, "matmul contracting dimension sizes must match, but LHS has shape {} and RHS has shape {}", %LHSshape, %RHSshape).

The generic ub.assert could accept any type of the %predicate and any type for the variadic list of operands as well: it is a matter of lowering to make it bottom out to something that the runtime can support. At this point if you stay purely in the torch dialect for example, you could even convert this ub.assert to whatever runtime you’re targeting.

I would expect that upstream defines some minimal runtime support for an assert that supports printing builtin types, and possibly some extensibility for printing custom types.

These are quite privileges though: we don’t make every effort to support other integer than the builtin ones on the same level for example. These “low-level” primitives as a common foundation seem incredibly useful to me: this is basically our “FFI” mechanism: you can have your own types and you interact with the ecosystem through these builtin type somehow.

I came to the same conclusion.

I wouldn’t mind renaming the interface if it can make the intention of the interface clearer. Two candidates that came up when talking with Sean offline were I1RuntimeVerifiableOpInterface and GenerateDebugPreconditions. (Not sure if these make the intention clearer or more confusing…) Or maybe expanding the pass/interface description could be sufficient.

Side note: I have WIP revision that I hadn’t sent out yet for exactly that: ⚙ D138775 [WIP][mlir] Support SSA values in runtime op verification errors

Canonicalization is 100% applicable to arbitrary closed domains of MLIR dialects. So is folding, static verification, CSE, symbol-dce, parsing, printing, call graph interfaces. A good example of this upstream is PDL.

I’m not saying we should not have an interface that forces i1, but let’s temper the genericness claims to avoid equating this with the genericness of the regular static verifier. Forcing a user to use i1 is really not very different from forcing a user to use func.func – we do it for practical reasons sometimes, but we don’t call such passes truly multilevel.

<rant>
There seems to be a clique of (relatively “low-level” / “machine-oriented”) dialects – linalg, func, memref, arith, scf, and the associated types and attributes that some people equate with “MLIR”. But in reality those are just a particular set of dialects. MLIR’s ability to support totally domain-specific compilers is really, really important and valuable, and IMO we should be using language or directory structure that is inclusive to projects that use it in the broader way.
</rant>

I personally don’t really care that much for this particular interface. I’m just pointing out that there is tech debt and possibly confusion being created here. Hopefully at some point we get a directory restructure that makes the layering here more clear.

1 Like

Haven’t had time to digest this (coming back from a long US holiday over the past 1.5 weeks). Will form better thoughts over the next day or so.

100% agree with everything written here. I don’t consider things that limit to specific types/ops/etc. to be that “generic”.

– River

1 Like

Here is an attempt at making the interface entirely independent of specific ops and types, based on the original proposal. I do not have a preference towards one or the other.

/// A callback that build IRs to abort program execution.
using AbortBuilderFn = std::function<void(OpBuilder &, Location loc)>;

/// Generate IR to verify this op at runtime. If runtime verification fails,
/// the generated IR is supposed to abort program execution using IR generated
/// by `abortBuilder`.
void generateRuntimeVerification(OpBuilder &builder, Location loc,
                                 AbortBuilderFn abortBuilder);

This interface is not tied to any type or operation. Instead of emitting cf.assert, interface implementations must generate if checks as needed and populate the “then” block with abortBuilder().

We may want to add a cf.abort op to the control flow dialect, so that we have a backend-independent cf-generate-runtime-verification pass: This pass would provide a builder that builds cf.abort ops:

static void cfAbortBuilder(OpBuilder &b, Location loc) {
  b.create<cf::AbortOp>(loc);
}

Alternatively, we could also just generate cf.assert(%c0).

This can then be extended to support richer error messages with format strings.

/// A callback that build IRs to abort program execution.
using AbortBuilderFn = std::function<void(OpBuilder &, StringRef msg,
                                          ValueRange placeholders)>;

Similarly, cf::AbortOp now has two arguments. (But depending on what AbortBuilderFn is specified, the program could be aborted in a different way.)

let arguments = (ins StrAttr:$msg, Variadic<AnyType>:$placeholders)

By providing different AbortBuilderFn callsback, users of this interface could choose to, e.g., return error codes instead of simply aborting.

It is vastly different to me, i1 is as built in as it goes.

I agree with the sentiment of your message an the note about linalg and other stuff, I just totally differ on the “genericness” of using i1 for denoting a bool in a generic interface: I don’t see any tech debt here, or any need to lower the claim about how generic the interface would be.

Moving i1 out of builtin and making such interface more generic for the sake of it seems like unnecessary generalization and added complexity: IMO this is just as much of tech debt in another direction.

We had an offline discussion about this and concluded that this interface cannot be easily used to support this use case in any of the proposed variants.

The main issue is that the generated verification IR may contain a runtime loop. (Or other complex IR.)

scf.for %i = %c0 to %dynamic_rank step %c1 {
  %v = "some_check"() : () -> (i1)
}

To send back an error code to the frontend, all %v values must be accumulated. So returning a Value for each generated check is not useful. We could allow only a single returned Value, but it is unclear how that works with error messages.

I don’t see any other benefits of returning Values, so I am in favor of sticking with the simpler (original) propsal: Generating aborts (e.g., with cf.assert) directly. This also fits better with the “verification” name: The dynamic verifier behaves likes a static verifier in a sense that it crashes the program or compilation.

If you have a use case where an op can legitimately be invalid at runtime, you can add your own project-specific error checking that knows how to return error codes etc. Also, the check will probably be less expensive because you can selectively check for error conditions in the right place (e.g., when entering the function) instead of in front of every single op.

Finally, it is not clear to me whether an abortBuilderFn (presented in the latest proposal) is really useful, so I’d leave that for a later point of time when we find a good use case. (Unless somebody has a good use case.) It can be extended easily if needed.

2 Likes

I just want to start this by saying thank you to @matthias-springer for going back and forth on the design points (very appreciated). I had a half written response, but never posted (oops). Updated to account for your most recent post.

I understand where you are coming from, but we have not defined “generic” this way in practice. For example, the inliner doesn’t require the builtin function type. Just like with function types, there are a lot of rich self-contained systems that (for better or worse) don’t use i1 to mean “bool”. I agree with @_sean_silva in that defining supposed “core” interfaces this way severely limits the abstractions that could make use of it. Aside from just “scalar” boolean checks, I’ve worked on various systems that predicate assert/print messages on vector operations. I also have use cases that could benefit from such an interface, but wouldn’t (at least easily) be able to abide by the i1 requirement.

This one left the construction of the assert op up the interface? If so I’m on board with that!

The requirements on the abort builder is something that wasn’t very clear to me in the above. Is the interface going to define a base line of support for what the abort can support? i.e. what is the expected structure of the “msg”? This would have to be well defined to be able to take advantage of, otherwise we’re going to get weird cases where the format provided isn’t understood or supported. I’d be slightly wary that we’d over constrain the interface to some very very basic “abort” that didn’t support what real users want from an error experience.

– River

2 Likes

Yes. I updated the Phabricator revisions (⚙ D138576 [mlir] Add RuntimeVerifiableOpInterface and transform and stack).

Yes, the AbortBuilderFn doc string would specify the expected structure of msg. But the current stack of revisions does not have an abort builder, because it is not clear to me when this would be useful. (I.e., memref::CastOp::generateRuntimeVerification always generates cf::AssertOps.)

Ref please? Because I don’t think we have defined “generic” at all actually :slight_smile:

I agree with this sentence, it’s not clear to me how you make the jump that “i1” isn’t just “generic enough” here. That is: what we need here is a true/false answer provided to “assert” that does not need to carry anything richer.
I’d like to see at minimum an example that justifies any added complexity and show how i1 would be a significant hurdle.

It’s not clear to me how that applies here? Can you provide an example of how we’d assert on a vector here?

To me the more important realization regarding complexity was that using i1 still creates a complicated situation in the case of any control flow, so an abortBuilderFn callback does not create more complexity in this regard, and my hope of using these i1s in complicated situations would require complicated logic.

In return, an abortBuilderFn seems to not cause much more complexity, but allows pipelines to specialize the abort to their runtime. This could be done by creating cf.asserts with casts between i1s and some other type specific to the relevant dialects, but this then creates i1s that might not exist at this level of the dialect before. I think casting back and forth between i1s is bad for 2 reasons:

  1. I don’t like this situation, but MLIR pipelines that I see have a large amount of very specific ordering of passes, and suddenly forcing in another type not seen before can complicate those pipelines.
  2. Similarly, I think MLIR allowing a mixture of many dialects should not actually be used to mix too many dialects. It is better to have several stages that each are like their own specialized LLVM IR. Forcing i1s is similar to the first description of mixing all dialects at the same time and prevents as clear of separations between different sets of dialects and their transformations at a given point in the pipeline.
1 Like