[RFC] Introduce the concept of IR listeners in MLIR

I think the concept of IR listener is needed but I don’t really like the fact that it is local to every operation.

have you considered that the listener could be on every op that is IsolatedFromAbove. when an operation is modified, it would notify the listener on the closest parent that is IsolatedFromAbove ?

No I haven’t considered this, I’m a bit worried of the cost though: now whether there is a listener or not you need to walk back to the closest isolated parent.
Every op->getParentOp() is 3 pointers to chase I believe: getParentBlock()->getRegion()->getParent() and you may have to go more than one level! Considering the granularity of the tracking (operand swap for example) we need something that is “free” to check.

“thinking out loud”: we could add more machinery and keep a bit on the operation “does my isolated parent have a listener” though, that may be a good tradeoff.

No I haven’t considered this, I’m a bit worried of the cost though: now whether there is a listener or not you need to walk back to the closest isolated parent.
Every op->getParentOp() is 3 pointers to chase I believe: getParentBlock()->getRegion()->getParent() and you may have to go more than one level! Considering the granularity of the tracking (operand swap for example) we need something that is “free” to check.

You are right that it could be too expensive.

“thinking out loud”: we could add more machinery and keep a bit on the operation “does my isolated parent have a listener” though, that may be a good tradeoff.

an alternative is to have a thread_local pointer on the current listener.

Can you elaborate on this? Do you mean a global variable? It seems problematic to me because there is no strong correlation between IR and threads in general (we use a thread pool), also that means there is a single listener for the entire thread regardless of the IR?
(and I’m always very wary of mutable global state, it doesn’t mesh well with library uses where multiple users of MLIR may exists in the same process space).

Can you elaborate on this? Do you mean a global variable? It seems problematic to me because there is no strong correlation between IR and threads in general (we use a thread pool), also that means there is a single listener for the entire thread regardless of the IR?

yes I mean global variable.
if listener don’t outlive passes it is not an issue.
inside one pass the given thread will only touch the IR the pass was provided.
it is possible for a pass to setup a listener for its own scope with an RAII object that will set the listener back to null after it is done and the pass manager can assert that the listener is null at the end of every pass.

(and I’m always very wary of mutable global state, it doesn’t mesh well with library uses where multiple users of MLIR may exists in the same process space).

I understand the concern but it can be greatly mitigated will proper abstraction that enforces the rules.

if the intent is to make listeners last longer than a single pass, the pass manger needs to cooperate. by setting the listener attached to some IR unit before a pass operates on it.

@mehdi_amini Thank you for making this proposal.

I am currently exploring ways to develop a debug utility that can be used to derive end-to-end traceability of the journey(transformation history) of an MLIR Op/Module from, say, TF dialect to TFL dialect or any downstream dialect. This traceability of the Op Provenance is going to be particularly useful to graphically visualize the transformation history of a TF Model in a compiler pipeline, like TFLite Converter.

As pointed-out by other members, earlier in the thread, the information available from Listeners is useful but it is going to be difficult to build the context, with only knowing What changed.

I have been experimenting with the available MLIR Source Location information. I think this information, paired with LocationSnapshot is useful to derive some context for the changes that occurred after a given MLIR Pass.

I created a prototype to demonstrate this here- Utility to create a list MLIR Mutations after a transform. I am interested in getting some early feedback from the community and can send a more formal proposal if there is interest.

Being able to trace back from any level of the IR (and ultimately the codegen) back to every decision taken by the compiler during the compilation is definitely an important use case I thought of behind the IRListener concept: there are many applications in security, obfuscation, etc. but also in debugging: if you have a miscompute you may want to trace back to how you got there. In the current framework this is print-ir-after-all and manually inspect the IR before/after every pass (which it tedious and has a granularity issue, see below the comment on locations).
Beyond that, just being able to debug and understand the compiler decisions during development seems like a very useful thing to have.

Tracing isn’t the only thing though, in a modular and composable design, decoupling is key and in being able to invoke other components without invalidating an analysis completely (avoid dangling pointers in maps for example) is necessary (this is how ValueHandle is used today in LLVM).

Locations is an attractive easy reach to try a “best effort” solution in tracing the IR, but it can’t really be an infrastructure to build upon for various reasons. On top of which: it relies on the transformations to update, and it just leaves a “paper trail” that is quite imperfect in how you can reconstruct the changes. Also it does not capture any intermediate steps: if you look at the location after “canonicalization” for example, you may be able to infer so correlation between the resulting IR and the producer IR, but you won’t know how you got there and you won’t have much info on what the sequence of transformations that got you there was (even though Action will gives some finer granularity there).

Location is likely “good enough” to get 90% there on multiple specific use-cases and workflow, but I don’t see it a basic block to build infrastructure that we can scale in MLIR generically.

2 Likes

@mehdi_amini Thank you for the reply! Really appreciate your time.

Can you provide your thoughts on how we could use the information from IR Listeners to build IR level op traceability? I agree, Listeners provides a lot of useful and fine-grained information. But something I don’t see available is the semantic connections between Ops(IRUnits) before and after a transform, just with the data available from listeners and without using any location info.

Thinking out loud- One way to go about this is; All OpBuilder and PatternRewriter level mutations are performed at an [InsertPoint](https://mlir.llvm.org/doxygen/classmlir_1_1OpBuilder_1_1InsertPoint.html). InserPoint is a location of an existing op → If I am able to listen to all the op-level changes(insert, delete, move, etc) at an InsertPoint, this is probably enough information to construct the connections?

Also it does not capture any intermediate steps: if you look at the location after “canonicalization” for example.

Can you please elaborate on this? What are the types intermediate steps you have in mind? One example I looked at is- constant folding happening via rewriter.create. Based on the situation the information from a constant folding steps could end up becoming too much unnecessary data. For example consider, the following TFL optimization that tries to fuse tfl.fully_connected with a tfl.add
–Before:

%cst = arith.constant dense<2.000000e+00> : tensor<1xf32> loc(#loc5)
%2 = "tfl.no_value"() {value} : () -> none loc(#loc8)
%3 = "tfl.fully_connected"(%1, %0, %2) {fused_activation_function = "NONE", keep_num_dims = false, weights_format = "DEFAULT"} : (tensor<128x2xf32>, tensor<*xf32>, none) -> tensor<128x1xf32> loc(#loc9)
%4 = tfl.add(%3, %cst) {fused_activation_function = "NONE"} : (tensor<128x1xf32>, tensor<1xf32>) -> tensor<128x1xf32> loc(#loc10)

–After:

%cst = arith.constant dense<2.000000e+00> : tensor<1xf32> loc(#loc5)
%2 = "tfl.fully_connected"(%1, %0, %cst) {fused_activation_function = "NONE", keep_num_dims = false, weights_format = "DEFAULT"} : (tensor<128x2xf32>, tensor<*xf32>, tensor<1xf32>) -> tensor<128x1xf32> loc(fused[#loc8, #loc9, #loc10])

In this optimization, tfl.add(%2, %cst) had been inserted and folded away as part of an intermediate constant folding. From a traceability standpoint, the information available via locations is probably enough. Although one might argue that its useful to know the full picture of the events that took place for debugging.

Yes, I 100% agree that the fact that location information is reliant on transforms to update the info correctly makes the approach and output misleading.

Can you please provide us with some advise on how to proceed further?

Can you provide your thoughts on how we could use the information from IR Listeners to build IR level op traceability? I agree, Listeners provides a lot of useful and fine-grained information. But something I don’t see available is the semantic connections between Ops(IRUnits) before and after a transform, just with the data available from listeners and without using any location info.

I’m not sure what you’re looking for exactly? Can you provide an example of what you would want to see in the end?

The API has notifyOpInserted, notifyOpMoved, etc. this should be what you need right?

“application of pattern XYZ”, or “fuse these two loops” are example of intermediate steps.

For next steps here, I plan to try the alternative encoding that I thought of here: [RFC] Introduce the concept of IR listeners in MLIR - #22 by mehdi_amini (that is storing a single bit per non-isolated operation instead of a full pointer).

@mehdi_amini Thank you for the response and your help with this!

My goal is, to programmatically know, after a pass, the source of an Op in the module. By source, I mean- an Op(or IRUnit, or group of Ops) from the version of the module before the pass.

For example, see the following list of mutations-

  1. conversion => (tf.A) → (tfl.B) => source of tfl.B is tf.A
  2. fusion => (tfl.C, tfl.D) → (tfl.E) => source of tfl.E is [tfl.C, tfl.D]
  3. unfuse => (tf.BatchMatMul) → (tf.split, tf.FullyConnected, tf.FullyConnected, tf.pack) => source of tf.split is tf.BatchMatMul

It may be possible, that in 3, there were probably a few intermediate changes that were not captured due to the execution of multiple rewriters in the same pass. My understanding is that MLIR Actions will enable instrumentation to achieve that granularity(thank you for that proposal as well!).

I don’t have much to add to this discussion, just a few thoughts:

I cleaned up around 15 cases of incorrect API usage in rewrite patterns of multiple MLIR dialects over the last days (see referenced revisions in D144552 if you’re curious). A few more in downstream projects (IREE). Some of these have lead to (hard-to-debug) issues in the GreedyPatternRewriteDriver and the transform dialect, which has prompted this investigation.

None of these bugs would have happened if we had IR listeners. IR listeners could also make the RewritePattern API easier to use for users:

  • No more LogicalResult return value for matchAndRewrite
  • No more updateRootInPlace etc.

Three MLIR components come to mind that would immediately benefit from IR listeners:GreedyPatternRewriteDriver, dialect conversion, transform dialect.

2 Likes

Hello everybody,

What is the status of this work? I can see the patch under code review (⚙ D142291 Introduce the concept of IR listeners in MLIR) but the last event dates back to January.

Regards,
Dumitru

I haven’t pursued this further for now, I still think it is interesting, but since then I also landed the work on properties, which makes it impossible to intercept every mutation.