[RFC] Introduce the concept of IR listeners in MLIR

I am actually not sure what you mean here, I struggle to connect it to the proposed mechanism concretely.
For example what does it mean that “some producers will be more diligent than others”? Are you talking about my proposal or referring to the “Rewriter” instead?
The way I implemented it here makes it that there is no notion of “producer” (external to the core of MLIR) and nothing anyone writing transformations should know or adjust here: it’s fully transparent.

I was referring to:

But there is (IMO) a much better way to accomplish the same task: use a
ValueHandle to trigger the update on deletion. This will prevent widespread
failure to use the deleteValue API to update alias analysis.

The last sentence makes it pretty clear that this is a superior solution. The notion of being “the best” is rather me concluding this in the absence of other solution. There may be hypothetical other solutions, but I’m unaware of them. If someone has experience from other compiler frameworks about how to handle this, I’d love to brainstorm more!

What I am doing here is not much different than ValueHandle conceptually as far as I can tell? Of course we can look whether we use a side-map in the context instead of a pointer on the operation, but that’s an implementation detail we can sort out after figuring out the conceptual “feature” the framework should provide.

If you’re talking about code-reuse, we could look into it but I’m not sure the component is complex enough to warrant reuse across LLVM IR and MLIR: this kind of things always looks appealing on the surface but not very practical when we dig in (we looked into whether we should refactor the LLVM “new” pass manager at the time for example instead of building one for MLIR).

IMO: Chandler emails shows what happen when dedicated/coupled APIs are used instead of having the framework handle it.
The “not using ValueHandle” was the problem reported, I just read this as ValueHandle being the more principled approach to the problem (in particular no coupling to transformations).

I would expect listener not attached to any operation to be useful for “automated” analysis updates.
for example every analysis could provide there updater/listener. Transformation passes could register listeners for the analysis they use in the IRRewriter used to edit the IR.
This would allow sharing analysis update code across Transformation passes. which is barely done in LLVM.

But what is the expected used case for attaching a listener to the operations ?

also it is possible to edit IR without going through a rewriter (which I dislike). is this intentional or should we get rid of that ?

Well this is kind of a major part of issue: the Rewriter is not the main way of editing IR, instead Rewriter is a temporary object that is right now setup mostly by a pattern driver and passed in the pattern application logic.

That may have been a tangent I followed…

I’m talking about the usage of your proposal, not the proposal itself. Your proposal is self-contained and just a way to register listeners, nothing mandates what people do with those. That’s fine.

But usually, the more generic the infrastructure, them more it leads to unintended uses. If people use to print stuff, that’s simple. If they use it to keep semantics across different passes (which is what I would use for), then it gets more complicated.

If those frameworks (built on top of listeners) become official in MLIR, for example to have a more resistant form of metadata, then what the listener does when encountering change will directly impact the quality of such metadata. This is what I meant by being “diligent”.

Metadata is optional and passes can drop them precisely to avoid the necessity of everyone being diligent about everyone else’s random information, and listeners don’t change that requirement. So my question is: for this particular use-case, why would it be better than metadata?

But that was a massive tangent, so it got lost. It may not be an important use-case, or even a potential one, so ignore if it’s not relevant.

I agree, but it still would be good to dig a bit deeper, like you did with the pass manager, to make sure there is consensus that this is the best way. I don’t really like that the MLIR pass manager does very similar things as the LLVM one, but it’s a whole new implementation. The more of those things we have in MLIR, the more we’ll have to duplicate implementation across projects in the future.

Got it I think: you used “producer” because these listeners are “producing” updated “metadata” right? (You were a step further than me: for the listener at on the “receiving” end of an IR modification).

Correct! :slight_smile:

Related work in LLVM: [RFC] Lightweight LLVM IR Checkpointing

I think the concept of IR listener is needed but I don’t really like the fact that it is local to every operation.

have you considered that the listener could be on every op that is IsolatedFromAbove. when an operation is modified, it would notify the listener on the closest parent that is IsolatedFromAbove ?

No I haven’t considered this, I’m a bit worried of the cost though: now whether there is a listener or not you need to walk back to the closest isolated parent.
Every op->getParentOp() is 3 pointers to chase I believe: getParentBlock()->getRegion()->getParent() and you may have to go more than one level! Considering the granularity of the tracking (operand swap for example) we need something that is “free” to check.

“thinking out loud”: we could add more machinery and keep a bit on the operation “does my isolated parent have a listener” though, that may be a good tradeoff.

No I haven’t considered this, I’m a bit worried of the cost though: now whether there is a listener or not you need to walk back to the closest isolated parent.
Every op->getParentOp() is 3 pointers to chase I believe: getParentBlock()->getRegion()->getParent() and you may have to go more than one level! Considering the granularity of the tracking (operand swap for example) we need something that is “free” to check.

You are right that it could be too expensive.

“thinking out loud”: we could add more machinery and keep a bit on the operation “does my isolated parent have a listener” though, that may be a good tradeoff.

an alternative is to have a thread_local pointer on the current listener.

Can you elaborate on this? Do you mean a global variable? It seems problematic to me because there is no strong correlation between IR and threads in general (we use a thread pool), also that means there is a single listener for the entire thread regardless of the IR?
(and I’m always very wary of mutable global state, it doesn’t mesh well with library uses where multiple users of MLIR may exists in the same process space).

Can you elaborate on this? Do you mean a global variable? It seems problematic to me because there is no strong correlation between IR and threads in general (we use a thread pool), also that means there is a single listener for the entire thread regardless of the IR?

yes I mean global variable.
if listener don’t outlive passes it is not an issue.
inside one pass the given thread will only touch the IR the pass was provided.
it is possible for a pass to setup a listener for its own scope with an RAII object that will set the listener back to null after it is done and the pass manager can assert that the listener is null at the end of every pass.

(and I’m always very wary of mutable global state, it doesn’t mesh well with library uses where multiple users of MLIR may exists in the same process space).

I understand the concern but it can be greatly mitigated will proper abstraction that enforces the rules.

if the intent is to make listeners last longer than a single pass, the pass manger needs to cooperate. by setting the listener attached to some IR unit before a pass operates on it.

@mehdi_amini Thank you for making this proposal.

I am currently exploring ways to develop a debug utility that can be used to derive end-to-end traceability of the journey(transformation history) of an MLIR Op/Module from, say, TF dialect to TFL dialect or any downstream dialect. This traceability of the Op Provenance is going to be particularly useful to graphically visualize the transformation history of a TF Model in a compiler pipeline, like TFLite Converter.

As pointed-out by other members, earlier in the thread, the information available from Listeners is useful but it is going to be difficult to build the context, with only knowing What changed.

I have been experimenting with the available MLIR Source Location information. I think this information, paired with LocationSnapshot is useful to derive some context for the changes that occurred after a given MLIR Pass.

I created a prototype to demonstrate this here- Utility to create a list MLIR Mutations after a transform. I am interested in getting some early feedback from the community and can send a more formal proposal if there is interest.

Being able to trace back from any level of the IR (and ultimately the codegen) back to every decision taken by the compiler during the compilation is definitely an important use case I thought of behind the IRListener concept: there are many applications in security, obfuscation, etc. but also in debugging: if you have a miscompute you may want to trace back to how you got there. In the current framework this is print-ir-after-all and manually inspect the IR before/after every pass (which it tedious and has a granularity issue, see below the comment on locations).
Beyond that, just being able to debug and understand the compiler decisions during development seems like a very useful thing to have.

Tracing isn’t the only thing though, in a modular and composable design, decoupling is key and in being able to invoke other components without invalidating an analysis completely (avoid dangling pointers in maps for example) is necessary (this is how ValueHandle is used today in LLVM).

Locations is an attractive easy reach to try a “best effort” solution in tracing the IR, but it can’t really be an infrastructure to build upon for various reasons. On top of which: it relies on the transformations to update, and it just leaves a “paper trail” that is quite imperfect in how you can reconstruct the changes. Also it does not capture any intermediate steps: if you look at the location after “canonicalization” for example, you may be able to infer so correlation between the resulting IR and the producer IR, but you won’t know how you got there and you won’t have much info on what the sequence of transformations that got you there was (even though Action will gives some finer granularity there).

Location is likely “good enough” to get 90% there on multiple specific use-cases and workflow, but I don’t see it a basic block to build infrastructure that we can scale in MLIR generically.

2 Likes

@mehdi_amini Thank you for the reply! Really appreciate your time.

Can you provide your thoughts on how we could use the information from IR Listeners to build IR level op traceability? I agree, Listeners provides a lot of useful and fine-grained information. But something I don’t see available is the semantic connections between Ops(IRUnits) before and after a transform, just with the data available from listeners and without using any location info.

Thinking out loud- One way to go about this is; All OpBuilder and PatternRewriter level mutations are performed at an [InsertPoint](https://mlir.llvm.org/doxygen/classmlir_1_1OpBuilder_1_1InsertPoint.html). InserPoint is a location of an existing op → If I am able to listen to all the op-level changes(insert, delete, move, etc) at an InsertPoint, this is probably enough information to construct the connections?

Also it does not capture any intermediate steps: if you look at the location after “canonicalization” for example.

Can you please elaborate on this? What are the types intermediate steps you have in mind? One example I looked at is- constant folding happening via rewriter.create. Based on the situation the information from a constant folding steps could end up becoming too much unnecessary data. For example consider, the following TFL optimization that tries to fuse tfl.fully_connected with a tfl.add
–Before:

%cst = arith.constant dense<2.000000e+00> : tensor<1xf32> loc(#loc5)
%2 = "tfl.no_value"() {value} : () -> none loc(#loc8)
%3 = "tfl.fully_connected"(%1, %0, %2) {fused_activation_function = "NONE", keep_num_dims = false, weights_format = "DEFAULT"} : (tensor<128x2xf32>, tensor<*xf32>, none) -> tensor<128x1xf32> loc(#loc9)
%4 = tfl.add(%3, %cst) {fused_activation_function = "NONE"} : (tensor<128x1xf32>, tensor<1xf32>) -> tensor<128x1xf32> loc(#loc10)

–After:

%cst = arith.constant dense<2.000000e+00> : tensor<1xf32> loc(#loc5)
%2 = "tfl.fully_connected"(%1, %0, %cst) {fused_activation_function = "NONE", keep_num_dims = false, weights_format = "DEFAULT"} : (tensor<128x2xf32>, tensor<*xf32>, tensor<1xf32>) -> tensor<128x1xf32> loc(fused[#loc8, #loc9, #loc10])

In this optimization, tfl.add(%2, %cst) had been inserted and folded away as part of an intermediate constant folding. From a traceability standpoint, the information available via locations is probably enough. Although one might argue that its useful to know the full picture of the events that took place for debugging.

Yes, I 100% agree that the fact that location information is reliant on transforms to update the info correctly makes the approach and output misleading.

Can you please provide us with some advise on how to proceed further?

Can you provide your thoughts on how we could use the information from IR Listeners to build IR level op traceability? I agree, Listeners provides a lot of useful and fine-grained information. But something I don’t see available is the semantic connections between Ops(IRUnits) before and after a transform, just with the data available from listeners and without using any location info.

I’m not sure what you’re looking for exactly? Can you provide an example of what you would want to see in the end?

The API has notifyOpInserted, notifyOpMoved, etc. this should be what you need right?

“application of pattern XYZ”, or “fuse these two loops” are example of intermediate steps.

For next steps here, I plan to try the alternative encoding that I thought of here: [RFC] Introduce the concept of IR listeners in MLIR - #22 by mehdi_amini (that is storing a single bit per non-isolated operation instead of a full pointer).

@mehdi_amini Thank you for the response and your help with this!

My goal is, to programmatically know, after a pass, the source of an Op in the module. By source, I mean- an Op(or IRUnit, or group of Ops) from the version of the module before the pass.

For example, see the following list of mutations-

  1. conversion => (tf.A) → (tfl.B) => source of tfl.B is tf.A
  2. fusion => (tfl.C, tfl.D) → (tfl.E) => source of tfl.E is [tfl.C, tfl.D]
  3. unfuse => (tf.BatchMatMul) → (tf.split, tf.FullyConnected, tf.FullyConnected, tf.pack) => source of tf.split is tf.BatchMatMul

It may be possible, that in 3, there were probably a few intermediate changes that were not captured due to the execution of multiple rewriters in the same pass. My understanding is that MLIR Actions will enable instrumentation to achieve that granularity(thank you for that proposal as well!).

I don’t have much to add to this discussion, just a few thoughts:

I cleaned up around 15 cases of incorrect API usage in rewrite patterns of multiple MLIR dialects over the last days (see referenced revisions in D144552 if you’re curious). A few more in downstream projects (IREE). Some of these have lead to (hard-to-debug) issues in the GreedyPatternRewriteDriver and the transform dialect, which has prompted this investigation.

None of these bugs would have happened if we had IR listeners. IR listeners could also make the RewritePattern API easier to use for users:

  • No more LogicalResult return value for matchAndRewrite
  • No more updateRootInPlace etc.

Three MLIR components come to mind that would immediately benefit from IR listeners:GreedyPatternRewriteDriver, dialect conversion, transform dialect.

2 Likes

Hello everybody,

What is the status of this work? I can see the patch under code review (⚙ D142291 Introduce the concept of IR listeners in MLIR) but the last event dates back to January.

Regards,
Dumitru

I haven’t pursued this further for now, I still think it is interesting, but since then I also landed the work on properties, which makes it impossible to intercept every mutation.