For fine-grain performance trancing and for industrial code traceability it would be interesting to be able to connect the components of the generated code (e.g. dispatches in the iree VM, or top-level loops in a piece of sequential code) to the operations of the input specification (e.g. linalg.matmul).
The connection does not need to be 1-to-1. Thus, loop fusion would typically associate to one loop in the final code multiple input operations. Similarly, only the top-level affine.for generated for a linalg.matmul would be associated with the initial operation.
Is there some support in MLIR for this?
If not, are there other people here interested in traceability?
One immediate application would be fine-grain performance tracing.
I have found the dispatch_profiler of iree, but (I may be wrong) it seems to only take as input single operations, such as linalg.matmul. My objective would be to profile full applications.
For approximative tracing like you seems to seek for, we are using “debug location”: every operation has these and they flow through the pipeline. Ultimately when using LLVM they end up in Dwarf potentially.
Yes, MLIR tracks source locations step by step, which lets you map back from compiled code to the input program (or anywhere in between). As operations are created, they can either choose to reuse a source location from a single predecessor, create a new “fused” location from multiple source locations, or create a new source location.
Since you asked about IREE, we do most of our in-depth performance analysis of the IREE compiler and IREE runtime using Tracy (IREE docs here).