I have a use case where some transformation could use DominanceInfo during transformation. I have typically stayed away from using DominanceInfo
during transformation to mitigate the cost of recomputing the DominanceInfo
. In my specific use case I can guarantee that my transformations are not changing the control dependence of the program. So the only cost of recomputation comes down to the cost of isBeforeInBlock
. This has some logic to recompute the order if the operations in the block change. My transformations will add new operations / remove operations from the IR. So the recomputation of this order will be needed. I am trying to get an idea of what this cost is, and if there are strategies I can use to mitigate the cost. My answer so far has been, this cost is too high, so use Dominator information to analyse the program before making any transformations, and do not use Dominator information during transformation. Having Dominator information during transformation though would be extremely useful.
Thoughts @mehdi_amini @River707 @Mogball ?
Recomputation isn’t needed if something gets removed, only if something gets added (and there isn’t space for it). The computation itself (if it wasn’t clear) is O(N) of the number of ops in the block.
If the computation is expensive, that generally would indicate that you are butting into the edge case that requires recomputation, i.e. if you add N operations in the middle of the block (where N is currently 5, given that we buffer for a certain number of operations being inserted). It’s tricky to give specific suggestions without seeing the code except for:
a) it’d be nice to understand if you’ve benchmarked how expensive it is, just to make sure this recomputation is the hot part (though I think you have?)
b) be careful about insert/check/insert/check loops (batching/caching is nice, but that’s general advice)
c) Try bumping the size of the stride that we buffer between operations when renumbering, it’s currently 5 but might be better for some cases as something slightly higher (like 10). We might also want to consider more “interesting” ways to avoid recomputing the whole block, but that’s only if really necessary.
Maybe not helpful, but the computation being hot has a direct correlation with some loop of adding lots of ops in a row -> checking dominance -> adding lots of ops in a row -> etc.
Thanks River!
I dont have profiling numbers, because I looked at the O(N) recomputation time and just avoided to start with. So I dont have an implementation while using dominators to evaluate effect of compilation time. Its great to see there are some strategies to mitigate the cost. I am looking at fuseElementwiseOps
, and seeing if using dominance info can increase the scope of the transformation. It wouldn’t be very useful for you to look at the transformation itself as it is today. But it does an insertion of a linalg.generic
op per pattern application. Also ends up cloning the region of these ops. I am trying to see if I can address the cloning part.
So it would end up in the loop you talked about here: