Thanks for starting the discussion.
I cannot follow this: there is no way in my opinion to define what “do no harm” means. There is almost always a path or a target that will prefer one form or the other. This is the reason why Dan’s post does not emphasize this, and on the contrary I believe it explains why it isn’t a good idea to emphasize this in the “Yeah so what about x * 2…” section: it can be seen “harmful” to turn a fast addition into a slow multiplication, but canonicalization isn’t about this! That’s missing the whole point.
“do no harm” requires cost model and complex analysis which are beyond what you can expect from canonicalization. However the “do not drop information” and “can easily recover” part of the principle should already be enough! (because if you don’t lose information, how can you do harm since an optimization can recover?).
This is overly ambiguous and not very principled IMO.
I don’t understand this: there is little definition of “simpler” and that is missing the point of canonicalization entirely: the subsequent analysis and transformations should have an easier time finding patterns not because the program is simpler but because it is in a pre-defined form!!
According to this principle we should stop pushing constants to the right of commutative operations for example…
The fundamental of canonicalization is there: at a given abstraction level, two semantically equivalent program should be transformed in the same form as much as possible. If you don’t subscribe to this, then why use canonicalization at all? Especially: it’s not an “optimization”.
Here is the basic set of principles I set for canonicalization:
- Cannot lose information. The information should be recoverable without “heroics” or “raising” by subsequent analysis and transformations: we want these to simply pattern match the canonical form.
As such canonicalization operates at one level of abstraction (that does not mean “inside a dialect”, it’s unfortunately ambiguous as some dialect sometimes span two level of abstractions, or one level of abstraction spans multiple dialects…). - Should be fast enough: canonicalization is iterative and especially matching can’t be overly complex.
Do you mean “break” in the abstract absolute sense, or break the “current implementation of some analysis”? The former is a good argument against a canonicalization, the latter is just the usual process of landing changes in the right order (any analysis should be updated to match the canonical form…).
Actually without canonicalization, you are forcing the analysis to account for these to be equivalent. This is exactly what canonicalization solves: the optimizations does not need to know and can match only one form.
I don’t quite understand the argument here: for every single canonicalization you could say that “someone could add an infinite loop”.
This is the case with almost all canonicalization, back to Dan’s article which touches on all the ambiguity.
Right and if I remember correctly someone (@sogartar?) argument this point convincingly, because the “don’t drop semantics” part is a fundamental principle to follow.
I agree in principle, but this is a difficult one in practice because most of the time that requires adding a casts back to the dynamic type. This is because you can’t just update the result of an op without also updating all the users, otherwise you’ll break a lot of verifiers.
For example this discussion RFC: remove arith/math ops on tensors - #63 by kuhar have a consensus that people didn’t want to support arith.extsi %3 : tensor<12xi16> to tensor<?xi64>
, however that means that if the original form is:
arith.extsi %3 : tensor<?xi16> to tensor<?xi64>
, then inferring the shape on the operation that produces %3
can’t be a local decision otherwise it would produce the invalid arith.extsi
above.
It can still work but all the “shape inference patterns” need to look through the newly inserted cast, and there can be casts left behind which then isn’t clear it is desirable? (it can be argued that static->dynamic conversion in the type system is fine as separate cast).