On Improving Arm SME Lowering Resilience in MLIR

Thanks for the post! Great to see more people using SME and scalable vectors!

If I’m parsing the IR correctly, I can see that you are vectorizing K (the reduction dimension) with size 1, which means we are only reducing a single element, making the reduction redundant. I would expect canonicalization to remove such a reduction, in the same way it gets rid of other trivial or redundant computation. If you make the K vector size greater than 1, no canonicalization should happen. This is the intended behavior, IMO.

Perhaps the problem lies in the expectations of having a “low-level” canonical form for a matmul computation once we have lost the structural information that higher level ops provide (e.g., linalg.matmul, vector.contract…), which leads to a deeper issue:

This lowering path, which exists for historical reasons, has been serving our goals so far but it’s not the first time that it has caused problems. As mentioned multiple times in the past, the vector.multi_reduction -> vector.contract step should be removed in favor of direct linalg.matmul -> vector.contract direct lowering. The current step is not a lowering but a raise in abstraction and, as such, a certain level of complexity in the pattern matching is expected. This is the actual problem that should be fixed, IMO.

This looks like an easy-to-fix bug and you pointed to the solution: the vectorizer shouldn’t try to vectorize an op that is already vectorized. Both the mask op and the masked op should have vector types so they shouldn’t pass the vectorizer’s preconditions. I’m also wondering what is the vectorization path that such masked operation is taking, as we mostly vectorize linalg ops and the masked operation shouldn’t be a linalg op at that point since it has been vectorized already. In any case, it should be a matter of bailing out from vectorization if any of the operand/result types of the candidate op has a vector type. It would be good to open a Github issue with a concrete reproducer and intermediate IR dumps.

This is one of the steps we planned to implement as part of the dynamic vector proposal so I would support this if it’s generic/reusable enough :slight_smile: