I’d like to ask if there are any thoughts on creating an AArch64 specific pass similar to LoopIdiomRecognize, specifically for loops with early exits which currently limits SLP and loop vectorisation from handling them.
We have identified some loops that we know at least a few workloads would benefit from if we were able to use SVE’s unique predication features. One example is with the following loop:
while (i != max_len) if (a[i] != b[i]) break;
This is similar to a memcmp, but is slightly different because instead of returning the difference between the values of the first pair of bytes which did not match, it returns the index of the first mismatch. We observe a significant performance improvement by replacing this with a specialised predicated SVE loop. We believe this would generally be beneficial; this pattern appears several times in the xz benchmark and in the LLVM test suite (7zip).
The LoopIdiomRecognise pass seems like a good candidate for matching the loop, however as the transform depends on the target having the right features to optimise this efficiently I don’t think it would be an acceptable place to implement this. We also would not be able to lower to a memcmp call as mentioned above. As there is a HexagonLoopIdiomRecognition pass, we are considering adding an AArch64 specific pass which we could also extend to work for similar loops such as std::find which is a more common idiom and we know is used heavily by the xalancbmk benchmark.
I’d like to check first if there are any concerns with this approach and any comments would be appreciated.