Looking at foldSelectCttzCtlz, I noticed that the optimization fails to apply in cases where it should. E.g. it triggers for:
int f1(uint64_t n) { return n == 0 ? 64 : __builtin_ctzll(n); }
but not for:
int f2(uint64_t n) { uint64_t cn = ~n; return cn == 0 ? 64 : __builtin_ctzll(cn);}
And then looking further, I noticed that the same problem affects many of the other (maybe even most?) optimizations for select in InstCombine. E.g. foldSelectZeroOrMul
should trigger on int f3(int n, int m) { int cn = ~n; return cn == 0 ? 0 : cn*m; }
but doesn’t, for the same reason.
The reason why this doesn’t work seems to be a combination of inflexibility in pattern-matching, and the order in which instructions are folded.
Going back to f2
, before InstCombine the IR looks like:
%neg = xor i64 %n, -1
%cmp = icmp eq i64 %neg, 0
%0 = call i64 @llvm.cttz.i64(i64 %neg, i1 false)
%cast = trunc i64 %0 to i32
%cond = select i1 %cmp, i32 64, i32 %cast
And, as written, this should trigger the optimization. However, InstCombine visits the icmp instruction FIRST before it gets to the select (where the cttz optimization triggers) – and folds the icmp into the equivalent instruction:
%cmp = icmp eq i64 %n, -1
Then, this no longer meets the required pattern, because %neg
is not the first arg of icmp (to match the first arg of @llvm.cttz
), and the constant arg is -1
rather than the expected 0
.
Is this problem already being tracked? I didn’t find a pre-existing report, but it’s hard to search for. Is there some good way to solve this issue in general? E.g. to allow matching up against an arbitrarily-folded icmp, or to defer icmp folding until after running the select
pattern matchers, or…something?