You understood correctly.
I have seen in the "instcombine" module that many times std::swap was used
for the same.
But it becomes too much repetitive and somewhat irritating to keep doing
this for every pattern.
Shouldn't we do this in more generic way where we do not have to worry
about swapping operands?
(How to do it - i don't know for now. I looked at the matchers
implementation for modification/improvement,
but didn't find any clue on it).
> It seems to me that we could rejigger Reassociate to reduce the number
of possibilities that InstCombine could expect.
Agree with David that infact it should be reassociate pass which should
handle this.
But following is something interesting and rather worrying things i have
found :
(I have omitted unimportant code and highlighted important one in
following example)
*e.x. *
((~A & B) | A) --> (A | B) ; Code is implemented for this already
*C code :*
suyog@suyog-Inspiron-N5010:~$ cat 1.c
#include<stdio.h>
int cal(int a, int b) {
*return ((~a & b) | a);*
}
int main(){
int a, b;
scanf("%d %d", &a, &b);
return cal(a,b);
}
LLVM IR at O0 :
suyog@suyog-Inspiron-N5010:~$ Open/rbuild/bin/clang -S -O0 -emit-llvm 1.c
; Function Attrs: nounwind
*define i32 @cal(i32 %a, i32 %b) #0 { %1 = xor i32 %a, -1 %2 = and i32
%1, %b %3 = or i32 %2, %a ret i32 %3*
}
; Function Attrs: nounwind
define i32 @main() #0 {
..
..
%4 = call i32 @cal(i32 %2, i32 %3)
ret i32 %4
}
*LLVM IR at instcombine :*
suyog@suyog-Inspiron-N5010:~$ Open/rbuild/bin/opt -S -instcombine 1.ll
*; Function Attrs: nounwinddefine i32 @cal(i32 %a, i32 %b) #0 { %1 = or
i32 %a, %b ret i32 %1}*
..
..
..
which is OK as per expected transform.
*LLVM IR at reassociate and instcombine : *suyog@suyog-Inspiron-N5010:~$
Open/rbuild/bin/opt -S -reassociate -instcombine 2.ll
*; Function Attrs: nounwinddefine i32 @cal(i32 %a, i32 %b) #0 { %1 = xor
i32 %a, -1 %2 = and i32 %b, %1 %3 = or i32 %2, %a ret i32 %3}*
..
..
Reassociate converted (~a&b) to (b&~a) and instruction combine failed to
optimize.
As evident, sometimes, reassociate pass actually harms the optimization
opportunity !!
Some more interesting findings on how can this affect the final machine
code:
*Test case :*
1.c :
#include<stdio.h>
int cal(int a, int b) {
*return ((~a & b) | a);*
}
int main(){
int a, b;
scanf("%d %d", &a, &b);
return cal(a,b);
}
*X86 .s file with clang at O2 for above program :*
suyog@suyog-Inspiron-N5010:~$ Open/rbuild/bin/clang -S -O2 1.c
main: # @main
# BB#0:
subl $28, %esp
leal 20(%esp), %eax
movl %eax, 8(%esp)
leal 24(%esp), %eax
movl %eax, 4(%esp)
movl $.L.str, (%esp)
calll __isoc99_scanf
movl 20(%esp), %eax
* orl 24(%esp), %eax*
addl $28, %esp
retl
As seen, optimization happened at IR level itself reflected in .s file.
*GCC output for the same:*
suyog@suyog-Inspiron-N5010:~$ gcc -S -O2 1.c
main:
.LFB23:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
andl $-16, %esp
subl $32, %esp
leal 28(%esp), %eax
movl %eax, 8(%esp)
leal 24(%esp), %eax
movl %eax, 4(%esp)
movl $.LC0, (%esp)
call __isoc99_scanf
movl 24(%esp), %eax
* orl 28(%esp), %eax*
leave
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
GCC also did the optimization.
Now we just *slightly flip* the test case :
*1.c Test case:*#include<stdio.h>
int cal(int a, int b) {
*return ((b & ~a) | a);*
}
int main(){
int a, b;
scanf("%d %d", &a, &b);
return cal(a,b);
}
*X86 .s file with clang at O2 for above program :*
suyog@suyog-Inspiron-N5010:~$ Open/rbuild/bin/clang -S -O2 1.c
main: # @main
# BB#0:
subl $28, %esp
leal 20(%esp), %eax
movl %eax, 8(%esp)
leal 24(%esp), %eax
movl %eax, 4(%esp)
movl $.L.str, (%esp)
calll __isoc99_scanf
movl 24(%esp), %ecx
movl %ecx, %eax
*notl %eax andl 20(%esp), %eax orl %ecx, %eax*
addl $28, %esp
retl
We lost the Optimization opportunity here !!
*GCC output for the same:*
suyog@suyog-Inspiron-N5010:~$ gcc -S -O2 1.c
main:
.LFB23:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
andl $-16, %esp
subl $32, %esp
leal 28(%esp), %eax
movl %eax, 8(%esp)
leal 24(%esp), %eax
movl %eax, 4(%esp)
movl $.LC0, (%esp)
call __isoc99_scanf
movl 24(%esp), %eax
* orl 28(%esp), %eax*
leave
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
GCC still optimized it !!
Clearly evident that llvm is losing because of naive approach of pattern
matching.
How can we improve this so that LLVM also recognizes
commutative/associative
pattern efficiently with single 'matchers' call? I am not having any idea
so far.
I will think more on it. Any help would be really helpful.
Any suggestions/comments/rectification are most awaited 