I couldn’t find a specific XOR (OR and AND) optimization on llvm, and therefore I am about to implement it.
But first I would like to check with you guys that it really does not exist.
For a simple loop like this:
nbits = 128;
bit_addr = 0;
bindex=bit_addr>>5; /* Index is number /32 /
bitnumb=bit_addr % 32; / Bit number in longword */
The -O3 set of optimizations generates a code like this:
br label %while.body
while.body: ; preds = %while.body, %entry
%0 = phi i32 [ 0, %entry ], [ %inc.3, %while.body ]
%shr = lshr i32 %0, 5
%rem = and i32 %0, 28
%shl = shl i32 1, %rem
%arrayidx = getelementptr inbounds i32* %bitmap, i32 %shr
%tmp6 = load i32* %arrayidx, align 4
%xor = xor i32 %tmp6, %shl
%rem.1 = or i32 %rem, 1
%shl.1 = shl i32 1, %rem.1
%xor.1 = xor i32 %xor, %shl.1
%rem.2 = or i32 %rem, 2
%shl.2 = shl i32 1, %rem.2
%xor.2 = xor i32 %xor.1, %shl.2
%rem.3 = or i32 %rem, 3
%shl.3 = shl i32 1, %rem.3
%xor.3 = xor i32 %xor.2, %shl.3
store i32 %xor.3, i32* %arrayidx, align 4
%inc.3 = add i32 %0, 4
%exitcond.3 = icmp eq i32 %inc.3, 128
br i1 %exitcond.3, label %while.end, label %while.body
while.end: ; preds = %while.body
It is clear that we are able to fold all XORs into a single XOR, and the same happens to all SHLs and ORs.
I am using -O3, but the code is not optimized, so I am assuming there is no optimization for this case. Am I correct?
If yes, I have a few other questions:
- Do you know of any other similar optimization that could do something here but is not being triggered for some reason??
- Do you know why a OR instruction is used for increments? instead of using a INC or ADD?
- Is there a straight forward way to know if an instruction belongs to a loop? (just curiosity)
Thanks very much