XOR optimization

Hi folks,

I couldn’t find a specific XOR (OR and AND) optimization on llvm, and therefore I am about to implement it.
But first I would like to check with you guys that it really does not exist.

For a simple loop like this:

nbits = 128;
bit_addr = 0;
while(nbits–)
{
bindex=bit_addr>>5; /* Index is number /32 /
bitnumb=bit_addr % 32; /
Bit number in longword */
bitmap[bindex]^=(1L<<bitnumb);
bit_addr++;
}

The -O3 set of optimizations generates a code like this:

entry:
br label %while.body

while.body: ; preds = %while.body, %entry
%0 = phi i32 [ 0, %entry ], [ %inc.3, %while.body ]
%shr = lshr i32 %0, 5
%rem = and i32 %0, 28
%shl = shl i32 1, %rem
%arrayidx = getelementptr inbounds i32* %bitmap, i32 %shr
%tmp6 = load i32* %arrayidx, align 4
%xor = xor i32 %tmp6, %shl
%rem.1 = or i32 %rem, 1
%shl.1 = shl i32 1, %rem.1
%xor.1 = xor i32 %xor, %shl.1
%rem.2 = or i32 %rem, 2
%shl.2 = shl i32 1, %rem.2
%xor.2 = xor i32 %xor.1, %shl.2
%rem.3 = or i32 %rem, 3
%shl.3 = shl i32 1, %rem.3
%xor.3 = xor i32 %xor.2, %shl.3
store i32 %xor.3, i32* %arrayidx, align 4
%inc.3 = add i32 %0, 4
%exitcond.3 = icmp eq i32 %inc.3, 128
br i1 %exitcond.3, label %while.end, label %while.body

while.end: ; preds = %while.body
ret void

It is clear that we are able to fold all XORs into a single XOR, and the same happens to all SHLs and ORs.
I am using -O3, but the code is not optimized, so I am assuming there is no optimization for this case. Am I correct?

If yes, I have a few other questions:

  • Do you know of any other similar optimization that could do something here but is not being triggered for some reason??
  • Do you know why a OR instruction is used for increments? instead of using a INC or ADD?
  • Is there a straight forward way to know if an instruction belongs to a loop? (just curiosity)

Thanks very much

Daniel Nicacio

You mean "bit_addr++;" ?

It's add'ing (+4 due to loop unrolling i guess)

   %inc.3 = add i32 %0, 4

I think he mean the intermediate steps:

%rem.1 = or i32 %rem, 1
%rem.2 = or i32 %rem, 2
%rem.3 = or i32 %rem, 3

When calculating bit_addr % 32. Inside the loop, %rem should be incremented because of the loop unrolling, but instead, the optimizer thought it’s easier to OR it. This gives the same result, but it’s weird.

2011/7/26 FlyLanguage <flylanguage@gmail.com>