Hi folks,

I couldn’t find a specific XOR (OR and AND) optimization on llvm, and therefore I am about to implement it.

But first I would like to check with you guys that it really does not exist.

For a simple loop like this:

nbits = 128;

bit_addr = 0;

while(nbits–)

{

bindex=bit_addr>>5; /* Index is number /32 */
bitnumb=bit_addr % 32; /* Bit number in longword */

bitmap[bindex]^=(1L<<bitnumb);

bit_addr++;

}

The -O3 set of optimizations generates a code like this:

entry:

br label %while.body

while.body: ; preds = %while.body, %entry

%0 = phi i32 [ 0, %entry ], [ %inc.3, %while.body ]

%shr = lshr i32 %0, 5

%rem = and i32 %0, 28

%shl = shl i32 1, %rem

%arrayidx = getelementptr inbounds i32* %bitmap, i32 %shr

%tmp6 = load i32* %arrayidx, align 4

%xor = xor i32 %tmp6, %shl

%rem.1 = or i32 %rem, 1

%shl.1 = shl i32 1, %rem.1

%xor.1 = xor i32 %xor, %shl.1

%rem.2 = or i32 %rem, 2

%shl.2 = shl i32 1, %rem.2

%xor.2 = xor i32 %xor.1, %shl.2

%rem.3 = or i32 %rem, 3

%shl.3 = shl i32 1, %rem.3

%xor.3 = xor i32 %xor.2, %shl.3

store i32 %xor.3, i32* %arrayidx, align 4

%inc.3 = add i32 %0, 4

%exitcond.3 = icmp eq i32 %inc.3, 128

br i1 %exitcond.3, label %while.end, label %while.body

while.end: ; preds = %while.body

ret void

It is clear that we are able to fold all XORs into a single XOR, and the same happens to all SHLs and ORs.

I am using -O3, but the code is not optimized, so I am assuming there is no optimization for this case. Am I correct?

If yes, I have a few other questions:

- Do you know of any other similar optimization that could do something here but is not being triggered for some reason??
- Do you know why a OR instruction is used for increments? instead of using a INC or ADD?
- Is there a straight forward way to know if an instruction belongs to a loop? (just curiosity)

Thanks very much

Daniel Nicacio