[PATCH] Emit rbit, clz on ARM for __builtin_ctz

Hi,

On ARMv6T2 this turns cttz into rbit, clz instead of the 4 instruction sequence it is now.

I'm not sure if adding RBIT to ARMISD and doing this optimization in the legalize pass is the best option, but the only better way I could think of doing it was to add a bitreverse intrinsic to llvm ir, which itself might not be the best option since bitreverse probably isn't too common.

Other targets that I know of that could potentially benefit from this optimization being global (that have a clz and bitreverse instruction but not ctz) are AVR32 and C64x, neither of which llvm has backends for yet.

llvm-ctz-arm.diff (5.04 KB)

Hi,

On ARMv6T2 this turns cttz into rbit, clz instead of the 4 instruction sequence it is now.

I'm not sure if adding RBIT to ARMISD and doing this optimization in the legalize pass is the best option, but the only better way I could think of doing it was to add a bitreverse intrinsic to llvm ir, which itself might not be the best option since bitreverse probably isn't too common.

I haven't looked at the patch in detail, but this approach makes sense to me.

Other targets that I know of that could potentially benefit from this optimization being global (that have a clz and bitreverse instruction but not ctz) are AVR32 and C64x, neither of which llvm has backends for yet.

When/if another target wants this, we could add a ISD::RBIT operation, it doesn't need to be added at the llvm ir level,

-Chris

The XCore also has ctlz and bitreverse instructions and not cttz. At the moment in the XCore backend cttz is marked as legal and expanded to this pair of instructions in a pattern in the InstrInfo.td.

Bit reversal turns up in most FFT algorithms, so it wouldn't hurt to
be able to add an instcombine that recognizes it, etc.

deep

In that case, perhaps it makes sense to add it as an ISD::RBIT operation straight away.

The rest of the patch looks good to me.

-Jim

Hi,

On ARMv6T2 this turns cttz into rbit, clz instead of the 4
instruction sequence it is now.

I'm not sure if adding RBIT to ARMISD and doing this optimization in
the legalize pass is the best option, but the only better way I
could think of doing it was to add a bitreverse intrinsic to llvm
ir, which itself might not be the best option since bitreverse
probably isn't too common.

I haven't looked at the patch in detail, but this approach makes sense
to me.

Other targets that I know of that could potentially benefit from
this optimization being global (that have a clz and bitreverse
instruction but not ctz) are AVR32 and C64x, neither of which llvm
has backends for yet.

When/if another target wants this, we could add a ISD::RBIT operation,
it doesn't need to be added at the llvm ir level,

Bit reversal turns up in most FFT algorithms, so it wouldn't hurt to
be able to add an instcombine that recognizes it, etc.

I agree with Chris it doesn't make sense to add a llvm instruction for this since it's rare. But it's something that can be recognized in dag combine / isel. Can you attach some examples?

Evan

Other targets that I know of that could potentially benefit from
this optimization being global (that have a clz and bitreverse
instruction but not ctz) are AVR32 and C64x, neither of which llvm
has backends for yet.

When/if another target wants this, we could add a ISD::RBIT
operation,
it doesn't need to be added at the llvm ir level,

The XCore also has ctlz and bitreverse instructions and not cttz. At
the moment in the XCore backend cttz is marked as legal and expanded
to this pair of instructions in a pattern in the InstrInfo.td.

In that case, perhaps it makes sense to add it as an ISD::RBIT
operation straight away.

Since only a couple of targets can use this, it shouldn't block this patch from going in. Jim, can you commit this?

Thanks,

Evan

Works for me. Done in r93758.

Thanks for doing this, David.

-Jim

Blackfin can add with backwards carry, essentially doing

(rbit (add (rbit a), (rbit b)))

This is used for FFTs.

I wasn't hoping to be able to pattern-match something so complicated.

Feel free to add target intrinsics where appropriate...

-Eli