rotate

in C or C++, how can I get clang/llvm to try and do a "rotate".

(want to test this code in the mips16 port)

i.e. emit rotr node.

tia.

reed

I can get clang/llvm to emit a rotate instruction on x86-64 when compiling C by just using -Os and the rotate from Hacker's Delight i.e.,

Nice!

Clever compiler..

*NOTE* IIRC compiling this with -O0 on x86-64 can yield the wrong result since clang will emit shifts and on intel shifts are mod the register size:

*NOTE* IIRC compiling this with -O0 on x86-64 can yield the wrong result
since clang will emit shifts and on intel shifts are mod the register
size [...snip...]

I remember finding the same thing (although I haven't tried it on a recent
clang version) and what I wondered was whether there was mileage in having
an explicit intrinsic for rotation (like there is for bit counting, as in
__builtin_clz and __builtin_ctz and so on). Microsoft's compiler has an
explicit intrinsic in the form of _rotl8 and _rotl16 (IIRC -- this is from
memory!). It would be nice to have a __builtin_rotl family in clang, in
my opinion, but it would need back-end support from llvm. I would expect
it to find use in code relating to hashing and cryptology. I know that
the compiler *should* optimise

uint32_t ror(uint32_t input, size_t rot_bits) {
  return (input >> rot_bits) | (input << ((sizeof(input) << 3) - rot_bits));
}

and such macros / inline functions are common, but an intrinsic specifies
intent better and could provide for better code optimisation.

Would this be worthwhile pursuing?

Cheers
Andy

After some reflection, I believe the problem was actually with shld/shrd and a similar
bit of C code.

Hey Andy,

I proposed a similar patch to LLVM (left circular shift) around 10/2011. Parts of my patch did make it into trunk about a year after, but others did not.

At that time, my solution was to add a binary operator to the IRBuilder, since LCS fits in nicely with the other shift operators. But, it is quite cumbersome to merge :*(. I would be happy to resend the original patch if you’d like.

-Cameron

Yes, I would be interested. Thank you! I don't know if was rejected before
that I'll have any better luck this time, but I can try...

Andy

Oh, no. I should have been more clear. The patch was not rejected, just lost in the daily shuffle.

I already have my employer’s approval to send this upstream, so I will prepare a patch against trunk this morning.

I proposed a similar patch to LLVM (left circular shift) around 10/2011.
Parts of my patch did make it into trunk about a year after, but others
did not.

And now that I reread this… it should have been “month after”, not “year after”.

The original code is depending on UB, if it doesn't mask.

Joerg

Andy,

Here is the left circular shift operator patch. I apologize to the reviewer in advance. The patch has a good bit of fine detail. Any comments/criticisms?

Some caveats…

  1. This is just the bare minimum needed to make the left circular shift operator work (e.g. no instruction combining).

  2. I tried my best to select operator names in the existing style; please feel free to change them as appropriate.

Ty,
Cameron

CShl.txt (14.6 KB)

We intentionally haven't included a rotate instruction in LLVM in the
past; the justification is that it's generally straightforward for the
backend to form rotate operations, and making the optimizer
effectively handle the new rotation instruction adds a substantial
amount of complexity. You're going to need to make a strong argument
that the current approach is insufficient if you want to commit a
patch like this.

-Eli

Well, we like the operator because it maps cleanly to Fortran’s ISHFTC intrinsic. There must also be other compilers out there, maybe catering to cryptos, that have their own intrinsic for circular shift in other languages. It seems wasteful for an optimizer to break apart an intrinsic into its elemental pieces in order for LLVM to put them back together. This was done in our compiler for some time and it cluttered up the interface code.

Just curious… what kind of optimizations are done on ISD::ROTL/ROTR? We’re able to preform certain InstCombines and other peeps when we see a binary operator. I do not have any experience trying to optimize ISD::ROTL.

This is possibly inconsequent, but Microsoft's compiler for one has rotate
intrinsics... (just my 2 cents!)

Andy

Well,

I believe something is currently broken with respect to forming rotate
instructions :

For example, using a recent clang/llvm on linux/x86_64 :

uint32_t ror32(uint32_t input, size_t rot_bits) {
  return (input >> rot_bits) | (input << ((sizeof(input) << 3) - rot_bits));
}

uint32_t rol32(uint32_t input, size_t rot_bits) {
  return (input << rot_bits) | (input >> ((sizeof(input) << 3) - rot_bits));
}

gives the expected ror and rol instructions, but their 16bits counter
parts :

uint16_t ror16(uint16_t input, size_t rot_bits) {
  return (input >> rot_bits) | (input << ((sizeof(input) << 3) - rot_bits));
}

uint16_t rol16(uint16_t input, size_t rot_bits) {
  return (input << rot_bits) | (input >> ((sizeof(input) << 3) - rot_bits));
}

fail miserably :

        .globl ror16
        .align 16, 0x90
        .type ror16,@function
ror16: # @ror16
        .cfi_startproc
# BB#0: # %entry
        movb %sil, %cl
        movl %edi, %eax
        shrl %cl, %eax
        movl $16, %ecx
        subl %esi, %ecx
                                        # kill: CL<def> CL<kill> ECX<kill>
        shll %cl, %edi
        orl %eax, %edi
        movzwl %di, %eax
        ret
.Ltmp2:
        .size ror16, .Ltmp2-ror16
        .cfi_endproc

        .globl rol16
        .align 16, 0x90
        .type rol16,@function
rol16: # @rol16
        .cfi_startproc
# BB#0: # %entry
        movb %sil, %cl
        movl %edi, %eax
        shll %cl, %eax
        movl $16, %ecx
        subl %esi, %ecx
                                        # kill: CL<def> CL<kill> ECX<kill>
        shrl %cl, %edi
        orl %eax, %edi
        movzwl %di, %eax
        ret
.Ltmp3:
        .size rol16, .Ltmp3-rol16
        .cfi_endproc

At a quick first glance, this seems to be related to the values being
promoted from i16 to i32 in the IR optimization passes, but this may not
be the only reason.

Why do we need a new builtin for rotates? What problem does it solve?

-Chris

The initial inspiration for my local mod was to create a way for our optimizer to get at ISD::ROTL directly. This would allow us to always produce an x86 rol instruction for a user intrinsic call, even at -O0.

I suppose an LLVM intrinsic would do the same thing. But, it is nice to have an operator so LLVM can optimize common cases, like cshift by a constant. I personally feel that it fits in with LSHL, LSHR, and ASHR.

That being said, I’d be just as happy with any other way to convey the same functionality. It just seems less nice to have to unfold a circular shift intrinsic into the Hacker’s Delight algorithm (5 or more operations) at our opt/LLVM interface, just to have LLVM try to put it back together.