Shifts that use only 5 LSBs.

I'm working on a Target that only uses the 5 lsbs of the shift amount.

I only have 32 bit registers, no 64 bit, so 64 bit math is emulated, LLVM doing the transformations whenever I can get it to.

I think I'm seeing a case where it ultimately looks like a standard multiword shift (from e.g. Hacker's Delight) is being inline expanded that assumes at least 6 bits of the shift is paid attention to (i.e. it looks like it assumes x >> 32 == 0 if x is 32 bits, but for me x >> 32 == x).

(A) What does LLVM assume about "shift width"?
(B) Is there a way I can tell it that I've only got 5 bits?

Thanks,

Dan

I'm working on a Target that only uses the 5 lsbs of the shift amount.

Okay, that's quite common... x86 is the same.

I only have 32 bit registers, no 64 bit, so 64 bit math is emulated,
LLVM doing the transformations whenever I can get it to.

x86 is the same.

I think I'm seeing a case where it ultimately looks like a standard
multiword shift (from e.g. Hacker's Delight) is being inline expanded
that assumes at least 6 bits of the shift is paid attention to (i.e.
it looks like it assumes x >> 32 == 0 if x is 32 bits, but for me x >>
32 == x).

(A) What does LLVM assume about "shift width"?

"If op2 is (statically or dynamically) negative or equal to or larger
than the number of bits in op1, the result is undefined." See
http://llvm.org/docs/LangRef.html#i_shl. Roughly, that means that
"lshr i32 %x, 32" is allowed to return any arbitrary value. If you
need consistent behavior, you should explicitly mask the shift amount
in the front-end.

-Eli

I’m working on a Target that only uses the 5 lsbs of the shift amount.

Okay, that’s quite common… x86 is the same.

Thanks - yes, I’d heard rumors that x86 operates the same way.

I only have 32 bit registers, no 64 bit, so 64 bit math is emulated,

LLVM doing the transformations whenever I can get it to.

x86 is the same.

Ah, maybe I should try my test below on x86, and see what happens. It’ll take me a bit as I’m not familiar with x86 assembly code.

I think I’m seeing a case where it ultimately looks like a standard

multiword shift (from e.g. Hacker’s Delight) is being inline expanded

that assumes at least 6 bits of the shift is paid attention to (i.e.

it looks like it assumes x >> 32 == 0 if x is 32 bits, but for me x >>

32 == x).

(A) What does LLVM assume about “shift width”?

“If op2 is (statically or dynamically) negative or equal to or larger
than the number of bits in op1, the result is undefined.” See
http://llvm.org/docs/LangRef.html#i_shl. Roughly, that means that
“lshr i32 %x, 32” is allowed to return any arbitrary value. If you
need consistent behavior, you should explicitly mask the shift amount
in the front-end.

The problem here is that it looks like LLVM is introducing an expansion that assumes 32 bit shifts use more than 5 bits of the shift value.

I created a simple test function:

u64 mebbe_shift( u64 x, int test )
{
if( test )

x <<= 2;

return x;
}

I compile using clang, opt, and llc.

I get something that, converted from my assembler to hasty psuedo-C:

u64 mebbe_shift( u64 x, int test )
{
int amt = test ? 2 : 0;

x.hi = x.hi << amt | x.lo >> (32 - amt);

x.lo <<= amt;

return x;

}

My Target doesn’t explicitly do any of these kinds of expansions or transformations, so it seems to me it’s somewhere in LLVM that this is happening.

I’ll investigate further.

Thanks,

Dan

P.S. what i’d like to get, for performance on my target, is something like:

u64 mebbe_shift( u64 x, int test )
{
y.hi = x.hi << 2 | x.lo >> 30;
y.lo = x.lo << 2;

x.hi = test ? y.hi : x.hi;

x.lo = test ? y.lo : x.lo;

return x;

}

Shifts are expensive but selects are cheap.

But I’d be happy to just understand where things are going wrong for now.

Ouch, that's nasty... I just filed 3225 – Wrong code legalizing 64-bit shift on x86.

-Eli

I can't find the bug you refer to.

Also, it doesn't have this problem in x86: it uses the shldl instruction.

PPC32, interestingly enough, generates something similar, but looks like it has extra instructions to or in what's guaranteed to be 0.

Reminding myself of some PPC assembler though, so I'm not 100%.

Thanks,

Dan

I can't find the bug you refer to.

Did the link not work? I'll try pasting it in again. In any case, I
checked in a fix disabling the broken optimization; try updating to
current SVN.

http://llvm.org/bugs/show_bug.cgi?id=3225

Also, it doesn't have this problem in x86: it uses the shldl
instruction.

Well, sort of... it turned out that it wasn't that hard to construct a
case that broke on x86.

-Eli

Got the update to LLVM.

Thanks,

Dan