llvm-gcc miscompilation or it's the gcc's rule?

Hi,

Here is C function:

uint64_t mul(uint32_t x, uint32_t y) {
return x * y;
}

current llvm-gcc-4.0 with -O3 will compile it to:

define i64 @mul(i32 %x, i32 %y) nounwind {
entry:
%tmp3 = mul i32 %y, %x ; [#uses=1]
%tmp34 = zext i32 %tmp3 to i64 ; [#uses=1]
ret i64 %tmp34
}

This seems incorrect. I think it should extend %x, %y to i64 first and then do the multiplication stuff.
Otherwise, the result may lose some bits if %x, %y are very big.

gcc seems have the same issue. Is this a bug or just gcc’s rule?

Thanks,
Sheng.

I don't think C has a way to express 32b x 32b -> 64b multiply, even though there is (on x86 anyway) a hardware instruction that does it.

The type of your expression (x * y) is still uint32_t. The implicit type coercion up to uint64_t as part of the return statement doesn't change this.

Right. llvm-gcc and GCC are correct. If you want this, use:

uint64_t mul(uint32_t x, uint32_t y) {
  return (uint64_t )x * y;
}

Your code generator should handle optimizing this to the simple instruction if your target has an instruction to do this. For example, the X86 backend produces:

_mul:
  movl 8(%esp), %eax
  mull 4(%esp)
  ret

-Chris

Zhou Sheng wrote:

Hi,

Here is C function:

uint64_t mul(uint32_t x, uint32_t y) {
  return x * y;
}

current llvm-gcc-4.0 with -O3 will compile it to:

define i64 @mul(i32 %x, i32 %y) nounwind {
entry:
    %tmp3 = mul i32 %y, %x ; <i32> [#uses=1]
    %tmp34 = zext i32 %tmp3 to i64 ; <i64> [#uses=1]
    ret i64 %tmp34
}

This seems incorrect. I think it should extend %x, %y to i64 first and then do the multiplication stuff.
Otherwise, the result may lose some bits if %x, %y are very big.

gcc seems have the same issue. Is this a bug or just gcc's rule?

This is not a bug, it's following the C standard. The multiplication of two uint32 is performed, computing an uint32 result and this result is then returned from the function which causes it to be promoted to uint64. Your code is equivalent to this:

uint64_t mul(uint32_t x, uint32_t y) {
   uint32_t result = x * y;
   return (uint64_t) result;
}

while what you wanted was this:

uint64_t mul(uint32_t x, uint32_t y) {
   return (uint64_t)x * (uint64_t)y;
}

m.