How to handle UMULO?

Hi All,

While compiling libgcc, I find I have to deal with UMULO (overflow-aware unsigned multiplication) SDNode. UMULO returns the result of multiplication, and a boolean indicating overflow occurred or not. Our target’s multiply instruction doesn’t care (detect) overflow. I am wondering if I can always set the boolean to false. I am not sure about this as I see AArch64 [1] seems trying to emulate the overflow behavior.

Thanks.

[1] https://github.com/llvm-mirror/llvm/blob/master/lib/Target/AArch64/AArch64ISelLowering.cpp

​Regards,
chenwj​

I think your users will be very upset if you don’t set the boolean return value correctly :slight_smile:

Whatever work it takes to determine the correct value for it, if the user code doesn’t need/use that value then the dead code will be eliminated later. But if they need that return flag then they will want it to be correct!

You may need to use a multiply instruction that returns a double-register result, or an instruction that returns only the upper half of the result. Or you might need to widen the operands and do a full double-width multiply. Or you might need to narrow the operands into two halves, do four (or three) multiplies and some shifts and adds (again detecting carry/overflow, but that’s easier for addition).

It all depends on what instructions your target has.

If your target has a cheap count-leading-zeros instruction, you’d be able to determine whether an unsigned multiply will overflow or not, in most cases, without doing the long version of the multiplication. This is because an N-bit number times an M-bit number will produce a result that is either N+M bits wide or N+M-1 bits wide. If an N+M bit result will fit in your result type, you are guaranteed the boolean result is false; if an N+M-1 bit result does not fit in your result type, you are guaranteed the boolean result is true.

You still need to do the more complicated check in the boundary cases, I think.

–paulr

Even if you do a long multiply, you can still determine whether or not it overflows using at most two multiplies plus a little bit of masking, shifting, and adding.

For example, if you want to know if a uint_16 * unit_16 will overflow, and you don’t have an overflow flag or any form of 16x16 → 32 or 16x16 → hi(16) multiply…

// program to check an algorithm to detect overflow in unsigned multiply

#include <stdio.h>
#include <stdint.h>

bool umulo(uint16_t a, uint16_t b){
uint8_t ah = a>>8, al = a & 0xff, bh = b>>8, bl = b & 0xff, mid;
if (ah == 0){
if (bh == 0) return false;
uint16_t albh = al * bh;
mid = albh & 0xff;
if (mid != albh) return true;
} else { // ah != 0
if (bh != 0) return true;
uint16_t ahbl = ah * bl;
mid = ahbl & 0xff;
if (mid != ahbl) return true;
}
uint16_t alblh = (al * bl) >> 8;
return (mid + alblh) > 0xff;
}

int main(){
int failures = 0;
for (uint32_t a=0; a<0x10000; ++a){
for (uint32_t b=0; b<0x10000; ++b){
bool ov = (a * b) > 0xffff;
if (ov != umulo(a, b)){
printf(“Error for %d * %d\n”, a, b);
+failures;
}
}
}
printf("%d failures\n", failures);
return 0;
}

That runs an exhaustive test with no failures in 4.25 seconds with -O3 on my machine. Or about 9 seconds with -0, -0s, or -02