llvm-gcc promotes i32 mul to i64 inside __muldi3

I'm building tool-chain for processor without integer MUL.
So, I've defined __mulsi3 for integer multiplication (int32).

Now I've got a problem with int64 multiplication which is implemented
in libgcc2.c.
Segfualt due to infinite recursion in i64 soft multiplication
(libgcc2, __muldi3).

LLVM-GCC (for my target) misoptimizes code if -O2 is passed.
It promotes i32 multiplication to int64 multiplication and as the
result my back-end generates __muldi3 call.

I would appreciate if someone can point out where this promotion happens.
And how can I disable it? Thanks in advance.

... some code examples. __muldi3 (defined in llvm-gcc/gcc/libgcc2.c):
w.s.high += ((UWtype) uu.s.low * (UWtype) vv.s.high + (UWtype)
uu.s.high * (UWtype) vv.s.low);

llvm asm compiled with llvm-gcc -O2:
%20 = zext i32 %19 to i64 ; <i64> #uses=1
%21 = zext i32 %16 to i64 ; <i64> #uses=1
%22 = mul i64 %sroa.store.elt3, %u ; <i64> #uses=1
%23 = mul i64 %sroa.store.elt, %v ; <i64> #uses=1
%24 = add i64 %22, %23 ; <i64> #uses=1
%25 = add i64 %24, %21 ; <i64> #uses=1
%26 = shl i64 %25, 32 ; <i64> #uses=1

Regards,
Sergey Y.

See http://llvm.org/bugs/show_bug.cgi?id=3101 which is essentially the
same issue.

-Eli

Alpha has the same problem with 128 Ints, in about the same functions.

Thanks, yes, I'm facing the same issue.

Hm... seems there are no simple fixes.
I have to do one more i64 mul implementation to workaround aggressive
optimizations.
Is that correct? Is this the only way?

Can I disable only one particular pass which does this promotion from
i32 to i64 using some LLVM-GCC option?

Are there other libgcc functions affected by this optimization?

Regards,
Sergey Y.

Thanks, yes, I'm facing the same issue.

Hm... seems there are no simple fixes.
I have to do one more i64 mul implementation to workaround aggressive
optimizations.
Is that correct? Is this the only way?

This shouldn't be necessary, IMO. If you were going to implement it,
then the correct thing to do would be to have generic selection dag
lowering of large multiplies, which renders the library mostly
useless.

Can I disable only one particular pass which does this promotion from
i32 to i64 using some LLVM-GCC option?

Not easily as far as I know.

Are there other libgcc functions affected by this optimization?

Any soft int stuff that lowering hasn't already implemented.

Andrew

This shouldn't be necessary, IMO. If you were going to implement it,
then the correct thing to do would be to have generic selection dag
lowering of large multiplies, which renders the library mostly
useless.

In fact, I would prefer to avoid custom lowering for operations on large types.
i64 will be rare in my case (embedded) and their performance is not an issue.
I need basic i64 support only for functional correctness.

Any soft int stuff that lowering hasn't already implemented.

Taking into account known issues with soft int staff...
Why expansion of large MULs/etc into smaller ones is not default
behavior of ExpandNode (LegalizeDAG)?
(seems meaningful if i32 operation is Legal or Custom)
Are there any plans to add that?

Regards,
Sergey Y.

Hi,

LLVM mis-compiles soft int64 mul '__muldi3' (either libgcc or
compiler-rt) unless some specific efforts are taken in the back-end to
custom lower i64 operations back to i32.

Issue appears also in CellSPU/Alpha, and there exist workarounds which
use custom lowering to vector instructions.

My case is different.

Deeply embedded processors have optional multiplier unit (e.g.
disabled at design time). Thus, efficient and compact MUL
implementation is not available, i32 mul is implemented as __mulsi3 in
libgcc.

Seems retargetable back-end requires additional efforts in this case:
either to rewrite __muldi3 source code to avoid mul promotion or to
custom lower i64 mul.

Am I missing someting?

Does this problem affects existing back-ends like PIC16?

Could you please help with some hints. How to implement mul i64
lowering to mul i32 (which must be replaced with __mulsi3)? are there
any examples in existing back-ends?

Thanks,
Sergey Y.