Improving i128 division on x86_64

Hi LLVM,

I think the potential of divq instruction is not fully exploited by X86 target.

The following IR can be lowered to one divq instruction:

define i64 @div128by64lo(i128 %d, i64 %n) nounwind readnone {
%m = zext i64 %n to i128
%q = udiv i128 %d, %m
%q.l = trunc i128 %q to i64
ret i64 %q.l
}

And that one can be 2 divq instructions:

define i128 @div128by64full(i128 %d, i64 %n) nounwind readnone {
%m = zext i64 %n to i128
%q = udiv i128 %d, %m
ret i128 %q
}

In current implementation, everywhere where i128 type shows up codegen generates a call to __udivti3 builtin function.

Am I missing something?

  • PaweĊ‚

I also found a GCC bug report about that:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58897