[RFC] Add support for division of large _BitInt (builtins, SelectionDAG/GlobalISel, clang)

Hi!

We have patches to add support for large _BitInt divisions, so that

_BitInt(129) square(_BitInt(129) a, _BitInt(129) b) {
    return a/b;
}

compiles succesfully. Compiling this with LLVM main currently
fails in SelectionDAG because it cannot find a matching library function in ExpandIntRes_SDIV,
see Integer division of types larger than 128-bits reports a fatal error in the DAG legalizer · Issue #44994 · llvm/llvm-project · GitHub.

This is achieved by

  • adding new library functions into compiler-rt
  • emitting calls to those in SelectionDAG and GlobalISel
  • providing an new library, ‘bitint’, to provide those also when using libgcc
  • unconditionally linking to the ‘bitint’ library in the clang driver

The new library functions are

void __udivei4(unsigned int *quo, unsigned int *a, unsigned int *b,
               unsigned int bits);

void __umodei4(unsigned int *rem, unsigned int *a, unsigned int *b,
               unsigned int bits);

void __divei4(unsigned int *quo, unsigned int *a, unsigned int *b,
              unsigned int bits);

void __modei4(unsigned int *rem, unsigned int *a, unsigned int *b,
              unsigned int bits);

This also lifts the bit size limit on _BitInt (currently 128 bits),
which was introduced because division did not work for larger _BitInts.

The implementation of the division and remainder are not tuned for performance;
they use Knuth’s division algorithm which is also used in Support/APInt.cpp.

We would like to ask for your feedback on this approach in general
and on the following design choices in particular:

  1. The new library functions are added to the builtins library.
    We also add them with weak linkage in a separate library
    (‘bitint’) to make them available even when using libgcc (which is often the default).
    Once those functions are widely available in libgcc and compiler-rt,
    the bitint library can be removed again.

  2. How do new library functions usually get synchronized between libgcc and
    compiler-rt?

  3. The clang driver links to the bitint library automatically
    unless -nodefaultlibs is specified, and independent on whether compiler-rt
    or libgcc are used as runtime.

  4. The clang driver only links to the bitint library in the GNU and MinGW
    toolchains. We don’t have much experience with the other toolchains.

  5. The implementation of the division uses malloc() to allocate scratch space.
    There is precedence of using malloc in builtins/emutls.c, so this seems
    acceptable. As said before, our main focus is to make the code compile and not
    yet on optimizing the runtime performance.

  6. clang currently enforces maximum number of bits on a _BitInt of 128, which
    was motivated by the lack of larger divisions in the backend. Now that those
    are available, we wonder what a good upper limit would be. I tested a few
    cases and I have not seen any errors in the backend. But large bitwidths take
    a long time to compile: A “udiv i4096000 %a, %b” takes ~ 1h in SelectionDAG.

Best wishes,
Matthias Gehre

Senior Staff Software Engineer| AMD
Adaptive Computing Tools

2 Likes

Thank you so much for doing this!

Regarding #6: My first blush is to remove the limit back to the llvm-mandated max, but document that the time to compile is very large (and likely runtime). I’d imagine that if a person really needs that many bits that they’d rather spend a long time compiling rather than not being able to accomplish their task.

As far as the interface: On 64 bit machines, would it be more efficient to use unsigned long-long to do the math at a larger width, or is there reason to use int?

Nice! I was also looking into this issue but didn’t arrive at a final proposal. My implementing direction and the lib API are very similar to yours. An example library function support would be like this: Compiler Explorer. It is based on a more weak but simple shift-sub algorithm. I’m more interested in what the backend codegen can do or improve in this issue. Are you going to upload the patches to https://reviews.llvm.org? I can help to do some review if need. Thank you again for driving this issue forward!

Thank you a lot for the feedback!
I’ll add you as reviewers when I upload the patches.

The first review for the compiler-rt changes is now here: ⚙ D120327 compiler-rt: Add udivmodei5 to builtins and add bitint library

SelectionDAG changes are in :gear: D120329 [SelectionDAG] Emit calls to __divei4 and friends for division/remainder of large integers (llvm.org)

si di ti suffixes in libgcc.a and libgcc_s.so.1 function names are related to Machine Modes (GNU Compiler Collection (GCC) Internals)
I am concerned whether GCC may reserve ei for other things, given that many letters from the [a-z] range have been taken.

Hope @urnathan knows how to talk to GCC for the namespace issue :slight_smile:

@MaskRay, any idea who I could ping to figure out how those names are reserved for gcc? Or should we alternatively take a name that is not in GCCs “namespace”? How could a name look then?

I have left a comment on the Phabricator patch that you can contact marxin (but not on Discourse).

I added a message to the gcc mailing list (CC’ing marxin) here: [RFC] Adding division/modulo on arbitrary precision integers to libgcc (gnu.org)