Hi!
We have patches to add support for large _BitInt
divisions, so that
_BitInt(129) square(_BitInt(129) a, _BitInt(129) b) {
return a/b;
}
compiles succesfully. Compiling this with LLVM main currently
fails in SelectionDAG because it cannot find a matching library function in ExpandIntRes_SDIV,
see Integer division of types larger than 128-bits reports a fatal error in the DAG legalizer · Issue #44994 · llvm/llvm-project · GitHub.
This is achieved by
- adding new library functions into compiler-rt
- emitting calls to those in SelectionDAG and GlobalISel
- providing an new library, ‘bitint’, to provide those also when using libgcc
- unconditionally linking to the ‘bitint’ library in the clang driver
The new library functions are
void __udivei4(unsigned int *quo, unsigned int *a, unsigned int *b,
unsigned int bits);
void __umodei4(unsigned int *rem, unsigned int *a, unsigned int *b,
unsigned int bits);
void __divei4(unsigned int *quo, unsigned int *a, unsigned int *b,
unsigned int bits);
void __modei4(unsigned int *rem, unsigned int *a, unsigned int *b,
unsigned int bits);
This also lifts the bit size limit on _BitInt (currently 128 bits),
which was introduced because division did not work for larger _BitInts.
The implementation of the division and remainder are not tuned for performance;
they use Knuth’s division algorithm which is also used in Support/APInt.cpp.
We would like to ask for your feedback on this approach in general
and on the following design choices in particular:
-
The new library functions are added to the builtins library.
We also add them with weak linkage in a separate library
(‘bitint’) to make them available even when using libgcc (which is often the default).
Once those functions are widely available in libgcc and compiler-rt,
the bitint library can be removed again. -
How do new library functions usually get synchronized between libgcc and
compiler-rt? -
The clang driver links to the bitint library automatically
unless -nodefaultlibs is specified, and independent on whether compiler-rt
or libgcc are used as runtime. -
The clang driver only links to the bitint library in the GNU and MinGW
toolchains. We don’t have much experience with the other toolchains. -
The implementation of the division uses malloc() to allocate scratch space.
There is precedence of using malloc in builtins/emutls.c, so this seems
acceptable. As said before, our main focus is to make the code compile and not
yet on optimizing the runtime performance. -
clang currently enforces maximum number of bits on a _BitInt of 128, which
was motivated by the lack of larger divisions in the backend. Now that those
are available, we wonder what a good upper limit would be. I tested a few
cases and I have not seen any errors in the backend. But large bitwidths take
a long time to compile: A “udiv i4096000 %a, %b” takes ~ 1h in SelectionDAG.
Best wishes,
Matthias Gehre
Senior Staff Software Engineer| AMD
Adaptive Computing Tools