Hello all!
I'm working on a library with bignum support, and I wanted to try LLVM as an apparently simpler and more portable system to my current design (a Haskell script which spits out mixed C and assembly). Porting the script to use the LLVM bindings instead of the current hack was pretty easy. But I have a few remaining questions:
(1) Are bignums exposed to any higher-level language? It would be nice to be able to write all the code in C and compile it with Clang... but I need 256-bit integer support.
(2) Is there a way to convince LLVM's register allocator to do the right thing on x86? I'm getting swaths of code like:
movq 88(%rsp), %rax
mulq 112(%rsp)
movq %rax, %r15
addq %r11, %r15
movq %rdx, %r14
adcq %rcx, %r14
adcq $0, %r9
(that's a 64x64 -> 128-bit multiply with 192-bit accumulate.) The problem is, %r11 and %rcx are dead here. It should have just added %rax and %rdx into them. This results in more movs, more spills, more code and less performance.
(3) Is there a way to avoid this whole mess? I'm using a script to spit out ugly code in large part because GCC and Clang both refuse to unroll loops that look like
int i,j;
// accumulator uses inline asm to do the 64x64->128->192-bit accumulate above
accumulator acc(a[0], b[0]);
tmp[0] = acc.shift();
for (j=1; j<n; j++) {
for (i=0; i<=j; i++)
acc.mac(a[i], b[j-i]);
tmp[j] = acc.shift();
}
where n is a template parameter, and thus known at compile time. Is there some clang pass which will unroll this properly? -O3 -funroll-loops doesn't seem to (in fact, -funroll-loops makes it worse... it tries to unroll the loops by a factor of 4 instead of completely unwinding them). Is there some opt pass which can fix it? This is more relevant if clang can do sane things with the registers (its performance is half that of GCC right now), but it'd be nice to know.
Thanks for your time!
-- Mike Hamburg