Large integers as first-class values

LLVM supports integers up to about 8 million bits. This is a wonderful
feature that I would like to expose in the language I'm designing, so
that if you were, say, implementing SHA512, you could write the code
in terms of variables of type [int 512], and have it all work with
near optimal efficiency, the size known at compile time, and (unlike
the case where you were using an arbitrary precision integer class) no
heap allocation.

So my question is, is it okay to go ahead and do this, or are there
any caveats in terms of efficiency or correctness? In particular, I
remember reading something about there being problems with returning
integers larger than two machine words, but I can't find it again; is
there currently any such problem, or is it the case that if there was,
it's fixed now?

In terms of correctness, it should work except for the fact that the
LLVM code generators don't implement more complicated operations on
such integers, like multiplication, division, and variable-width
shifts. The issues with returning large integers are fixed, at least
on x86.

In terms of efficiency, the generated code is likely to be less than
ideal; juggling 512-bit numbers takes a lot of registers, and
everything will be unrolled. This might be okay for a 512-bit number,
but it would be a complete mess for a 2048-bit number.

Overall, for arbitrary uses, you're probably better off using a more
conventional bignum library.

-Eli

But not on other platforms?

What's the largest integer such that something like 'return ((a * b) /
c) >> d' works correctly on all major platforms?

In terms of correctness, it should work except for the fact that the
LLVM code generators don't implement more complicated operations on
such integers, like multiplication, division, and variable-width
shifts. The issues with returning large integers are fixed, at least
on x86.

But not on other platforms?

If I recall correctly, there was some platform-specific work involved,
and I'm not sure it got done on all platforms.

What's the largest integer such that something like 'return ((a * b) /
c) >> d' works correctly on all major platforms?

Twice the size of a pointer, i.e. 64 bits on 32-bit platforms and 128
bits on 64-bit platforms.

-Eli

Okay, thanks. Do I understand correctly that this is likely to
continue to be the case, so language support for large integers will
need to be implemented by other means?

Yes; there are no plans to change this.

-Eli

Maybe it would be worth adding "iInf" to LLVM to use all the
pre-existing optimizations, then have passes to lower it to GMP or
other implementations...

I see where you're coming from, but I don't think that would be
useful. There are applications where a dependency on GMP is okay, but
not across the board for a core language feature. I think I'm just
going to have to bite the bullet and go ahead and implement arbitrary
precision integers as part of my standard library.