An unlimited-width integer type

Hi, I’m considering writing an LLVM backend to target an abstract machine, encoding to a bytecode ISA.

This abstract machine has integers, but not in the sense of having fixed-size machine words; rather, each abstract-machine register and memory location can losslessly hold onto a distinct (non-aliasing) unlimited bit-width value, and math/bitwise ops will operate losslessly on these values to produce new values of unlimited bit-width, without any need for overflow into neighbouring registers/memory locations.

A VM implementing this abstract machine would, of course, use some kind of bignum library to implement these semantics, with each such integer value actually being a pointer to a VM-heap-allocated bignum; or, perhaps, the VM would do load-time optimization to strength-reduce some of the bignums to regular host-CPU integer types.

But, from the perspective of a compiler targeting the abstract machine itself, the ISA just “has” registers and ops of an ‘iUNLIMITED’ type.

Does LLVM, and specifically the LLVM IR type system, support anything like this? If not, would it be the work of days, or of months, to make it happen?

Hi, I'm considering writing an LLVM backend to target an abstract machine, encoding to a bytecode ISA.

This abstract machine has integers, but not in the sense of having fixed-size machine words; rather, each abstract-machine register and memory location can losslessly hold onto a distinct (non-aliasing) unlimited bit-width value, and math/bitwise ops will operate losslessly on these values to produce new values of unlimited bit-width, without any need for overflow into neighbouring registers/memory locations.

A VM implementing this abstract machine would, of course, use some kind of bignum library to implement these semantics, with each such integer value actually being a pointer to a VM-heap-allocated bignum; or, perhaps, the VM would do load-time optimization to strength-reduce some of the bignums to regular host-CPU integer types.

But, from the perspective of a compiler targeting the abstract machine itself, the ISA just "has" registers and ops of an 'iUNLIMITED' type.

Does LLVM, and specifically the LLVM IR type system, support anything like this? If not, would it be the work of days, or of months, to make it happen?

The real question is: what do you expect LLVM to do with these integers? Do you expect LLVM to optimize operations on them? Perform constant folding? If you represent them as pointers that you happen to pass to particular functions/intrinsics representing operations on them, then getting things working will be easy. Depending on the attributes you put on the functions/intrinsics, you can even get some amount of CSE. If you want more, it will likely be more work.

-Hal

Hi,

This is roughly the model that Smalltalk provides for integers, and I did implement a Smalltalk compiler using LLVM some time ago...

A few thoughts come to mind:

1. C and related languages encode fixed-width types in their abstract machine (in C, the width of types is baked by the time the preprocessor has run for a lot of code, which is why the EFI bytecode design was such a bad idea). Your VM wouldn't be useable from any existing LLVM front ends without some significant changes, which is not necessarily a reason *not* to do this, but may limit the utility.

2. LLVM has fixed-width types and intrinsics for checking overflow. If you expect that *most* of the operations in your source language(s) will fit into an LLVM fixed-width integer type that corresponds to a machine register, then you may find that LLVM does quite a good job of optimising your fast paths (though leaves your slow paths untouched) if you implement a compiler for your abstract machine *to* LLVM. Note that deferring overflow checks for as long as possible and failing back to the fast path to redo calculations will generate better code for this.

3. If you're willing to do some extra work, you could try introducing an iUnlimited type and see how invasive the change is. There are a number of places where optimisations query bit widths and other places where APInt and friends encode a fixed width for a particular compile-time evaluation. There are probably other places where known-bits analyses could allow you to transform iUnlimited to a fixed-width type. You could then provide a default lowering that mapped them to pointers and calls to a bignum library, so existing back ends could use them. I can imagine that this would be useful to some front ends that would prefer to defer lowering of variable-width integers until after some optimisations. I can't guarantee that this would be something that upstream would accept, but I'd love to see someone try the experiment.

4. You may find that doing this in MLIR works better. MLIR has an open type system, so you can add a new type for your integers quite easily, though then you won't benefit from any LLVM optimisations. It's not clear that you could necessarily use the generic LLVM backend infrastructure though, so a direct translation from MLIR to your bytecode may preferable.

David

Fair warning, I’m answering a bit outside my comfort zone, so some of this might be wrong…

If you’re wanting to have an LLVM backend that targets this abstract machine, then lowering fixed precision integers into infinite precision integers is fairly straight forward. You just need to manually implement the twos complement wrapping behaviors in the integer space w/branching. That probably won’t be the fastest thing in the world, but if you’re targeting anything other than an mature VM, you shouldn’t care.

Your challenge will be representing the infinite integer type in the backend. You shouldn’t need to change IR at all.

The simplest, and slowest, approach would be to custom lower to a set of intrinsic calls. If you type your “registers” as oddly sized pointers into a special address space, you can probably get that working. Your real challenge will be representing your “integer store” memory since it is by definition not byte addressable. I don’t really have any good suggestions there. I’d strongly suggest figuring this piece out before moving on to anything else.

The harder approach to the lowering would be introducing an integer type in into the backend. I have no input on difficulty of that, other than to say it’s probably not worth doing unless you’re pushing performance fairly hard. (I doubt you are.)

Philip