Code generation option for wide integers on x86_64?

Is there an existing option in X86_64 target code generator to emit a loop for the following code:

define i4096 @add(i4096 %a, i4096 %b) alwaysinline {

%c = add i4096 %a, %b

ret i4096 %c

}

instead of:

movq %rdi, %rax

addq 96(%rsp), %rsi

adcq 104(%rsp), %rdx

movq %rdx, 8(%rdi)

movq %rsi, (%rdi)

adcq 112(%rsp), %rcx

movq %rcx, 16(%rdi)

adcq 120(%rsp), %r8

movq %r8, 24(%rdi)

adcq 128(%rsp), %r9

movq %r9, 32(%rdi)

movq 8(%rsp), %rcx

adcq 136(%rsp), %rcx

movq %rcx, 40(%rdi)

movq 16(%rsp), %rcx

:

:

:

What is the best strategy for lowering wide integer types/operations on x86_64 without causing code blow up? Should we the code run through a custom pass that replaces wide operations with a library function call (or alternatively a loop) before code generation? Is there any existing code that can be reused?

Is there any documentation that describe strategies for lowering from languages that support arbitrarily wide integers?

If you’re dealing with integers that wide, I’d recommend avoiding the builtin IR integers. There are dedicated libraries for wide integers; the most well-known is probably GMP.

-Eli

I’ll comment that I’d love to see someone improve LLVM’s lowering for wide integers, but this definitely falls into the “interesting project” camp, not the “fastest way to make progress” camp.

Philip

I do not mind using dedicated libraries for wide integers (in fact, I mentioned that in my email below).

But the question is whether I should do that when lowering from the front-end representation to LLVM IR or at first use the LLVM IR integer representation, run machine independent transformation passes and then prior to running the code gen passes, convert the wide integers to an alternative representation and convert all operations on them to library calls?

If latter, how would I go about doing that? Create ‘alloca’ for the wide integers, bit cast and store them to stack and call the library functions? I don’t know if there is a way to get rid of unnecessary alloca’s. For example, if these wide integers are already in memory; it is a waste to load them from memory and copy them to stack; the address of the original location could be passed to the library.

Or lowering to library calls from the front-end representation, the better alternative?

/Riyaz