General modular and multiprecision arithmetic

Hi,

I know there's been some talk about bignums already, this is similar to
it, but not exactly the same.
I'm currently using LLVM for my master thesis. The goal is to make a
compiler for zero-knowledge proofs of knowledge protocols. This compiler
should target embedded devices. There's a language called the protocol
implementation language in which these protocols should be implemented.
Here's an excerpt from a simple sample of it:

Common (
Z SZKParameter = 80;
Prime(1024) p = 17;
Prime(160) q = 1;
Zmod*(p) y = 1, g=3
) {
}

Prover(Zmod+(q) x) {
Zmod+(q) _s_1=1, _r_1=4;

Def (Void): Round0(Void) {
}

Def (Zmod*(p) _t_1): Round1(Void) {
_r_1 := Random(Zmod+(q));
_t_1 := (g^_r_1);
}

Def (_s_1): Round2(_C=Int(80) _c) {
_s_1 := (_r_1+(x*_c));
}
}

I have already written a parser and an LLVM front-end for it. The
approach I've used so far was to have external functions modexp1024 (and
cast all inputs to 1024 - the maximal bitsize allowed for my simple
cases), modmul1024 and modadd1024, doing exponentiation, multiplication

So far I've tried to avoid extending LLVM with group types by using
group types only during code generation and backing them with LLVM's
arbitrary ints. This required inferring the resulting type (and its
modulus) and just plugging it in to modexp1024, modmul...

Here's an excerpt of the generated LLVM IR from the frontend:

; ModuleID = 'Prover'

@_s_1 = external global i1024
@_r_1 = external global i1024

declare i1024 @Random()

declare i1024 @modsub1024(i1024, i1024, i1024)

declare i1024 @modmul1024(i1024, i1024, i1024)

declare i1024 @modexp1024(i1024, i1024, i1024)

declare i1 @Verify(i1)

define void @Round0() {
entry:
ret void
}

define i1024 @Round1() {
entry:
%calltmp = call i1024 @Random()
store i1024 %calltmp, i1024* @_r_1
%_t_1 = call i1024 @modexp1024(i1024 3, i1024 %_r_1, i1024 17)
ret i1024 %_t_1
}

I want this IR to be transformable to multiple architectures (x86, ARM,
even 8051 so C++ target code generation is a no-no) with or without a
support for a custom coprocessor for modular arithmetic.

There are multiple problems I'm facing in the end:
1. C backend will not work with anything bigger than 128 bits
2. x86 backend just pushes everything on the stack (naturally) even if I
use i1024* -> I want references
3. all the expressions in the parser are assumed to use mod... for doing
the operations
-> constant folding is made hard, uniformity is lost (you really can't
pass non-modular things) -> can't have constant expressions

What I need is:
1. Uniform treatment of whatever I plug in via LLVM IR's add (to reuse
existing constant folding and just
define my own for modular arithmetic)

What I think I should do:
1. Add support from group types in LLVM (mostly done)