Generics

Hi,

I’m about designing the compiler backend of my script language and I have generic types in the language. An example is the array. An array#int means array of integers. You can compare it to Java’s generics. Whenever i have a variable of the type “generic”, the type should be replaced by the really used type.
Now the question is: how do i represent them in llvm without losing too much memory or CPU cycles?
I figured out three ways to do it. Which do you think is the best?

  1. Duplicating the implementations for each used type
    very hard to implement because it needs a lot of restructuring the backend

  2. Defining data type “generic” as a record of all possible types
    would be fast but memory intensive; would need an optimization pass that removes unused variables from a struct (is there such pass?)

  3. Defining data type “generic” as a union of all possible types
    would safe a lot of memory, but would destroy all optimizations because of the bitcasts; i dont know how to implement unions in llvm

3 is the way it is done in the interpreter, so i would prefer a way that is very close to 3.
Suggestions?

Hi,

I'm about designing the compiler backend of my script language and I
have generic types in the language. An example is the array. An
array#int means array of integers. You can compare it to Java's
generics. Whenever i have a variable of the type "generic", the type
should be replaced by the really used type.
Now the question is: how do i represent them in llvm without losing too
much memory or CPU cycles?
I figured out three ways to do it. Which do you think is the best?

1. Duplicating the implementations for each used type
very hard to implement because it needs a lot of restructuring the backend

2. Defining data type "generic" as a record of all possible types
would be fast but memory intensive; would need an optimization pass that
removes unused variables from a struct (is there such pass?)

3. Defining data type "generic" as a union of all possible types
would safe a lot of memory, but would destroy all optimizations because
of the bitcasts; i dont know how to implement unions in llvm

3 is the way it is done in the interpreter, so i would prefer a way that
is very close to 3.
Suggestions?

Do type erasure before emitting llvm ir.

I assume that is what he means with his third possibility. It depends on the objectives, though.

Duplicating the code for special versions is better for performance, because the optimizations can use specific types. This is effectively, what happens with C++ templates.

On the other hand, type erasure is what happens in javac and yields you a smaller code size, because nothing is duplicated.