Union support in LLVM

Hi,

I’m working on developing a programming language using LLVM as a backend, and it’d be very handy for me if LLVM had union support. I’ve been looking into getting the previous union implementation working properly for the last week or so, but I’m entirely new to the LLVM codebase so I thought I’d ask whether I’m barking up the wrong tree before doing a full-blown implementation. At the moment it seems like the best approach to get unions working is to treat them as a byte array and then have the insertvalue/extractvalue instructions automatically perform conversions to and from other types by bitcasting to/from an equivalent size i8 vector where the bytes can be got at individually.

This approach seems to have a few problems. It gets vector instructions involved without any really good reason (I’m looking at the assembler output). It also seems to violate the ABI - my test function is trying to return the result in memory where I think it should be using registers (I’m JITing a function and calling it using GCC compiled code). The x86-86 ABI isn’t very clear to me though.

Anyway, please let me know if anyone sees major problems with this approach, or has any thoughts on the ABI issues.

Regards,

James

An alternate approach would be to not define a union type as such, but
to introduce metadata (using the LLVM metadata support) that marks a
memory object as a discriminated union with a particular discriminator
value. Then optimizers could be taught to make assumptions such as
"the discriminator value has changed if and only if the type of the
object's data has changed". If your language lends itself to having
all your discriminated unions live in memory until mem2reg time, that
should make it work well without having to rework every layer of LLVM.

Hi James,

I'm working on developing a programming language using LLVM as a backend, and
it'd be very handy for me if LLVM had union support. I've been looking into
getting the previous union implementation working properly for the last week or
so, but I'm entirely new to the LLVM codebase so I thought I'd ask whether I'm
barking up the wrong tree before doing a full-blown implementation. At the
moment it seems like the best approach to get unions working is to treat them as
a byte array and then have the insertvalue/extractvalue instructions
automatically perform conversions to and from other types by bitcasting to/from
an equivalent size i8 vector where the bytes can be got at individually.

you would do better to use arrays rather than vectors, and access them via
memory. As a general rule you shouldn't try to hold aggregate values in
registers - it is supported but only efficient for small aggregates like
complex numbers. The llvm-gcc front-end represents a union as a struct
containing one field with type equal to the type of the largest member of
the union. Unions are accessed from memory (rather than placed in registers)
and the bitcast instruction is used to turn a pointer to the field into a
pointer to one of the other types making up the union.

This approach seems to have a few problems. It gets vector instructions involved
without any really good reason (I'm looking at the assembler output).

You used vectors thus you get vector instructions.

  It also

seems to violate the ABI - my test function is trying to return the result in
memory where I think it should be using registers (I'm JITing a function and
calling it using GCC compiled code).

Sadly, it is up to front-ends to take of getting the ABI right. This is because
there is not enough information in the LLVM IR for it to handle all ABI details
for you automagically.

  The x86-86 ABI isn't very clear to me though.

Yes, it's extremely complicated.

Anyway, please let me know if anyone sees major problems with this approach, or
has any thoughts on the ABI issues.

Take a look at http://llvm.org/bugs/show_bug.cgi?id=4246

Ciao, Duncan.