Union type, is it really used or necessary?

In the LLVM type system, union type is similar to the one in C/C++. In the implementation of llvm-gcc, a C union type is converted to a struct of one field that can hold all possible values of the union type and type casts are used to make the code manipulating the union type “well typed”. This approach seems work very well, is there really a need to keep union type in LLVM? Is there a front-end emitting union type?

Thanks,
Neal

used to make the code manipulating the union type "well typed". This
approach seems work very well, is there really a need to keep union type in
LLVM?

I think in its current state the unions should be removed from LLVM IR
in next release. It's pretty much unfinished and noone is willing to
work on them.

Is there a front-end emitting union type?

Not that I'm aware of.

I agree.

-Chris

used to make the code manipulating the union type “well typed”. This
approach seems work very well, is there really a need to keep union type in
LLVM?
I think in its current state the unions should be removed from LLVM IR
in next release. It’s pretty much unfinished and noone is willing to
work on them.

I agree.

Unfortunately I wasn’t able to take the union stuff much farther than I did. Partly that was because my LLVM-related work has been on hiatus for the last 4 months or so due to various issues going on in my personal life. But it was also partly because I had reached the limit of my knowledge in this area, I wasn’t able to delve deeply enough into the code generation side of LLVM to really understand what needed to be done to support unions.

As far as converting a union into a C struct that is large enough to hold all possible types of the union, there are two minor problems associated with this approach:

  1. For frontends that generate target-agnostic code, it is difficult to calculate how large this struct should be. (Which is larger, 3 int32s or two pointers? You don’t know unless your frontend knows the size of a pointer.) In my case, I finally decided to abandon my goal of making my frontend completely target-neutral. While it’s relatively easy to write a frontend that is 99% target-neutral with LLVM, that last 1% cannot be eliminated.

  2. Extracting the values from the union require pointer casting, which means that the union cannot be an SSA value - it has to have an address. This probably isn’t a big issue in languages like C++ which use unions infrequently, but other languages which use algebraic type systems might suffer a loss of performance due to the need to store union types in memory.

used to make the code manipulating the union type “well typed”. This
approach seems work very well, is there really a need to keep union type in
LLVM?
I think in its current state the unions should be removed from LLVM IR
in next release. It’s pretty much unfinished and noone is willing to
work on them.

I agree.

Unfortunately I wasn’t able to take the union stuff much farther than I did. Partly that was because my LLVM-related work has been on hiatus for the last 4 months or so due to various issues going on in my personal life. But it was also partly because I had reached the limit of my knowledge in this area, I wasn’t able to delve deeply enough into the code generation side of LLVM to really understand what needed to be done to support unions.

As far as converting a union into a C struct that is large enough to hold all possible types of the union, there are two minor problems associated with this approach:

  1. For frontends that generate target-agnostic code, it is difficult to calculate how large this struct should be. (Which is larger, 3 int32s or two pointers? You don’t know unless your frontend knows the size of a pointer.) In my case, I finally decided to abandon my goal of making my frontend completely target-neutral. While it’s relatively easy to write a frontend that is 99% target-neutral with LLVM, that last 1% cannot be eliminated.

This is indeed a problem if a front-end or any pass has to compute the size of a type. For example, Sometimes I need to find out the size of a type in my pass, I then call TargetData.getTypeStorageSize() to get the size of a particular type. This practice will introduce architecture-dependent LLVM code. IMHO, LLVM cannot avoid this problem anyway, unless such function is removed or returns a ConstantExpr. Probably, LLVM has a function that returns a ConstantExpr type size, I’m just ignorant in this aspect.

Another thought is can you delay the computing of the maximum storage of a union type by using a max operator?
Your example can be represented as “struct { max([3xi32], [2xi8*],…) }”, this approach will avoid deciding the size in front-ends. But again allowing TargetData.getTypeStorageSize() can compromise the architecture-neutrality goal.

  1. Extracting the values from the union require pointer casting, which means that the union cannot be an SSA value - it has to have an address. This probably isn’t a big issue in languages like C++ which use unions infrequently, but other languages which use algebraic type systems might suffer a loss of performance due to the need to store union types in memory.

Can mem2reg alleviate this problem?

Cheers,
Neal

+1 for more features that make it easier to generate target-agnostic
IR, despite its difficulty.

Speaking of incomplete features, most LLVM frontends do not use the
va_arg intrinsics, but they have not been cut. Presumably they are
useful for the same reason.

Reid

Neal N. Wang wrote:

         >> used to make the code manipulating the union type "well
        typed". This
         >> approach seems work very well, is there really a need to
        keep union type in
         >> LLVM?
         > I think in its current state the unions should be removed
        from LLVM IR
         > in next release. It's pretty much unfinished and noone is
        willing to
         > work on them.

        I agree.

    Unfortunately I wasn't able to take the union stuff much farther
    than I did. Partly that was because my LLVM-related work has been on
    hiatus for the last 4 months or so due to various issues going on in
    my personal life. But it was also partly because I had reached the
    limit of my knowledge in this area, I wasn't able to delve deeply
    enough into the code generation side of LLVM to really understand
    what needed to be done to support unions.

    As far as converting a union into a C struct that is large enough to
    hold all possible types of the union, there are two minor problems
    associated with this approach:

    1) For frontends that generate target-agnostic code, it is difficult
    to calculate how large this struct should be. (Which is larger, 3
    int32s or two pointers? You don't know unless your frontend knows
    the size of a pointer.) In my case, I finally decided to abandon my
    goal of making my frontend completely target-neutral. While it's
    relatively easy to write a frontend that is 99% target-neutral with
    LLVM, that last 1% cannot be eliminated.

This is indeed a problem if a front-end or any pass has to compute the
size of a type. For example, Sometimes I need to find out the size of a
type in my pass, I then call TargetData.getTypeStorageSize() to get the
size of a particular type. This practice will introduce
architecture-dependent LLVM code. IMHO, LLVM cannot avoid this problem
anyway, unless such function is removed or returns a ConstantExpr.
Probably, LLVM has a function that returns a ConstantExpr type size, I'm
just ignorant in this aspect.

:slight_smile: It's ConstantExpr::getSizeOf(Ty).

You can then pass that into an alloca and allocate that number of bytes.

Another thought is can you delay the computing of the maximum storage of
a union type by using a max operator?

Sure, but that's annoying. The max(%X, %Y) becomes 'select i1 (icmp ult %X, %Y), %X, %Y), or in code:
   Constant *SizeX = ConstantExpr::getSizeOf(Ty1);
   Constant *SizeY = ConstantExpr::getSizeOf(Ty2);
   Constant *GT = ConstantExpr::getICmp(ICmpInst::UGT, SizeX, SizeY);
   Constant *Max = ConstantExpr::getSelect(GT, SizeX, SizeY);

Your example can be represented as "struct { max([3xi32], [2xi8*],...)
}", this approach will avoid deciding the size in front-ends. But again
allowing TargetData.getTypeStorageSize() can compromise the
architecture-neutrality goal.

    2) Extracting the values from the union require pointer casting,
    which means that the union cannot be an SSA value - it has to have
    an address. This probably isn't a big issue in languages like C++
    which use unions infrequently, but other languages which use
    algebraic type systems might suffer a loss of performance due to the
    need to store union types in memory.

Can mem2reg alleviate this problem?

If the memory is alloca'd then mem2reg should take care of it, yes. Note that the constant expression needs to be resolved to a concrete number at some point for this to take place, which in practise means that the TargetData will need to be added and an instcombine run will need to take place before mem2reg can do its work.

Nick

Speaking of incomplete features, most LLVM frontends do not use the
va_arg intrinsics, but they have not been cut. Presumably they are
useful for the same reason.

They are, but you still have the issue of va_list not being the same
type everywhere.

Reid
_______________________________________________
LLVM Developers mailing list
LLVMdev@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Cheers,

Reid Kleckner <reid.kleckner@gmail.com> writes:

I removed unions from mainline in r112356.

-Chris

Chris Lattner wrote:

I removed unions from mainline in r112356.

Sorry for reviving this old thread, but I think the removal of
unions is a real pity.

I use Haskell to generate LLVM code using David Terei's LLVM
code from the GHC compiler (the compiler I'm working on is also
written in Haskell). Once I've generated LLVM IR code I use llc
to generate object code.

I'm currently use llvm-2.7 and have been using unions, not being
aware that they are going to be removed. The use case is for
forcing field alignments in a packed struct to be correct for
32 and 64 bits. In particular I have a struct with an i32 tag
field followed by a pointer.

When generating 32 bit code the struct looks like:

     <{ i32, pointer }>

and for 64 bit code:

     <{ union { i32, i64 }, pointer }>

The nice thing about this is that in my LLVM code generator,
I have a function offsetOf which can get me the byte offset of
and element in a struct. In the case above,

    offsetOf (1)

returns 4 when generating 32 bit code and 8 when generating 64
bit code.

If there's another of guaranteeing struct alignment as well as
and easy way to get struct field offsets I'd like hear of it.
Otherwise, I'd like to know what needs to be done to get unions
back in LLVM.

Cheers,
Erik

Hello, Erik

Otherwise, I'd like to know what needs to be done to get unions
back in LLVM.

Well, the answer is pretty easy: someone should "fix" them to be
supported throughout the whole set of libraries and became a
"maintainer".
Otherwise the feature being unused will quickly became broken.

It was already broken for ages... :confused:

Even if you're not using the backends (or MC), having it in front-end
only will only confuse new users that will try to use it and hope it
just works (my case, a few months ago).

If there is nothing, you just work around it (by adding new features
to structs, if necessary) or re-create unions, depending on your
commitment to the union problem. Although having an union type would
be quite an improvement to IR readability, I really don't need it that
badly to write the whole back-end for it.

It's just a matter of priorities, unfortunately... :frowning:

Here’s a suggestion - can we make the “union patch” (the inverse of the patch that removed unions) as a downloadable file so that people who are interested in finishing the work can do so?

The patch would degenerate quickly and become useless after a few
commits/releases.

Maybe a high level docs would be best, with the basic points like the
one on how to create a FunctionPass, but focused on how to create a
new Type. And a special section on the rationale about unions (and its
intrinsic problems with type sizes in front-end/back-end code
generation).

Here’s a suggestion - can we make the “union patch” (the inverse of the patch that removed unions) as a downloadable file so that people who are interested in finishing the work can do so?

It already is. I reverted it with one commit, so you can obtain that patch with ‘svn diff’.

-Chris

Anyone who's really interested in working on it can just use "svn diff
-c 112356", and apply it with "patch -R".

-Eli

Eli Friedman wrote:

> Here's a suggestion - can we make the "union patch" (the inverse of the
> patch that removed unions) as a downloadable file so that people who are
> interested in finishing the work can do so?

Anyone who's really interested in working on it can just use "svn diff
-c 112356", and apply it with "patch -R".

Well I tried that, the patch fails to reverse apply. Out of the
34 files touched by the patch, not a single hunk actually manages
to reverse apply.

Assuming I was to decide to embark on the effort of getting unions
back into LLVM:

  a) What is required for them to be accepted back in?

  b) What are the chances of getting them in the 2.8 release?

Erik

Eli Friedman wrote:

Here's a suggestion - can we make the "union patch" (the inverse of the
patch that removed unions) as a downloadable file so that people who are
interested in finishing the work can do so?

Anyone who's really interested in working on it can just use "svn diff
-c 112356", and apply it with "patch -R".

Well I tried that, the patch fails to reverse apply. Out of the
34 files touched by the patch, not a single hunk actually manages
to reverse apply.

Assuming I was to decide to embark on the effort of getting unions
back into LLVM:

a) What is required for them to be accepted back in?

It needs to work. When reverted, it was broken in almost all cases.

b) What are the chances of getting them in the 2.8 release?

Zero.

-Chris

Chris Lattner wrote:

> a) What is required for them to be accepted back in?

It needs to work. When reverted, it was broken in almost all cases.

'It needs work' and 'it was broken' doesn't really give me an
idea of what specifically is required.

Specifically, what I am interested in is using unions within
packed structs to force alignment. Using unions like this was
the easiest and most reliable way of forcing specific alignment.
It made it really easy to calculate offsets in high level code
allowing me to completely ignore whether I was generating code
for 32 or 64 bits.

> b) What are the chances of getting them in the 2.8 release?

Zero.

So a feature, of which a subset was actually working (I know
this because I am using unions successfully in the compiler
I'm working on) in the 2.7 release and was documented on the
web site

    LLVM Assembly Language Reference Manual

just gets yanked?

Are you really trying to tell me that anyone using LLVM in anger
needs to be running SVN HEAD and keep an eye on the mailing list
to make sure that features they use aren't going to get arbitrarily
yanked?

Erik