Union type, is it really used or necessary?

Chris Lattner wrote:

a) What is required for them to be accepted back in?

It needs to work. When reverted, it was broken in almost all cases.

'It needs work' and 'it was broken' doesn't really give me an
idea of what specifically is required.

There were numerous problems, basically any time someone tried to use it, it broke. I don't think codegen supported it at all for example.

Specifically, what I am interested in is using unions within
packed structs to force alignment. Using unions like this was
the easiest and most reliable way of forcing specific alignment.
It made it really easy to calculate offsets in high level code
allowing me to completely ignore whether I was generating code
for 32 or 64 bits.

Lots of people agree it would be a useful feature, we are lacking a useful implementation :slight_smile:

b) What are the chances of getting them in the 2.8 release?

Zero.

So a feature, of which a subset was actually working (I know
this because I am using unions successfully in the compiler
I'm working on) in the 2.7 release and was documented on the
web site

   LLVM Assembly Language Reference Manual

just gets yanked?

The 2.7 release notes:
http://llvm.org/releases/2.7/docs/ReleaseNotes.html

contained:

"LLVM 2.7 has pre-alpha support for unions in LLVM IR. Unfortunately, this support is not really usable in 2.7, so if you're interested in pushing it forward, please help contribute to LLVM mainline."

Several other features that were in 2.7, but unmaintained, also got removed.

2.8 has already branched for its release, so it is too late for new features. Bill's email made this very clear.

-Chris

If you want to make sure the pointer field is properly aligned, why not
just use a non-packed struct: { i32, pointer } ? Or if you really want
a packed struct, can you use <{ i32, i32, pointer }>, since you're
already emitting target-dependent IR anyway?

If you're computing the offset in order to use as a value
within the program, you can use ConstantExpr::getOffsetOf.
(If this isn't exposed in whatever bindings you're using,
that can be fixed.)

Dan

Dan Gohman wrote:

If you want to make sure the pointer field is properly aligned, why not
just use a non-packed struct: { i32, pointer } ? Or if you really want
a packed struct, can you use <{ i32, i32, pointer }>, since you're
already emitting target-dependent IR anyway?

If you're computing the offset in order to use as a value
within the program, you can use ConstantExpr::getOffsetOf.
(If this isn't exposed in whatever bindings you're using,
that can be fixed.)

The thing is, I am not using the LLVM library, I generate IR
code directly from Haskell (not using the library because it
makes bootstrapping the compiler I'm working on more difficult).

Secondly, my code, which currently works with llvm-2.7, assumes
packed structs and unions. Using

     <{ i32, i32, pointer }>

won't work for me because the 32 bit version looks like:

     <{ i32, pointer }>

and that means the the pointer element will have different struct
indices depending on whether I'm compiling for 32 vs 64 bit
targets.

Another possible solution is using:

     <{ i64, pointer }> ; 64 bit version
     <{ i32, pointer }> ; 32 bit version

but then the first elements are of different types.

Generating IR code without the benefit of unions, makes both
me generating and generated code far less readable. I'm going
to have to stick with version 2.7 and work on getting unions
back in 2.9.

Erik

Great! I'm sure many people would really appreciate working union support in 2.9, thanks!

-Chris

Search for "union" in the list and you'll find lots of problems and
places to fix.

I remember that Constant doesn't support zero-initialize for unions,
so even before codegen. Also, there was no support at all in the
generic codegen, nor target specific (AFAIK), which is a major factor
for unions, since type sizes, alignment and ABIs play an important
role in how you'll represent unions in code. Also, given it's dubious
status, I doubt there is anything related to that in MC.

I agree that the decision to remove it could be taken a bit less
lightly (maybe asking around first), but if you don't monitor the list
nor the change log (which said it would probably be removed), there
isn't much more to do.

If you want to support unions, you'll have to revert the commit,
extend support in IR (ex. implementing zero-initialize), and implement
the basic support in MC with at least one target (x86?). Even if you
don't use union types for generating code, people would come and think
there is support and report bugs on something that has zero
maintenance. In the end, if you're not willing to support "unions" as
a whole feature, it'll end up the same way as the previous
implementation.

I am one of the advocates we must have unions and it's up to the
codegen to figure out (based on data layout + architecture + whatever
else) what to do with it. But I also have to be realist and accept the
fact that if I can't spare the time to implement it decently, I should
wait until someone does (I do hope that's you!). :wink:

[...]

When generating 32 bit code the struct looks like:

     <{ i32, pointer }>

and for 64 bit code:

     <{ union { i32, i64 }, pointer }>

Surely LLVM will cause the first structure to be correctly aligned on
64-bit platforms by automatically inserting padding? Is explicit
alignment by the user really necessary?

No, because it is declared as "packed" (the angle brackets).

Jonas

David Given wrote:

[...]
> When generating 32 bit code the struct looks like:
>
> <{ i32, pointer }>
>
> and for 64 bit code:
>
> <{ union { i32, i64 }, pointer }>

Surely LLVM will cause the first structure to be correctly aligned on
64-bit platforms by automatically inserting padding? Is explicit
alignment by the user really necessary?

You missed the point. a struct defined with just parentheses like

     { i32, pointer }

may be padded. A struct defined with angle brackets and parentheses

     <{ i32, pointer }>

is a packed struct, guaranteed not to have padding.

See:

    LLVM Language Reference Manual — LLVM 18.0.0git documentation

Erik

[...]

Surely LLVM will cause the first structure to be correctly aligned on
64-bit platforms by automatically inserting padding?

No, because it is declared as "packed" (the angle brackets).

Ah, I wasn't aware of that. Ta.

would be a much better approach than packed structs with unions,
even in LLVM versions which support unions. The output would be
more readable and you'd get better optimization.

I don't know how much work it would involve for you to change
whatever assumptions you have in your code about packed structs, but
if you're considering doing work in LLVM to re-introduce and finish
unions, it's something to consider.

Dan