Neal N. Wang wrote:
>> used to make the code manipulating the union type "well
typed". This
>> approach seems work very well, is there really a need to
keep union type in
>> LLVM?
> I think in its current state the unions should be removed
from LLVM IR
> in next release. It's pretty much unfinished and noone is
willing to
> work on them.
I agree.
Unfortunately I wasn't able to take the union stuff much farther
than I did. Partly that was because my LLVM-related work has been on
hiatus for the last 4 months or so due to various issues going on in
my personal life. But it was also partly because I had reached the
limit of my knowledge in this area, I wasn't able to delve deeply
enough into the code generation side of LLVM to really understand
what needed to be done to support unions.
As far as converting a union into a C struct that is large enough to
hold all possible types of the union, there are two minor problems
associated with this approach:
1) For frontends that generate target-agnostic code, it is difficult
to calculate how large this struct should be. (Which is larger, 3
int32s or two pointers? You don't know unless your frontend knows
the size of a pointer.) In my case, I finally decided to abandon my
goal of making my frontend completely target-neutral. While it's
relatively easy to write a frontend that is 99% target-neutral with
LLVM, that last 1% cannot be eliminated.
This is indeed a problem if a front-end or any pass has to compute the
size of a type. For example, Sometimes I need to find out the size of a
type in my pass, I then call TargetData.getTypeStorageSize() to get the
size of a particular type. This practice will introduce
architecture-dependent LLVM code. IMHO, LLVM cannot avoid this problem
anyway, unless such function is removed or returns a ConstantExpr.
Probably, LLVM has a function that returns a ConstantExpr type size, I'm
just ignorant in this aspect.
It's ConstantExpr::getSizeOf(Ty).
You can then pass that into an alloca and allocate that number of bytes.
Another thought is can you delay the computing of the maximum storage of
a union type by using a max operator?
Sure, but that's annoying. The max(%X, %Y) becomes 'select i1 (icmp ult %X, %Y), %X, %Y), or in code:
Constant *SizeX = ConstantExpr::getSizeOf(Ty1);
Constant *SizeY = ConstantExpr::getSizeOf(Ty2);
Constant *GT = ConstantExpr::getICmp(ICmpInst::UGT, SizeX, SizeY);
Constant *Max = ConstantExpr::getSelect(GT, SizeX, SizeY);
Your example can be represented as "struct { max([3xi32], [2xi8*],...)
}", this approach will avoid deciding the size in front-ends. But again
allowing TargetData.getTypeStorageSize() can compromise the
architecture-neutrality goal.
2) Extracting the values from the union require pointer casting,
which means that the union cannot be an SSA value - it has to have
an address. This probably isn't a big issue in languages like C++
which use unions infrequently, but other languages which use
algebraic type systems might suffer a loss of performance due to the
need to store union types in memory.
Can mem2reg alleviate this problem?
If the memory is alloca'd then mem2reg should take care of it, yes. Note that the constant expression needs to be resolved to a concrete number at some point for this to take place, which in practise means that the TargetData will need to be added and an instcombine run will need to take place before mem2reg can do its work.
Nick