ML types in LLVM

Good afternoon!

I'm trying to write an LLVM codegen for a Standard ML compiler
(MLton). So far things seem to match up quite nicely, but I have hit
two sticking points. I'm hoping LLVM experts might know how to handle
these two cases better.

1: In ML we have some types that are actually one of several possible
types. Expressed in C this might be thought of as a union. The codegen
only ever accesses these 'union types' via pointer. Before actually
indexing into the type, it always casts from the 'union pointer type'
to a specific pointer type.

As a concrete example. I have two types %a and %b. I want to express a
third type %c that is either %a* or %b*. Later I'll cast the %c to
either %a* or %b*.

Currently I just represent %c as i8*. I assume that this can have
consequences in terms of aliasing. I tried opaque*, but llvm-as didn't
like that. Is there any way to better represent the type %c to LLVM?

2: In the ML heap we have objects that are garbage collected. Objects
are preceded by a header that describes the object to the garbage
collector. However, pointers to the objects point past the header and
at the actual object. Sometimes, however, the program itself accesses
the header. For example, to determine the length of an array (the
length is in the header). For every type I output it like this:

%opt_33 = { i32, %opt_45*, float }

I could also create another type which includes the header something like:
%opt_33_with_header = {i32, %opt_33 }

Is there any way to express that a pointer is actually a pointer to an
interior element of a type? Something like %opt_33_in_heap =
%opt_33_with_header:1 ?

Currently when I want to read the header of an %opt_33, I cast it to a
i32* and then use getelementptr -1. Is there a better way?

Currently I just represent %c as i8*. I assume that this can have
consequences in terms of aliasing. I tried opaque*, but llvm-as didn’t
like that. Is there any way to better represent the type %c to LLVM?

I assume this is for tagged sums.

Logically, what you want is a distinct LLVM type for every ML union type
and each of its constructors. Unfortunately, LLVM does structural
uniquing of types, so that won’t work. What you can do is abuse address
spaces, giving every distinct type its own address space and casting
back and forth between address spaces as necessary.

Is there any way to express that a pointer is actually a pointer to an
interior element of a type? Something like %opt_33_in_heap =
%opt_33_with_header:1 ?

Something like an ungetelementptr? No, sorry. That would be a
pretty nice extension, though obviously unsound, of course.

Currently when I want to read the header of an %opt_33, I cast it to a
i32* and then use getelementptr -1. Is there a better way?

I think it depends on (1) exactly how your runtime environment lays out
the header and (2) whether you’re trying to create portable IR (as opposed
to creating IR portably).

Personally, I would create a struct type (hereafter “HeaderType”) for the
entire GC header; when you want to access a header field, just cast the
base pointer to i8*, subtract the allocation size of HeaderType, cast the
result to HeaderType*, and getelementptr from there. That doesn’t
make portable IR, of course, but in the absence of an ungetelementptr
instruction, I’m not sure how you could do that better.

You can portably get the allocation size of a type using
TargetData::getTypeSizeInBits().

John.

Currently I just represent %c as i8*. I assume that this can have
consequences in terms of aliasing. I tried opaque*, but llvm-as didn't
like that. Is there any way to better represent the type %c to LLVM?

I assume this is for tagged sums.

Yes.

Logically, what you want is a distinct LLVM type for every ML union type
and each of its constructors. Unfortunately, LLVM does structural
uniquing of types, so that won't work.

Is there absolutely no way to generate a new type? Not even an 'opaque' one?

What you can do is abuse address
spaces, giving every distinct type its own address space and casting
back and forth between address spaces as necessary.

The manual indicates that only addresses in space 0 can have GC
intrinsics used on them. Also I get the impression that this would be
a pretty unsafe idea. :wink:

Is there any way to express that a pointer is actually a pointer to an
interior element of a type? Something like %opt_33_in_heap =
%opt_33_with_header:1 ?

Something like an ungetelementptr? No, sorry. That would be a
pretty nice extension, though obviously unsound, of course.

Well, ungetelementptr could be nice, but I was hoping for something
even better: a way to refer to the whole object type (including the
header) even though my pointer doesn't point to the start of the
object. Ie: this is a pointer to 8 bytes past type X.

That way for normal access I punch down to the object part of the type
and do my business. For access to the header, I just punch into that
part of the type (which happens to involve a negative offset from the
address). However, it seems that LLVM pointers always have to point to
the start of an object.

Personally, I would create a struct type (hereafter "HeaderType") for the
entire GC header; when you want to access a header field, just cast the
base pointer to i8*, subtract the allocation size of HeaderType, cast the
result to HeaderType*, and getelementptr from there.

That's what I'm doing right now; the HeaderType happens to be i32. :wink:
I am assuming that casting in and out of i8* will cost me in terms of
the optimizations LLVM can apply..?

Also, I couldn't find a no-op instruction in LLVM. In some places it
would be convenient to say: '%x = %y'. For the moment I'm doing a
bitcast from the type back to itself, which is rather awkward.

You may be interested in my HLVM project, a VM built upon LLVM and designed
for MLs:

  http://hlvm.forge.ocamlcore.org

Wesley W. Terpstra wrote:

Currently I just represent %c as i8*. I assume that this can have
consequences in terms of aliasing. I tried opaque*, but llvm-as didn't
like that. Is there any way to better represent the type %c to LLVM?

I assume this is for tagged sums.

Yes.

Logically, what you want is a distinct LLVM type for every ML union type
and each of its constructors. Unfortunately, LLVM does structural
uniquing of types, so that won't work.

Is there absolutely no way to generate a new type? Not even an 'opaque' one?

Each time you say "opaque" in a .ll (or call OpaqueType::get in the C++ API) you get yourself a new distinct opaque type.

It's not clear to me at all why opaque didn't work for you in the first place. One thing you'll have to remember is that because of the above, if you want to take an opaque* and pass it to another function that takes an opaque*, you'll get a type mismatch since you said opaque twice. Use "%c = type opaque" in the global space, then %c* to get the same opaque in multiple places. The other reason it might not have worked for you is that you might've tried to dereference your opaque* thereby producing just 'opaque' which isn't allowed.

What you can do is abuse address
spaces, giving every distinct type its own address space and casting
back and forth between address spaces as necessary.

The manual indicates that only addresses in space 0 can have GC
intrinsics used on them. Also I get the impression that this would be
a pretty unsafe idea. :wink:

Is there any way to express that a pointer is actually a pointer to an
interior element of a type? Something like %opt_33_in_heap =
%opt_33_with_header:1 ?

Something like an ungetelementptr? No, sorry. That would be a
pretty nice extension, though obviously unsound, of course.

Well, ungetelementptr could be nice, but I was hoping for something
even better: a way to refer to the whole object type (including the
header) even though my pointer doesn't point to the start of the
object. Ie: this is a pointer to 8 bytes past type X.

That way for normal access I punch down to the object part of the type
and do my business. For access to the header, I just punch into that
part of the type (which happens to involve a negative offset from the
address). However, it seems that LLVM pointers always have to point to
the start of an object.

Personally, I would create a struct type (hereafter "HeaderType") for the
entire GC header; when you want to access a header field, just cast the
base pointer to i8*, subtract the allocation size of HeaderType, cast the
result to HeaderType*, and getelementptr from there.

That's what I'm doing right now; the HeaderType happens to be i32. :wink:
I am assuming that casting in and out of i8* will cost me in terms of
the optimizations LLVM can apply..?

Also, I couldn't find a no-op instruction in LLVM. In some places it
would be convenient to say: '%x = %y'. For the moment I'm doing a
bitcast from the type back to itself, which is rather awkward.

There is none, using a bitcast is the workaround. LLVM's optimizers will fix it up.

Nick

* Wesley W. Terpstra:

Logically, what you want is a distinct LLVM type for every ML union type
and each of its constructors. Unfortunately, LLVM does structural
uniquing of types, so that won't work.

Is there absolutely no way to generate a new type? Not even an
'opaque' one?

Is this really a problem for MLton? I think you only get less precise
alias analysis, and that's it.

Wesley W. Terpstra wrote:

Is there absolutely no way to generate a new type? Not even an 'opaque' one?

Each time you say "opaque" in a .ll (or call OpaqueType::get in the C++
API) you get yourself a new distinct opaque type.

Ok. That's what I thought it did which is why I tried it in the first
place. I must have done something wrong. Thank you!

Also, I couldn't find a no-op instruction in LLVM. In some places it
would be convenient to say: '%x = %y'. For the moment I'm doing a
bitcast from the type back to itself, which is rather awkward.

There is none, using a bitcast is the workaround. LLVM's optimizers will
fix it up.

I'll keep doing what I'm doing then.

Correct. However, I want a fair comparison between LLVM performance
and the native x86 codegen. If I don't give LLVM the same information
the x86 codegen has, it's an unfair comparison.

Wesley W. Terpstra wrote:

Does this help :-

struct TaggedUnionType {
enum { Byte, Char, Int } Type;
union {
uint_8 b;
char c;
uint32_t i;
};
};
int main( int argc, char *argv)
{
TaggedUnionType t;
t.Type = Int;
t.i = 0xAA55;
switch( t.Type)
{
case Byte:
{
printf( “Byte = %0x2\n”, t.b);
break;
}
case Char:
{
printf( “Char = %c\n”, t.c);
break;
}
case Int:
{
printf( “Int = %0x8\n”, t.i);
break;
}
return 0;
}

Output from LLVM disassembler

; ModuleID = ‘/tmp/webcompile/_613_0.bc’
target datalayout = “e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:32:32”
target triple = “i386-pc-linux-gnu”
%struct.TaggedUnionType = type { i32, %“struct.TaggedUnionType::._11” }
%“struct.TaggedUnionType::._11” = type { i32 }
@.str = internal constant [13 x i8] c"Byte = %0x2\0A\00" ; <[13 x i8]> [#uses=1]
@.str1 = internal constant [11 x i8] c"Char = %c\0A\00" ; <[11 x i8]
> [#uses=1]
@.str2 = internal constant [12 x i8] c"Int = %0x8\0A\00" ; <[12 x i8]> [#uses=1]
define i32 @main(i32 %argc, i8** %argv) {
entry:
%argc_addr = alloca i32 ; <i32
> [#uses=1]
%argv_addr = alloca i8** ; <i8***> [#uses=1]
%retval = alloca i32 ; <i32*> [#uses=2]
%t = alloca %struct.TaggedUnionType ; <%struct.TaggedUnionType*> [#uses=6]
%0 = alloca i32 ; <i32*> [#uses=2]
%“alloca point” = bitcast i32 0 to i32 ; [#uses=0]
store i32 %argc, i32* %argc_addr
store i8** %argv, i8*** %argv_addr
%1 = getelementptr %struct.TaggedUnionType* %t, i32 0, i32 0 ; <i32*> [#uses=1]
store i32 2, i32* %1, align 4
%2 = getelementptr %struct.TaggedUnionType* %t, i32 0, i32 1 ; <%“struct.TaggedUnionType::._11”> [#uses=1]
%3 = getelementptr %“struct.TaggedUnionType::._11”
%2, i32 0, i32 0 ; <i32*> [#uses=1]
store i32 43605, i32* %3, align 4
%4 = getelementptr %struct.TaggedUnionType* %t, i32 0, i32 0 ; <i32*> [#uses=1]
%5 = load i32* %4, align 4 ; [#uses=1]
switch i32 %5, label %bb3 [
i32 0, label %bb
i32 1, label %bb1
i32 2, label %bb2
]
bb: ; preds = %entry
%6 = getelementptr %struct.TaggedUnionType* %t, i32 0, i32 1 ; <%“struct.TaggedUnionType::._11”> [#uses=1]
%7 = getelementptr %“struct.TaggedUnionType::._11”
%6, i32 0, i32 0 ; <i32*> [#uses=1]
%8 = bitcast i32* %7 to i8* ; <i8*> [#uses=1]
%9 = load i8* %8, align 4 ; [#uses=1]
%10 = zext i8 %9 to i32 ; [#uses=1]
%11 = call i32 (i8*, …)* @printf(i8* noalias getelementptr ([13 x i8]* @.str, i32 0, i32 0), i32 %10) ; [#uses=0]
br label %bb3
bb1: ; preds = %entry
%12 = getelementptr %struct.TaggedUnionType* %t, i32 0, i32 1 ; <%“struct.TaggedUnionType::._11”> [#uses=1]
%13 = getelementptr %“struct.TaggedUnionType::._11”
%12, i32 0, i32 0 ; <i32*> [#uses=1]
%14 = bitcast i32* %13 to i8* ; <i8*> [#uses=1]
%15 = load i8* %14, align 4 ; [#uses=1]
%16 = sext i8 %15 to i32 ; [#uses=1]
%17 = call i32 (i8*, …)* @printf(i8* noalias getelementptr ([11 x i8]* @.str1, i32 0, i32 0), i32 %16) ; [#uses=0]
br label %bb3
bb2: ; preds = %entry
%18 = getelementptr %struct.TaggedUnionType* %t, i32 0, i32 1 ; <%“struct.TaggedUnionType::._11”> [#uses=1]
%19 = getelementptr %“struct.TaggedUnionType::._11”
%18, i32 0, i32 0 ; <i32*> [#uses=1]
%20 = load i32* %19, align 4 ; [#uses=1]
%21 = call i32 (i8*, …)* @printf(i8* noalias getelementptr ([12 x i8]* @.str2, i32 0, i32 0), i32 %20) ; [#uses=0]
br label %bb3
bb3: ; preds = %bb2, %bb1, %bb, %entry
store i32 0, i32* %0, align 4
%22 = load i32* %0, align 4 ; [#uses=1]
store i32 %22, i32* %retval, align 4
br label %return
return: ; preds = %bb3
%retval4 = load i32* %retval ; [#uses=1]
ret i32 %retval4
}
declare i32 @printf(i8* noalias, …)

Use :-

http://llvm.org/demo/index.cgi

To convert the code. Making sure optimization is turned off, its good optimization !

The other approach is a C++ style inheritance and have a base class with a tag in and sub types as inheriting classes.

Hope this helps,

Aaron

2009/6/13 Wesley W. Terpstra <wesley@terpstra.ca>

Logically, what you want is a distinct LLVM type for every ML union type
and each of its constructors. Unfortunately, LLVM does structural
uniquing of types, so that won't work.

Is there absolutely no way to generate a new type? Not even an 'opaque' one?

As mentioned, you can generate new opaque types, but obviously that
won't work for, say, distinguishing between separate constructors that are
structured identically. If you're not planning to write any LLVM-level
language-specific optimizations, that probably doesn't matter at all.
On the other hand, you were talking about alias analysis, which generally
involves writing a pass to inject language-specific information.

What you can do is abuse address
spaces, giving every distinct type its own address space and casting
back and forth between address spaces as necessary.

The manual indicates that only addresses in space 0 can have GC
intrinsics used on them.

More casts! Although I'm curious why this limitation is in effect at all;
probably a consequence of some other overloaded use of address
spaces.

Also I get the impression that this would be a pretty unsafe idea. :wink:

Not particularly less safe than all the other unsafe casts you're planning
to use.

Is there any way to express that a pointer is actually a pointer to an
interior element of a type? Something like %opt_33_in_heap =
%opt_33_with_header:1 ?

Something like an ungetelementptr? No, sorry. That would be a
pretty nice extension, though obviously unsound, of course.

Well, ungetelementptr could be nice, but I was hoping for something
even better: a way to refer to the whole object type (including the
header) even though my pointer doesn't point to the start of the
object. Ie: this is a pointer to 8 bytes past type X.

Okay. You are right, there is no way to express this in the type system,
and that is very unlikely to change.

Personally, I would create a struct type (hereafter "HeaderType") for the
entire GC header; when you want to access a header field, just cast the
base pointer to i8*, subtract the allocation size of HeaderType, cast the
result to HeaderType*, and getelementptr from there.

That's what I'm doing right now; the HeaderType happens to be i32. :wink:
I am assuming that casting in and out of i8* will cost me in terms of
the optimizations LLVM can apply..?

It would only really affect a type-based alias analysis, and there's no
cookie-cutter version of that; you would need to write your own AA
pass, which could then easily recognize the pattern of accessing the
header.

Also, I couldn't find a no-op instruction in LLVM. In some places it
would be convenient to say: '%x = %y'. For the moment I'm doing a
bitcast from the type back to itself, which is rather awkward.

The bitcast is a decent workaround, but the real question is why you need
a no-op at all; if you're doing it to provide a hook for optimizer
information, a call is probably a better idea.

John.

Even if this puts LLVM at an unfair disadvantage, I think you will find that
LLVM will thrash MLton's current x86 backend anyway.

I did some benchmarking on HLVM and found that it was often several times
faster than OCaml when the GC is not the bottleneck:

http://flyingfrogblog.blogspot.com/2009/03/performance-ocaml-vs-hlvm-beta-04.html

And, of course, OCaml is fast as ML compilers go...

For numerical tasks and Array tasks but your graphs show for data manipulation for Lists LLVM is slower. I thnk you need further benchmarks, is this at all posible for you to do :slight_smile:

Interesting,

Aaron

The only benchmarks where HLVM does not thrash OCaml are GC intensive and that
is solely because HLVM has a very naive and inefficient GC implementation
(going via a hash table!). I assume Wesley intends to reuse MLton's existing
GC which does not have these issues so I do not believe those results are
relevant here. My point is simply that LLVM makes it easy to generate
extremely performant x86 (and x64) code, IME.

However, I would very much like to continue benchmarking and improving HLVM.
HLVM is currently on hold temporarily while we ship a new product but I'll
get back to it ASAP. I would also very much like to implement a concurrent GC
so Talin's recent news about Scarcity was very interesting to me.

Incidentally, Henderson described his shadow stack algorithm as a means of
implementing accurate garbage collection in an uncooperative environment back
in 2002:

I used a similar approach in HLVM in order to implement an accurate GC without
having to do any unsafe low-level LLVM hacking in C++. The difference is that
Henderson stored structs of references on the stack in order to keep them
contiguous and pushed only a single pointer to the whole struct onto the
shadow stack. In contrast, I literally stored every reference separately on
the shadow stack.

Many people (particularly functional programmers, who are always whining :wink:
have expressed concerns about using shadow stacks as a workaround for
low-level LLVM hacking in order to implement an accurate GC. Consequently, I
also did a little study of the impact of having used a shadow stack in HLVM:

http://flyingfrogblog.blogspot.com/2009/03/current-shadow-stack-overheads-in-hlvm.html

Suffice to say, I found the performance of the shadow stack to be more than
adequate for my needs. However, I am somewhat unusual among functional
programmers in that I am persuing the performance profile of F# rather than
OCaml or Haskell. Specifically, I want to be able to implement numerical
methods using the abstractions of a high-level language without sacrificing
the predictably-good performance of Fortran and, in particular, I have no
desire to be able to allocate lots of short-lived values in
performance-critical code.