ocaml+llvm

Thanks for the quick reply, Anton.

Hello, Gordon.

Just a quick thinko (that’s why I’m sending this e-mail personally):

No problem. I hope you don’t mind me redirecting back to the list, though…

maybe current infrastructure used for eh/debug info emission can be extended to handle ocaml data?

Perhaps. I’m pretty sure the code to generate this stuff doesn’t belong in LLVM proper like DWARF support does, though; this is not even a vague approximation of a standard. :slight_smile:

Now it emits just DWARF information. Probably, we can think about extending it to emit some “random” information. For example, there is a term “landing pad”, which corresponds to catch-block in C++. Internally, landing pad is just pair of labels plus some additional info. Labels appears from lowering of invoke inst, maybe something like this can be used to emit necessary information?

Oo, that sounds like the right track. Maybe I could tack an optional something onto the call/invoke instructions that says “export a return address label”, like this:

%tmp.1 = i8* call @_camlPervasives__print_endline_298(i8* %greeting.0) ret internal i8* @L103

And then adjust the code generators to insert the specified label. Afterwards, @L103 would be a normal constant and I could build the frame table as a plain old LLVM constant.

Some amount of intrinsics plus additional lowering code.

I can think of two ways to use an intrinsic toward this end, both of which seem unfortunately flawed:

; Annotation on the call instruction.
%tmp.1 = i8* call @_camlPervasives__print_endline_298(i8* %greeting.0)
call void @llvm.var.annotate(i8* %tmp.1, i8* “L103”, i8* null, i32 0)

The code generator can notice the annotation and insert the label. But… what if the callee had a void return type? There’s no register to annotate.

; A freestanding call immediately following the call instruction.
%tmp.1 = i8* call @_camlPervasives__print_endline_298(i8* %greeting.0)
call void @llvm.pcmarker(i32 103)

The marker can just be lowered to a label. But… what about caller-save calling convention? The PC marker won’t be on the right instruction. If I’m just imagining this problem, then this may be the solution to this problem.

Even more, maybe some present functionaly can be trivially extended, or annotation intrinsic used for this?

Hopefully Chris has something up his sleeve! I don’t think llvm.var.annotate works, though as pointed out above.

— Gordon

maybe current infrastructure used for eh/debug info emission can be extended to handle ocaml data?

Perhaps. I'm pretty sure the code to generate this stuff doesn't belong in LLVM proper like DWARF support does, though; this is not even a vague approximation of a standard. :slight_smile:

From your original email:

The biggest problem is a data structure called the frame table, a
simple structure for which LLVM seems ill-prepared. For each call
site in the program, ocaml emits an entry into this table:

    key : the return address of the call site
    value : the stack offset of every variable live after return

The garbage collector uses this when walking the stack to find
live objects.

I don't think you want to try to have the LLVM code generator build this table. The table is a contract between the specific codegen you're using and the GC runtime you're using. This contract is specific to the current ocaml code generator.

In the LLVM world, you should use the first-class support we already have for accurate GC: http://llvm.org/docs/GarbageCollection.html

Based on this, you annotate the variables with llvm.gcroot intrinsics, and then walk the stack with the llvm_cg_walk_gcroots function. Note that these are not well tested and probably have holes in them, but these are the interfaces we want to use :slight_smile:

-Chris

Hi Chris,

Chris Lattner wrote:

I don't think you want to try to have the LLVM code generator build this table. The table is a contract between the specific codegen you're using and the GC runtime you're using. This contract is specific to the current ocaml code generator.

In the LLVM world, you should use the first-class support we already have for accurate GC: http://llvm.org/docs/GarbageCollection.html

I was wondering recently: are there actually any projects that are using these facilities? Preferably ones that I could take a look at?

Based on this, you annotate the variables with llvm.gcroot intrinsics, and then walk the stack with the llvm_cg_walk_gcroots function. Note that these are not well tested and probably have holes in them, but these are the interfaces we want to use :slight_smile:

How is this function actually implemented? Is there special stack-walking code emitted or does LLVM keep an extra stack of roots?

Cheers,

Carl Friedrich Bolz

Chris Lattner wrote:

I don’t think you want to try to have the LLVM code generator build this table. The table is a contract between the specific codegen you’re using and the GC runtime you’re using. This contract is specific to the current ocaml code generator.

In the LLVM world, you should use the first-class support we already have for accurate GC: http://llvm.org/docs/GarbageCollection.html

I was wondering recently: are there actually any projects that are using these facilities? Preferably ones that I could take a look at?

Just the strawman implementation in lib/Transforms/Scalar/LowerGC.cpp and runtime/SemiSpace/semispace.c, which is not functional. llvm_gc_collect ends with ‘abort()’, among other problems. :slight_smile:

Based on this, you annotate the variables with llvm.gcroot intrinsics, and then walk the stack with the llvm_cg_walk_gcroots function. Note that these are not well tested and probably have holes in them, but these are the interfaces we want to use :slight_smile:

How is this function actually implemented? Is there special stack-walking code emitted or does LLVM keep an extra stack of roots?

The latter. LowerGC.cpp maintains a linked list of frames. The structure is even more obvious in semispace.c.

— Gordon

The biggest problem is a data structure called the frame table, aÊsimple structure for which LLVM seems ill-prepared. For each callÊsite in the program, ocaml emits an entry into this table:

Ê ÊÊkeyÊÊÊ: the return address of the call site
Ê ÊÊvalue : the stack offset of every variable live after return

The garbage collector uses this when walking the stack to findÊlive objects.

I don’t think you want to try to have the LLVM code generator build thisÊtable.

The table is a contract between the specific codegen you’re usingÊand the GC runtime you’re using.Ê This contract is specific to the currentÊocaml code generator.

Ocaml is compiled statically; this isn’t an ephemeral link from JIT to runtime as might be the case for a Java or Perl program. Changing these structures breaks binary compatibility (including C interop).

In the LLVM world, you should use the first-class support we already haveÊfor accurate GC: http://llvm.org/docs/GarbageCollection.html

Based on this, you annotate the variables with llvm.gcroot intrinsics,

I’m totally on board with that. The llvm.gc* intrinsics are well-designed as hooks for runtime-specific code generation. I just need to custom lower llvm.gcroot, too.

andÊthen walk the stack with the llvm_cg_walk_gcroots function.ÊNote thatÊthese are not well tested and probably have holes in them, but these areÊthe interfaces we want to use :slight_smile:

But here I have to disagree.ÊQuite by design, LLVM lacks an existing runtime to leverage: LLVM ­ CLR. In light of that, it is difficult to justify ignoring a mature runtime that does exist in order to avoid building a simple data structure.

Here are the respective negatives I perceive:

Ditch the frame table and write a new runtime:
-Êno need to write any interesting LLVM code! [1]
-Êbreaks binary compatibility (including C interop)
-Êhave to write new runtime; llvm’s GC is a strawman

  • makes an ugly patch, which makes for licensing problemsÊ[2]

Tread lightly:

  • generated .bc is incompatible with llc due to runtime-specific GC codegen

Ñ Gordon

[1] Yes, this is a downside. :slight_smile: Possibly the dominant one, since this is a weekend project.

[2] ocaml is licensed under the QPL [Trolltech/KDE], which has an onerous distribution clause which prohibits forks. My current work leaves the existing code generator in place, touching only a few files to integrate LLVM code generation as a runtime option; this improves the possibility that a patch would be accepted, and otherwise makes patch maintenance manageable.

In the LLVM world, you should use the first-class support we already have
for accurate GC: http://llvm.org/docs/GarbageCollection.html

I was wondering recently: are there actually any projects that are using
these facilities? Preferably ones that I could take a look at?

No, unfortunately not. The one that I was working on years ago (which gave me the excuse to build this infrastructure) was not allowed to be released.

Based on this, you annotate the variables with llvm.gcroot intrinsics, and
then walk the stack with the llvm_cg_walk_gcroots function. Note that
these are not well tested and probably have holes in them, but these are
the interfaces we want to use :slight_smile:

How is this function actually implemented? Is there special
stack-walking code emitted or does LLVM keep an extra stack of roots?

The current implementation is very inefficient. The algorithm is described here:

    "Accurate Garbage Collection in an Uncooperative Environment"
    Fergus Henderson, ISMM, 2002

In practice, going forward, I would like to see the code generators develop native support for this, using metadata for each function to locate the pointers on the stack.

-Chris

The table is a contract between the specific codegen you're using and the GC runtime you're using. This contract is specific to the current ocaml code generator.

Ocaml is compiled statically; this isn't an ephemeral link from JIT to runtime as might be the case for a Java or Perl program.

JITing vs native codegen is orthogonal to this problem.

Changing these structures breaks binary compatibility (including C interop).

If that is so, and if there is no way around this, then it makes sense to develop some compatibility mode. How does native C code generate these tables?

In the LLVM world, you should use the first-class support we already have for accurate GC: http://llvm.org/docs/GarbageCollection.html

Based on this, you annotate the variables with llvm.gcroot intrinsics,

I'm totally on board with that. The llvm.gc* intrinsics are well-designed as hooks for runtime-specific code generation. I just need to custom lower llvm.gcroot, too.

Yep.

and then walk the stack with the llvm_cg_walk_gcroots function. Note that these are not well tested and probably have holes in them, but these are the interfaces we want to use :slight_smile:

But here I have to disagree. Quite by design, LLVM lacks an existing runtime to leverage: LLVM ­ CLR. In light of that, it is difficult to justify ignoring a mature runtime that does exist in order to avoid building a simple data structure.

Sure. Given a lack of current implementation, if the ocaml implementation would work and meet our needs, I have no problem with adopting/stealing it :slight_smile:

Here are the respective negatives I perceive:

Ditch the frame table and write a new runtime:
- no need to write any interesting LLVM code! [1]
- breaks binary compatibility (including C interop)
- have to write new runtime; llvm's GC is a strawman
- makes an ugly patch, which makes for licensing problems [2]

Tread lightly:
- generated .bc is incompatible with llc due to runtime-specific GC codegen

I totally agree. Given that there is no real client for the LLVM GC hooks, you are free to break or redesign what we already have: go crazy! :slight_smile:

[2] ocaml is licensed under the QPL [Trolltech/KDE], which has an onerous distribution clause which prohibits forks. My current work leaves the existing code generator in place, touching only a few files to integrate LLVM code generation as a runtime option; this improves the possibility that a patch would be accepted, and otherwise makes patch maintenance manageable.

We can work around this on our end going forward. What this means in the short term is that we can't accept any Ocaml-project code into LLVM. This does not prevent the ocaml people from using LLVM code though.

-Chris

Changing these structures breaks binary compatibility (including C interop).

If that is so, and if there is no way around this, then it makes sense to develop some compatibility mode. How does native C code generate these tables?

I might’ve conflated two related points. The frametable isn’t really at issue with respect to C interop. Rather, the decision to get into the runtime business is.

To be specific, C interop doesn’t generate frame table entries. It anchors roots by explicitly registering/unregistering root pads with the runtime. Very similar to the paper you cited, although not a stack.

C interop does (from macros) use runtime information directly derived from the data/text bracketing symbols I also mentioned.

and then walk the stack with the llvm_cg_walk_gcroots function. Note that these are not well tested and probably have holes in them, but these are the interfaces we want to use :slight_smile:

But here I have to disagree. Quite by design, LLVM lacks an existing runtime to leverage: LLVM ­ CLR. In light of that, it is difficult to justify ignoring a mature runtime that does exist in order to avoid building a simple data structure.

Sure. Given a lack of current implementation, if the ocaml implementation would work and meet our needs, I have no problem with adopting/stealing it :slight_smile:

I’m not sure you want to incorporate ocaml’s runtime per se; its object model is very unique. :slight_smile: How do you envision LLVM GC support going?

My thinking was to merely extend LLVM to be a less “uncooperative environment” for GC (as the paper you referenced put it), while maintaining neutrality to the runtime and object model.

The two major problems I had really boil down to identifying GC points in machine code and statically identifying live roots at those GC points, both problems common to many collection techniques. Looking at the problem from that perspective makes the problem much more tractable, actually…

[2] ocaml is licensed under the QPL [Trolltech/KDE], which has an onerous distribution clause which prohibits forks. My current work leaves the existing code generator in place, touching only a few files to integrate LLVM code generation as a runtime option; this improves the possibility that a patch would be accepted, and otherwise makes patch maintenance manageable.

We can work around this on our end going forward.

How so?

What this means in the short term is that we can’t accept any Ocaml-project code into LLVM. This does not prevent the ocaml people from using LLVM code though.

Exactly. ocaml is written in itself, so there’s little likelihood of that becoming a problem. :slight_smile:

— Gordon

Chris,

This is much more generic than what I was originally thinking, but what do you think of providing a facility like this?

enum GCPointKind {
GCPSubroutineReturn,
GCPFunctionExit,
GCPSubroutineCall,
GCPBackBranch
};

class ??? {
public:
// Allows the code generator to avoid inserting unnecessary GC points.
bool RequiresGCPointsOfKind(GCPointKind Kind) const;

// If false, instructs the code generator to keep all GC roots on the
// stack at GC points.
bool SupportsRegisterRoots() const; // false for now…

// Visits a GC point, including information necessary to identify all
// live GC roots in the stack frame.
virtual bool VisitGCPoint(MachineInstruction *Point,
GCPointKind Kind,
intptr_t *StackRoots, size_t StackCount,
int *RegisterRoots, size_t RegisterCount);
};

That could enable several kinds of conservative collectors.

The code generator would need to cooperate by:

  1. Inserting nodes for GC points if required, recording the corresponding original Instruction*.
  2. Recording the physical location of each llvm.gcroot’d alloca.

Later, a MachineFunctionPass could walk the MachineFunction to lower these nodes. Live roots can be identified using domination information.

In addition to providing a stack and register map for building tables, such an interface offers a collector a chance to insert any necessary landing pads at exit/call/back-branch.

If the collector supports it, LLVM could be allowed to scalarize GC roots down the road.

— Gordon

Changing these structures breaks binary compatibility (including C interop).

If that is so, and if there is no way around this, then it makes sense to develop some compatibility mode. How does native C code generate these tables?

I might've conflated two related points. The frametable isn't really at issue with respect to C interop. Rather, the decision to get into the runtime business is.

Ok.

To be specific, C interop doesn't generate frame table entries. It anchors roots by explicitly registering/unregistering root pads with the runtime. Very similar to the paper you cited, although not a stack.

Ok, that seems easy to support. LLVM doesn't get in the runtime business either, the code generators just provide an opaque mechanism to walk pointers in the stack. The GC runtime library discussed on the accurate GC page is nice (if it existed :wink: but optional.

C interop does (from macros) use runtime information directly derived from the data/text bracketing symbols I also mentioned.

Ok.

and then walk the stack with the llvm_cg_walk_gcroots function. Note that these are not well tested and probably have holes in them, but these are the interfaces we want to use :slight_smile:

But here I have to disagree. Quite by design, LLVM lacks an existing runtime to leverage: LLVM CLR. In light of that, it is difficult to justify ignoring a mature runtime that does exist in order to avoid building a simple data structure.

Sure. Given a lack of current implementation, if the ocaml implementation would work and meet our needs, I have no problem with adopting/stealing it :slight_smile:

I'm not sure you want to incorporate ocaml's runtime per se; its object model is very unique. :slight_smile: How do you envision LLVM GC support going?

No idea - I know little about the ocaml GC. I assumed it was generic enough to be reused in other contexts. If not, then disregard my comments.

My thinking was to merely extend LLVM to be a less "uncooperative environment" for GC (as the paper you referenced put it), while maintaining neutrality to the runtime and object model.

Yep, sounds good.

The two major problems I had really boil down to identifying GC points in machine code and statically identifying live roots at those GC points, both problems common to many collection techniques. Looking at the problem from that perspective makes the problem much more tractable, actually…

Yep.

[2] ocaml is licensed under the QPL [Trolltech/KDE], which has an onerous distribution clause which prohibits forks. My current work leaves the existing code generator in place, touching only a few files to integrate LLVM code generation as a runtime option; this improves the possibility that a patch would be accepted, and otherwise makes patch maintenance manageable.

We can work around this on our end going forward.

How so?

If it becomes important, we can re-implement pieces that are needed.

What this means in the short term is that we can't accept any Ocaml-project code into LLVM. This does not prevent the ocaml people from using LLVM code though.

Exactly. ocaml is written in itself, so there's little likelihood of that becoming a problem. :slight_smile:

:slight_smile:

-Chris

The two major problems I had really boil down to identifying GC points in machine code and statically identifying live roots at those GC points, both problems common to many collection techniques. Looking at the problem from that perspective makes the problem much more tractable, actually…

Chris,

This is much more generic than what I was originally thinking, but what do you think of providing a facility like this?
That could enable several kinds of conservative collectors.

I'm not sure I follow what you mean.

The code generator would need to cooperate by:

1. Inserting nodes for GC points if required, recording the corresponding original Instruction*.
2. Recording the physical location of each llvm.gcroot'd alloca.

This is somewhat implicit in the design we want. The only question is how to identify these at runtime. In theory, the code generator already knows what the gc points are and where the pointers on the stack are located. For a "cooperative" code generator, the codegen should emit tables (similar to EH tables) that describe this, and the llvm code generator callback should enumerate these.

Later, a MachineFunctionPass could walk the MachineFunction to lower these nodes. Live roots can be identified using domination information.

These are optimizations that can be implemented in the code generator, there are many others as well. For the time being, we don't even have real dominator info in the code generator though..

-Chris

Chris Lattner wrote:

I'm not sure you want to incorporate ocaml's runtime per se; its object model is very unique. :slight_smile: How do you envision LLVM GC support going?

No idea - I know little about the ocaml GC. I assumed it was generic enough to be reused in other contexts. If not, then disregard my comments.

The OCaml GC is quite generic. Basically it exposes a model where values all uniformly represented by a N-bit machine word (N=32 or 64) which can denote either a (N-1)-bit integer (with the lsb set to 1), a pointer to a block controlled by the GC, or a pointer outside the GC area (though the last possibility is considered bad style).

A GC block comes with a 1-word header that describes its length and also contain a small 8-bit tag (plus a few bits for the GC). In OCaml, the tag is used in particular to store the constructor for union types (discriminated sums), but the runtime system is really agnostic about the meaning of the tag except for a few special values which denote special kinds of blocks such as:
- byte buffers (not traced by the GC, used to represent OCaml strings), - custom blocks (native data not controlled by the GC, plus a pointer to a table of custom functions to implement finalization and generic operations like equality/hashing/serialization),
- lazy pointers (the GC is able to shortcut chains of forwarding pointers to forced values),
- arrays of weak pointers,...

Of course, this model makes some assumptions, but it could certainly be useful to support languages other than OCaml!

-- Alain

The two major problems I had really boil down to identifying GC points in machine code and statically identifying live roots at those GC points, both problems common to many collection techniques. Looking at the problem from that perspective makes the problem much more tractable, actually…

Chris,

This is much more generic than what I was originally thinking, but what do you think of providing a facility like this? That could enable several kinds of conservative collectors.

A thinko; I meant accurate, not conservative.

I’m not sure I follow what you mean.

My initial thought was more akin to getting the labels I needed and representing the frametable at the source level. That’s a terrible idea. :slight_smile: It makes much more sense for the code generator to cooperate with GC.

The code generator would need to cooperate by:

  1. Inserting nodes for GC points if required, recording the corresponding original Instruction*.
  2. Recording the physical location of each llvm.gcroot’d alloca.

This is somewhat implicit in the design we want. The only question is how to identify these at runtime. In theory, the code generator already knows what the gc points are and where the pointers on the stack are located. For a “cooperative” code generator, the codegen should emit tables (similar to EH tables) that describe this, and the llvm code generator callback should enumerate these.

Yes, that’s what I’m working on. Currently I have:

  • Suppressed llvm.gcroot lowering in the LowerGC pass [may wish to remove the pass from the default pipeline entirely]
  • Added a GCInfo analysis
  • Recorded in GCInfo the stack object indices for roots

Presumably, then, another pass can come along and use the analysis to emit GC tables.

I’m still rummaging around the code generators trying to determine an approach to identifying GC points in machine code which enables liveness analysis. Perhaps I should punt on that for the moment in the interests of getting something dumb working (sans liveness analysis), though.

Later, a MachineFunctionPass could walk the MachineFunction to lower these nodes. Live roots can be identified using domination information.

These are optimizations that can be implemented in the code generator, there are many others as well. For the time being, we don’t even have real dominator info in the code generator though…

Would it be sufficient to use domination information from the LLVM representation for liveness analysis?

— Gordon

I'm not sure I follow what you mean.

My initial thought was more akin to getting the labels I needed and representing the frametable at the source level. That's a terrible idea. :slight_smile: It makes much more sense for the code generator to cooperate with GC.

All the GC needs is information about what pointers are live in each stack trace, right? This is all details of implementation of how it extracts that. The GC should not be tied to the implementation of the code generator.

This is somewhat implicit in the design we want. The only question is how to identify these at runtime. In theory, the code generator already knows what the gc points are and where the pointers on the stack are located. For a "cooperative" code generator, the codegen should emit tables (similar to EH tables) that describe this, and the llvm code generator callback should enumerate these.

Yes, that's what I'm working on. Currently I have:

- Suppressed llvm.gcroot lowering in the LowerGC pass [may wish to remove the pass from the default pipeline entirely]

Right.

- Added a GCInfo analysis
- Recorded in GCInfo the stack object indices for roots

Presumably, then, another pass can come along and use the analysis to emit GC tables.

Makes sense!

I'm still rummaging around the code generators trying to determine an approach to identifying GC points in machine code which enables liveness analysis. Perhaps I should punt on that for the moment in the interests of getting something dumb working (sans liveness analysis), though.

Please start with something simple: we can work up optimizations once the simple stuff is really solid.

> Later, a MachineFunctionPass could walk the MachineFunction to lower > these nodes. Live roots can be identified using domination information.

These are optimizations that can be implemented in the code generator, there are many others as well. For the time being, we don't even have real dominator info in the code generator though..

Would it be sufficient to use domination information from the LLVM representation for liveness analysis?

No, because the machine code cfg can be significantly different. We really do need dom info on machine code.

-Chris

Okay. That makes the increment very simple. I'll work up a patch.

— Gordon