Integrating LLVM in an existing project

Hi everyone,

After some time hacking on llvm, let me introduce myself :slight_smile:
I'm a PhD student at the French university Pierre et Marie Curie in Paris. I work on a project
called the "Virtual Virtual Machine" project. You can find some (dated) information on the
website http://vvm.lip6.fr.

Basically it's a "low level virtual machine" :slight_smile: with a just in time compiler called the VPU, an execution environment
(GC + threads) and it parses a lisp-like language that is translated to the VPU's internal bytecode.

On top of this execution environment we have implemented a java virtual machine and a .net virtual machine. They
are both functional and we achieve reasonable performance (1/3 of IBM's JVM or Mono).

Our just in time compiler is however what we think the main limit for having better performance. Our register allocator
is really simple and we don't have any basic optimization passes. So we decided to take a look at llvm and see if it was
possible to translate the VPU's internal bytecode to llvm's bytecode. After porting llvm to linux/ppc, and adding some
functionality in llvm (like knowing the required size for the code of a method before allocating memory for the method), we can now execute
a large amount of code of our lisp-like language.

So the next step was to execute our Java and .Net virtual machine on top of the new execution environment with llvm. They are both
implemented in the lisp-like language, therefore we expected nothing or at least little changes. Which was the case. However when executing Java or .Net applications we turned into the problem of exception handling.

Exception handling is not integrated in our execution environment. The JIT is not aware of exception handling. Therefore they are handled at the application level with setjmp and longjmp. When a method with exceptions is compiled, we set labels (start, end, handler) in the compiler for each exception and after compilation grab the address of these labels in the generated code. When an exception is thrown, we compare the current instruction pointer with all couples (start, end) of the current method. If the IP is in the interval, and if the exception type is correct, we setjmp to an instruction in the method's code which will jump to the handler. If not, we look at the calling method's exceptions, and so on until we reach the end of the backtrace.

This algorithm does not work with llvm because creating labels (which correspond to creating basic blocks) does not imply that the label (ie the basic block) will have an address. Even without optimizations enabled in llvm, some basicblocks are not emitted (obviously because some basic blocks are useless).

We can not use current exception handling in llvm, see http://www.nondot.org/sabre/LLVMNotes/ExceptionHandlingChanges.txt.
We can not use the llvm.dbg.stoppoint feature because it is not implemented in llvm's JIT.

So we are stuck :). However we would really like to see what performance gains we have with llvm.

So here are a few questions whose answers will help me go through this issue
1) Is the Chris' exception handling note actually implemented or is it still in project? And how difficult do you expect it to be? (Even if I have implemented some stuff in llvm, I am still not entirely comfortable with the code)
2) The llvm.dbg.stoppoint: how far is it actually implemented?
3) Getting the address of basic blocks: is there a workaround?

Thanks a lot for your answers. And don't hesitate to ask me more infos if things aren't clear in my explanations.

Best,
Nicolas

Hi Nicolas,

Hi everyone,

After some time hacking on llvm, let me introduce myself :slight_smile:
I'm a PhD student at the French university Pierre et Marie Curie in
Paris. I work on a project
called the "Virtual Virtual Machine" project. You can find some (dated)
information on the
website http://vvm.lip6.fr.

Basically it's a "low level virtual machine" :slight_smile: with a just in time
compiler called the VPU, an execution environment
(GC + threads) and it parses a lisp-like language that is translated to
the VPU's internal bytecode.

Interesting project. I wish you could talk about it at the Developer's
Meeting (http://llvm.org/DevMtgMay2007.html :slight_smile:

.. snip ..

So here are a few questions whose answers will help me go through this issue
1) Is the Chris' exception handling note actually implemented or is it
still in project?

I have signed up to implemented this (PR1269) just as Chris' note
states. HLVM needs it for much the same reason that VVM does. I hope to
address this in late April. I'm not sure if it will make it into the 2.0
release (if it does, it will be close).

And how difficult do you expect it to be? (Even if I have implemented
some stuff in llvm, I am still not entirely comfortable with the code)

Well, its not conceptually difficult. At the LLVM IR level its pretty
easy: add an unwind target to a basic block and remove the invoke
instruction. Its dealing with the consequence of those two changes. It
affects many passes and all targets. The changes required individually
are not too bad, but there are a lot of them. See the work plan in
PR1269.

2) The llvm.dbg.stoppoint: how far is it actually implemented?

This is fully implemented in static compilations and works well with GDB
(i.e. you can debug a program generated by LLVM quite well). For JIT,
the reason its not implemented is because its unclear how the
information would be used. Would one want to run LLI under GDB and
actually debug the executed program? If so, where does all the debug
information go? GDB doesn't read debug information out of LLVM IR
representation :slight_smile: We don't know of a way to generate it on the fly in a
form GDB can consume (it prefers DWARF in a file).

Another option is to implement a debugger directly in LLI that knows
about LLVM debug information. This is somewhat like the java approach
(running a debug version of the VM).

So, the question really is, how do you want to use this in the JIT?

3) Getting the address of basic blocks: is there a workaround?

I don't know. I'll let someone else answer this. This has been
discussed several times but I wasn't paying attention to the
answers/results :wink:

Thanks a lot for your answers. And don't hesitate to ask me more infos
if things aren't clear in my explanations.

It was quite clear! Thanks for letting us know.

Reid.

Hi Reid

Reid Spencer wrote:

Interesting project. I wish you could talk about it at the Developer's
Meeting (http://llvm.org/DevMtgMay2007.html :slight_smile:

I wish I could! Unfortunately there is very little chance I get the fundings to
go to the US in May.

I have signed up to implemented this (PR1269) just as Chris' note
states. HLVM needs it for much the same reason that VVM does. I hope to
address this in late April. I'm not sure if it will make it into the 2.0
release (if it does, it will be close).
  
That's great news! I'll look at the PR to see if I can help.

2) The llvm.dbg.stoppoint: how far is it actually implemented?
    
So, the question really is, how do you want to use this in the JIT?

Didn't see that one coming :slight_smile: Maybe I want to use it like I want to use
basic blocs for getting addresses of instructions. My only concern is to be sure that a
list of instruction is generated between one label and one other, and to know the
addresses of these labels.

Thanks Reid!

Best,
Nicolas

Hi Reid

Reid Spencer wrote:
>
> Interesting project. I wish you could talk about it at the Developer's
> Meeting (http://llvm.org/DevMtgMay2007.html :slight_smile:
>
>

I wish I could! Unfortunately there is very little chance I get the
fundings to
go to the US in May.

Yes, long way to go on short notice.

>
> I have signed up to implemented this (PR1269) just as Chris' note
> states. HLVM needs it for much the same reason that VVM does. I hope to
> address this in late April. I'm not sure if it will make it into the 2.0
> release (if it does, it will be close).
>

That's great news! I'll look at the PR to see if I can help.

That would be welcome :slight_smile:

>> 2) The llvm.dbg.stoppoint: how far is it actually implemented?
>>
>
> So, the question really is, how do you want to use this in the JIT?
>
>

Didn't see that one coming :slight_smile: Maybe I want to use it like I want to use
basic blocs for getting addresses of instructions. My only concern is to
be sure that a
list of instruction is generated between one label and one other, and to
know the
addresses of these labels.

I don't think debug stop points are a very useful mechanism for this
purpose. I don't know if llvm supports taking the address of a label
since generally the only thing you can/should use it for is a
branch/switch. Can you do this kind of processing before code generation
(at the LLVM IR level) ?

If not, then please what for Chris or someone to chime in on the "can we
get the address of a basic block" issue.

You might also find this thread useful:
http://lists.cs.uiuc.edu/pipermail/llvmdev/2004-August/001782.html

Reid.

On top of this execution environment we have implemented a java virtual
machine and a .net virtual machine. They
are both functional and we achieve reasonable performance (1/3 of IBM's
JVM or Mono).

Cool.

This algorithm does not work with llvm because creating labels (which
correspond to creating basic blocks) does not imply that the label (ie
the basic block) will have an address. Even without optimizations
enabled in llvm, some basicblocks are not emitted (obviously because
some basic blocks are useless).

Right.

We can not use current exception handling in llvm, see
http://www.nondot.org/sabre/LLVMNotes/ExceptionHandlingChanges.txt.

Why not?

We can not use the llvm.dbg.stoppoint feature because it is not
implemented in llvm's JIT.

llvm.dbg.stoppoint isn't really what you want, this is really for debug info support.

So here are a few questions whose answers will help me go through this issue
1) Is the Chris' exception handling note actually implemented or is it
still in project? And how difficult do you expect it to be? (Even if I
have implemented some stuff in llvm, I am still not entirely comfortable
with the code)

There are two separate issues here:

1. catching exceptions from function calls
2. catching exceptions from "non call" instructions like divides, etc.

The LLVM IR, as it stands now, is expressive enough to describe #1, but not #2. The notes above talk solely about how to extend the IR to support #2. Reid has an interest in implementing #2 at some point, which will be a great improvement for some applications.

There is an extra wrinkle here though. Zero-cost C++ exception handling (i.e. #1) is about 3/4 of the way implemented in the C front-end and code generator, but it is not yet completed (notably, JIT support is missing). This work is also currently stalled.

If you'd be interested in helping with any of these projects, please let me know!

-Chris

Hi Chris,

Chris Lattner wrote:

We can not use current exception handling in llvm, see
http://www.nondot.org/sabre/LLVMNotes/ExceptionHandlingChanges.txt.
    
Why not?

Like you say, it's not functional for non-calls instructions. Besides, having to change
all CalInst to InvokeInst is just too much pain in our current vm.

There are two separate issues here:

1. catching exceptions from function calls
2. catching exceptions from "non call" instructions like divides, etc.

The LLVM IR, as it stands now, is expressive enough to describe #1, but not #2. The notes above talk solely about how to extend the IR to support #2. Reid has an interest in implementing #2 at some point, which will be a great improvement for some applications.

There is an extra wrinkle here though. Zero-cost C++ exception handling (i.e. #1) is about 3/4 of the way implemented in the C front-end and code generator, but it is not yet completed (notably, JIT support is missing). This work is also currently stalled.

Actually, why is it missing? What's the difference between the code generator
and the JIT?

If you'd be interested in helping with any of these projects, please let me know!
  
The thing is, I know very little on exception handling (our setjmp/longjmp mechanism in our vms
is functional but not really optimized) and zip on LLVM. I looked at Reid's notes for implementing #2,
and I'm not sure I can help without having a month or two (of spare time) to learn LLVM's internals.

However, I'd be interested to look at how exception handling can be implemented in the JIT.
What's the current situation?

Nicolas

Like you say, it's not functional for non-calls instructions. Besides, having to change all CalInst to InvokeInst is just too much pain in our current vm.

ok.

Actually, why is it missing? What's the difference between the code generator and the JIT?

There are two things missing:

1. Testing and working out the set of remaining bugs
2. Extending the JIT to emit the EH tables to memory somewhere, and
    register them with the EH runtime.

The thing is, I know very little on exception handling (our setjmp/longjmp mechanism in our vms is functional but not really optimized) and zip on LLVM.

I don't know, you seem to know something about the code generator :slight_smile:

I looked at Reid's notes for implementing #2, and I'm not sure I can help without having a month or two (of spare time) to learn LLVM's internals.

Right, I wouldn't suggest tackling this unless you are willing to see it through all the way.

However, I'd be interested to look at how exception handling can be implemented in the JIT. What's the current situation?

The static code generator works for many simple cases, but it is currently disabled. To enable it, uncomment this line in llvm-gcc/gcc/llvm-convert.cpp:

//#define ITANIUM_STYLE_EXCEPTIONS

Based on that, you should be able to compile simple C++ codes that throw and catch exceptions. The next step would be to make a .bc file, run it through the JIT, see how it explodes :slight_smile:

-Chris

Hi Chris,

The static code generator works for many simple cases, but it is currently
disabled. To enable it, uncomment this line in
llvm-gcc/gcc/llvm-convert.cpp:

//#define ITANIUM_STYLE_EXCEPTIONS

Based on that, you should be able to compile simple C++ codes that throw
and catch exceptions. The next step would be to make a .bc file, run it
through the JIT, see how it explodes :slight_smile:

the compiler fails to build if you do that :-/
The attached patch helps a bit but it needs more work.
Also, I suppose you might need to uncomment this bit in llvm-backend.cpp
as well:
// Disabled until PR1224 is resolved.
  //if (flag_exceptions)
  // Args.push_back("--enable-eh");

Some comments on the patch:
(1)
       new UnreachableInst(CurBB);
+ } else {
+ new UnwindInst(UnwindBB);
     }
-#endif
+#else
     new UnwindInst(UnwindBB);
+#endif

This avoid generating an unwind instruction straight after an unreachable
instruction, i.e. two terminators in a row.

(2)

- FuncCPPPersonality = cast<Function>(
+ FuncCPPPersonality =
     TheModule->getOrInsertFunction("__gxx_personality_v0",
                                    Type::getPrimitiveType(Type::VoidTyID),
- NULL));
- FuncCPPPersonality->setLinkage(Function::ExternalLinkage);
- FuncCPPPersonality->setCallingConv(CallingConv::C);
+ NULL);

- FuncUnwindResume = cast<Function>(
+ FuncUnwindResume =
     TheModule->getOrInsertFunction("_Unwind_Resume",
                                    Type::getPrimitiveType(Type::VoidTyID),
                                    PointerType::get(Type::Int8Ty),
- NULL));
- FuncUnwindResume->setLinkage(Function::ExternalLinkage);
- FuncUnwindResume->setCallingConv(CallingConv::C);
+ NULL);

When compiling the C++ file in which __gxx_personality_v0 is defined,
getOrInsertFunction returns a bitcast of the real function (a bitcast,
because the prototype of the real function is not "void (void)"). This
is a constant expression, thus cast<Function> asserts. Also, there seems
to be no need to set the CC and linkage because the values set are the
defaults. By the way, I like the use of the C++ only __gxx_personality_v0
here!

(3)

+ TypeInfo = BitCastToType(TypeInfo, PointerType::get(Type::Int8Ty));

This argument is, as far as I can see, just a "cookie" and certainly need
not be an i8*, thus the bitcast. Also, since it is a cookie, shouldn't
the intrinsic be taking an opaque* rather than an i8*?

Ciao,

Duncan.

eon.diff (2.6 KB)

the compiler fails to build if you do that :-/
The attached patch helps a bit but it needs more work.
Also, I suppose you might need to uncomment this bit in llvm-backend.cpp
as well:
// Disabled until PR1224 is resolved.
//if (flag_exceptions)
// Args.push_back("--enable-eh");

PR1224 is resolved. Do you want to try enabling this and see if stuff continues to work?

Your patch looks great, I applied it:

+ TypeInfo = BitCastToType(TypeInfo, PointerType::get(Type::Int8Ty));

This argument is, as far as I can see, just a "cookie" and certainly need
not be an i8*, thus the bitcast. Also, since it is a cookie, shouldn't
the intrinsic be taking an opaque* rather than an i8*?

The bitcast shouldn't hurt anything. It's good to have standardized specific prototypes. For example, memcpy is declared as taking i8*'s, which invariably requires bitcasts as well.

Thanks Duncan,

-Chris