Strategy for leveraging llvm optimizations in vm

Hi –

I’m still very much a newbie with llvm, but am looking (hopefully) to use it to compile into native intel code a set of source that is a combination of byte codes for my own custom vm and intel code that has been coded in assembly language directly.

In an earlier exchange, I already discovered that llvm does not do any optimizations on intel assembly language code. This would be an interesting thing to add, I believe, and I may make some progress in that direction as I proceed. But in the meantime I’m wondering if the following makes sense: To write two separate translation mechanisms; one that translates the assembler code associated with each of my byte codes into a sequence of llvm intermediate code instructions, and the other to do the same with instructions coded directly in intel assembly (i.e there would be a one-to-one correspondence between intel assembly language statements and llvm intermediate code statements, substituting real register assignments with llvm virtual register assignments.

It strikes me that if I did this, I would then have instruction sequences (whether originating in my byte code or in intel assembly language) that could be submitted to the llvm optimization engine. Does this make sense to any of you old hands?

Thanks.

Mike

If you x86 assembly is sufficiently simple, I don't see any reason why
you couldn't programmatically raise it back up to LLVM IR. People
have tried this in the past (qemu, I think? I can't remember), and it
usually results in some considerable slowdowns. I'd imagine that if
your asm is sufficiently restricted, such as not needing to worry
about arithmetic flags, the x87 FPU stack, or arbitrary control flow,
then it should be easy to do without introducing any slowdowns.

Reid

If you x86 assembly is sufficiently simple, I don't see any reason why
you couldn't programmatically raise it back up to LLVM IR. People
have tried this in the past (qemu, I think? I can't remember),

IIRC, this is what libcpu does.

http://www.libcpu.org/wiki/Main_Page

and it
usually results in some considerable slowdowns. I'd imagine that if
your asm is sufficiently restricted, such as not needing to worry
about arithmetic flags, the x87 FPU stack, or arbitrary control flow,
then it should be easy to do without introducing any slowdowns.

Reid

Hi --

I'm still very much a newbie with llvm, but am looking (hopefully) to use it
to compile into native intel code a set of source that is a combination of
byte codes for my own custom vm and intel code that has been coded in
assembly language directly.

In an earlier exchange, I already discovered that llvm does not do any
optimizations on intel assembly language code. This would be an interesting
thing to add, I believe, and I may make some progress in that direction as I
proceed. But in the meantime I'm wondering if the following makes sense:
To write two separate translation mechanisms; one that translates the
assembler code associated with each of my byte codes into a sequence of llvm
intermediate code instructions, and the other to do the same with
instructions coded directly in intel assembly (i.e there would be a
one-to-one correspondence between intel assembly language statements and
llvm intermediate code statements, substituting real register assignments
with llvm virtual register assignments.

It strikes me that if I did this, I would then have instruction sequences
(whether originating in my byte code or in intel assembly language) that
could be submitted to the llvm optimization engine. Does this make sense to
any of you old hands?

Thanks.

Mike

_______________________________________________
LLVM Developers mailing list
LLVMdev@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

_______________________________________________
LLVM Developers mailing list
LLVMdev@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

-- Jean-Daniel

Hi –

I’m still very much a newbie with llvm, but am looking (hopefully) to use it to compile into native intel code a set of source that is a combination of byte codes for my own custom vm and intel code that has been coded in assembly language directly.

In an earlier exchange, I already discovered that llvm does not do any optimizations on intel assembly language code. This would be an interesting thing to add, I believe, and I may make some progress in that direction as I proceed. But in the meantime I’m wondering if the following makes sense: To write two separate translation mechanisms; one that translates the assembler code associated with each of my byte codes into a sequence of llvm intermediate code instructions, and the other to do the same with instructions coded directly in intel assembly (i.e there would be a one-to-one correspondence between intel assembly language statements and llvm intermediate code statements, substituting real register assignments with llvm virtual register assignments.

Another option for handling the assembly language fragments is to convert them into inline assembly code within the LLVM IR that you generate. This might be worth exploring if converting the assembly code into LLVM IR is either difficult or causes a performance hit.

– John T.

CC’ing the mailing list.

The inline assembly feature is described in the LLVM Language Reference Manual (). I’m not really familiar with what the LLVM optimizers will do with inline assembly code; all I really know is that there’s a well-defined interface between LLVM code and inline assembly code that allows the optimizers to “do the right thing” (e.g., an inline assembly fragment can specify that it needs one register for input and two for outputs, and the LLVM code generator will assign physical registers for you, or you can tell it that you’re using %eax, and the LLVM code generator won’t generate code that clobbers %eax). It’s basically the same as GCC inline assembly. – John T.