MC-JIT

Together with Jan Sjodin (in copy of this email), we begin an
implementation of the JIT with MC. The idea, suggested by Jan, is to
develop a MCJIT in parallel of the current JIT and to keep the two
implementations until (at least) the new MC one is mature enough.
Currently code is kept on gitorious
(http://gitorious.org/llvm-mc-jit/llvm-mc-jit).

Following this, a boolean "bool MCJIT = false" has been introduced to
the "static ExecutionEngine *create(...)" method, allowing the lli
tool to use the MCJIT with the optional flag "-mcjit".

The MCJIT can now execute little functions with no relocation (like
add(a,b) => a+b), to do that some modifications have been made :
- The addPassesToEmitMC method has been added to TargetMachine class.
It fills the MCContext pointer so we can build the MCJITStreamer.
- The Finish method of MCAssembler have a new optional argument
"Writer" to allow a custom MCJITObjectWriter to be used.

Can you give us some feedbacks on the general idea and on this 2
particular hooks ?

Currently MCJIT has one unittest and the binary size is quite big
(~110MB in debug...) because before using the MC framework we need to
call "InitializeAllTargets()" and friends (same as llc does). For the
JIT, we need only the "host" backend and "InitializeHostTarget()"-like
method doesn't seem to exist. Do you have an opinion on this ?

Attached you will find the patch introducing the first MCJIT draft.
Can you comment on this ?

Jan and Olivier.

mc_jit.patch (55.4 KB)

I can't get it to apply with either 'git apply' or patch. Something
weird about the directory names.

- Michael Spencer

It works with "patch -p0 <mc_jit.patch" on my svn checkout.
I generate it with "git diff --no-prefix" as it seems some svn client
requires it.

Olivier.

Ah. He may have tried p1 since that's what git normally requires.

Though the .gitignore directory stuff likely failed to apply :slight_smile:

-eric

Hi Olivier,

Together with Jan Sjodin (in copy of this email), we begin an
implementation of the JIT with MC. The idea, suggested by Jan, is to
develop a MCJIT in parallel of the current JIT and to keep the two
implementations until (at least) the new MC one is mature enough.
Currently code is kept on gitorious
(http://gitorious.org/llvm-mc-jit/llvm-mc-jit).

Great, we would really love to kill to the old JIT and move everything
to the MC framework.

Following this, a boolean "bool MCJIT = false" has been introduced to
the "static ExecutionEngine *create(...)" method, allowing the lli
tool to use the MCJIT with the optional flag "-mcjit".

The MCJIT can now execute little functions with no relocation (like
add(a,b) => a+b), to do that some modifications have been made :

Cool!

In the context of the JIT, there really is no such thing as a
relocation, just fixups. I'm not completely sure what the right
approach is, but the JIT should be able to fully resolve all of the
symbols that are being used in the module. We may need some extra
interfaces to allow the JIT to tell the MCAssembler about the address
of some external symbols though.

- The addPassesToEmitMC method has been added to TargetMachine class.
It fills the MCContext pointer so we can build the MCJITStreamer.
- The Finish method of MCAssembler have a new optional argument
"Writer" to allow a custom MCJITObjectWriter to be used.

Ok. We could also make the streamer setup the right ObjectWriter,
since the ObjectStreamer should always know what that is.

Can you give us some feedbacks on the general idea and on this 2
particular hooks ?

Seems reasonable, but I haven't looked at the code yet. I would
suggest trying to split your work up into separate patches which
implement incremental pieces of functionality, then submitting them to
llvm-commits for review. That is much easier for us to deal with than
large monolithic patches out of tree.

Currently MCJIT has one unittest and the binary size is quite big
(~110MB in debug...) because before using the MC framework we need to
call "InitializeAllTargets()" and friends (same as llc does). For the
JIT, we need only the "host" backend and "InitializeHostTarget()"-like
method doesn't seem to exist. Do you have an opinion on this ?

You shouldn't need to initialize all targets to use MC, you should be
able to just use InitializeHostTarget(). If not, its probably a bug
somewhere.

- Daniel

Some boring style comments:
- whack trailing whitespace
- spaces, not tabs
- the methods in MCJITStreamer.cpp should probably have blank lines between them

There seems to be an ownership problem of the MCJITObjectWriter. If I
understand the code correctly, the assembler Finish method takes
ownership of the Writer parameter, which presumably is needed to JIT
two functions.

+1 for separate patches, or alternatively, can we just commit this and
begin developing this in-tree? Then other people can hack on it
before it's considered "ready". :slight_smile:

There's also a lot of copying stuff out of the JIT going on. The
MCJIT doesn't need to reimplement things like the JITEventListener
interface, for example. I think you should either:
- Maybe make a svn branch and replace the JIT in-place.
- Try to reuse the support classes whenever possible.

Regarding the memory management strategy, I'd like to take this
opportunity to try to do something more sane than the current JIT.
Instead of streaming the code directly into RWX memory, it would be
better to stream to some other resizeable in-memory data structure,
and allocate the RWX memory once the size is known. You can probably
wait on this one, but I'd like to see this eventually.

Reid

In the context of the JIT, there really is no such thing as a
relocation, just fixups. I'm not completely sure what the right
approach is, but the JIT should be able to fully resolve all of the
symbols that are being used in the module. We may need some extra
interfaces to allow the JIT to tell the MCAssembler about the address
of some external symbols though.

OK, I'm still experimenting with this. But it seems that I just need
to modify the "FixedValue" parameter of the RecordRelocation method.
More on this in future patches. :slight_smile:

There seems to be an ownership problem of the MCJITObjectWriter. If I
understand the code correctly, the assembler Finish method takes
ownership of the Writer parameter, which presumably is needed to JIT
two functions.

Ok, it's not really clear, but with the patch :
- If no Writer is provided to the Finish method, the MCAssembler takes
ownership of the Writer it creates
- If a Writer is provided, the ownership is not taken by MCAssembler

Ok. We could also make the streamer setup the right ObjectWriter,
since the ObjectStreamer should always know what that is.

Yep ! It will clarify the ownership thing. But my patch was to disturb
as little as possible the other streamers. I can implement it like
this if you prefer.

+1 for separate patches, or alternatively, can we just commit this and
begin developing this in-tree? Then other people can hack on it
before it's considered "ready". :slight_smile:

No problem for me and Jan, we can work out or in tree, even if it is
probably more convenient in tree.

- Try to reuse the support classes whenever possible.

This was planned like this.

Regarding the memory management strategy, I'd like to take this
opportunity to try to do something more sane than the current JIT.
Instead of streaming the code directly into RWX memory, it would be
better to stream to some other resizeable in-memory data structure,
and allocate the RWX memory once the size is known. You can probably
wait on this one, but I'd like to see this eventually.

The plan was to allocate just the necessary size of memory, as after
the layout, memory size needed for the generated function should be
known.

Seems reasonable, but I haven't looked at the code yet. I would
suggest trying to split your work up into separate patches which
implement incremental pieces of functionality, then submitting them to
llvm-commits for review. That is much easier for us to deal with than
large monolithic patches out of tree.

Ok. Sorry for the too big patch. Attached is the first patch adding
only 2 hooks on TargetMachine and on MCAssembler. Style should be LLVM
compliant. Apply it with "patch -p0".

You shouldn't need to initialize all targets to use MC, you should be
able to just use InitializeHostTarget(). If not, its probably a bug
somewhere.

Ah. Sorry never mind ! I finally found the "InitializeNativeTarget"
method which do what I need.

Olivier.

mc_jit_01.patch (4.39 KB)

+ // Make sure the code model is set.
+ setCodeModelForStatic();

For the JIT, this should be setCodeModelForJIT; it makes a difference
on, for example, x86-64. I'd also suggest changing the name of the
method appropriately.

- llvm::OwningPtr<MCObjectWriter> Writer(getBackend().createObjectWriter(OS));
- if (!Writer)
- report_fatal_error("unable to create object writer!");
+ MCObjectWriter *Writer = Writer_;
+ if (Writer == 0) {
+ Writer = getBackend().createObjectWriter(OS);
+ if (!Writer)
+ report_fatal_error("unable to create object writer!");
+ }

It would be cleaner if you kept around the OwningPtr, but only
assigned to it in the case where the writer take ownership; that way,
you wouldn't have to mess with explicitly deleting the writer. Also,
the surrounding code uses two spaces for indentation, not four.

I think it looks fine otherwise.

-Eli

New patch taking Eli's comments into account.

Olivier.

mc_jit_01.patch (4.31 KB)

Comments inline. If you have commit access, I'd fire away. If not, I can.

diff --git include/llvm/MC/MCAssembler.h include/llvm/MC/MCAssembler.h
index 07ca070..afff96e 100644
--- include/llvm/MC/MCAssembler.h
+++ include/llvm/MC/MCAssembler.h
@@ -669,7 +669,9 @@ public:
- void Finish();
+ /// \arg Writer is used for custom object writer (as the MCJIT does),
+ /// if not specified it is automatically created from backend

Nit: End complete sentences end with a period.

+ void Finish(MCObjectWriter *Writer = 0);
diff --git include/llvm/Target/TargetMachine.h
include/llvm/Target/TargetMachine.h
index 227499b..a115877 100644
--- include/llvm/Target/TargetMachine.h
+++ include/llvm/Target/TargetMachine.h
@@ -244,6 +244,15 @@ public:
                                           bool = true) {
     return true;
   }

There wasn't a newline here, but there should be for consistency. The comment
below should look like the rest of the Doxygen comments:
/// methodName - Use complete sentences starting with caps and ending with
/// periods.

+ /// get machine code emitted. This method should returns true if fails.
+ /// It fills the MCContext Ctx pointer used to build MCStreamer.
+ ///
+ virtual bool addPassesToEmitMC(PassManagerBase &PM,
+ MCContext *&Ctx,
+ CodeGenOpt::Level OptLevel,
+ bool DisableVerify = true) {
+ return true;
+ }

Ditto.

+ /// get machine code emitted. This method should returns true if fails.
+ /// It fills the MCContext Ctx pointer used to build MCStreamer.
+ ///
+ virtual bool addPassesToEmitMC(PassManagerBase &PM,
+ MCContext *&Ctx,
+ CodeGenOpt::Level OptLevel,
+ bool DisableVerify = true);

Ditto.

+/// get machine code emitted. This method should returns true if fails.
+/// It fills the MCContext Ctx pointer used to build MCStreamer.
+///
+bool LLVMTargetMachine::addPassesToEmitMC(PassManagerBase &PM,
+ MCContext *&Ctx,
+ CodeGenOpt::Level OptLevel,
+ bool DisableVerify) {

Args above should be aligned with column after opening paren.

+ // Add common CodeGen passes.
+ if (addCommonCodeGenPasses(PM, OptLevel, DisableVerify, Ctx))
+ return true;
+ // Make sure the code model is set.
+ setCodeModelForJIT();

New patch. Thanks for all of your comments !

Comments inline. If you have commit access, I'd fire away. If not, I can.

I don't have commit access, if you find it ok, please commit it. :slight_smile:

Olivier.

mc_jit_01.patch (4.68 KB)

Committed in r109080. Thanks for the patches! :slight_smile:

Reid

Any progress with this?
gitorious page shows the last update on Jul 27, 2010.

Yuri

There's basics for an MC JIT implemented now, but it's not yet full featured enough to replace the old JIT. Have a look at ExecutionEnginer/RuntimeDyld and ExecutionEngine/MCJIT.

It's usable enough for some things; for example, dynamic expression evaluation in lldb. The biggest gaps are more thorough relocation support and lazy compilation.

-Jim

Does it, or does it plan to write binary PIC code into the continuous memory area which can be saved and restored if needed? I was thinking, maybe I am mistaken, that the plan was to write code in memory as ELF which can be exported/imported as such. Is this the case?

Yuri

The current implementation uses MachO binaries in memory. There's no reason they couldn't be serialized to disk. The llvm-rtdyld tool gives an example of using the MCJIT RuntimeDyld support to load an object file from memory and use it directly.

The idea is that the object module will be loaded/JITed, then registered with the RuntimeDyld which will extract the functions, data, etc.. from the object and register them with the JIT memory manager and perform any relocations necessary. That will allow having the running program be in a separate address space (or a remote target) than where the compilation/loading is taking place, for example.

For platforms that want ELF instead of MachO, it should be straightforward to implement a similar structure with proper subclassing and such of the RuntimeDyld interfaces. That's not there, currently, however.

-Jim

What I see now is that on ELF platform MCJIT writes ELF module in memory but then later tries to parse it with MachOObject and it breaks with "not a Mach object file (invalid magic)" message. This is inconsistency: it could have written Mach-O anyway even on ELF architecture just to make it work and later when ELFObject class is implemented switch both writing and reading sides to ELF.

Yuri