Lightweight code loader

Hi list,

The short version of my question: Is it easy to make a lightweight
(read: small in size) linker loader for code produced by the llvm
jit. Does it even make sense to do so?

The longer version: Suppose I have some llvm bytecode module A, and I
want to load and use that code in some runtime. The two obvious ways
to do that are a) use the LLVM jit, or b) compile the module into a
dynamic library and load it then.

Option b, as I understand, doesn't work on Windows. Option a requires
having the LLVM jit linked to the runtime. This seems reasonable, but
as it turns out, the LLVM jit is several times larger than our runtime
(yes, the Release version): ~5 M vs ~.5 M.

Would it be possible (ie, relatively straitforward) to do the
following: Take the code in module A, compile it with the JIT (since
we cannot make libraries in Windows), and save the resulting binary
goo in some file. Later (in a different instance of the runtime), with
some much smaller sized loader, read in the file and link that code to
the runtime. The platforms we care about this working on is the x86,
ppc, and sparc v9.

While it would be nice to be able to build a runtime that only
contains the absolute minimal, this seems rather non-trivial, but
maybe I am wrong. If this is indeed involved, fraught with danger (ie,
would make it impossible to debug) etc, let me know as well.If there
is some other clever (or obvious) way to accomplish the same thing,
I'd love to hear any ideas.

Would it be possible (ie, relatively straitforward) to do the
following: Take the code in module A, compile it with the JIT (since
we cannot make libraries in Windows), and save the resulting binary
goo in some file. Later (in a different instance of the runtime), with
some much smaller sized loader, read in the file and link that code to
the runtime. The platforms we care about this working on is the x86,
ppc, and sparc v9.

Yes, this would be straight-forward. The output of the code generator is a bunch of bytes of machine code and relocations to perform on it. This is exactly what you would need to do this.

While it would be nice to be able to build a runtime that only
contains the absolute minimal, this seems rather non-trivial, but
maybe I am wrong. If this is indeed involved, fraught with danger (ie,
would make it impossible to debug) etc, let me know as well.If there
is some other clever (or obvious) way to accomplish the same thing,
I'd love to hear any ideas.

Another option might be to just fix building .dll's on windows :slight_smile:

-Chris

While it would be nice to be able to build a runtime that only
contains the absolute minimal, this seems rather non-trivial, but
maybe I am wrong. If this is indeed involved, fraught with danger (ie,
would make it impossible to debug) etc, let me know as well.If there
is some other clever (or obvious) way to accomplish the same thing,
I'd love to hear any ideas.

Another option might be to just fix building .dll's on windows :slight_smile:

I will be working on a COFF backend after the NASMW one. Then possibly direct PE generation.

Thats the plan anyway, if I can get the LLVM C++ frontend built otherwise I will have to skip doing that and use the older precompiled binary frontend. This is what is delaying me from getting on at the moment.

Testing is going to be the biggy. Doing proper ABI support against MSC's calling conventions.

Aaron

It looks like I am going to wind up writing something like this. Will
you guys accept a patch?

It seems like the result will be a much-simplified version of the
JIT/JIT.cpp and JIT/JITEmitter.cpp, as well as some utility to write
the code and relocations into a file.

There will also need to be a loader, which will depend on as little of
the framework as possible.

This seems fairly usefull - it breaks the reliance on the system
assembler for dynamic systems. What do you guys think?

Alexander,

Yes, a patch like that would be accepted. Fewer dependencies = good :slight_smile:

Some notes on doing this:

(1) Please make sure you use the std c++ iostream libraries for doing
I/O. No native calls (we end up with portability problems). If you need
something that must be ported, please add it to lib/System

(2) You should also use the sys::Path class (include/llvm/System/Path.h)
for handling paths in a platform independent way.

(3) Your patch should be an incremental add to LLVM, not removing any
existing JIT functionality.

If it is warranted, consider creating a bugzilla bug to track this as it
makes our life easier for release notes, patch collection, etc.
Otherwise, just send the patch to this list.

Thanks,

Reid.

Alexander,

Yes, a patch like that would be accepted. Fewer dependencies = good :slight_smile:

Some notes on doing this:

(1) Please make sure you use the std c++ iostream libraries for doing
I/O. No native calls (we end up with portability problems). If you need
something that must be ported, please add it to lib/System

Sure. What counts as a 'native call' ? :slight_smile:

(2) You should also use the sys::Path class (include/llvm/System/Path.h)
for handling paths in a platform independent way.

(3) Your patch should be an incremental add to LLVM, not removing any
existing JIT functionality.

I was actually going to make something orthogonal to the JIT. It would
not support nearly as much functionality as the JIT (though it would
have similar code).

Here is a very loose use case - does this seem reasonable?

> Alexander,
>
> Yes, a patch like that would be accepted. Fewer dependencies = good :slight_smile:
>
> Some notes on doing this:
>
> (1) Please make sure you use the std c++ iostream libraries for doing
> I/O. No native calls (we end up with portability problems). If you need
> something that must be ported, please add it to lib/System

Sure. What counts as a 'native call' ? :slight_smile:

Sorry. "native" as in anything specific to the operating system. For
example, if you need to #include <sys/...> then its probably "native".

> (2) You should also use the sys::Path class (include/llvm/System/Path.h)
> for handling paths in a platform independent way.
>
> (3) Your patch should be an incremental add to LLVM, not removing any
> existing JIT functionality.

I was actually going to make something orthogonal to the JIT. It would
not support nearly as much functionality as the JIT (though it would
have similar code).

Actually, that's even better as its a completely separate thing.

Here is a very loose use case - does this seem reasonable?

****

  #include modulecompiler

  #include module_loader

  char * countdown_function =
    "int %countdown (int %AnArg) {\n"
    " %result = call fastcc int %foo-int (int %AnArg) \n"
    " ret int %result\n"
    "}\n"
    "fastcc int %foo-int (int %AnArg) {\n"
    "EntryBlock:\n"
    " %cond = setle int %AnArg, 2\n"
    " br bool %cond, label %return, label %recur\n"
    "return:\n"
    " ret int 1\n"
    "recur:\n"
    " %sub1 = sub int %AnArg, 1\n"
    " %result = tail call fastcc int %foo-int (int %sub1)\n"
    " ret int %result\n"
    "}\n"

  Module * M = parseAsmText(countdown_function);

  RelocatableCode * code = compileModule(M);

  WriteRelocatableCodeToFile (code,"file");

  // now load. This part should be as light as possible - that is, very small
  // in size of the compiled binary. The resulting "module" is fairly static, not like
  // standard llvm modules.

  // should be able to get pointers to functions and change value of
  // global pointers.

  NativeModule * mod = ReadAndRelocateCodeFromFile ("file");

  int (* countdown) (int);

  countdown = (int (*)(int)) GetPointerToFunction("countdown",mod);

  printf("result of countdown: %d\n", countdown (1000000));

***

The above looks pretty succinct to me. The only comment I have is that
you might want to wrap the new functions in a C++ class. At the very
least, please declare them in the llvm namespace.

There are a few problems that I need some input on.

First, I want the loader to be as tiny as possible. So, I don't want
to link in VMcore and friends. Is it possible to just link in
selected object files instead of entire libraries?

Yes, but it will require a makefile change. Right now VMCore only builds
a re-linked object (all object files coalesced into one bigger one).
THat's not what you want. You want to build an archive so that the
individual object files are linked in, and only what you need. However,
I'm pretty certain that because of inter-dependencies you'll probably
end up with all or most of VMCore anyways unless you're using only a few
very specific items. To build an archive library, just add:

ARCHIVE_LIBRARY := 1

to the makefile at the top and then link with "LLVMCore.a" instead of
just "LLVMCore".

Second, there is functionality that the loader needs to have that
depends on VMCore, but doesn't actually need it for my purposes. The
main thing is the 'relocate' function in each (System)JITInfo.cpp
file. I would like to be able to get the correct JITInfo object
(really just the function) without having to link in extra stuff,
instantiate modules,targets, etc. Ideally this would not require
duplicating any code :slight_smile: How does one go about doing this?

You'd have to break the pieces out that you need into separate
compilation units (.cpp files), ensure there are no dependencies on the
rest of the definitions, and then use the ARCHIVE_LIBRARY trick above to
ensure you only load what's necessary to resolve symbols.

FYI, you might want to try out our IRC channel where people hang out
most of the time .. might get you some answers faster and allow us to
discuss this more effectively than by email.

Just put irc://irc.oftc.net/llvm into Mozilla if you have it.

Reid

Whoops, I screwed up on my advice there. The keyword to build a an
arvhive library is actually BUILD_ARCHIVE, strangely enough. PLease use

BUILD_ARCHIVE := 1

Thanks,

Reid.

something that must be ported, please add it to lib/System

Sure. What counts as a 'native call' ? :slight_smile:

You are allowed to use anything in the *C++* standard library without fear. This includes <cstdlib> and friends. Anything not in the C++ standard must be dealt with by lib/System.

(3) Your patch should be an incremental add to LLVM, not removing any
existing JIT functionality.

I was actually going to make something orthogonal to the JIT. It would
not support nearly as much functionality as the JIT (though it would
have similar code).

Yeah, that makes sense.

Here is a very loose use case - does this seem reasonable?

****

#include modulecompiler

#include module_loader

char * countdown_function =
   "int %countdown (int %AnArg) {\n"
   " %result = call fastcc int %foo-int (int %AnArg) \n"
   " ret int %result\n"
   "}\n"
   "fastcc int %foo-int (int %AnArg) {\n"
   "EntryBlock:\n"
   " %cond = setle int %AnArg, 2\n"
   " br bool %cond, label %return, label %recur\n"
   "return:\n"
   " ret int 1\n"
   "recur:\n"
   " %sub1 = sub int %AnArg, 1\n"
   " %result = tail call fastcc int %foo-int (int %sub1)\n"
   " ret int %result\n"
   "}\n"

Module * M = parseAsmText(countdown_function);

RelocatableCode * code = compileModule(M);

WriteRelocatableCodeToFile (code,"file");

Yes, something like this makes sense.

// now load. This part should be as light as possible - that is, very small
// in size of the compiled binary. The resulting "module" is fairly static, not like
// standard llvm modules.

// should be able to get pointers to functions and change value of
// global pointers.

What do you mean change the value of global pointers? You mean external globals or the address of things defined in the module? The first case shouldn't be hard, the later might be tougher (not sure).

NativeModule * mod = ReadAndRelocateCodeFromFile ("file");

Proba also needs some way to get information on the address of symbols.

int (* countdown) (int);

countdown = (int (*)(int)) GetPointerToFunction("countdown",mod);

printf("result of countdown: %d\n", countdown (1000000));

This makes sense.

***

There are a few problems that I need some input on.

First, I want the loader to be as tiny as possible. So, I don't want
to link in VMcore and friends. Is it possible to just link in
selected object files instead of entire libraries?

No not really.

I think what you really want to do is write two things:

1. A traditional LLVM library that look similar to the JIT stuff that
    writes out the file. This would write it out in some well-defined
    format that preserves the code and the relocations.
2. A very simple and light-weight library (maybe written in C?) that has
    no dependencies on the other llvm libraries, including VMCore, Support
    and maybe lib/system.

    This library just mmaps the well-defined file written by #1, applies
    the relocations, and away it goes. The problems with this are: a) it
    will have to be ported to different platforms if it doesn't use
    lib/system. b) it will have to be ported to different targets if it
    doesn't use the target-specific lib/Target support for relocations.

At this stage, the natural question is: why write #2 at all?

In particular, if your well-defined format written by #1 is something like, say, the ELF .so format, you just turn #2 into dlopen/dlsym and friends (which just mmaps and relocates). This seems like the right way to go unless you have a strong reason not to. This also solves the problems with #2 listed above.

Second, there is functionality that the loader needs to have that
depends on VMCore, but doesn't actually need it for my purposes. The
main thing is the 'relocate' function in each (System)JITInfo.cpp
file.

Yes, this is really a symptom of #2 above. The harder part is the target-specific stuff that knows how to apply the target-specific relocations.

I would like to be able to get the correct JITInfo object (really just the function) without having to link in extra stuff, instantiate modules,targets, etc. Ideally this would not require duplicating any code :slight_smile: How does one go about doing this?

No good way. :frowning:

-Chris