Primer with LLVM

Hi, everybody:

I am a beginner with LLVM, in fact today was the first day that I use it.

I have several questions about LLVM:

Can I use LLVM to compile several files (bytecode), scripts (char*) and link
them with external libraries generating *only* one executable (all in
memory)?

Can I invoke externals functions from a guest (LLVM generated) code which
exist in the host code (the code that execute the first one)?

And a problem:

I made a little aplication with use the functions llvm::LinkModules and
llvm::LinkLibraries that, I suppouse, exist in libLLVMLinker.a archive, but
gcc linker reports 'undefined' errors for this functions. I don't use the
makefile system from LLVM because is so complex to incorporate in this point
of my work that it isn't usefull at the moment. I use the libraries directly
in my own Makefile. I tried to explicitly insert **all** LLVM libraries and
it doesn't work :frowning:

Does anybody have got a example of an aplication which uses these functions
to compile and execute in memory a multi-file application? (lli isn't useful
for me)

Thanks in advance and apologize so basic questions.

Hi, everybody:

Hi Francisco

I am a beginner with LLVM, in fact today was the first day that I use it.

Welcome!

I have several questions about LLVM:

If you haven't already, a good place to start is the Getting Started
Guide, at http://llvm.cs.uiuc.edu/docs/GettingStarted.html

Can I use LLVM to compile several files (bytecode), scripts (char*) and link
them with external libraries generating *only* one executable (all in
memory)?

I'm not exactly sure what you're trying to do, but I believe you can do
what you want. The llvm-link tool will link together multiple bytecode
files. lli will generate code in memory (JIT compilation) and execute
it. Since the bulk of the implementation of these tools is in the LLVM
libraries, you can create your own process to do the same sorts of
things.

I'm not quite sure what you mean by "scripts (char*)" but if you mean
LLVM assembly then this is possible to. The ParseAssemblyFile function
(include/llvm/Assembly/Parser.h) can be used to turn an LLVM assembly
into a Module* which is then suitable for linking via the linker
interface (include/llvm/Linker.h).

Can I invoke externals functions from a guest (LLVM generated) code which
exist in the host code (the code that execute the first one)?

Yes. The lli interpreter/JIT compiler will resolve symbols within its
own executable or dynamically via a loadable plug-in (-load option).
Some platforms don't support this currently (Cygwin for example).

And a problem:

I made a little aplication with use the functions llvm::LinkModules and
llvm::LinkLibraries that, I suppouse, exist in libLLVMLinker.a archive,

that is correct.

but
gcc linker reports 'undefined' errors for this functions.

The only thing I can think of is dependencies that libLLVMLinker has.
You will probably want a linke line that looks something like:

gcc -o myapp myapp.o -lLLVMLinker -lLLVMArchive -lLLVMBCReader \
      -lLLVMBCWriter -lLLVMCore -lLLVMSupport -lLLVMbzip2 -lLLVMSystem

To get examples of these library specifications, look at the llvm-ld and
gccld tools' Makefiles.

I don't use the
makefile system from LLVM because is so complex to incorporate in this point
of my work that it isn't usefull at the moment.

That's unfortunate to hear. We've tried to make it as dead simple as
possible. The typical end-user Makefile for LLVM is about 3 or 4 lines
long. What do you find confusing? Have you read the Makefile Guide?

I use the libraries directly
in my own Makefile. I tried to explicitly insert **all** LLVM libraries and
it doesn't work :frowning:

Order of specification of the libraries on the link command is
important. There are numerous inter-dependencies amongst the libraries
and you need to understand them before this kind of approach will work.
Its unlikely that you would want *all* LLVM libraries in any tool.

Does anybody have got a example of an aplication which uses these functions
to compile and execute in memory a multi-file application? (lli isn't useful
for me)

The closest thing we have is the HowToUseJIT example program. I believe
Duraid Madina is working on something similar. You might want to check
with him (duraid at octopus dot com dot au)

Thanks in advance and apologize so basic questions.

Glad to help. I hope you find LLVM useful.

Reid

Francisco,

I just added a section to the Using Libraries document that shows which
libraries depend on which other libraries. You might find this useful.
Please see:

http://llvm.cs.uiuc.edu/docs/UsingLibraries.html#dependencies

Reid

Hi again, and thanks (Reid) for your fast response:

Yes, it works!!! Only changing the order of libraries in the Makefile.

Nowaday I have my software with the capability of compile assembly, bytecode
(from buffer and file) and link them with a set of libraries. It seems to
work perfectly (I don't generate code yet).

My real aim is to have a process (host) with execute several no-jit binaries
(guest) each one in his own thread (not forked!! each one with a main
function). Guest and host have interdependencies in both ways. This is a
part of my doctoral degree.

Before now, I was using TCC (Tiny C Compiler), but it have a big lack: it
isn't reentrant. It can only have one generated program at time. For this
reason I agree to use LLVM (two days ago) and rebuild all the project. 8-|

LLVM Makefile systen is great!! But I have no time to change nowadays all my
current Makefiles. Maybe later.

I have noticed that LLVM have memory laks; exactly a poor main with a simple
"return 0;" linked with the LLVM libraries report memory laks using
valgrind. Is it normal?

Now I have other problem: I have a Module and I need generate a iostream
(memory) with native x86 code (maybe elf/coff) to be executed later (into
the guest process space, without fork!!). I studied llc and lli, but they
don't help me much. Any idea? Are there any guy working in some like that?

A sample code will be greatfully :slight_smile:

Hi again, and thanks (Reid) for your fast response:

Glad to help.

Yes, it works!!! Only changing the order of libraries in the Makefile.

Great!

Nowaday I have my software with the capability of compile assembly, bytecode
(from buffer and file) and link them with a set of libraries. It seems to
work perfectly (I don't generate code yet).

Cool. Next step: generate code! :slight_smile:

My real aim is to have a process (host) with execute several no-jit binaries
(guest) each one in his own thread (not forked!! each one with a main
function). Guest and host have interdependencies in both ways. This is a
part of my doctoral degree.

Before now, I was using TCC (Tiny C Compiler), but it have a big lack: it
isn't reentrant. It can only have one generated program at time. For this
reason I agree to use LLVM (two days ago) and rebuild all the project. 8-|

Careful here. Much of LLVM is not re-entrant (yet!) either. However, if
you're written your own execution engine and made it synchronize wisely,
you should be okay. Making LLVM re-entrant is on the "to do" list, we
just haven't had the time/manpower yet. If you find things in LLVM that
would help make it more re-entrant, we'd gladly accept patches for it.

LLVM Makefile system is great!!

:slight_smile:

But I have no time to change nowadays all my
current Makefiles. Maybe later.

Okay, sounds reasonable.

I have noticed that LLVM have memory leaks; exactly a poor main with a simple
"return 0;" linked with the LLVM libraries report memory laks using
valgrind. Is it normal?

Yes, unfortunately it is. LLVM uses quite a few static variables because
they are needed throughout the program's life span. They are constructed
at static initialization time and never modified or freed so they show
up as "leaks" in valgrind. Generally this isn't a problem. I would like
to see this cleared up eventually too; its one of the stumbling blocks
of making LLVM re-entrant as well. However, several important features
of the LLVM Core (like the type system) depend on global variables. For
example, a Type* for any given type is allocated exactly once so that
types can be compared by pointer equality. To make that magic happen, we
need to keep track of types globally and a static variable is currently
used for that.

Now I have other problem: I have a Module and I need generate a iostream
(memory) with native x86 code (maybe elf/coff) to be executed later (into
the guest process space, without fork!!). I studied llc and lli, but they
don't help me much. Any idea? Are there any guy working in some like that?

There's two approaches that could be used here:

1. lli-style. Save the Module as bytecode and dynamically compile it at
    runtime. Your thread would: read the bytecode, convert it to x86
    code in memory and then execute it directly from memory.

2. llc-style. Use llc to convert the module to assembly code, then use
    GCC (or as and ld with a little more effort) to create a dynamically
    loadable shared object (.so or .dll). You can then instantiate a
    DynamicLibrary object to load your compiled library into your
    thread (include/llvm/System/DynamicLibrary.h). Once loaded, your
    thread can use DynamicLibrary::GetAddressOfSymbol to find "main" and
    start executing it.

A sample code will be greatfully :slight_smile:

Unfortunately, I don't know of any code that does this. bugpoint is
close, but it uses fork(2).

Reid.

In addition to what Reid said, I would like to point out that
llvm/tools/lli/lli.cpp is not the entire JIT compiler, a lot of it lives
in llvm/lib/ExecutionEngine/* and llvm/lib/ExecutionEngine/JIT/* .
Together, they provide a lot of the machinery that you're looking for.

The other point is that the JIT, when it writes out code to memory, is
no longer relocatable, i.e. it's bound to the addresses where it's
written, with all symbol references resolved. LLVM does support
relocations, but if you need to save binary code and then load it in
different address spaces, or write out elf/coff binaries directly, you
will need to do some work such as creating a BinaryObjectCodeEmitter (to
parallel the current MachineCodeEmitter) and properly write out the
correct binary image, with relocations intact.

Really JIT isn't my goal. I prefer use a native execution engine; and ok, I
don't need save the generated Module so it ever lives in memory, so if LLVM
doesn't generate relocatable code it's fine for me.

About the reentrant lacks of LLVM, I can convert my own code - which build
the Module - into a critical section so I think it is enough; but I need to
know if several independent Modules can live in the same address space
running at same time (I agree).

Would be great if we append into the documentation several "patters" to show
how perform with LLVM. It would accelerate the learn curve for beginners
like me, avoiding basic errors and mistakes. If I reach a good level with
LLVM I can make these.

Have you a sample to generate code into memory?

To move LLVM into a reentrant library we could move all global symbols into
a structure which would be the LLVM context. All (ok, not all) constructor
classes would have as first parameter this context, so it would be easy
lock/unlock the context and allow critical sections in the code.

Happy new year :slight_smile:

Would be great if we append into the documentation several "patters"
to show how perform with LLVM. It would accelerate the learn curve for
beginners like me, avoiding basic errors and mistakes. If I reach a
good level with LLVM I can make these.

I'm not sure if I understand what you mean. Are you looking for an
"LLVM programmer's guide"? If so, we have one of those:

  http://llvm.cs.uiuc.edu/docs/ProgrammersManual.html

There are a lot of documents, both for extending LLVM and building new
projects _with_ LLVM here:

  http://llvm.cs.uiuc.edu/docs/

Have you a sample to generate code into memory?

Take a look at llvm/examples/HowToUseJIT and llvm/examples/Fibonacci in
the LLVM distribution, or online via cvsweb.

> Would be great if we append into the documentation several "patters"
> to show how perform with LLVM. It would accelerate the learn curve for
> beginners like me, avoiding basic errors and mistakes. If I reach a
> good level with LLVM I can make these.

I'm not sure if I understand what you mean. Are you looking for an
"LLVM programmer's guide"? If so, we have one of those:

  http://llvm.cs.uiuc.edu/docs/ProgrammersManual.html

I am a recently LLVM user, and I found that LLVM is hard to use at first
sight. It would be nice if people like me can have documents like "howto"
but code-oriented (versus manual-oriented). Something like code extracts
with common (and basic) problems.

Last URL is a document to generate code from scratch until a "Module", and I
need generate it from a "Module" thru native code....

There are a lot of documents, both for extending LLVM and building new
projects _with_ LLVM here:

  http://llvm.cs.uiuc.edu/docs/

... and nothing of these cover this topic.

(If I am wrong get right me!!)

> Have you a sample to generate code into memory?

Take a look at llvm/examples/HowToUseJIT and llvm/examples/Fibonacci in
the LLVM distribution, or online via cvsweb.

Looking................ (each dot is a minute)

Well, these examples are JIT-based. I really need generate into memory
native code, extract the address of a function (aka "main") and execute it.

--
Misha Brukman :: http://misha.brukman.net :: http://llvm.cs.uiuc.edu

Apologize my inexperience with LLVM, but I *really* don't find an example to
generate code into memory from a "Module".

Would be great if we append into the documentation several "patters"
to show how perform with LLVM. It would accelerate the learn curve for
beginners like me, avoiding basic errors and mistakes. If I reach a
good level with LLVM I can make these.

I'm not sure if I understand what you mean. Are you looking for an
"LLVM programmer's guide"? If so, we have one of those:

  http://llvm.cs.uiuc.edu/docs/ProgrammersManual.html

I am a recently LLVM user, and I found that LLVM is hard to use at first
sight. It would be nice if people like me can have documents like "howto"
but code-oriented (versus manual-oriented). Something like code extracts
with common (and basic) problems.

That is exactly what the ProgrammersManual *is*. In particular, this section:
http://llvm.cs.uiuc.edu/docs/ProgrammersManual.html#common

is exactly the sort of thing you're looking for. Note that we don't have every possible subject documented, so if you see a missing subject, please send in a patch to the documentation.

Last URL is a document to generate code from scratch until a "Module", and I
need generate it from a "Module" thru native code....

No it's not. Working with LLVM Module's are the same no matter if you do it runtime, compile time, are generating C, LLVM, or native machine code. All of the same principles apply.

There are a lot of documents, both for extending LLVM and building new
projects _with_ LLVM here:

  http://llvm.cs.uiuc.edu/docs/

... and nothing of these cover this topic.

(If I am wrong get right me!!)

I'm not sure I understand exactly what you want to do, so let me restate your objective to make sure I have it correct. You want to build a representation of a program in memory, then emit machine code to memory, then get the address of the entry point and call it. Is this correct?

Have you a sample to generate code into memory?

Take a look at llvm/examples/HowToUseJIT and llvm/examples/Fibonacci in
the LLVM distribution, or online via cvsweb.

Looking................ (each dot is a minute)

Well, these examples are JIT-based. I really need generate into memory
native code, extract the address of a function (aka "main") and execute it.

If my characterization of your goals (above) is correct, the JIT is exactly what you want.

--
Misha Brukman :: http://misha.brukman.net :: http://llvm.cs.uiuc.edu

Apologize my inexperience with LLVM, but I *really* don't find an example to
generate code into memory from a "Module".

That is the JIT.

-Chris

>>> Would be great if we append into the documentation several "patters"
>>> to show how perform with LLVM. It would accelerate the learn curve for
>>> beginners like me, avoiding basic errors and mistakes. If I reach a
>>> good level with LLVM I can make these.
>>
>> I'm not sure if I understand what you mean. Are you looking for an
>> "LLVM programmer's guide"? If so, we have one of those:
>>
>> http://llvm.cs.uiuc.edu/docs/ProgrammersManual.html
>

It is not that I need, this document shows how to generate a representation
of code, and I start my work from a bytecode/assembly file (maybe in
memory).

In pseudo code:

(0) Create an empty "Module"
(1) Load a source file (which contains byte/assembly code) maybe from memory
(2) Compile it into a "Module"
(3) Link last one with first one.
(4) Go to step (1) until no more sources
//In this point I have a "Module" which contains the byte code
//from the sources
(5) Generate native (x86) code from generated module
(6) Localize entry point (¿main?)
(7) Execute it

(Apologize my English)

I have points 0-4 working, but I am confused about point 5 and maybe 6.

> I am a recently LLVM user, and I found that LLVM is hard to use at first
> sight. It would be nice if people like me can have documents like
"howto"
> but code-oriented (versus manual-oriented). Something like code extracts
> with common (and basic) problems.

That is exactly what the ProgrammersManual *is*. In particular, this
section:
http://llvm.cs.uiuc.edu/docs/ProgrammersManual.html#common

Yes, i agree. But only with one LLVM state; not with compile, link and
execute stages.

is exactly the sort of thing you're looking for. Note that we don't have
every possible subject documented, so if you see a missing subject, please
send in a patch to the documentation.

Ok.

> Last URL is a document to generate code from scratch until a "Module",
and I
> need generate it from a "Module" thru native code....

No it's not. Working with LLVM Module's are the same no matter if you do
it runtime, compile time, are generating C, LLVM, or native machine code.
All of the same principles apply.

>> There are a lot of documents, both for extending LLVM and building new
>> projects _with_ LLVM here:
>>
>> http://llvm.cs.uiuc.edu/docs/
>
> ... and nothing of these cover this topic.
>
> (If I am wrong get right me!!)

I'm not sure I understand exactly what you want to do, so let me restate
your objective to make sure I have it correct. You want to build a
representation of a program in memory, then emit machine code to memory,
then get the address of the entry point and call it. Is this correct?

All ok, except first step. I don't need build the program representation; in
fact I read it from file/memory (bytecode/assembly).

>>> Have you a sample to generate code into memory?
>>
>> Take a look at llvm/examples/HowToUseJIT and llvm/examples/Fibonacci in
>> the LLVM distribution, or online via cvsweb.
>>
>
> Looking................ (each dot is a minute)
>
> Well, these examples are JIT-based. I really need generate into memory
> native code, extract the address of a function (aka "main") and execute
it.

If my characterization of your goals (above) is correct, the JIT is
exactly what you want.

But JIT generate code **Just In Time**, and it is slower than generate all
at once and then execute it. I need a real compiler not an execution
environment. Last one will be build by me.

>> --
>> Misha Brukman :: http://misha.brukman.net :: http://llvm.cs.uiuc.edu
>
> Apologize my inexperience with LLVM, but I *really* don't find an
example to
> generate code into memory from a "Module".

That is the JIT.

¿sure?

> Now I have other problem: I have a Module and I need generate a iostream
> (memory) with native x86 code (maybe elf/coff) to be executed later
(into
> the guest process space, without fork!!). I studied llc and lli, but
they
> don't help me much. Any idea? Are there any guy working in some like
that?

There's two approaches that could be used here:

1. lli-style. Save the Module as bytecode and dynamically compile it at
    runtime. Your thread would: read the bytecode, convert it to x86
    code in memory and then execute it directly from memory.

Dear developers:

What are the classes evolved in to convert bytecode to x86 code directly?
Without Just In Time (all at once).

I have points 0-4 working, but I am confused about point 5 and maybe 6.

[snip and reorder]

(5) Generate native (x86) code from generated module

The JIT currently is built to generate native code for a given module, a
function-at-a-time. That means that first, main() is generated, and
anything main() calls is not. As soon as main() calls anything that is
in the Module that is NOT yet code-generated, it will be code-generated
on-the-fly (aka just-in-time).

If you want to generate ALL the code for the entire Module at once, you
will have to do one of the following:

1. a) Compile to .asm file (using something like LLC)
   b) Assemble the code using system assembler -> .o
   c) Link it -> executable
   d) Run executable

OR

2. Modify the JIT to not run a function-at-a-time, but generate ALL the
code for ALL the functions. Note that this isn't supported at this
time, so you will have to modify the ExecutionEngine
(llvm/lib/ExecutionEngine/* and llvm/lib/ExecutionEngine/JIT/*) to do
this.

(6) Localize entry point (¿main?)
(7) Execute it

llvm/tools/lli/lli.cpp ->
    Function *Fn = MP->getModule()->getMainFunction();
    if (!Fn) {
      std::cerr << "'main' function not found in module.\n";
      return -1;
    }
    
    // Run main...
    int Result = EE->runFunctionAsMain(Fn, InputArgv, envp);

See

llvm/lib/ExecutionEngine/ExecutionEngine.cpp for runFunctionAsMain()

There is currently no such thing. Please see the other email I just
sent for details as to what your choices are.

> I have points 0-4 working, but I am confused about point 5 and maybe 6.
[snip and reorder]
> (5) Generate native (x86) code from generated module

The JIT currently is built to generate native code for a given module, a
function-at-a-time. That means that first, main() is generated, and
anything main() calls is not. As soon as main() calls anything that is
in the Module that is NOT yet code-generated, it will be code-generated
on-the-fly (aka just-in-time).

Yes, i know.

If you want to generate ALL the code for the entire Module at once, you
will have to do one of the following:

1. a) Compile to .asm file (using something like LLC)
   b) Assemble the code using system assembler -> .o
   c) Link it -> executable
   d) Run executable

Slowest.

OR

2. Modify the JIT to not run a function-at-a-time, but generate ALL the
code for ALL the functions. Note that this isn't supported at this
time, so you will have to modify the ExecutionEngine
(llvm/lib/ExecutionEngine/* and llvm/lib/ExecutionEngine/JIT/*) to do
this.

:frowning:

Well, i supposed that LLVM could generate native code: a raw memory segment
that contains machine code of a whole module.

> (6) Localize entry point (¿main?)
> (7) Execute it

llvm/tools/lli/lli.cpp ->
    Function *Fn = MP->getModule()->getMainFunction();
    if (!Fn) {
      std::cerr << "'main' function not found in module.\n";
      return -1;
    }

    // Run main...
    int Result = EE->runFunctionAsMain(Fn, InputArgv, envp);

Yes, yes, I know it.

> What are the classes evolved in to convert bytecode to x86 code
> directly? Without Just In Time (all at once).

There is currently no such thing. Please see the other email I just
sent for details as to what your choices are.

Last question (I promise you):

Can LLVM generated ELF or COFF binaries from a Module into an iostream?

1. a) Compile to .asm file (using something like LLC)

...

2. Modify the JIT to not run a function-at-a-time, but generate ALL the
code for ALL the functions. Note that this isn't supported at this

...

OR, call ExecutionEngine::getPointerToFunction on every function in the module, causing them to be compiled before the code starts.

-Chris

No, not yet. It is something we would like to do at some point, but haven't implemented yet.

-Chris