LLVM Newb: Getting started

A few days ago Craig Black made the suggestion in the D newsgroup,
that someone creates a D <http://www.digitalmars.com/d/index.html&gt;
frontend for LLVM. Never having heard of LLVM in the past, I
immediately got captured by its design when I've read the
documentation. I was always scared by GCC - a great piece of
software, but horribly bad documented, and in it's own way not very
KISS.

Since I'm currently working on a clean ABI draft for D I thought, that
implementing a D frontend for LLVM would also be a good test case for
my ABI drafts and so I agreed to implement that frontend.

Today I've built LLVM from source (however not the GCC frontend yet)
and played a bit with the assembly language, and it brought up the
following questions:

* How can I define parts of the ABI that cover calling conventions?

* Which is the LLVM way of creating object files?
From what I've tried out yet, the llvm-ld will generate a executable.
llc produces target specific assembly; there is an option to output
object files, but this crashes on my system.

* Which is the best way to add support for new object file types. One
thing that a lot of people in the D community anticipate is the
possibility to compile modules into self contained objects, which can
be redistributed without additional header files, and be linked
either statically or dynamically into the final program. Kinda like
Pascal units. This of course requires to put additional information
into the created binary. And I'd like to try, to let the thing to
create CLI binaries
<http://www.ecma-international.org/publications/standards/Ecma-335.htm&gt;\.

And last but not least: Which parts of the documentation must I
definitely read in the first place to get started to hack LLVM (I'll
read all of course, but I'd like to focus right now).

Hi Wolfgang,

A few days ago Craig Black made the suggestion in the D newsgroup,
that someone creates a D <http://www.digitalmars.com/d/index.html&gt;
frontend for LLVM. Never having heard of LLVM in the past, I
immediately got captured by its design when I've read the
documentation. I was always scared by GCC - a great piece of
software, but horribly bad documented, and in it's own way not very
KISS.

Welcome to LLVM :slight_smile:

Since I'm currently working on a clean ABI draft for D I thought, that
implementing a D frontend for LLVM would also be a good test case for
my ABI drafts and so I agreed to implement that frontend.

Wonderful!

Today I've built LLVM from source (however not the GCC frontend yet)

If you're making your own front end, you probably won't need it :slight_smile:

and played a bit with the assembly language, and it brought up the
following questions:

* How can I define parts of the ABI that cover calling conventions?

LLVM supports calling conventions, if that's what you mean. You can even
create your own, but you'd have to implement them in the code
generators.

* Which is the LLVM way of creating object files?
From what I've tried out yet, the llvm-ld will generate a executable.
llc produces target specific assembly; there is an option to output
object files, but this crashes on my system.

llvm-ld should generate an executable, but alas it is an unfinished
piece of work.
generally what we do is:

llc input.bc -o output.s
gcc output.s -o program

I've been meaning to finish the llvm-ld and llvmc tools for some time
now but other things have consistently gotten in the way. One of these
days they'll be working.

Note that you should be able to write a "D" configure file for llvmc
that sets up the various stages for compilation of D programs through
linking. Unfortunately, there isn't much good documentation on it and
the file format is not good. There's an open bug to clean this all up,
but it is not, as yet, implemented.

* Which is the best way to add support for new object file types. One
thing that a lot of people in the D community anticipate is the
possibility to compile modules into self contained objects, which can
be redistributed without additional header files, and be linked
either statically or dynamically into the final program. Kinda like
Pascal units. This of course requires to put additional information
into the created binary.

If you can create platform independent code from D, then you should be
able to just use the LLVM bytecode representation. Most of the tools put
this format out. Note that LLVM bytecode files are only as platform
independent as the language front end used to create them. The bytecode
files can be distributed and then targeted to any of the platforms that
LLVM supports. The bytecode format even retains which libraries it still
needs to link against so you should be able to turn the bytecode into an
executable fairly readily.

And I'd like to try, to let the thing to
create CLI binaries
<http://www.ecma-international.org/publications/standards/Ecma-335.htm&gt;\.

Sounds interesting. How much like Microsoft CLR is it? You could
probably create a backend for CLI quite readily. See the "C" Backend
that turns LLVM IR into C99 code. You can see it here:
http://llvm.org/cvsweb/cvsweb.cgi/llvm/lib/Target/CBackend/Writer.cpp?rev=1.282&content-type=text/x-cvsweb-markup

Writing a backend to support CLI output should be on the same order of
magnitude as that file. Its not particularly difficult due to the
regularity/simplicity of the LLVM IR.

And last but not least: Which parts of the documentation must I
definitely read in the first place to get started to hack LLVM (I'll
read all of course, but I'd like to focus right now).

Try this:

http://llvm.org/pubs/2006-04-25-GelatoLLVMIntro.html
http://llvm.org/docs/GettingStarted.html
http://llvm.org/docs/LangRef.html
http://llvm.org/docs/CommandGuide/index.html
http://llvm.org/docs/ProgrammersManual.html
http://llvm.org/docs/GetElementPtr.html
http://llvm.org/docs/FAQ.html

Reid.

If you're making your own front end, you probably won't need it :slight_smile:

Well, I wanted to play around with it, so see, how my older programs
perform with it.

So far I managed to get some programs running by folowing scheme:

for src in $infiles; do
llvm-gcc -o $src.bc -c $src ;
opt -f -o $src.bc.opt $src.bc ;
llc -f $src.bc.opt ;
gcc -c $src.bc.opt.s ;
done

gcc -o $name *.o

However until now I always had to compile the source file with the
main function directly with gcc, so that the __main function (to
initialize the globals and static constructors) gets created.

I did not manage to make llvm-gcc create __main.

If you can create platform independent code from D, then you should
be able to just use the LLVM bytecode representation.

D itself is platform independent, but since D is also aimed at system
programming it has an inline asembler. However one can supply
different kinds of assembler through conditional comilation, so the
following would be completely valid:

void foo(){
    version(x86) {
        asm {
/* x86 assembler */
        }
    } else version(llvm) {
        asm {
/* llvm assembler */
        }
    } else {
/* plain D implementation */
    }
}

<http://www.digitalmars.com/d/iasm.html&gt;
<http://www.digitalmars.com/d/version.html&gt;

Any ideas, how this could be integrated with LLVM? The IMHO most naive
approach would be to add labels where the asm begins and ends. Then
the plattform dependent assembler being parted from the code llvm
gets to see and being processed by an external assembler into object
code. Later in the process the linker then reinserts the assembler
object code.

If you don't mind the output being target specific, you can just put
the inline asm into the llvm bytecode files. LLVM has growing support
for inline asm; there is no reason to split them out, unless you want
to capture all the above versions of the asm and not pick one until
late in compilation.

Andrew

Hello, Wolfgang.

I did not manage to make llvm-gcc create __main.

Code for static construction & destruction is highly platform dependent.
Usually it's placed in crt.o binary, which is compiled during main gcc
build cycle.

For example, for Linux we should only output some code in specialy named
sections and this code will automatically called by system loader. As
opposite, for mingw32 we're emitting "__main" directly.

You can change default behaviour by changing
X86DAGToDAGISel::EmitSpecialCodeForMain() routine located in the
lib/Target/X86/X86ISelDAGToDAG.cpp file (I'm assuming, you're using some
x86 target).

> If you're making your own front end, you probably won't need it :slight_smile:

Well, I wanted to play around with it, so see, how my older programs
perform with it.

So far I managed to get some programs running by folowing scheme:

for src in $infiles; do
llvm-gcc -o $src.bc -c $src ;
opt -f -o $src.bc.opt $src.bc ;
llc -f $src.bc.opt ;
gcc -c $src.bc.opt.s ;
done

gcc -o $name *.o

If you're using llvm-gcc4, you can reduce this to:

llvm-gcc -O2 $infiles -o $name

However until now I always had to compile the source file with the
main function directly with gcc, so that the __main function (to
initialize the globals and static constructors) gets created.

I did not manage to make llvm-gcc create __main.

I think Anton already answered this.

> If you can create platform independent code from D, then you should
> be able to just use the LLVM bytecode representation.

D itself is platform independent, but since D is also aimed at system
programming it has an inline asembler. However one can supply
different kinds of assembler through conditional comilation, so the
following would be completely valid:

void foo(){
    version(x86) {
        asm {
/* x86 assembler */
        }
    } else version(llvm) {
        asm {
/* llvm assembler */
        }
    } else {
/* plain D implementation */
    }
}

<http://www.digitalmars.com/d/iasm.html&gt;
<http://www.digitalmars.com/d/version.html&gt;

Any ideas, how this could be integrated with LLVM?

As John mentioned, you should probably just use LLVM's inline assembly
capability. It is already being used to compile the inline assembly in
the Linux kernel and the C library. While inline assembly capability is
not fully complete, it is sufficient for most uses. Please see these
documents:
http://llvm.org/docs/LangRef.html#moduleasm
http://llvm.org/docs/LangRef.html#inlineasm
http://llvm.org/doxygen/classllvm_1_1InlineAsm.html

The IMHO most naive
approach would be to add labels where the asm begins and ends. Then
the plattform dependent assembler being parted from the code llvm
gets to see and being processed by an external assembler into object
code. Later in the process the linker then reinserts the assembler
object code.

Most naive approach is to use the LLVM support for this :slight_smile:

Reid.

I would eventually like to support "codegen-time comparisons against the target" in a more formal way. The most straight-forward way to do this is to compile this into LLVM code like:

if (llvm.target.matches("x86"))
   ...
else if (llvm.target.matches("ppc"))
  ...
else
   ...

The "..." can use the standard LLVM inline asm facility as others have mentioned and the 'plain D impl' can compile to plain llvm code. In order to get the dynamic target checks working properly and efficiently, you'd make an llvm intrinsic (e.g. "llvm.target.matches") and have that intrinsic lowered to a constant by the code generator. Dead code elim will delete the unreachable cases, so you'll get the right code for each target.

-Chris