Compile programs with the LLVM Compiler as a gsoc project

hi,

This e-mail is written to involve some of the project ideas in LLVM in GSOC this year.

I was looking in to the ideas mentioned under improving current system and found the idea of “Compile programs with the LLVM Compiler” to be interesting. I would like to compile one of the large code bases that have not yet been compiled with LLVM and convert the build system to be compatible with the LLVM Programs testsuite.

But I have several doubts to be clarified. They are listed below.

  • I would like to know whether this is a suitable project for GSOC?
  • What software has already been compiled with LLVM and what are not; so that I can identify the possible candidates for the project?

Thank you

Me:

My name is Kumaripaba Atukorala and I am a computer science and engineering undergraduate [1]. I’m interested in compiler technology and have experience in C/C++ and java programming [2].

[1] - http://www.cse.mrt.ac.lk

[2] - http://paba50.googlepages.com

hi,
This e-mail is written to involve some of the project ideas in LLVM in GSOC this year.
I was looking in to the ideas mentioned under improving current system and found the idea of “Compile programs with the LLVM Compiler” to be interesting. I would like to compile one of the large code bases that have not yet been compiled with LLVM and convert the build system to be compatible with the LLVM Programs testsuite.

But I have several doubts to be clarified. They are listed below.

  • I would like to know whether this is a suitable project for GSOC?
  • What software has already been compiled with LLVM and what are not; so that I can identify the possible candidates for the project?

I think this would be a great project. However, I would rephrase it to be more concrete.

How about taking a linux distro like redhat or gentoo or whatever you are familiar of comfortable with, and try compiling the whole thing with llvm-gcc? As part of the GSoC project, you could file bug reports for any issues you hit and help track down problems.

-Chris

Chris Lattner wrote:

hi,
This e-mail is written to involve some of the project ideas in LLVM
in GSOC this year.
I was looking in to the ideas mentioned under improving current
system and found the idea of "Compile programs with the LLVM
Compiler" to be interesting. I would like to compile one of the large
code bases that have not yet been compiled with LLVM and convert the
build system to be compatible with the LLVM Programs testsuite.

But I have several doubts to be clarified. They are listed below.

    * I would like to know whether this is a suitable project for GSOC?
    * What software has already been compiled with LLVM and what are
      not; so that I can identify the possible candidates for the
      project?

I think this would be a great project. However, I would rephrase it
to be more concrete.

How about taking a linux distro like redhat or gentoo or whatever you
are familiar of comfortable with, and try compiling the whole thing
with llvm-gcc? As part of the GSoC project, you could file bug
reports for any issues you hit and help track down problems.

Excellent idea!

When testing large code bases built with llvm, and trying to track down
where the problem is it would be very useful to have an automated tool
to help. Something similar to 'git bisect', or bugpoint but for many
source files.

For example: built entire code with gcc, get some "expected output" (run
make check, ....), same for llvm-gcc. If they differ, start tracking
down (automatically!) in which source files the problem is. Then you
build half code with llvm, half with gcc. If it breaks, you build 1/4
llvm, 3/4 gcc; if it doesn't break you build 3/4 llvm, 1/4 gcc, and so
on. The situation should be logged by a tool, because for example I
would certainly forget which build worked, and which one didn't.
It would make sense to cache files previously built, an easy way to do
that would be to build everything with one compiler, then backup&remove
one half, and built it with the other compiler (just run make with the
correct compiler, it will rebuild the missing files). Then restore the
half, remove a quarter, repeat.

If this tool could be a drop-in wrapper for CC/CXX, it would be
excellent, since nearly every autotooled package could be tested this way.

P.S.: to avoid duplicate bug reports, I think filing a "meta" bug that
holds as depedencies all bugs that affect package X would be useful.

Best regards,
--Edwin

Thank you.
I’ll take all these valuable facts in to consideration and come up with my proposal for
this project .

Kumaripaba

Hi Chris,

How about taking a linux distro like redhat or gentoo or whatever you
are familiar of comfortable with, and try compiling the whole thing
with llvm-gcc? As part of the GSoC project, you could file bug
reports for any issues you hit and help track down problems.

They may seem a bit large and daunting. How about Linux from Scratch?
If it's completed, more could be added on.

    http://www.linuxfromscratch.org/

Cheers,

Ralph.

hi,
Several doubts aroused after I read through all the information provided in former mails. They are

I think this would be a great project. However, I would rephrase it
to be more concrete.

How about taking a linux distro like redhat or gentoo or whatever you
are familiar of comfortable with, and try compiling the whole thing
with llvm-gcc? As part of the GSoC project, you could file bug
reports for any issues you hit and help track down problems.

  1. I thought of taking the gcc compiler and compiling it with llvm since it is easier to make test cases to test the system. Is gcc compiler already built with llvm? if so I have the linux kernel as the second option. What is your openion on this ?

Excellent idea!

When testing large code bases built with llvm, and trying to track down
where the problem is it would be very useful to have an automated tool
to help. Something similar to ‘git bisect’, or bugpoint but for many
source files.

For example: built entire code with gcc, get some “expected output” (run
make check, …), same for llvm-gcc. If they differ, start tracking
down (automatically!) in which source files the problem is. Then you
build half code with llvm, half with gcc. If it breaks, you build 1/4
llvm, 3/4 gcc; if it doesn’t break you build 3/4 llvm, 1/4 gcc, and so
on. The situation should be logged by a tool, because for example I
would certainly forget which build worked, and which one didn’t.
It would make sense to cache files previously built, an easy way to do
that would be to build everything with one compiler, then backup&remove
one half, and built it with the other compiler (just run make with the
correct compiler, it will rebuild the missing files). Then restore the
half, remove a quarter, repeat.

If this tool could be a drop-in wrapper for CC/CXX, it would be
excellent, since nearly every autotooled package could be tested this way.

  1. you’ve mentioned about using a tool to test the system that I’ll be building with LLVM. Do I have to develop this tool from the scratch or are there any existing tools that can be made use of?

Thank you,
Kumaripaba

Kumaripaba Miyurusara Atukorala wrote:

hi,
Several doubts aroused after I read through all the information
provided in former mails. They are

    >>
    > I think this would be a great project. However, I would rephrase it
    > to be more concrete.
    >
    > How about taking a linux distro like redhat or gentoo or
    whatever you
    > are familiar of comfortable with, and try compiling the whole thing
    > with llvm-gcc? As part of the GSoC project, you could file bug
    > reports for any issues you hit and help track down problems.
    >

1) I thought of taking the gcc compiler and compiling it with llvm
since it is easier to make test cases to test the system. Is gcc
compiler already built with llvm?

Yes, llvm-gcc is bootstrapped (so it is compiled with llvm).

if so I have the linux kernel as the second option. What is your
openion on this ?

That could be a large task. I succeeded building a kernel with llvm a
while ago, but it didn't boot (neither did an UML kernel). You should
have very good knowledge of how the kernel works to find out what is
wrong. This is one place the tool I suggested could come in handy :wink:

    Excellent idea!

    When testing large code bases built with llvm, and trying to track
    down
    where the problem is it would be very useful to have an automated tool
    to help. Something similar to 'git bisect', or bugpoint but for many
    source files.

2) you've mentioned about using a tool to test the system that I'll be
building with LLVM. Do I have to develop this tool from the scratch or
are there any existing tools that can be made use of?

Maybe bugpoint. But you should discuss the design of the tool with Chris.

Best regards,
--Edwin

How exactly do can LLVM be used with autotooled packages (with -emit-llvm)? I’ve tried setting CC, CXX, CFLAGS, CXXFLAGS. I usually can’t get past “./configure” because it tries to compile test programs to make sure gcc works. Adding the “-c” option allows it to compile but it outputs as filename.o instead of a.out as the script expects. Without the “-c” option I get ld errors:
ld: Unknown command line argument ‘-m’. Try: ‘/home/ssoria/llvm/gcc/ld --help’
ld: Unknown command line argument ‘-dynamic-linker’. Try: ‘/home/ssoria/llvm/gcc/ld --help’
ld: Unknown command line argument ‘-emit-llvm’. Try: ‘/home/ssoria/llvm/gcc/ld --help’
collect2: ld returned 1 exit status

Sean Soria wrote:

    If this tool could be a drop-in wrapper for CC/CXX, it would be
    excellent, since nearly every autotooled package could be tested
    this way.

How exactly do can LLVM be used with autotooled packages (with
-emit-llvm)? I've tried setting CC, CXX, CFLAGS, CXXFLAGS. I usually
can't get past "./configure" because it tries to compile test programs
to make sure gcc works. Adding the "-c" option allows it to compile
but it outputs as filename.o instead of a.out as the script expects.
Without the "-c" option I get ld errors:
ld: Unknown command line argument '-m'. Try:
'/home/ssoria/llvm/gcc/ld --help'
ld: Unknown command line argument '-dynamic-linker'. Try:
'/home/ssoria/llvm/gcc/ld --help'
ld: Unknown command line argument '-emit-llvm'. Try:
'/home/ssoria/llvm/gcc/ld --help'
collect2: ld returned 1 exit status

This should work (if you used --program-prefix=llvm-, otherwise just
give the full path to the gcc you built)
CC=llvm-gcc CXX=llvm-g++ ./configure

Note, that llvm-gcc4.x will generate native code, and you won't see the
intermediate IR files.

If you need the IR files, you can try -O4 (but linking doesn't work on
Linux yet in this case), or use a wrapper script (utils/ccc from clang
does the job with little adjustments).

Best regards,
--Edwin

1) I thought of taking the gcc compiler and compiling it with llvm
since it is easier to make test cases to test the system. Is gcc
compiler already built with llvm?

The frontend of GCC is already part of the LLVM project. It is used
inside of LLVM to compile C++ and C to an intermediate format that the
LLVM infrastructure then takes to machine code.
IOW the frontend is already built, the backend (which generates machine
code) isn't used (and as far as I can tell from a quick look at the
llvm-gcc sources, it was pruned from the sources, too).

if so I have the linux kernel as the second option.

The Linux kernel requires some very special precautions.
* It does not link against the standard C runtime (which expects things
like an environment and command-line parameters, stuff that's not
available when booting).
* It requires some tiny but essential bits of assembly for things like virtual memory and scheduling. (I don't know whether LLVM's bitcode files can carry machine code; if yes, this would probably not be a problem.)

Regards,
Jo

Hello,

1) I thought of taking the gcc compiler and compiling it with llvm
since it is easier to make test cases to test the system. Is gcc
compiler already built with llvm? if so I have the linux kernel as the
second option. What is your openion on this ?

Just my 2 cents. I'd strongly suggest to select gentoo or anything
similar for this, it's pretty compiler-oriented, you can have several
versions of compilers in system and select, which compiler to use
smoothly.

We rutinely compile linux with llvm (and do LTO and custom transforms
on it). So that would be novel. However, several existing
optimizations break the linux kernel (and several bits of the linux
kernel are buggy and just happen to work with gcc (aka their
correctness depends on getting a pseudo random value from reading an
unitinialized variable)). Tracking down and distilling minimal test
cases for the broken optimizations would be really useful (and very
painful).

There are really 2 ways to do this. First is to do it manually. find
the optimization that breaks the kernel, find the function, see what
it does that causes the breakage, etc. OR, you could extend bugpoint
to be able to launch an external tool that performed the final linking
and testing of the bytecode. This would be nice because then bugpoint
would give the tool two pieces, the tool would assemble the two pieces
into a booting kernel, run the kernel in an emulator and report back
to bugpoint on whether it suceeded or failed.

Obviously the second one would be a more useful addition to the llvm
too chain, whereas the first method would be invaluable hard manual
debugging.

Andrew

I should add that the second method would allow bugpoint to work on
gui programs, which would be a nice thing.

Andrew