Compiler Driver Requrements & Design (Comments Solicited!)

LLVMers,

As part of my work on bug 353: Create Front End Framework And Compiler
Driver (http://llvm.cs.uiuc.edu/PR353), I'm starting a discussion on the
design and requirements of the compiler driver. If you have comments on
this, by all means PLEASE chime in. This is by no means cast in stone.
The results of the ensuing discussion will be documented in PR353 (and
elsewhere) and I'll use it as my guide in implementing the compiler
driver.

If your comments are limited in scope, please place the section number
and title in the subject line so we can have independent lines of
discussion on sub-topics. Thanks.

CONTENTS:

2. MODE OF OPERATION

The driver will simply read its command line arguments, read its
configuration data, and invoke the compilation, linking, and
optimization tools necessary to complete the user's request. Its basic

I'm not sure that I agree with this. Compilers need to be extremely
predictable and simple. In particular, saying:

llvmgcc x.c y.c z.c

should invoke exactly the same tools as:

llvmgcc x.c -c
llvmgcc y.c -c
llvmgcc z.c -c
llvmgcc x.o y.o z.o

I don't necessarily think that you're contradicting this, I just wanted to
make sure we're on the same page.

4. SIMILAR OPTIONS AS GCC

Certain common GCC options should be supported in order to make the
driver appear familiar to users of GCC. In particular, the following
options are important to preserve:

Very important, I agree.

Additionally, we should have options to:
* generate analysis reports ala the LLVM analyze tool

I'm not certain how useful this would be. It would add complexity to the
driver that is of arguable use. If anything I would make this the last
priority: the people who use 'analyze' are compiler developers, not end
users.

* have a "no op" mode like -v where it just reports what it would do
* have a language specific help utility based on suffixes. For example,
  --help ll would list the options applicable to *.ll input files. This
  would extend to source languages too (e.g. --help c for C help or
  --help f for FORTRAN help). The generated help info would be specific
  for the given language, after the config files have been read thus
  allowing the output to vary depending on the driver's configuration.
* Support the -- option to terminate command line options and indicate
  the remaining options are files to be processed. This
* Support command line configuration (override config files on the
  command line) either by specifying a config file or using special
  configuration options.
* each option should have short (-X) and long (--language) variants

Sure.

5. BASIC/STANDARD COMPILATION TASKS

The driver will perform basic tasks such as compilation, optimization,
and linking. The following definitions are suggested, but more could be
supported.

There has been a lot of discussion/confusion on IRC relating to what
actually will go into .s or .o files. In particular, some people were
arguing that if we output a .o file, that it should only contain native
code. This means that these two commands would do very different things:

llvmgcc x.c -o x.o # compile to native .o
llvmgcc x.c -o x.bc # compile to bytecode

I have to say that I *strenuously* object to this behavior. In
particular, this would require all users to change their makefiles to get
IPO/lifelong optzn support from LLVM, violating one of the main goals of
the system.

There are a couple of things that people brought up (including wrapping
.bc files in ELF sections, generating .o files containing native
code+.bc), but here is the proposal that I like best: :slight_smile:

I don't think that anything should change w.r.t. the contents of .o files.
In particular, .o files should contain LLVM bytecode without wrappers or
anything fancy around them. The big problem with this is compiler
interoperability, in particular, mixing .o files from various compilers
(e.g. a native GCC) will not work (e.g. 'ld' will barf when it hits an
LLVM .o file).

Personally I don't see a problem with this. We already have "llvm aware"
replacements for many system tools, including ld, nm, and a start for ar.
These tools could be made 'native aware', so that 'llvm-ld x.o b.o' would
do the right thing for mixed native and llvm .o files. Imagine an
llvm-objdump tool that either runs the native objdump program or llvm-dis
depending on the file type.

The one major thing that I want to fix is the current kludge of using
llvmgcc -S or llvmgcc -c to control whether the compile-time optimizer is
run. The only reason we did this was because it was easy, and a new
compiler driver is exactly what we need to fix this. In particular, I
would really like to see something like this:

llvmgcc X.c -S # compiles, runs gccas, emits an *optimized* .ll file
llvmgcc X.c -c # Same as -S, but now in .bc form instead of .ll form
llvmgcc X.c -On -S # "no" optimization, emit a 'raw' .ll file
llvmgcc X.c -On -c # "no" optimization, emit a 'raw' .bc file

Basically, today's equivalents to these are:

llvmgcc X.c -c -o - | llvm-dis > X.s
llvmgcc X.c -c
llvmgcc X.c -S
llvmgcc X.c -S -o - | llvm-as > X.o

The ability to capture the raw output of a front-end is very useful and
important, but it should be controlled with -O options, not -S/-c. Also,
llvmgcc -O0 is not necessary the same as -On, because some optimizations
actually speed up compilation (e.g., dead code elim).

Anyway, these are just some high-level ideas.

-Chris

> 2. MODE OF OPERATION
> ====================
> The driver will simply read its command line arguments, read its
> configuration data, and invoke the compilation, linking, and
> optimization tools necessary to complete the user's request. Its basic

I'm not sure that I agree with this. Compilers need to be extremely
predictable and simple. In particular, saying:

llvmgcc x.c y.c z.c

should invoke exactly the same tools as:

llvmgcc x.c -c
llvmgcc y.c -c
llvmgcc z.c -c
llvmgcc x.o y.o z.o

I don't necessarily think that you're contradicting this, I just wanted to
make sure we're on the same page.

I'm not contradicting anything here. The driver will select a completely
deterministic, simple, and direct sequence of commands in a well defined
order. My analogy to the SQL query optimizer was just that: an analogy.
Its not going to look for the "best" solution, it'll just be coded with
the best strategies built in and completely predictable from there.

> 4. SIMILAR OPTIONS AS GCC
> =========================
> Certain common GCC options should be supported in order to make the
> driver appear familiar to users of GCC. In particular, the following
> options are important to preserve:

Very important, I agree.

> Additionally, we should have options to:
> * generate analysis reports ala the LLVM analyze tool

I'm not certain how useful this would be. It would add complexity to the
driver that is of arguable use. If anything I would make this the last
priority: the people who use 'analyze' are compiler developers, not end
users.

True, I'll drop it.

> 5. BASIC/STANDARD COMPILATION TASKS
> ===================================
> The driver will perform basic tasks such as compilation, optimization,
> and linking. The following definitions are suggested, but more could be
> supported.

There has been a lot of discussion/confusion on IRC relating to what
actually will go into .s or .o files. In particular, some people were
arguing that if we output a .o file, that it should only contain native
code. This means that these two commands would do very different things:

llvmgcc x.c -o x.o # compile to native .o
llvmgcc x.c -o x.bc # compile to bytecode

I have to say that I *strenuously* object to this behavior. In
particular, this would require all users to change their makefiles to get
IPO/lifelong optzn support from LLVM, violating one of the main goals of
the system.

There are a couple of things that people brought up (including wrapping
.bc files in ELF sections, generating .o files containing native
code+.bc), but here is the proposal that I like best: :slight_smile:

I don't think that anything should change w.r.t. the contents of .o files.
In particular, .o files should contain LLVM bytecode without wrappers or
anything fancy around them. The big problem with this is compiler
interoperability, in particular, mixing .o files from various compilers
(e.g. a native GCC) will not work (e.g. 'ld' will barf when it hits an
LLVM .o file).

Personally I don't see a problem with this. We already have "llvm aware"
replacements for many system tools, including ld, nm, and a start for ar.
These tools could be made 'native aware', so that 'llvm-ld x.o b.o' would
do the right thing for mixed native and llvm .o files. Imagine an
llvm-objdump tool that either runs the native objdump program or llvm-dis
depending on the file type.

Okay, above is agreed.

The one major thing that I want to fix is the current kludge of using
llvmgcc -S or llvmgcc -c to control whether the compile-time optimizer is
run. The only reason we did this was because it was easy, and a new
compiler driver is exactly what we need to fix this. In particular, I
would really like to see something like this:

llvmgcc X.c -S # compiles, runs gccas, emits an *optimized* .ll file
llvmgcc X.c -c # Same as -S, but now in .bc form instead of .ll form

Okay, but what's the default -Ox option that gets applied? -O2? -O1?.
Its not clear from this what the default is. To mimic GCC, such a
command line would produce very little, if any optimization.

llvmgcc X.c -On -S # "no" optimization, emit a 'raw' .ll file
llvmgcc X.c -On -c # "no" optimization, emit a 'raw' .bc file

That's fine, -On, I suppose is basically "absolutely no optimization
passes" but what is -O0 (oh zero)? a synonym for -On? Some minimal
optimization?

Basically, today's equivalents to these are:

llvmgcc X.c -c -o - | llvm-dis > X.s
llvmgcc X.c -c
llvmgcc X.c -S
llvmgcc X.c -S -o - | llvm-as > X.o

Are these supposed to match the four above? The use of llvmgcc is
confusing me here. In future discussion, when you mean the future
driver, please write as "driver" (or the actual name if its decided by
then).

So one problem with this is that there's no way to emit a native .o
file? I thought one of the goals you wanted for the driver was to allow
an invoked compiler tool to generate as much as possible, including
native object file (.o) such as ELF. This would imply from the last
example that:

driver X.c -On -c would produce:

llvmgcc X.c -S -o - | llvm-as | llc | gas > X.o

But, your scheme doesn't seem to permit this?

The ability to capture the raw output of a front-end is very useful and
important, but it should be controlled with -O options, not -S/-c. Also,
llvmgcc -O0 is not necessary the same as -On, because some optimizations
actually speed up compilation (e.g., dead code elim).

Okay, you answered my question above. Perhaps you can define the
specific passes that should be included n -O0.

As for capturing the raw output of a front-end, GCC has the -E option
(well, at least for the pre-processor). Do we want to do that ?

Anyway, these are just some high-level ideas.

Your thoughts, if any on the other topics would be very much
appreciated.

Thanks,

Reid.

> The one major thing that I want to fix is the current kludge of using
> llvmgcc -S or llvmgcc -c to control whether the compile-time optimizer is
> run. The only reason we did this was because it was easy, and a new
> compiler driver is exactly what we need to fix this. In particular, I
> would really like to see something like this:
>
> llvmgcc X.c -S # compiles, runs gccas, emits an *optimized* .ll file
> llvmgcc X.c -c # Same as -S, but now in .bc form instead of .ll form

Okay, but what's the default -Ox option that gets applied? -O2? -O1?.
Its not clear from this what the default is. To mimic GCC, such a
command line would produce very little, if any optimization.

That's orthogonal to the discussion, but my current thought is that no -O
option should default to the equivalent of -O1 or maybe -O2. A lot of
people just simply "forget" to use -O options and we want to do reasonably
well, but not be too clever :slight_smile:

> llvmgcc X.c -On -S # "no" optimization, emit a 'raw' .ll file

That's fine, -On, I suppose is basically "absolutely no optimization
passes" but what is -O0 (oh zero)? a synonym for -On? Some minimal
optimization?

My thought is that -On is the straight output from the front-end. -O0
would be the fastest possible compile-time. These are almost certainly
different, as simple optimizations like DCE can reduce compile times by
reducing disk I/O. -On would probably only be useful to compiler people.

> Basically, today's equivalents to these are:
>
> llvmgcc X.c -c -o - | llvm-dis > X.s
> llvmgcc X.c -c
> llvmgcc X.c -S
> llvmgcc X.c -S -o - | llvm-as > X.o

Are these supposed to match the four above? The use of llvmgcc is
confusing me here. In future discussion, when you mean the future
driver, please write as "driver" (or the actual name if its decided by
then).

Yes, sorry, ok s/llvmgcc/driver/

So one problem with this is that there's no way to emit a native .o
file? I thought one of the goals you wanted for the driver was to allow
an invoked compiler tool to generate as much as possible, including
native object file (.o) such as ELF. This would imply from the last
example that:

driver X.c -On -c would produce:

llvmgcc X.c -S -o - | llvm-as | llc | gas > X.o

But, your scheme doesn't seem to permit this?

Again, whether or not to generate native code is another orthogonal issue,
but one that may be tied into the -O options (e.g. produce native code at
levels -O2 and below). When generating native code, the compiler driver
would run llc and gas as appropriate. Of course some day we will be able
to write .o files directly, so we can skip a step. :slight_smile:

> The ability to capture the raw output of a front-end is very useful and
> important, but it should be controlled with -O options, not -S/-c. Also,
> llvmgcc -O0 is not necessary the same as -On, because some optimizations
> actually speed up compilation (e.g., dead code elim).

Okay, you answered my question above. Perhaps you can define the
specific passes that should be included n -O0.

I would thinking -constprop -simplifycfg -mem2reg -dce, though that's
just a first thought :slight_smile:

As for capturing the raw output of a front-end, GCC has the -E option
(well, at least for the pre-processor). Do we want to do that ?

-E should continue to be a preprocessor flag. We're really talking about
controlling the amount of optimization here, which is why I like the idea
of -On.

> Anyway, these are just some high-level ideas.

Your thoughts, if any on the other topics would be very much
appreciated.

Sure. I'm suprised noone else has chimed in :slight_smile:

-Chris

I just had a chance to read some of follow-up comments on Reid's initial document. I agree with Chris's discussion below of what is needed for users to get IPO/lifelong opt'n via LLVM without extensive changes to Makefiles, and about what .o files should contain. This is in perfect agreement with what I just said about how users should view LLVM.

--Vikram
http://www.cs.uiuc.edu/~vadve
http://llvm.cs.uiuc.edu/

Compiling InstrSelectorEmitter.cpp InstrSelectorEmitter.cpp: In member function `virtual void llvm::InstrSelectorEmitter::run(std::ostream&)': InstrSelectorEmitter.cpp:1295: internal compiler error: in convert_from_eh_region_ranges_1, at except.c:1159 Please submit a full bug report, with preprocessed source if appropriate. See <[URL:http://bugzilla.redhat.com/bugzilla](http://bugzilla.redhat.com/bugzilla)> for instructions. Preprocessed source stored into /tmp/ccbMwLuD.out file, please attach this to your bugreport

As is my usual approach to internal compiler errors I tried to compile it a second time and received the same error. If anyone knows what is going on and how to fix it I would appreciate it :wink:

~Patrick

Compiling InstrSelectorEmitter.cpp
InstrSelectorEmitter.cpp: In member function `virtual void
   llvm::InstrSelectorEmitter::run(std::ostream&)':
InstrSelectorEmitter.cpp:1295: internal compiler error: in
   convert_from_eh_region_ranges_1, at except.c:1159
Please submit a full bug report,

GCC 3.3.2 is not compatible with LLVM, sorry!

-Chris

with preprocessed source if appropriate.
See <URL:http://bugzilla.redhat.com/bugzilla&gt; for instructions.
Preprocessed source stored into /tmp/ccbMwLuD.out file, please attach this to your bugreport
</snip>

As is my usual approach to internal compiler errors I tried to compile it a second time and received the same error. If anyone knows what is going on and how to fix it I would appreciate it :wink:

~Patrick

-Chris

> Compiling InstrSelectorEmitter.cpp
> InstrSelectorEmitter.cpp: In member function `virtual void
> llvm::InstrSelectorEmitter::run(std::ostream&)':
> InstrSelectorEmitter.cpp:1295: internal compiler error: in
> convert_from_eh_region_ranges_1, at except.c:1159
> Please submit a full bug report,

GCC 3.3.2 is not compatible with LLVM, sorry!

FWIW, GCC 3.4.0 works and is installed on the research machines in
/home/vadve/shared/localtools/fc1.

-Chris

> with preprocessed source if appropriate.
> See <URL:http://bugzilla.redhat.com/bugzilla&gt; for instructions.
> Preprocessed source stored into /tmp/ccbMwLuD.out file, please attach this to your bugreport
> </snip>
>
> As is my usual approach to internal compiler errors I tried to compile it a second time and received the same error. If anyone knows what is going on and how to fix it I would appreciate it :wink:
>
> ~Patrick

-Chris

--
http://llvm.cs.uiuc.edu/
Chris Lattner's Homepage

_______________________________________________
LLVM Developers mailing list
LLVMdev@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://mail.cs.uiuc.edu/mailman/listinfo/llvmdev

-- John T.

Yes. I agree now too. We settled that one on IRC a few days ago. I was
just bringing a traditional compilation mindset to the table. I'm over
it :slight_smile:

Reid.

How does one move instructions from one basic block to another? I tried
this:
(IB is an Instruction* as is current_last, current_BB is a BasicBlock*)

      IB->getParent()->getInstList().remove(IB);
      current_BB->getInstList().insert(current_last, IB);

and I get this assertion:
Assertion `V->getParent() == 0 && "Value already in a container!!"' failed.

it seems to me that remove should remove it from it's container....

You want to use `erase' instead of `remove'.
See http://llvm.cs.uiuc.edu/docs/ProgrammersManual.html#schanges_deleting

The problem you're having is that remove returns in the instruction and
invalidates its argument. "IB" does not point to the instruction any
longer, in fact, it gets invalidated by the operation. Try this:

  Instruction *I = IB->getParent()->getInstList().remove(IB);
  current_BB->getInstList().insert(current_last, I);

Alternatively, you can use splice, which is a little more efficient:

  current_BB->getInstList().splice(current_last, IB->getParent()->getInstList(),
                                   IB);

splice is a bit more efficient than remove/insert because it doesn't
bother to take the symbol out of the function symbol table (unless of
course you're splicing into a different function). Good
descriptions of splice can be found in STL references for the list class.

-Chris