Clang universal driver project state

Hi,

What is the current state of Clang universal driver project
(http://clang.llvm.org/UniversalDriver.html)? Does anybody work on
that project now?

Recently I start to implement support for one more MIPS toolchain.
It's directory tree is similar to Code Sourcery toolchain but has some
difference as well. Sure I can add a bit more folder names to the
clang driver and make GCC detection algorithm more complicated. But I
think it is not a right way.

That's why I'd like to implement the clang driver's configuration file
support. My current plan is to put to this file the following settings
and use these settings in the driver:
- target triple
- include files search paths (-I)
- library search paths (-L)
- program and files search paths (-B)
- explicit paths for files and / or programs

User will be able to select configuration file by --target=<config name> option.

Also I want to code an utility that runs specified gcc with
-print-multi-directory / -print-prog-name / ... options and generates
driver's configuration file.

I plan to send more formal RFC next week. But if someone works on
similar tasks or has any comments or questions right now, I would be
happy to get the response.

Hi,

What is the current state of Clang universal driver project
(http://clang.llvm.org/UniversalDriver.html)? Does anybody work on
that project now?

I don't think so.

Recently I start to implement support for one more MIPS toolchain.
It's directory tree is similar to Code Sourcery toolchain but has some
difference as well. Sure I can add a bit more folder names to the
clang driver and make GCC detection algorithm more complicated. But I
think it is not a right way.

That's why I'd like to implement the clang driver's configuration file
support. My current plan is to put to this file the following settings
and use these settings in the driver:
- target triple
- include files search paths (-I)
- library search paths (-L)
- program and files search paths (-B)
- explicit paths for files and / or programs

User will be able to select configuration file by --target=<config name>
option.

Can you implement this using the existing response file support, rather
than coming up with a new config file format?

Also I want to code an utility that runs specified gcc with

Unfortunately response file format does not satisfy all requirements.

First I need a fine grained control over include headers directories.
I need to distinguish regular system include directories, include
directories with extern "C" semantic, c++ system include directories.

The next requirement might be specific for MIPS toolchains. For these
toolchains include headers directories, libraries search paths etc
depend not only on target name like mips-linux-gnu or mipsel-linux-gnu
but on command line arguments too. Sure we can write a separate
configuration file for each target and command line options
combination. But I see two problems here: a) huge number of
configuration files. b) end-user prefers to use standard command line
options like -mips16 -msoft-float, ... instead of something like
-target mispel-16-softfloat. I hope to implement a configuration file
contains options for a front-end, assembler and linker and driver's
command line arguments triggers their selection.

Hi,

What is the current state of Clang universal driver project
(http://clang.llvm.org/UniversalDriver.html)? Does anybody work on
that project now?

Recently I start to implement support for one more MIPS toolchain.
It's directory tree is similar to Code Sourcery toolchain but has some
difference as well. Sure I can add a bit more folder names to the
clang driver and make GCC detection algorithm more complicated. But I
think it is not a right way.

I disagree. I think this is the right way.

That's why I'd like to implement the clang driver's configuration file
support. My current plan is to put to this file the following settings
and use these settings in the driver:
- target triple
- include files search paths (-I)
- library search paths (-L)
- program and files search paths (-B)
- explicit paths for files and / or programs

User will be able to select configuration file by --target=<config name>
option.

There is not universal agreement that this is the right long-term
direction. I'm moderately opposed to complicating the distribution and
installation model of Clang. I'm very opposed to worsening the state of
getting distributions to both:
1) Pick a common and consitent layout of toolchains, and/or
2) Work with the upstream community to add support for their layout.

I strongly suspect that doing this the way you are suggesting will have
this result.

The design I have advocated for in the past can be summed up as:

1) Don't start a new driver. Just refactor the current one.
2) Lift the common and often duplicated "configuration" activities from C++
code to a simple table file processed by tablegen to produce C++ code that
implements automatic detection of various configurations.
3) Continue to bake the database of targets and supported layouts into the
driver itself, but now with a very low overhead for adding another config.

We could eventually add support for loading some elements of these configs
at runtime, but I would like to avoid that if at all possible, and at least
ensure that 99.99% of the time it isn't needed.

Also I want to code an utility that runs specified gcc with
-print-multi-directory / -print-prog-name / ... options and generates
driver's configuration file.

This doesn't ensure they are correct or sensible, just that they happen to
work today. I would prefer actually being principled about the layouts we
support.

I would also *really* like to see more pressure on people to produce
toolchains in a common and widely used layout. Everyone's life would be
much simpler in that world.

Hi Chandler,

The world of compilers for embedded embedded platforms and cross compilers is what it is.

Trying to ask the whole embedded programming world to conform to what you think it should be is not going to happen; ignoring any questions of who is right.

For a given compiler, there are so many platforms, toolchain variants, header file variants, library variants. You can't stuff all of this in the clang driver.

The driver part of clang is a mess right now. Tons of hardcoded stuff that should not be there, even for simple host/targets like linux/linux or bsd/bsd. Everyone struggles with trying to do the simplest driver work and this kind of activity should be a nobrainer defined by some simple ascii based tables that can be edited .

Tablegen is definitely the wrong place. Much of this needs to be configurable after the compiler is built.

Gcc deals with this by using configure and essentially you have to build
special compilers for many combinations and even then is not so easy to
keep straight. Using various configure parameters you can control all of this but then you have to specially build N compiler binaries; even on the same host.

Clang/LLVM tossed out this notion of many compilers on a given host but has not adequately addressed the rest of the configuration problem: headers, toolchains, libraries, etc.

Simon has already been through this exercise with some very involved tool chains for Mips and he knows the issues here.

This is a very simple and clean solution he is proposing.
Clang should be able to read from a simple ascii file where the components of the target infrastructure are located: i.e. header files, assembler, linker, libraries, etc.

Simon has the time to work on this now and has deep knowledge of the details of this problem and did an excellent job on teaching clang about
the code sorcery tool chain for Mips which I think is maybe the most involved one of any of the gcc embedded targets.

I think we should focus on what, if anything, really will not work with his scheme.

The rest of Clang/LLVM is evolving and maybe the driver will too but I can't understand the motivation towards making the current mess there even messier by putting more information there.

Reed

This is a very simple and clean solution he is proposing.
Clang should be able to read from a simple ascii file where the components
of the target infrastructure are located: i.e. header files, assembler,
linker, libraries, etc.

Why can't the files be response files? The driver already communicates
all the header info to cc1 via command lines. Having command lines for
disabling autodetect and selecting the linker, library layout, etc is
good for testing anyway. With those in place, response file should
work. Including the idea of writing a tool that run gcc and extracts a
configuration file.

Reed

Cheers,
Rafael

This is a very simple and clean solution he is proposing.
Clang should be able to read from a simple ascii file where the components
of the target infrastructure are located: i.e. header files, assembler,
linker, libraries, etc.

Why can't the files be response files? The driver already communicates
all the header info to cc1 via command lines. Having command lines for
disabling autodetect and selecting the linker, library layout, etc is
good for testing anyway.

Yep, this is what I was wondering/asking at the social last night. If
all these options are (or should be) command line arguments for clang
anyway, all we really want is the ability to specify a file containing
command line arguments (this seems slightly different from a "response
file" - I mean literally just a file containing command line args and
a command line "-args foo.txt" & it just substitutes the contents of
the file as if it were arguments immediately written there (so they
override and can be overridden as usual))

This is a very simple and clean solution he is proposing.
Clang should be able to read from a simple ascii file where the components
of the target infrastructure are located: i.e. header files, assembler,
linker, libraries, etc.

Why can't the files be response files? The driver already communicates
all the header info to cc1 via command lines. Having command lines for
  disabling autodetect and selecting the linker, library layout, etc is
good for testing anyway.

Yep, this is what I was wondering/asking at the social last night. If
all these options are (or should be) command line arguments for clang
anyway, all we really want is the ability to specify a file containing
command line arguments (this seems slightly different from a "response
file" - I mean literally just a file containing command line args and
a command line "-args foo.txt" & it just substitutes the contents of
the file as if it were arguments immediately written there (so they
override and can be overridden as usual))

Simon will have to explain all the details here.

He implemented this all in Clang for the Codesorcery (Mentor) toolchain and at least for Mips, that is probably about as complicated a toolchain as you are going to get. The multidimensional matrix
of processor types, headers, libraries, tools, etc. is pretty large.

Right now, we have a command option -gcc-toolchain=<name> where you give the root of the
Codesorcery distribution and then all the standard gcc mips options when passed to clang for Mips will interact with that toolchain properly.

Also, it's not necessary to convert all of what is already in Clang over to this new scheme, although I think that if done properly, people would want to transition to this and we could clean up a lot of code
in the driver.

People are free to do as they always have done there.

It's needed though for the embedded toolchain folks.

This was prototyped by us several years back in an alternate driver that we wrote in python.

Reed

I believe that clang does actually have flags for this functionality (e.g.
-internal-externc-isystem). Does `clang -help | grep -A1 isystem` have
flags for all you need?

-- Sean Silva

First of all thanks for all your response. Let me sum up.

1. In general response files solve the problem. Though they have some
shortcomings:
  a) MIPS toolchain requires ~35 separate response files to cover all
possible combinations of options. Some paths in these files are the
same but user has to duplicate them in each file.
  b) Response files are independent of command line options. User has
to use them consistently.

2. As far as I understand there is no objection to implement a tool
that run gcc and create a response file acceptable the clang.

Let's postpone (at least temporarily) the discussion on the driver's
configuration file format. I'll write a tool for response file
generation. Maybe that close the issue. Any suggestion and ideas are
welcome.

Regards,
Simon

This is a very simple and clean solution he is proposing.
Clang should be able to read from a simple ascii file where the components
of the target infrastructure are located: i.e. header files, assembler,
linker, libraries, etc.

Why can't the files be response files? The driver already communicates
all the header info to cc1 via command lines. Having command lines for
  disabling autodetect and selecting the linker, library layout, etc is
good for testing anyway.

Yep, this is what I was wondering/asking at the social last night. If
all these options are (or should be) command line arguments for clang
anyway, all we really want is the ability to specify a file containing
command line arguments (this seems slightly different from a "response
file" - I mean literally just a file containing command line args and
a command line "-args foo.txt" & it just substitutes the contents of
the file as if it were arguments immediately written there (so they
override and can be overridden as usual))

With those in place, response file should
work. Including the idea of writing a tool that run gcc and extracts a
configuration file.

First of all thanks for all your response. Let me sum up.

1. In general response files solve the problem. Though they have some
shortcomings:
   a) MIPS toolchain requires ~35 separate response files to cover all
possible combinations of options. Some paths in these files are the
same but user has to duplicate them in each file.
   b) Response files are independent of command line options. User has
to use them consistently.

2. As far as I understand there is no objection to implement a tool
that run gcc and create a response file acceptable the clang.

Let's postpone (at least temporarily) the discussion on the driver's
configuration file format. I'll write a tool for response file
generation. Maybe that close the issue. Any suggestion and ideas are
welcome.

I don't know anything about response files but from what you are saying, it sounds like it also might be possible to write a standalone tool for generating response files. Maybe some simple python script for this response file generation would be easiest.

I agree that for now to be able to derive this from a gcc toolchain is the most useful since it's not likely that someone, at this exact time , will be doing LLVM work on any target that does not also have a normal gcc like tool chain with binutils, libc, etc. In that case, the command line options in gcc for telling you about the specifics of the toolchain as input to the tool you propose, would be the most helpful.