RFC: Clang driver redesign

Usecase 3: Clang developer, developing
--------------------------------------

Wants no functionality to change - things keep working as normal [1]

Usecase 4: Apple/Darwin developer, using fat binaries
-----------------------------------------------------

Requires fat-binary support. This entails multiple "-arch" arguments
being supported. [1]

.. note::

Describe this some more?

Functional requirements

The following requirements follow from the use cases above and attempt
to formalise those use cases more precisely.

[1] No functional regressions
The driver **must** be able to be configured such that it can parse
command lines that the current Clang driver accepts. The driver
**must** invoke all subtools in the same manner as the current Clang
driver, with the possible exception of obtuse, undefined, legacy or
otherwise incorrect behaviour, permission for which must be obtained
from the mailing list and documented in a subsection of this
document for decision tracking.

I honestly have to disagree with this one. A lot of the reasons for
horribleness in the current driver is compatibility with GCC. I
believe that we should really have two drivers, one being the 'nice'
driver, and one being the compatibility driver. To be honest, I
consider POSIX specifications for CC rather irrigating as well, but
I'm willing to concede POSIX compatibility. Naturally, it should be
easy for these to both be changeable at once so that we don't have
ridiculous levels of maintenance being performed, but I'm of opinion
that the current model is predicated on enough levels of annoyances
that trying to promote a compatible compiler is not a good approach
(the first example that comes to mind is -Wall).

I'm sure people will disagree with me, though.

[3] Extensibility
All parts of the driver that are to interact with outside
environment (such as interpreting command lines and launching
subtools) **must** be able to have their behaviour easily modified.

While there is no requirement for this to be able to be done with no
source changes, there **could** be scope for allowing dynamically
loadable modules (in the spirit of ``opt -load``) to change the
driver's behaviour at invoke-time.

Oh no, spec files. :wink:

Sean

Having recently taught C++, I strongly agree. The two most obvious problems are:

a) Too low a default warning level

b) The 'clang' command's stupid treatment of C++ (inherited from gcc) where it will compile it, but not link in the standard library. I have to teach students that if they see the error message:

Undefined symbols for architecture x86_64:
  "std::ios_base::Init::Init()", referenced from:
      ___cxx_global_var_init in z-pj7RVn.o
  "std::ios_base::Init::~Init()", referenced from:
      ___cxx_global_var_init in z-pj7RVn.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)

It means that tried compiling c++ with clang instead of clang++. Almost everyone hits it at some point, and it's impossible to diagnose without googling or help.

Chris

Usecase 3: Clang developer, developing

Wants no functionality to change - things keep working as normal [1]

Usecase 4: Apple/Darwin developer, using fat binaries

Requires fat-binary support. This entails multiple “-arch” arguments
being supported. [1]

… note::

Describe this some more?

Functional requirements

The following requirements follow from the use cases above and attempt
to formalise those use cases more precisely.

[1] No functional regressions
The driver must be able to be configured such that it can parse
command lines that the current Clang driver accepts. The driver
must invoke all subtools in the same manner as the current Clang
driver, with the possible exception of obtuse, undefined, legacy or
otherwise incorrect behaviour, permission for which must be obtained
from the mailing list and documented in a subsection of this
document for decision tracking.

I honestly have to disagree with this one. A lot of the reasons for
horribleness in the current driver is compatibility with GCC. I
believe that we should really have two drivers, one being the ‘nice’
driver, and one being the compatibility driver. To be honest, I
consider POSIX specifications for CC rather irrigating as well, but
I’m willing to concede POSIX compatibility. Naturally, it should be
easy for these to both be changeable at once so that we don’t have
ridiculous levels of maintenance being performed, but I’m of opinion
that the current model is predicated on enough levels of annoyances
that trying to promote a compatible compiler is not a good approach
(the first example that comes to mind is -Wall).

I’m sure people will disagree with me, though.

I fully agree.

I’ve tried adding options, in the past, and the seggregation of options is quite… baffling. Some -fXXX will influence LLVM while others influence Clang !?!

The dual Options.td cc1Options.td is quite nice too…

The GCC compatibility is great for a drop-in replacement, but I certainly see no harm into building a “pure” clang driver, where options are seggregated according to Clang usecases.

This would also allow implementing a sane option syntax, like for example a hierarchical option parser:

–codegen-stackframe-limit=50

Where “stackframe-limit=50” is dispatched to the “codegen” plugin, which itself dispatch “limit=50kB” to its stackframe object, which sets its “limit” attribute to 50000 (for example).

This make it easy to:

  • group options right where they are used
  • avoid name collisions
  • have plugins register their own set of options

I have been working on such a parser in my spare time, and I don’t mind giving the code away as a basis:
=> automatic handling of Integers, Strings, and Enums (which must provide some specific functions for conversion and listing the available values)
=> automatic handling of Booleans arguments (at the moment, I use =yes or =no, but it would be trivial to parse --no- as meaning =no providing that no plugin tries to grab the “no” namespace)

Ah, and I also have a configuration file parser which sets the options objects before the command line is parsed… (modelled after the config python module)

There’s one particular set of use cases that this brings up that I’d like to mention:

“Less knobs for users to use”

I realize it’s an example, and a good way of describing a partial way of evaluation options, however, the idea behind this option in particular is something I explicitly don’t want to do. I believe we want less knobs, not more. We don’t want the users mucking with things like inlining heuristics, the size of the stack, and whether they want to unroll 4 loops or 3. These are the kinds of decisions that the optimizer should be able to handle and people working on the compiler have their own ways of mucking with these sorts of options.

-eric

This would also allow implementing a sane option syntax, like for example a hierarchical option parser:

–codegen-stackframe-limit=50

Where “stackframe-limit=50” is dispatched to the “codegen” plugin, which itself dispatch “limit=50kB” to its stackframe object, which sets its “limit” attribute to 50000 (for example).

There’s one particular set of use cases that this brings up that I’d like to mention:

“Less knobs for users to use”

I realize it’s an example, and a good way of describing a partial way of evaluation options, however, the idea behind this option in particular is something I explicitly don’t want to do. I believe we want less knobs, not more. We don’t want the users mucking with things like inlining heuristics, the size of the stack, and whether they want to unroll 4 loops or 3. These are the kinds of decisions that the optimizer should be able to handle and people working on the compiler have their own ways of mucking with these sorts of options.

Wrong; to successfully build the cellspu tblgen files on windows x64, one had to increase stack size for tblgen not to crash. I’m quite certain this use case is far from unique. I do agree hiding low level options casual users shouldn’t know about (although they should be well documented), but not having it is really shooting yourself in the foot…

Ruben

Then the port is broken. You shouldn’t need a compiler option for this.

-eric

This would also allow implementing a sane option syntax, like for example a hierarchical option parser:

–codegen-stackframe-limit=50

Where “stackframe-limit=50” is dispatched to the “codegen” plugin, which itself dispatch “limit=50kB” to its stackframe object, which sets its “limit” attribute to 50000 (for example).

There’s one particular set of use cases that this brings up that I’d like to mention:

“Less knobs for users to use”

I realize it’s an example, and a good way of describing a partial way of evaluation options, however, the idea behind this option in particular is something I explicitly don’t want to do. I believe we want less knobs, not more. We don’t want the users mucking with things like inlining heuristics, the size of the stack, and whether they want to unroll 4 loops or 3. These are the kinds of decisions that the optimizer should be able to handle and people working on the compiler have their own ways of mucking with these sorts of options.

Wrong; to successfully build the cellspu tblgen files on windows x64, one had to increase stack size for tblgen not to crash. I’m quite certain this use case is far from unique. I do agree hiding low level options casual users shouldn’t know about (although they should be well documented), but not having it is really shooting yourself in the foot…

Then the port is broken. You shouldn’t need a compiler option for this.

Agreed, but that doesn’t take away there will always be wanted and controlled cases where these types of things are required. It would only limit Clang’s power by removing that compiler option interface. And there will always be compiler bugs, that could be effectively worked around by using these kinds of options.

Ruben

Then the port is broken. You shouldn’t need a compiler option for this.

Agreed, but that doesn’t take away there will always be wanted and controlled cases where these types of things are required. It would only limit Clang’s power by removing that compiler option interface. And there will always be compiler bugs, that could be effectively worked around by using these kinds of options.

Did you file a bug? Did you fix the bad behavior? Or did you just look for an undocumented option that would work around it and have left that in? Even if you did the right thing most users won’t. Supporting and maintaining those sorts of behaviors with explicit command line options is exactly why those kinds of options shouldn’t exist.

A generic interface for short term fixes (perhaps a reason for the plugins that James mentioned) might be OK, but the general driver should expose as little of the backend as possible.

-eric

Then the port is broken. You shouldn’t need a compiler option for this.

Agreed, but that doesn’t take away there will always be wanted and controlled cases where these types of things are required. It would only limit Clang’s power by removing that compiler option interface. And there will always be compiler bugs, that could be effectively worked around by using these kinds of options.

Did you file a bug? Did you fix the bad behavior? Or did you just look for an undocumented option that would work around it and have left that in? Even if you did the right thing most users won’t. Supporting and maintaining those sorts of behaviors with explicit command line options is exactly why those kinds of options shouldn’t exist.

A generic interface for short term fixes (perhaps a reason for the plugins that James mentioned) might be OK, but the general driver should expose as little of the backend as possible.

OK, workarounds are a bad reason.

The optimizer will not always make the best decisions, especially in situations where numerical data is processed that is unknown at compile time. Backend inlining options can improve performance if the user knows what to optimize for. I agree there’s most probably a lot of mucky options, but sometimes fine-grained control beyond what the backend itself can provide, is very wanted, if not necessary.

All I’m trying to say is a (too) dumbed down interface can be harmful (to adoption, usefulness, adaptability, research…) as much as too many obscure options can lead to misuse.

Ruben

PS: the stack size option was for the linker, making it a bit irrelevant in light of the current discussion, I wrongfully picked it up from a previous message.

> This would also allow implementing a *sane* option syntax, like for
> example a hierarchical option parser:
>
> --codegen-stackframe-limit=50
>
>
> Where "stackframe-limit=50" is dispatched to the "codegen" plugin,
> which itself dispatch "limit=50kB" to its stackframe object, which
> sets its "limit" attribute to 50000 (for example).

There's one particular set of use cases that this brings up that I'd
like to mention:

"Less knobs for users to use"

I realize it's an example, and a good way of describing a partial way
of evaluation options, however, the idea behind this option in
particular is something I explicitly don't want to do. I believe we
want less knobs, not more. We don't want the users mucking with things
like inlining heuristics, the size of the stack, and whether they want
to unroll 4 loops or 3. These are the kinds of decisions that the
optimizer should be able to handle and people working on the compiler
have their own ways of mucking with these sorts of options.

From a scientific-programming perspective, I think that this is the

wrong way to approach the problem. Although I certainly understand the
desire to decrease the maintenance burden by restricting the number of
public-facing options, tuning things like loop-unrolling limits are
often necessary for squeezing the last bit of performance out of some
scientific code. There is a tendency to think, "but people should not
spend their time doing that"; and for the most part that is true. But
many places now have autotunners that attempt to find optimal compiler
parameters for specific routines on specific input data, and those
autotunners often work with multiple compilers and so the options that
they're tuning must be available using the command-line interface.

I think, however, that it is important to make a distinction between the
options that are designed to be public facing and those that are not.
For example, a public option could look like: -floop-unrolling-limit=200
while a non-public, could-change-at-any-time option could look like:
-finternal:loop-unrolling-limit=200. Options that have meanings that are
(mostly) independent of the underlying implementation, such has now many
instructions can be in an unrolled loop, stack sizes, etc. should be
made public. Other options should not, but should still be made
available. In this way, clang/LLVM will be as friendly as possible to
regular users, performance engineers, and compiler developers.

-Hal

GCC compatibility is and has always been crucial to the viability of Clang, *especially* in the driver, which needs to deal with many years of accumulated cruft in makefiles and command lines. Unlike with language compatibility, where we can differ from GCC to better adhere to a language standard, GCC's driver *is* the standard for most *nix systems out there. You won't win the hearts and minds of users if you tell them to change all of their makefiles before they can even try Clang.

By all means, please make it easier to build and distribute cross compilers, but any Clang driver that does not provide GCC compatibility is likely to be a non-starter [*].

  - Doug

[*] The natural exception would be a driver designed for compatibility with a different compiler, e.g., a Clang that accepts Microsoft CL command-like syntax.

This isn't completely true. gcc is the standard for building *open
source packages* across platforms, which has only become important
recently. Your conclusion is correct that tweaking Makefiles cannot be
a requirement for adopting clang. But the assumption that most users
are gcc option gurus is wrong, and clang suffers from that mentality.

The majority of users, build maintainers, need some superficial level
of compatibility. As long as the build doesn't break, they're happy.

The details of invoking subtools, diagnostics, fine control of
optimizations, and target specific flags are important to hackers
who are already dealing with something broken or requiring
extraordinary optimization. Then there are the people on this list,
who simply want to waste less time repeating the horrible
trial-and-error process that it takes to reproduce a problem or enable
an experimental feature. Our experience matters too.

My own experience parallels the community in general. I've used gcc
far more than any other compiler and always appreciated having it when
I need to port a missing library. But I never cared about tweaking any
command line options other than "-g -O0". For any real compiler
development, performance work, or debugging I used the platform
vendor's native compiler which always had a sane, well documented
option set. Very un-like gcc.

My experience adopting a mostly undocumented and seemingly
obfuscated clang driver for development was violently traumatic and
still causes me grief. I'm guessing there are two reasons: (1) the
assumption that all compiler developers must have been gcc/llvm-gcc
developers at some point (2) the evil idea that a driver should be
designed primarily to prevent build engineers from using anything
beyong the minimal option set, which results in burying the rest of
the functionality in an indecipherable web of driver code.

Moving to a driver that compartmentalizes gcc compatibility would be
fantastic. We could finally focus on providing a sensible command-line
interface for the clang community. I'm not necessarily advocating two
drivers so much as a clear and formal segregation between
first-class clang options and gcc compatibility options. This is sort of
the intent behind the unfortunate "-cc1" and "-mllvm" flags. But they
only add considerable confusion from my point of view.

And while I'm in evangelism mode, we need a strict requirement that
every decision within the compiler that can be impacted by the
environment, including target data and library versions, should be
formalized and captured by an option framework. These options
could be either printed and replayed on the command line or potentially
embedded in bitcode. I have no problem disabling these options in
release builds if that's really the way to solve the inertial-QA-team problem.

And how difficult would it be to record those options + compiler version
in the obj file? Really!

Taking it one step further, we should have a single driver that
supports decomposition of the major compilation stages. For example,
I'm currently unable to use basic codegen diagnostics without
hand-inserting instrumentation because I often can't force the llc
driver to produce the same code as the clang driver.

Believe me, experimental compiler development doesn't need to be this
difficult.

-Andy

Clang's gcc-compatible options (and extensions) are one of its best
features, from a user point of view. All one's tweaking worked out
(sometimes painfully) for gcc pretty much just magically works with
clang! This is really incredibly refreshing, and makes clang look
good ("wow, they actually thought about the users!").

-Miles

The flipside is that it is impossible to even consider a situation where getting things to work doesn’t involve painful tweaking. Calling this “thinking about the users” smacks of Stockholm syndrome.

What’s wrong with having a gcc compatibility layer for the driver which just translates from one option set to the other. I’m thinking of something like:

clang-gcc [gcc options]
clang-g++ [g++ options]

clang [clang options]
clang++ [clang++ options]

If anything, the clang-gcc and clang-g++ could just be shell script wrappers around clang.

Using shell for argument conversion can introduce noticeable performance penalty.

... and doesn't work on windows :slight_smile:

Good point(s). In that case, it can be a trivial C++ program which does find/replace on the arguments and execs clang with the interpreted arguments. The point is that it doesn’t have to be anything fancy.

I honestly have to disagree with this one. A lot of the reasons for
horribleness in the current driver is compatibility with GCC. I
believe that we should really have two drivers, one being the 'nice'
driver, and one being the compatibility driver. To be honest, I
consider POSIX specifications for CC rather irrigating as well, but
I'm willing to concede POSIX compatibility. Naturally, it should be
easy for these to both be changeable at once so that we don't have
ridiculous levels of maintenance being performed, but I'm of opinion
that the current model is predicated on enough levels of annoyances
that trying to promote a compatible compiler is not a good approach
(the first example that comes to mind is -Wall).

GCC compatibility is and has always been crucial to the viability of Clang, *especially* in the driver, which needs to deal with many years of accumulated cruft in makefiles and command lines. Unlike with language compatibility, where we can differ from GCC to better adhere to a language standard, GCC's driver *is* the standard for most *nix systems out there. You won't win the hearts and minds of users if you tell them to change all of their makefiles before they can even try Clang.

By all means, please make it easier to build and distribute cross compilers, but any Clang driver that does not provide GCC compatibility is likely to be a non-starter [*].

This isn't completely true. gcc is the standard for building *open
source packages* across platforms, which has only become important
recently.

I have no idea what you mean by "recently", because open-source code is the #1 priority for Clang, and always has been. Vendors have other means of moving their users to Clang, but Clang only wins mind-share in the open-source world by being compatible (to lower the barrier to entry) and being better (to convince people to stay).

Your conclusion is correct that tweaking Makefiles cannot be
a requirement for adopting clang. But the assumption that most users
are gcc option gurus is wrong, and clang suffers from that mentality.

I have no sympathy for the GCC option gurus. They'll be unhappy anyway, because we're going to accept and ignore all of the codegen-tweaking options they like to play with anyway. But, they don't really matter.

I do have sympathy for the people who accepted patches from the GCC option gurus, and now have makefiles with weird options they don't understand. Drop-in compatibility with GCC command-line syntax is important to get them

The majority of users, build maintainers, need some superficial level
of compatibility. As long as the build doesn't break, they're happy.

Agreed.

Moving to a driver that compartmentalizes gcc compatibility would be
fantastic.

I agree.

We could finally focus on providing a sensible command-line
interface for the clang community.

Users won't care; they're best served by just providing GCC compatibility. It could certainly be helpful for us as Clang/LLVM developers, and perhaps for vendors who want to vend a different command-line interface.

I'm not necessarily advocating two
drivers so much as a clear and formal segregation between
first-class clang options and gcc compatibility options.

This is sort of
the intent behind the unfortunate "-cc1" and "-mllvm" flags. But they
only add considerable confusion from my point of view.

-cc1 is historical cruft; it could and should be replaced by something more sane.

  - Doug

Hello,

What's wrong with having a gcc compatibility layer for the driver which just translates from one option set to the other. I'm thinking of something like:

clang-gcc [gcc options]
clang-g++ [g++ options]

clang [clang options]
clang++ [clang++ options]

If anything, the clang-gcc and clang-g++ could just be shell script wrappers around clang.

Just use hard links and have clang inspect the program name on startup. It already does this to automatically pass the
C++ standard library to the linker when called as clang++ (IIRC).

Jonathan