Configuration files

Hi all,

I would like to propose implementing configuration files in clang. The idea is to read some set of options prior to the options specified in a command line in every tool invocation. These options are taken from a file which is searched either in predefined place or its location is specified by environmental variable.

Few words about particular problem this facility can solve. Clang issues many warnings, some refer to potential errors, some merely attract attention to code that can cause problems in some circumstances but otherwise is pretty innocent. For example, warning -Wundefined-var-template is issued for template usages, which nor cannot be instantiated implicitly neither are mentioned in explicit instantiation declarations. Such usage is not an error or bad style, but the warning may be helpful for a user in a complex case, such as described in PR24425. For other users this warning may be annoying and they would prefer to turn it off by default (see discussion in http://lists.llvm.org/pipermail/cfe-commits/Week-of-Mon-20160808/167354.html). Different categories of users like to have different set of warnings enabled by default.

If configuration files were supported by clang, a user can put -Wno-undefined-var-template to such file and this would have the same effect as if the unwanted warning were turned off by default. No changes to build scripts would be required.

Configuration files are supported by the Intel Compiler (https://software.intel.com/en-us/node/522780) and similar behavior is proposed to be implemented by clang.

By default the configuration file is searched in the same directory where tool executable resides. In the case of clang the file would be ‘clang.cfg’, for other tool named ‘foo’- ‘foo.cfg’. User can specify any location for this file if he sets up environmental variable ‘CLANGCFG’ (or ‘FOOCFG’). The variable should contain path to configuration file. If the configuration file does not exist, it is silently ignored. Content of configuration file is treated as a part of command line of each tool invocation preceding any arguments specified in command line.

Does implementation of this facility makes sense?

This has been discussed before, but it died out a bit. I am personally
in favour of this proposal, but I think this could be expanded just a
bit and do much more.

One big problem we have is that GNU cross-compilation toolchains are
created in different ways, with different options, paths, libraries,
support by different distributions, so it's practically impossible for
the Clang driver to keep up with all of them. The end result is that
users have to fiddle with --sysroot, -L, -I, -l, triple names, etc.
This gets even more complicated when adding compiler-rt, libc++ and
especially libunwind. Also bad is the choices between GNU and LLVM
tools for native and cross compilation, which drives the number of
possibilities times the number of distribution-specific configurations
through the roof.

So, I'd like to extend your proposal...

You said:
* If no config file is found, do nothing
* If the default config is found, pre-pend (not append, because of
last-seen effect) all the options to the user's command line

I'd add:
* If there's a system default (/etc/llvm/default.cfg), silently use that
* If there's a user default (~/.llvm/default.cfg), silently *replace*
the system config
* If Clang uses the option "--config foo.cfg", *replace* any other,
with reverse search patch (first local dir, then ~/.llvm, then
/etc/llvm)
* Allow the user to omit the ".cfg" extension, or not even create it
in the first place (I prefer it that way, see below)
* Clang -v and -### to show which configuration it's using (full path)

This would allow one to create a number of configurations to help
users get what they want, for example:
* Use Compiler-RT and libc++ installed on the system by choosing
"--config rt-libcxx", hiding all distro-specific choices (-L, -I) form
the user
* Cross-compile to ARM based on the distro's multi-arch config with
"--config cross-armv7l-linux-gnueabihf"
* Only enable "-Werror" on a final build, not development ones by
making the "--config werror" optional in CMake/Make files (ex. based
on -DDEBUG flags)

It would also help developers reproduce bug reports, if the user
provides their config file (when you don't have a crash script).

Finally, it would give us time to refactor the Clang driver for
cross-compilation by hiding the complexity behind the config file
while it's too ugly and also allow it to move without upsetting too
many users.

cheers,
--renato

I’d also add:

I’d like to see {foo}-clang interpreted as equivalent to clang --config {foo}. We currently understand {triple}-clang, but this isn’t enough to describe most cross-compile toolchains. I’d love to be able to drop a sysroot (maybe a cross-linker) and a config file in a system with a working clang and have cross compilation.

We should also probably automatically silence unused options from config files. For example, if my cross-compile toolchain passes some extra linker arguments with -Wl,-foo, then I don’t want every single line in my build to complain that -Wl is unused in conjunction with -c.

David

I’d like to see {foo}-clang interpreted as equivalent to clang --config {foo}. We currently understand {triple}-clang, but this isn’t enough to describe most cross-compile toolchains. I’d love to be able to drop a sysroot (maybe a cross-linker) and a config file in a system with a working clang and have cross compilation.

That's a good point, too. If there are no {foo} config files in the
search path, assume it's a triple, and fallback to the current
behaviour.

It's up to the distros to create the config files and symlinks to
their own configs. Users can do the same on their own ~/bin PATHs.

We should also probably automatically silence unused options from config files. For example, if my cross-compile toolchain passes some extra linker arguments with -Wl,-foo, then I don’t want every single line in my build to complain that -Wl is unused in conjunction with -c.

That's going to be a bit harder. Since the proposal is to just
pre-pend the config, it'd be *really* easy and transparent to
implement this in the driver. But if we need to know where the option
came from and then act differently, this would need pervasive changes
to many different places in the driver, and different toolchain
emulation (darwin, freebsd, linux, etc) may behave differently, since
they implement the same logic mostly on their own.

I'd consider this wish-lis, more than part of the initial proposal.

cheers,
--renato

I've been using something similar for quite a while in ELLCC (http://ellcc.org), my clang based cross compilation environment. The way I've implemented it is that if the clang executable is named {xxx}-ecc, the file xxx.cfg is read. Otherwise the argument to the -target option is checked to see if it is the name of a config file. In either case, I put the options in the config file on the command line as if they entered at the point on the command line where the name or -target option is, so that following options override those in the config file.

I also use the config files to tell the driver how to find libraries, etc.
I used YAML as the format, just because LLVM had a YAML parser. Here's an example of what one of the config files look like: http://ellcc.org/viewvc/svn/ellcc/trunk/libecc/config/ppc64el-linux?view=markup

It has proven to be very handy to support a wide variety of targets.

-Rich

That's very clean. I'm sure we could use a similar style.

cheers,
--renato

> Content of configuration file
> is treated as a part of command line of each tool invocation preceding
any
> arguments specified in command line.

This has been discussed before, but it died out a bit. I am personally
in favour of this proposal, but I think this could be expanded just a
bit and do much more.

One big problem we have is that GNU cross-compilation toolchains are
created in different ways, with different options, paths, libraries,
support by different distributions, so it's practically impossible for
the Clang driver to keep up with all of them. The end result is that
users have to fiddle with --sysroot, -L, -I, -l, triple names, etc.
This gets even more complicated when adding compiler-rt, libc++ and
especially libunwind. Also bad is the choices between GNU and LLVM
tools for native and cross compilation, which drives the number of
possibilities times the number of distribution-specific configurations
through the roof.

So, I'd like to extend your proposal...

You said:
* If no config file is found, do nothing
* If the default config is found, pre-pend (not append, because of
last-seen effect) all the options to the user's command line

I'd add:
* If there's a system default (/etc/llvm/default.cfg), silently use that
* If there's a user default (~/.llvm/default.cfg), silently *replace*
the system config
* If Clang uses the option "--config foo.cfg", *replace* any other,
with reverse search patch (first local dir, then ~/.llvm, then
/etc/llvm)
* Allow the user to omit the ".cfg" extension, or not even create it
in the first place (I prefer it that way, see below)
* Clang -v and -### to show which configuration it's using (full path)

I believe that if the replacement behaviour is the default, then there
ought to be an "inherit" behaviour (like include/include-next).

IMO that would be prohibitively expensive, hard to debug and hard to
get users to do anything useful if the system already has a list of
system configurations.

It would also not make sense without an "include" directive, since an
order of inheritance would have to be assumed / hacked through
instead.

Instead, it should be very easy to copy the system config and change a
few things for your use case on your home dir.

cheers,
--renato

> I believe that if the replacement behaviour is the default, then there
ought
> to be an "inherit" behaviour (like include/include-next).

IMO that would be prohibitively expensive, hard to debug and hard to
get users to do anything useful if the system already has a list of
system configurations.

It would also not make sense without an "include" directive, since an
order of inheritance would have to be assumed / hacked through
instead.

Instead, it should be very easy to copy the system config and change a
few things for your use case on your home dir.

I'm worried that, as a user's environment has more Clang-based
configurations (say, for different targets or different projects), that the
copy/paste becomes unmanageable. As it is, the "home dir" modified copy of
the system config solution already assumes that the user is not working off
a home directory shared via network for use on systems where there is no
common "system config".

I agree it’s an issue, both ways.

Cheers,
Renato

As a side note, you can already have this behavior of override the compiler options without changing the build settings using the environment variable CCC_OVERRIDE_OPTIONS.

I’m in favor of something like this. But, I think one important point to take account of is the interplay between the configuration selection and the effective triple.

E.g., let’s say you have a linux distro that supports both “i386-linux-gnu” and “x86_64-linux-gnu”. The typical way that users request the former target is via “$CC -m32”. If you want a configuration option specific to “i386-linux-gnu”, how do you accomplish that, in a way that works with the use of “-m32”?

It had been discussed previously that perhaps configuration options should be applied at two levels:

  1. Add options based on a configuration name.
  2. After adding those options, add more options based on the combination of (config name, effective triple). That is: using the result of “computeTargetTriple(Triple, Args)”.

While a system default can be somewhat reasonable, I'm *strongly*
opposed to magic user defaults. Speaking with a cross-OS packaging
system in mind that allows unprivileged operation, it is begging for
another set of difficult to impossible to reproduce failure conditions.
In short: Only pull in configuration files if they are either explicitly
requested OR come from a fixed location in the compiler installation
prefix. Most importantly, do not just randomly include configuration
files in the current directory either. That can easily result in hard to
trace down race conditions. The advantage gained by saving a command
line option doesn't justify the cost, IMO.

Joerg

In short: Only pull in configuration files if they are either explicitly
requested OR come from a fixed location in the compiler installation
prefix.

That works, too, in the packaging level. But for user overriding the
behaviour (which was the OP's point), you need either a config dir
(~/.llvm) or local dir.

Most importantly, do not just randomly include configuration
files in the current directory either.

Don't Windows user rely on this? I just mentioned because of that.
Also, if Clang -v tells where the config is coming from, wouldn't that
help users finding the culprit?

That can easily result in hard to
trace down race conditions. The advantage gained by saving a command
line option doesn't justify the cost, IMO.

This is one of the points against inheritance of rules, which makes
this problem even worse.

But it should be OK to define only two (system, user) hard-coded
locations for config files.

cheers,
--renato

I’m inclined to agree. And I’d note that we already support @file as a command line argument to read additional flags from a file, which seems to address at least a significant component of this.

One place where we seem to be missing a good answer is the default mapping from targets to sysroots and associated paths. We currently hardcode a lot of this into the compiler itself; perhaps an installation-wide or command-line-provided config file for that would be more useful.

I don't think we have any such mapping, e.g. you are required to provide
the sysroot explicitly?

Joerg

I was referring to the associated paths (gcc include path etc) rather than the sysroot itself, sorry for being unclear.

Thanks to all for feedback!

I tried to summarize the feedback in the form of user-visible description (possibly a part of user documentation) and random set of development specific notes.

------ User documentation ------

Using clang as a part of toolchain, especially in the case of cross-compilation, require setting up large amount of parameters - location of headers and libraries, set of enabled warnings, triplet names etc. Changing from debug to release build, or from host processor to accelerator or any other change of build configuration requires substantial changes of these parameters. To help maintaining option sets for various build variants, sets of options may be combined into configurations and stored in configuration files. Selection of particular configuration file activates all options it represents.

Configuration file may be selected in several ways:

  • Using command line option --config,
  • By setting up environmental variable CLANGCFG,
  • As default configuration.
    Only one way may be used. If option ‘’–config" is specified, CLANGCFG is not checked and the default configuration is not be applied even if the requested configuration is not found. Similarly, if variable CLANGCFG exists, default configuration is never applied.

Command line option --config expects argument either full path to configuration file, or a name of configuration, for instance:
–config /home/user/cfgs/testing.cfg
–config debug
–config debug.cfg
If full path is specified, options are read from that file. If configuration is specified by name with optional suffix “.cfg”, corresponding configuration file is searched in the directories in the following order:

  • ~/.llvm
  • /etc/llvm

If the option --config is absent, and environment variable CLANGCFG is set, content of CLANGCFG is used as full path to configuration file. If CLANGCFG is empty, config file is not used, no diagnostic produced.

If neither --config nor CLANGCFG are specified, default config file is searched in the following order:

  • ~/.llvm/clang.cfg
  • /etc/llvm/clang.cfg
  • clang.cfg in the directory where clang executable resides.

Configuration file is the sequence of compiler options specified in one or several lines. Lines started with ‘#’ possibly prepended with space characters are comments, they are ignored. Lines started with '#" in the first column immediately followed by ‘:’ are reserved for directives. The file may reference other files using usual syntax for response files: @included_file. Example of configuration file:

Frontend options

-std=c++14 -fcxx-exceptions

Directories

@include-dirs.cfg
@library-dirs

Name of the active configuration file and options it provided can be obtained by call of clang with options ‘-v’ or ‘-###’.

------ End of user documentation ------

Notes:

  1. What should be the name of default config file, ‘default.cfg’ or ‘clang.cfg’? If some tool other than clang will use the same mechanism, name ‘clang.cfg’ looks more appropriate.
  2. Should compiler emit a warning (or even an error) if the specified configuration file was not found? Obviously absence of default config file should be silently ignored, as well as empty CLANGCFG variable. But what if configuration specified by --config is not found? This looks like a severe error, as user expectation are broken.
  3. Default config file may be searched for in the directory where clang executable resides. Should configuration specified by --config be searched in this directory as well? I would say no, because a user who need tuning configurations may prepare them in home directory or use those provided by installation in system directories. Ability to place default config into binary directory is convenient for compiler developer or CI tools, they may uses several variants of compiler simultaneously.
  4. Format of proposed config file is in fact gnu response file. YAML format mentioned by Richard looks nice but it requires development of proper parser. I would propose at first implement the simple format. In future other format can be supported, we can distinguish formats by putting a directive like ‘#:format=yaml’ or even automatically.
  5. Some comments may be reserved for future use. Format directive mentioned above is an example. The sequence ‘#:’ proposed as marker is absolutely arbitrary, any other prefix may be used. Now compiler may warn if it sees such comment. Probably we do not need to bother about the future extensions now.
  6. Using response files in the form @file can imitate include directive. If a user do not want to copy-and-past pieces of config files, he may structure them using this feature.

Sorry for being late to the discussion,

I think automatic user defaults are a bad idea: Right now when you invoke clang without any additional options you know that you are in a well known state. Build systems and projects rely on this having unknown compiler options active because the user put them into his configuration file is a recipe for disaster IMO! The example below "-std=c++14 -fcxx-exception" already illustrates a situation in which settings are used that can break the compilation of many projects!

- I'm fine with a commandline option triggering reading a config file (@file)
- Using a different executable name/symlinks to trigger loading of config files may be fine as well (build systems should not pick up myarch-clang by accident)
- Automatically picking up options from user or system configuration files is too dangerous IMO!

- Matthias

If full path is specified, options are read from that file. If
configuration is specified by name with optional suffix ".cfg",
corresponding configuration file is searched in the directories in the
following order:
- ~/.llvm
- /etc/llvm

I see no advantage in the additional complexity of new directory
searching logic when the user explicitly provides a config name.

If the option --config is absent, and environment variable CLANGCFG is set,
content of CLANGCFG is used as full path to configuration file. If CLANGCFG
is empty, config file is not used, no diagnostic produced.

I'm not a fan of environment variables for compilers either. You
normally can't specify empty environment variables: they are undefined.

If neither --config nor CLANGCFG are specified, default config file is
searched in the following order:
- ~/.llvm/clang.cfg
- /etc/llvm/clang.cfg
- clang.cfg in the directory where clang executable resides.

I'm against a default configuration look up in the home directory. As
stated earlier, that would essentially force me to always provide a
--config argument in build systems to get consistent rules. /etc/llvm
should be the equivalent of --sysconfdir to configure in cmake. I'm
against the last as the question of where an executable is located is
quite problematic in many situations. If anything, the -B option should
be honored here.

Configuration file is the sequence of compiler options specified in one or
several lines. Lines started with '#' possibly prepended with space
characters are comments, they are ignored. Lines started with '#" in the
first column immediately followed by ':' are reserved for directives. The
file may reference other files using usual syntax for response files:
@included_file. Example of configuration file:

Please don't mix comments with semantics in one syntax form.

Joerg