A couple of questions about include search paths

First, does anyone know why the search paths take one of the following two sequences?

If no C_INCLUDE_DIRS are provided:

  1. /usr/local/include
  2. /usr/include

If C_INCLUDE_DIRS is provided:

  1. /usr/local/include
  2. C_INCLUDE_DIRS

The thing that I don’t get here is ‘/usr/local/include’. Why do we always do this, unconditionally, and before the builtin headers? It means that if your libc’s ‘limits.h’ is in /usr/local/include, the wrong thing happens with the builtin ‘limits.h’ include_next directive. I propose moving ‘/usr/local/include’ down to live with (and precede) ‘/usr/include’.

Second, there is no way to change the relative path used for the builtin headers easily. The resource dir has ‘include’ appended to it, and neither this relative path, nor the resource dir itself can be configured in any way.

The resource dir seems to have two uses: finding static libraries on the Darwin platform, and locating the builtin header files. I’d rather these concepts were independent as one is Darwin-specific, and the other is needed everywhere. We could still use the resource directory as a default for the bultin header directory when on the Darwin platform. Others could specify more likely defaults. My vague plan would be:

  • create a dedicated option for tracking the bultin header location
  • default it to /include on Darwin, and possibly other platforms initially
  • allow compile-time default override to be specified much like we allow C_INCLUDE_DIRS: BUILTIN_INCLUDE_DIR?
  • investigate a better default for Linux systems; suggestions welcome here.

First, does anyone know why the search paths take one of the following two
sequences?
If no C_INCLUDE_DIRS are provided:
1) /usr/local/include
2) <builtin-headers>
3) <massive platform-specific location set>
4) /usr/include

This is to match gcc, except for the massive set which is a
combination of every path used by different distros :frowning:

If C_INCLUDE_DIRS is provided:
1) /usr/local/include
2) <builtin-headers>
3) C_INCLUDE_DIRS
The thing that I don't get here is '/usr/local/include'. Why do we always do
this, unconditionally, and *before* the builtin headers? It means that if
your libc's 'limits.h' is in /usr/local/include, the wrong thing happens
with the builtin 'limits.h' include_next directive. I propose moving
'/usr/local/include' down to live with (and precede) '/usr/include'.

I don't think that is a good idea. We should match the order that gcc
uses. What could be done is replace the hardcoded /usr/local/include.
The problem is that gcc has different behaviour when it is compiled as
a regular compiler or a cross compiler. To handle both we can probably
add a C_INCLUDE_DIR_PREFIX and search

*) C_INCLUDE_DIR_PREFIX + "/local/include"
*) <builtin-headers>
*) C_INUCLDE_DIR_PREFIX + "/include"
*) C_INCLUDE_DIRS

and for c++ add the c++ specific headers before that.
C_INCLUDE_DIR_PREFIX should default to /usr and be ignore if it is
empty.

Second, there is no way to change the relative path used for the builtin
headers easily. The resource dir has 'include' appended to it, and neither
this relative path, nor the resource dir itself can be configured in any
way.
The resource dir seems to have two uses: finding static libraries on the
Darwin platform, and locating the builtin header files. I'd rather these
concepts were independent as one is Darwin-specific, and the other is needed
everywhere. We could still use the resource directory as a default for the
bultin header directory when on the Darwin platform. Others could specify
more likely defaults. My vague plan would be:
- create a dedicated option for tracking the bultin header location
- default it to <resource-dir>/include on Darwin, and possibly other
platforms initially
- allow compile-time default override to be specified much like we allow
C_INCLUDE_DIRS: BUILTIN_INCLUDE_DIR?
- investigate a better default for Linux systems; suggestions welcome here.

Not sure if you need all that. All you should need for finding clang's
headers is path relative to itself. That is what gcc does. If it
resolves links or not is controlled by -no-canonical-prefixes.

One thing that could be done to clean up things a bit is split the
various config objects. Each looking something like

class IncludeConfig {
  std::vector<std::string> CXXIncludeDirs;
  std::string EarlyIncludeDir;
  std::string BultinDir;
  std::vector<std::string> ExtraCIncludeDirs;
  std::vector<std::string> FrameWorkIncludeDirs;
};

I think this can be use to represent the includes of every
configuration we support. There would be one for each version of
Darwin, one for each linux distro, etc. To make the construction easy
we would need some factories. For example

* Windows: Use getWindowsSDKDir to find the correct sdk path and
build the includeConfig.
* Linux: Most linux distros can build a config with hard coded C path
and custom C++ ones.
* Configure time option. Build a config base on the
C_INCLUDE_DIR_PREFIX and similar variables.
* As a hack, a "combined linux" config that merges the paths of every
linux config. This would be the default unless the configure options
were used.

That is a lot of annoying refactoring, but should be somewhat on the
path for having config files. It should also allow for a single clang
binary to be used in different ways

* clang -ccc-config=configure: This will use the contents of the
configure options and can then use a custom libstdc++ for example
* clang -ccc-config=linux-merge: The default hack of merging all linux paths
* clang -ccc-config=ubuntu

Currently if you set the configure options the produced clang cannot
be used with the host includes for example :frowning:

Cheers,

First, does anyone know why the search paths take one of the following two
sequences?
If no C_INCLUDE_DIRS are provided:

  1. /usr/local/include
  2. /usr/include

This is to match gcc, except for the massive set which is a
combination of every path used by different distros :frowning:

But why do we need to do this? Why does GCC want a system header include to precede it’s builtin includes? I’ve been trying to contrive of a reason this would be a good thing, even in the context of a cross compiler, and I can’t think of one.

If C_INCLUDE_DIRS is provided:

  1. /usr/local/include
  2. C_INCLUDE_DIRS
    The thing that I don’t get here is ‘/usr/local/include’. Why do we always do
    this, unconditionally, and before the builtin headers? It means that if
    your libc’s ‘limits.h’ is in /usr/local/include, the wrong thing happens
    with the builtin ‘limits.h’ include_next directive. I propose moving
    ‘/usr/local/include’ down to live with (and precede) ‘/usr/include’.

I don’t think that is a good idea. We should match the order that gcc
uses.

Are you saying things are depending on this specific ordering? (local/include before builtin)

If so, that might be a good reason to keep it even if there is no rationale for this ordering. What things? How do they depend on it?

What could be done is replace the hardcoded /usr/local/include.
The problem is that gcc has different behaviour when it is compiled as
a regular compiler or a cross compiler. To handle both we can probably
add a C_INCLUDE_DIR_PREFIX and search

*) C_INCLUDE_DIR_PREFIX + “/local/include”
*)
*) C_INUCLDE_DIR_PREFIX + “/include”
*) C_INCLUDE_DIRS

and for c++ add the c++ specific headers before that.
C_INCLUDE_DIR_PREFIX should default to /usr and be ignore if it is
empty.

This is not the problem I’m trying to solve. I’m trying to understand why we would ever want headers to override the builtin headers.

If we really have to support this, I’m going to ask for an option at build or run time to disable it in order to make a more hermetic compiler.

Second, there is no way to change the relative path used for the builtin
headers easily. The resource dir has ‘include’ appended to it, and neither
this relative path, nor the resource dir itself can be configured in any
way.
The resource dir seems to have two uses: finding static libraries on the
Darwin platform, and locating the builtin header files. I’d rather these
concepts were independent as one is Darwin-specific, and the other is needed
everywhere. We could still use the resource directory as a default for the
bultin header directory when on the Darwin platform. Others could specify
more likely defaults. My vague plan would be:

  • create a dedicated option for tracking the bultin header location
  • default it to /include on Darwin, and possibly other
    platforms initially
  • allow compile-time default override to be specified much like we allow
    C_INCLUDE_DIRS: BUILTIN_INCLUDE_DIR?
  • investigate a better default for Linux systems; suggestions welcome here.

Not sure if you need all that. All you should need for finding clang’s
headers is path relative to itself. That is what gcc does. If it
resolves links or not is controlled by -no-canonical-prefixes.

Have you looked at how this works in Clang? the relative path for the resource dir contains the clang version number in it. This is completely specialized for the Darwin packaging of it afaict…

One thing that could be done to clean up things a bit is split the
various config objects. Each looking something like

While I don’t have specific objections to this (other than my questions above), it’s not what I’m working on right now. =/ I’d like to make the changes necessary to get the two issues I’m having making the includes hermetic fixed, and then if someone ese is interested in this cleanup, that’d be really cool. I won’t have time to work on it for a month or two.

(sending again with the correct email address)

But why do we need to do this? Why does GCC want a system header include to
precede it's builtin includes? I've been trying to contrive of a reason this
would be a good thing, even in the context of a cross compiler, and I can't
think of one.

I can't think of one too, but this is an area where it is probably
better to have bug by bug compatibility with gcc. In fact, this was
changed in 103912 so I assume the user had found a case where there
was a dependency on the gcc way.

Are you saying things are depending on this specific ordering?
(local/include before builtin)
If so, that might be a good reason to keep it even if there is no rationale
for this ordering. What things? How do they depend on it?

I am not sure. If I was the one designing these things I would
probably not have a /usr/local :slight_smile: Mike, do you remember why you
changed this?

This is not the problem I'm trying to solve. I'm trying to understand why we
would ever want headers to override the builtin headers.
If we really *have* to support this, I'm going to ask for an option at build
or run time to disable it in order to make a more hermetic compiler.

The above description is effectively a compile time option since I
suggested ignoring C_INCLUDE_DIR_PREFIX if it is empty. What vanilla
gcc does is ignore the local dirs if it is built as a cross compiler.

Not sure if you need all that. All you should need for finding clang's
headers is path relative to itself. That is what gcc does. If it
resolves links or not is controlled by -no-canonical-prefixes.

Have you looked at how this works in Clang? the relative path for the
resource dir contains the clang version number in it. This is completely
specialized for the Darwin packaging of it afaict..

I see. I would say that resource dir location system has to be smarter
or the installation structure more regular, but I am not familiar with
this particular part of the system.

One thing that could be done to clean up things a bit is split the
various config objects. Each looking something like

While I don't have specific objections to this (other than my questions
above), it's not what I'm working on right now. =/

Same problem I have. It never gets high enough in my priorities :slight_smile:

I'd like to make the
changes necessary to get the two issues I'm having making the includes
hermetic fixed, and then if someone ese is interested in this cleanup,
that'd be really cool. I won't have time to work on it for a month or two.

I think the smallest change you could make is to not include
/usr/local if C_INCLUDE_DIRS is set. That is, setting C_INCLUDE_DIRS
would imply gcc's behaviour when cross compiling. This would break
things for users that use C_INCLUDE_DIRS to create a host compiler for
an unusual host. Not sure if there are any such users.

Cheers,

I can't think of one too, but this is an area where it is probably
better to have bug by bug compatibility with gcc. In fact, this was
changed in 103912 so I assume the user had found a case where there
was a dependency on the gcc way.

To summarize a offline discussion I had with Chandler. My option is
that by default clang should follow the system compiler. On windows it
should read the registry, on linux it should do what the system
compiler does. On every linux distro that I know that includes
searching /usr/local/include first.

If designing a new system, I would highly recommend trying to do
without /usr/local/include. But that is not the case on existing
systems.

I fully support having a run time or compile time option to change
this. Even better if it is easier to use than gcc's behaviour of doing
things differently if it is a cross compiler.

Cheers,
Rafael

So, from the point of view of cross compilation, I'd like it to be possible to construct any header path you want. I don't really mind what the defaults are, so long as it is possible to construct an exact search path that includes only what you want, with the builtin headers in a controllable place. So combinations of -sysroot, -isysroot, -nostdinc, -nostdlib, -I, -Isystem and friends should lead to something predictable. -nostdinc and -nostdlib are not that nice to have to use, but I have had projects where we did that... fortunately my present project works with -sysroot.

Andrew

As a FreeBSD user, I can tell that our base gcc (4.2.1) does not look into /usr/local/include at all. All system headers are located in /usr/include and all third party includes go into /usr/local/include. This makes possible to use, for example, two versions of OpenSSL simultaneously.
So, if clang will start searching in /usr/local/include first, it would be slightly unexcepted.

Indeed, and this is on purpose. Also, the version of Clang imported
into 9-CURRENT has been modified to have the same behaviour.

It will not be the case, though, if you build Clang manually, or install
it from ports.

We have a bug on FreeBSD then. Clang should not search
/usr/local/include if the system compiler does not. I am guessing that
the other BSD systems do the same, so the attached patch should fix
it.

Do the BSD guys think it is OK?

Cheers,
Rafael

t.patch (1.26 KB)

no... the system gcc (ie. /usr/bin/gcc) does not search /usr/local/include.

any other gcc search /usr/local/include as well.. so clang is ok

The thought was if clang-builtin include position has to change, might as well place it consistent with gcc ordering; ie: after /usr/local/include and before /usr/include . As to why gcc searches /usr/local/include by default, I dare not speculate :wink:

The details on why clang-builtin positioning needed to change (r103912) can be found in the following email. To summarize, particular versions of libstdc++ headers shipped with debian, ubuntu and fedora use #include_next <stddef.h> which dictated clang's stddef.h must be searched after libstdc++.

http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20100510/030245.html

--mike-m